Recover corrupted family tree

From Gramps
Revision as of 21:56, 21 June 2009 by Dm1407 (Talk | contribs) (reword)

Jump to: navigation, search

Explaination of GRDB corruption, how to recover from it, and how to avoid it in the future.

What causes this corruption?

The leading cause of grdb corruption is moving the grdb file from its original location. Whether you move the file to another directory, rename it, copy into another file, transfer to another machine, or another user account -- all of those will "corrupt" the file.

What happens is that the grdb file needs its database environment -- a directory with log files, lock files, temp files, etc. The 2.2.x gramps releases uses grdb files and stores the environment for each file, under a tree in a ~/.gramps/env directory. If your grdb file is /home/user/genealogy/MyData.grdb then its environment is in the /home/user/.gramps/env/home/user/genealogy/MyData.grdb directory.

So moving, copying, or renaming the file will copy the file's bytes, but not its environment. This is why the moved file appears corrupted.

Another cause can be an upgrade or downgrade of your operating system to a bsddb database backend that does not support fully the previous form of the database (eg, changed hash versions). This will also seem like a corruption in GRAMPS, but actually means the bsddb tools must be used to convert to data to a new version.

Not being able to open a /tmp/... file in GRAMPS 3.0.x on opening grdb files indicates database corruption. This is because the grdb file you want to open is copied to the /tmp dir, and then opened. All failure results in the '/tmp/tmpxxxxx could not be opened'

What do I do now?

The answer depends on whether or not you have the environment for that database. If you just copied one file into another then the environment may still work. If you modified the original database since then, the original environment has changed and there's no good environment for the new file. If you removed your .gramps directory (why oh why?) then all environments are lost. So act depending on the situation, as explained below.

The environment still exists

If you have environment directory for that file, copy it under the above gudelines.

Example
You copied /home/user/genealogy/MyData.grdb to /home/user/genealogy/backup/BackupData.grdb and the new file is not working.
Solution
Copy /home/user/.gramps/env/home/user/genealogy/MyData.grdb directory into /home/user/.gramps/env/home/user/genealogy/backup/BackupData.grdb and this should fix the problem.

The environment is lost

If you don't have the original environment for that file, you may try dumping and loading your data using Berkeley DB tools. Depending on your system, they may be called db_dump and db_load, db41_dump and db41_load, db4.4_dump and db4.4_load, ... In Ubuntu you find them in the package db4.4-util. You might need more recent versions depending on the version your distribution uses in its python package. So for eg Ubuntu Hardy created files, you will need db4.6-util. Whatever they are called, there should be a dump tool and a load tool, and they should be version 4 or later.

Basically, you just dump the grdb into a text file, then create a new grdb from that text file:

   $ db4.4_dump BackupData.grdb > somefile.txt
   $ db4.4_load newfile.grdb < somefile.txt

and then cross your heart and hope that newfile.grdb will open in GRAMPS.

If you obtain the error:

db4.4_dump: eidtrans: unsupported hash version: 9

this is an indication you need a more recent version. So use db4.6 tools:

   $ db4.6_dump BackupData.grdb > somefile.txt
   $ db4.6_load newfile.grdb < somefile.txt

Note: If you downgrade your distribution, it might be needed to do dump with 4.6 tools, and load with 4.4 or 4.5 tools.

How to prevent corruption?

While moving the file is the leading cause of corruption, apparently there are other less frequent causes that we don't fully know. So preventing corruption is not always possible.

What is possible though is to backup the data regularly. The backups should be in XML format (the .gramps format). XML is machine- and human-readable. It is completely self-sufficient. It is also small. The following are good practices of backups:

  1. Export to XML from time to time, especially after large edits.
  2. Export to XML before making big changes, such as importing new data into an existing database from e.g. GEDCOM, merging records, running tools that may heavily modify the data, etc.
  3. Export to XML before upgrading GRAMPS to a newer version. Apparently, export to XML with old version before you install the new one!
  4. Export to XML before upgrading your OS.

Also, use XML format for any data migration. Moving to another machine, sending data to grandma, copying to another user on the same machine -- all of these cases should use XML.

Can you guys not solve this ?

Starting with GRAMPS 3.0, this has been completely reworked using the simpler Family Tree Manager.

But DB_RUNRECOVERY could still happend ! If so, you may try :

gramps -l

To find <target directory> in ~/.gramps/grampsdb

cp <target directory> <backup directory>

To perserve the old

cd /home/<user>/.gramps/grampsdb/<target directory>
db4.6_recover -c