Restructuring the database objects: Part I
The classes and methods that manage the gramps database are at the heart of every operation in gramps — from importing, entering and editing data to producing reports and exporting data to other formats. I recently became intensely interested in the operation of this part of the codebase, and began a project to understand it and streamline it where possible. This article is the first in a series I plan to write about that work.
The gramps database is made up of several files typically stored under the .gramps./grampsdb/
Today, most of the work of managing this structure is done by two modules: base.py and dbdir.py, both in the src/gen/db directory of the source tree. These are rich and complicated modules containing several classes and many, many methods. As I began to study these modules, I saw that they were deeply interconnected (largely due to the history of development) and that they contained important subfunctions alongside the basic methods to retrieve, add, modify and delete data. I then began to look for ways to consolidate similar functionality and factor the code by major function group.
Note: None of these changes have been committed to SVN yet. I want to build and run many unit tests as well as integration tests before I do that, though I might commit some of the independent (and currently unused) new modules for safe keeping. Meanwhile I just rsync them between two different machines.
Transactions:
The first major subfunction I worked on was transaction processing. When gramps is ready to write changes to the database, it does not do so directly. A simple change made via one of the editors may actually affect several of the files in the database structure. To protect those files, the changes are grouped into a transaction. Once all the changes are gathered into the transaction, they are then committed together. Also, the original change is saved for possible undoing later on.
The code for saving and eventually writing the data to the database is currently split between a Transaction class and methods in dbdir.py and base.py. I realized that there these methods could be brought together in one class that could then be placed in its own module for safekeeping. Thus was the GrampsDbTxn class born.
The GrampsDbTxn class handles all the functions of storing transactions and writing them to the database. It also contains a level of abstraction in that it does not matter what the underlying database access method is, so long as it supports put() and delete() methods as well as (possibly null-operation) methods for starting a database-access-method transaction and committing it later. Today, gramps uses BSDDB as our DBMS and it contains a (lower-level) transaction function similar to the one GrampsDbTxn provides for a gramps database.
To begin a gramps transaction, one calls the transaction_begin method of the database, usually like this:
transaction = self.db.transaction_begin()
Then, as changes are made, calls are made to add those changes to the currently-active transaction, usually like this:
transaction.add(…)
However, due to the interconnectedness of the gramps database files, you will see modules calling commit_person() or commit_family() or remove_person() etc., which modules eventually call transaction.add().
Later, when all the changes that are to be made are done, the transaction.commit() method is called, which does several things:
1. Begins a DBMS-level transaction
2. Adds any new records
3. Updates any existing records
4. Calls any signal handlers to let them know about the changes
5. Deletes any records that are so marked
7. Updates the cross-reference files
8. Commits the DBMS-level transaction
Note: It is important that step 4 precede step 5 since to give the signal handler the opportunity to inspect the to-be-deleted data before it is gone.
Oh, I almost forgot to mention that, when transaction.add() is called, the information regarding the pending operation is also written to the undo database (a separate file in the gramps subdirectory) so that the transaction can be undone (and later redone!) if necessary. (I plan to write more about the undo/redo operations in a later installment.)
A BSDDB wrapper:
As mentioned above, BSDDB contains a transaction mechanism of begin…commit. It is actually a natural mechanism for the Python “with” statement which uses context-managers. Unfortunately, these are not directly supported by BSDDB, so I built a little wrapper class for the purpose. Today, you would need to write:
from bsddb import db
env = db.DBEnv()
…
env.txn_begin()
…
db.put()
db.delete()
…
env.txn_commit()
With the new wrapper class, you can write:
with BSSDBtxn(..) as txn:
txn.put()
txn.delete()
and the commit method is called automatically when you fall off the end of the “with” statement (unless you call txn.abort() in the middle). The result is more compact and robust code. I plan to exploit this whereever possible.
While I was at it, I also added context management to the GrampsDbTxn class, so it is possible to write:
with GrampsDbTxn(…) as txn:
self.db.commit_person()
…etc..
and the GrampsDbTxn commit method is automatically called at the end.
Well, that’s surely enough for now! Look for Part II, which will examine the undo database
Benny
11 August 2009 @ 8:41 pm
Nice work.
A small comment. I would prefer things in gen are not named GrampsSomething, but instead GenSomething or just Something.
The idea is that the gen part of GRAMPS can be shipped independant of the GRAMPS application.
Gerald Britton
11 August 2009 @ 8:57 pm
Just following existing convention here. base.py has GrampsDbBase; dbdir.py has GrampsDbDir. Hence GrampsDbTxn which lives in the same subdirectory (src/gen/db) as the other two. Of course, we can change them as well with only a little disruption, but they are so specific to the gramps application and the gramps database layout I can’t possibly imagine how they could be used independently.