My recent work on the gramps database objects has led me to reflect upon their structure from an abstract data type point of view. I’m hoping that this blog entry will start a discussion that will ultimately lead to the development of a true ADT-based structure for gramps databases with multiple implementations.
Current structure using BSDDB
The current implementation of gramps uses BSDDB as the storage engine. This has served us well and provides an efficient back end with good Python wrappers to access it. (For those wanting more detail, see the official documentation.) Gramps today is tightly bound to BSDDB. This is not optimal and could make porting to a different engine troublesome and tricky. On the other hand, it is helpful to examine the structure and use the insight gained from that examination to design good ADTs for future implementations.
As most readers will know, a gramps database is actually a collection of BSDDB databases of various types that live together in subdirectory in some filesystem. There are eight primary BSDDBs, one for each primary object type (Person, Family, Event, Place, Source, MediaObject, Repository and Note). These are keyed by a program-generated hash that is called a handle internally. Mirroring these are eight secondary databases that are indexed by gramps id and point (via handle) to the primary objects. Additionally, there are databases to track cross-references and other things.
Within the BSDDB world, the subdirectory is an important entity in its own right. It is considered to be the environment within which the underlying databases function. This environment provides centralized control over transactions and serialization among other things.
Requirements for gramps ADTs
Here is my initial take on requirements for ADTs for gramps:
- High-level ADT
- Groups underlying ADTs that together comprise a gramps database
- Provide transaction, logging and serialization methods
- Object-specific ADT
- Handles data and methods for a gramps object type
- Uses Python dictionary methods for access
- Automatically performs appropriate transaction processing, logging and serialization using high-level ADT methods
First attempt at implementation of ADTs
I’ve been building prototype ADT implementations that attempt to fulfill these requirements for the current BSDDB structure. I’m working with six classes:
- MyEnv: Manages the BSDDB environment; provides transaction support
- GenDb: Generic database type implementing Python dictionary methods
- MyDb: BSDDB DB type (derives from bsddb.dbobj.DB and GenDb)
- MyDbShelf: BSDDB DBShelf type (derives from bsddb.dbobj.DBShelve and GenDb)
- MyTxn: Context manager for transactions (used by MyEnv)
- MyCursor: Context manager for cursors
Though I’m still in the (very) early stages with these classes, they have already allowed for some simplifications. For example, today the get_person_by_handle method calls get_by_handle specifying the actual database and handle. Within the new structure, this can simply be person[handle] if “person” is a MyShelf object for the person database. The details of how the handle is accessed and the person object returned are hidden by the class.
It should in principle be very easy to implement the same classes using regular Python dictionaries instead of BSDDB databases. This would allow for working with non-persistent databases entirely in RAM — a possible boon for testing new modules or isolating odd troubles.
More Discussion Needed
I hope by this posting to spur others on to think more about defining ADTs for gramps with an eye to implementing them for other back ends. Please add your thoughts.