GEPS 032: Database Backend API
This proposal defines a complete Database Backend API so that we can have plug-in replacements for BSDDB. This would allow the use of other databases.
This is an idea refined from GEPS 010: Relational Backend. However, without the relational components.
Create a Database Backend plugin. Create a functioning BSDDB plugin, that is used by default. Identify which plugin to use via a file in the database directory.
This is underway under the branch geps/gep-032-database-backend here:
Identify the items that:
- need to be fixed
- are too BSDDB specific (some tools?)
- could be abstracted away from details
Step one is completed.
Develop alternative plugins that support the db API.
Exactly what is required for a class to implement a fully functional database for Gramps is only determined by examination of the existing BSDDB class. This involves the following components:
- data and metadata update, add, and delete
- transactions for batch or atomic changes
- signal handling
Once a full Gramps Database class is created, there needs to be a way of:
- selecting which backend to use for new databases [DONE]
- selecting the database to load (Family Tree Manager) [DONE]
We will use the directory structure, as we do now. In each directory, the type of database needs to be identified. This could be done in two ways:
- well-defined database backend types. These could be registered, like any plugin/addon. [DONE]
- is there really any reason for Gramps to have to have the code for the db backend? All that is necessary is for the backend to create the Database instance.
It makes sense that we will reuse and share the backends, so we should use option 1, and develop a database backend plugin type.
Because there are so many functions for the database layer, extensive testing should be created to test all functions to ensure that a backend works correctly.
We should factor-out all BSDDB dependencies, and make BSDDB the first database backend plugin. The plugin API should include functions for:
- making a new database, given a directory [DONE]
- loading the database, given the directory [DONE]
Other things that might need to be changed:
Listing the databases (-l and -L). That might include changing the current listing: [DONE]
Family Tree "test_family, gramps40": Bsddb version: (5, 1, 29) Last accessed: 11/17/2013 08:47:12 AM Locked?: no Number of people: Unknown Path: /home/dblank/.gramps/grampsdb/528787d0 Schema version: Unknown
DjangoDb and DictionaryDb are largely complete.
Develop a fully-tested alternative to BSDDB.
Proposal is to develop a DB-API 2.0 database backend, testing with sqlite.
The database directory will have a small initialize program to create the database, and return a class that can create the connection.
Because BSDDB didn't support transaction natively but required a two-step transactional system, Gramps developed this support in Python. Most database system have transaction built into the system. Most database backends will perhaps not use transactions as BSDDB does.
There still could be uses for the transactions, However. For example, we can use an abstraction for the History undo/redo. Although the current system only exists in the current session, and is limited. We can probably create a better method with more features (such as diff between versions, lifetime changes, etc).
Transactions in the Python code are ignored.
Need to hook alternative backends to History Redo/Undo.
Some systems, like Django, have a single database used per Python session. That makes sense, given that it is designed to be a webserver with fixed settings. However, what to do if you want to use the Django ORM and switch settings?
I have implemented a solution based on the following:
import sys class ModulesCheckpoint(object): def __init__(self): self.original = sys.modules.copy() def reset(self): # clear modules: for key in list(sys.modules.keys()): del(sys.modules[key]) # load previous: for key in self.original: sys.modules[key] = self.original[key] checkpoint = ModulesCheckpoint() # do stuff checkpoint.reset() # do stuff again
However, this has to be done in the right place, and can't (as far as I can see) be embedded in an the imported database plugin. [DONE]
It would be nice to do away with hack, but is the only method I can find to unload Django. It works fine, so far.
The first step is to separate all of the gramps.gen.db code into reusable and extendable components. [DONE]
This has begun with the DictionaryDB 4972, which is a in-memory replacement for the BSDDB. Still needs the indexes, and metadata support (gender names, bookmarks, etc). Also, the Dictionary transaction is non-existent.
Currently, the best working replacement backend is "dictionarydb". [MOSTLY COMPLETE]
We can develop backends that work directly on Exported formats. Others to consider: GEDCOM and CSV. These would probably use a DbDictionary, and simply import/export on load/close (would lose data if power outage; would be fast as in-memory, slow on start/stop).
Branch geps/gep-032-backend-database is complete.
(as of May 16, 2015)
- dictionarydb and djangodb are not yet finished. Mostly metadata needs to be dealt with.
- sqlitedb hasn't been started yet.
Other Backend plugins to consider developing:
- CSV - spreadsheet
- SQLHeavy - probably not... sqlite is a thin, robust, backwards-compatible-guaranteed layer. SQLHeavy has too much that we do not need.