Difference between revisions of "GEPS 032: Database Backend API"

From Gramps
Jump to: navigation, search
Line 19: Line 19:
 
* could be abstracted away from details
 
* could be abstracted away from details
  
**Step one is completed.**
+
'''Step one is completed.'''
  
 
===Step 2===
 
===Step 2===
Line 63: Line 63:
 
   Schema version: Unknown
 
   Schema version: Unknown
 
</pre>
 
</pre>
 +
 +
'''DjangoDb and DictionaryDb are largely complete.'''
  
 
==Transactions==
 
==Transactions==
Line 71: Line 73:
  
 
Transactions in the Python code are ignored.
 
Transactions in the Python code are ignored.
 +
 +
'''Need to hook alternative backends to History Redo/Undo.'''
  
 
==Complications==
 
==Complications==
Line 76: Line 80:
 
Some systems, like Django, have a single database used per Python session. That makes sense, given that it is designed to be a webserver with fixed settings. However, what to do if you want to use the Django ORM and switch settings?
 
Some systems, like Django, have a single database used per Python session. That makes sense, given that it is designed to be a webserver with fixed settings. However, what to do if you want to use the Django ORM and switch settings?
  
I have found a solution, but hasn't been thoroughly tested:
+
I have implemented a solution based on the following:
  
 
<pre>
 
<pre>
Line 100: Line 104:
  
 
However, this has to be done in the right place, and can't (as far as I can see) be embedded in an the imported database plugin. [DONE]
 
However, this has to be done in the right place, and can't (as far as I can see) be embedded in an the imported database plugin. [DONE]
 +
 +
'''It would be nice to do away with hack, but is the only method I can find to unload Django. It works fine, so far.'''
  
 
==Progress==
 
==Progress==
Line 110: Line 116:
  
 
We can develop backends that work directly on Exported formats. Others to consider: GEDCOM and CSV. These would probably use a DbDictionary, and simply import/export on load/close (would lose data if power outage; would be fast as in-memory, slow on start/stop).
 
We can develop backends that work directly on Exported formats. Others to consider: GEDCOM and CSV. These would probably use a DbDictionary, and simply import/export on load/close (would lose data if power outage; would be fast as in-memory, slow on start/stop).
 +
 +
'''Branch geps/gep-032-backend-database is complete.'''
  
 
== Unresolved Issues ==
 
== Unresolved Issues ==

Revision as of 12:05, 16 May 2015

This proposal defines a complete Database Backend API so that we can have plug-in replacements for BSDDB. This would allow the use of other databases.

This is an idea refined from GEPS 010: Relational Backend. However, without the relational components.

Plan

Step 1

Create a Database Backend plugin. Create a functioning BSDDB plugin, that is used by default. Identify which plugin to use via a file in the database directory.

This is underway under the branch geps/gep-032-database-backend here:

https://github.com/gramps-project/gramps/tree/geps/gep-032-database-backend

Identify the items that:

  • need to be fixed
  • are too BSDDB specific (some tools?)
  • could be abstracted away from details

Step one is completed.

Step 2

Develop alternative plugins that support the db API.

Exactly what is required for a class to implement a fully functional database for Gramps is only determined by examination of the existing BSDDB class. This involves the following components:

  1. data and metadata update, add, and delete
  2. transactions for batch or atomic changes
  3. signal handling

Once a full Gramps Database class is created, there needs to be a way of:

  1. selecting which backend to use for new databases [DONE]
  2. selecting the database to load (Family Tree Manager) [DONE]

We will use the directory structure, as we do now. In each directory, the type of database needs to be identified. This could be done in two ways:

  1. well-defined database backend types. These could be registered, like any plugin/addon. [DONE]
  2. is there really any reason for Gramps to have to have the code for the db backend? All that is necessary is for the backend to create the Database instance.

It makes sense that we will reuse and share the backends, so we should use option 1, and develop a database backend plugin type.

Because there are so many functions for the database layer, extensive testing should be created to test all functions to ensure that a backend works correctly.

We should factor-out all BSDDB dependencies, and make BSDDB the first database backend plugin. The plugin API should include functions for:

  • making a new database, given a directory [DONE]
  • loading the database, given the directory [DONE]

Other things that might need to be changed:

Listing the databases (-l and -L). That might include changing the current listing: [DONE]

Family Tree "test_family, gramps40":
   Bsddb version: (5, 1, 29)
   Last accessed: 11/17/2013 08:47:12 AM
   Locked?: no
   Number of people: Unknown
   Path: /home/dblank/.gramps/grampsdb/528787d0
   Schema version: Unknown

DjangoDb and DictionaryDb are largely complete.

Transactions

Because BSDDB didn't support transaction natively, Gramps developed these in Python. Most database system have transaction built into the system. Most database backends will perhaps not use transactions as BSDDB does.

There still could be uses for the transactions, However. For example, we can use an abstraction for the History undo/redo. Although the current system only exists in the current session, and is limited. We can probably create a better method with more features (such as diff between versions, lifetime changes, etc).

Transactions in the Python code are ignored.

Need to hook alternative backends to History Redo/Undo.

Complications

Some systems, like Django, have a single database used per Python session. That makes sense, given that it is designed to be a webserver with fixed settings. However, what to do if you want to use the Django ORM and switch settings?

I have implemented a solution based on the following:

import sys

class ModulesCheckpoint(object):
    def __init__(self):
        self.original = sys.modules.copy()
        
    def reset(self):
        # clear modules:
        for key in list(sys.modules.keys()):
            del(sys.modules[key])
        # load previous:
        for key in self.original:
            sys.modules[key] = self.original[key]

checkpoint = ModulesCheckpoint()
# do stuff
checkpoint.reset()
# do stuff again

However, this has to be done in the right place, and can't (as far as I can see) be embedded in an the imported database plugin. [DONE]

It would be nice to do away with hack, but is the only method I can find to unload Django. It works fine, so far.

Progress

The first step is to separate all of the gramps.gen.db code into reusable and extendable components. [DONE]

This has begun with the DictionaryDB 4972, which is a in-memory replacement for the BSDDB. Still needs the indexes, and metadata support (gender names, bookmarks, etc). Also, the Dictionary transaction is non-existent.

Currently, the best working replacement backend is "dictionarydb". [MOSTLY COMPLETE]

We can develop backends that work directly on Exported formats. Others to consider: GEDCOM and CSV. These would probably use a DbDictionary, and simply import/export on load/close (would lose data if power outage; would be fast as in-memory, slow on start/stop).

Branch geps/gep-032-backend-database is complete.

Unresolved Issues

(as of May 16, 2015)

  1. dictionarydb and djangodb are not yet finished. Mostly metadata needs to be dealt with.
  2. sqlitedb hasn't been started yet.

Other Backends

Other Backend plugins to consider developing:

  • MongoDB
  • CouchDB
  • CSV - spreadsheet
  • SQLHeavy - probably not... sqlite is a thin, robust, backwards-compatible-guaranteed layer. SQLHeavy has too much that we do not need.
  • Libgda

See Also