Difference between revisions of "GEPS 032: Database Backend API"

From Gramps
Jump to: navigation, search
(Step 1)
(8 intermediate revisions by the same user not shown)
Line 9: Line 9:
 
Create a Database Backend plugin. Create a functioning BSDDB plugin, that is used by default. Identify which plugin to use via a file in the database directory.
 
Create a Database Backend plugin. Create a functioning BSDDB plugin, that is used by default. Identify which plugin to use via a file in the database directory.
  
This is underway under the branch geps/gep-032-database-backend here:
+
This was developed under the branch geps/gep-032-database-backend here:
  
 
https://github.com/gramps-project/gramps/tree/geps/gep-032-database-backend
 
https://github.com/gramps-project/gramps/tree/geps/gep-032-database-backend
 +
 +
It has now been committed to gramps50 (aka master as of this writing).
  
 
Identify the items that:
 
Identify the items that:
Line 19: Line 21:
 
* could be abstracted away from details
 
* could be abstracted away from details
  
**Step one is completed.**
+
'''Step one is completed.'''
  
 
===Step 2===
 
===Step 2===
Line 63: Line 65:
 
   Schema version: Unknown
 
   Schema version: Unknown
 
</pre>
 
</pre>
 +
 +
'''DjangoDb, DictionaryDb, and DBAPI are largely complete.'''
 +
 +
=== Step 3 ===
 +
 +
Develop a fully-tested alternative to BSDDB.
 +
 +
Proposal is to develop a [https://www.python.org/dev/peps/pep-0249/ DB-API 2.0 database backend], testing with sqlite, postgresql, and mysql. [COMPLETE]
 +
 +
The database directory will have a small initialize program to create the database, and return a class that can create the connection.
  
 
==Transactions==
 
==Transactions==
  
Because BSDDB didn't support transaction natively, Gramps developed these in Python. Most database system have transaction built into the system. Most database backends will perhaps not use transactions as BSDDB does.
+
Because BSDDB didn't support transaction natively but required a two-step transactional system, Gramps developed this support in Python. Most database system have transaction built into the system. Most database backends will perhaps not use transactions as BSDDB does.
  
 
There still could be uses for the transactions, However. For example, we can use an abstraction for the History undo/redo. Although the current system only exists in the current session, and is limited. We can probably create a better method with more features (such as diff between versions, lifetime changes, etc).
 
There still could be uses for the transactions, However. For example, we can use an abstraction for the History undo/redo. Although the current system only exists in the current session, and is limited. We can probably create a better method with more features (such as diff between versions, lifetime changes, etc).
  
 
Transactions in the Python code are ignored.
 
Transactions in the Python code are ignored.
 +
 +
'''COMPLETE'''
  
 
==Complications==
 
==Complications==
Line 76: Line 90:
 
Some systems, like Django, have a single database used per Python session. That makes sense, given that it is designed to be a webserver with fixed settings. However, what to do if you want to use the Django ORM and switch settings?
 
Some systems, like Django, have a single database used per Python session. That makes sense, given that it is designed to be a webserver with fixed settings. However, what to do if you want to use the Django ORM and switch settings?
  
I have found a solution, but hasn't been thoroughly tested:
+
I have implemented a solution based on the following:
  
 
<pre>
 
<pre>
Line 100: Line 114:
  
 
However, this has to be done in the right place, and can't (as far as I can see) be embedded in an the imported database plugin. [DONE]
 
However, this has to be done in the right place, and can't (as far as I can see) be embedded in an the imported database plugin. [DONE]
 +
 +
'''It would be nice to do away with hack, but is the only method I can find to unload Django. It works fine, so far.'''
  
 
==Progress==
 
==Progress==
Line 110: Line 126:
  
 
We can develop backends that work directly on Exported formats. Others to consider: GEDCOM and CSV. These would probably use a DbDictionary, and simply import/export on load/close (would lose data if power outage; would be fast as in-memory, slow on start/stop).
 
We can develop backends that work directly on Exported formats. Others to consider: GEDCOM and CSV. These would probably use a DbDictionary, and simply import/export on load/close (would lose data if power outage; would be fast as in-memory, slow on start/stop).
 +
 +
'''Branch geps/gep-032-backend-database is complete.'''
  
 
== Unresolved Issues ==
 
== Unresolved Issues ==
  
(as of May 16, 2015)
+
(as of May 24, 2015)
 
 
1. Currently if the first django thing you do is make a new database, it complains that settings are not set. If you first open an existing database first, then all is fine. Is a problem with reset... it happens when CONSTRUCTING class, where it should happen at the load time.
 
 
 
2. dictionarydb and djangodb are not yet finished. Mostly metadata needs to be dealt with.
 
  
3. sqlitedb hasn't been started yet.
+
# dictionarydb and djangodb are not yet finished. Mostly metadata needs to be dealt with.
 +
# dbapi using sqlite3, postgresql, and mysql is complete.
  
 
==Other Backends==
 
==Other Backends==

Revision as of 14:28, 18 June 2015

This proposal defines a complete Database Backend API so that we can have plug-in replacements for BSDDB. This would allow the use of other databases.

This is an idea refined from GEPS 010: Relational Backend. However, without the relational components.

Plan

Step 1

Create a Database Backend plugin. Create a functioning BSDDB plugin, that is used by default. Identify which plugin to use via a file in the database directory.

This was developed under the branch geps/gep-032-database-backend here:

https://github.com/gramps-project/gramps/tree/geps/gep-032-database-backend

It has now been committed to gramps50 (aka master as of this writing).

Identify the items that:

  • need to be fixed
  • are too BSDDB specific (some tools?)
  • could be abstracted away from details

Step one is completed.

Step 2

Develop alternative plugins that support the db API.

Exactly what is required for a class to implement a fully functional database for Gramps is only determined by examination of the existing BSDDB class. This involves the following components:

  1. data and metadata update, add, and delete
  2. transactions for batch or atomic changes
  3. signal handling

Once a full Gramps Database class is created, there needs to be a way of:

  1. selecting which backend to use for new databases [DONE]
  2. selecting the database to load (Family Tree Manager) [DONE]

We will use the directory structure, as we do now. In each directory, the type of database needs to be identified. This could be done in two ways:

  1. well-defined database backend types. These could be registered, like any plugin/addon. [DONE]
  2. is there really any reason for Gramps to have to have the code for the db backend? All that is necessary is for the backend to create the Database instance.

It makes sense that we will reuse and share the backends, so we should use option 1, and develop a database backend plugin type.

Because there are so many functions for the database layer, extensive testing should be created to test all functions to ensure that a backend works correctly.

We should factor-out all BSDDB dependencies, and make BSDDB the first database backend plugin. The plugin API should include functions for:

  • making a new database, given a directory [DONE]
  • loading the database, given the directory [DONE]

Other things that might need to be changed:

Listing the databases (-l and -L). That might include changing the current listing: [DONE]

Family Tree "test_family, gramps40":
   Bsddb version: (5, 1, 29)
   Last accessed: 11/17/2013 08:47:12 AM
   Locked?: no
   Number of people: Unknown
   Path: /home/dblank/.gramps/grampsdb/528787d0
   Schema version: Unknown

DjangoDb, DictionaryDb, and DBAPI are largely complete.

Step 3

Develop a fully-tested alternative to BSDDB.

Proposal is to develop a DB-API 2.0 database backend, testing with sqlite, postgresql, and mysql. [COMPLETE]

The database directory will have a small initialize program to create the database, and return a class that can create the connection.

Transactions

Because BSDDB didn't support transaction natively but required a two-step transactional system, Gramps developed this support in Python. Most database system have transaction built into the system. Most database backends will perhaps not use transactions as BSDDB does.

There still could be uses for the transactions, However. For example, we can use an abstraction for the History undo/redo. Although the current system only exists in the current session, and is limited. We can probably create a better method with more features (such as diff between versions, lifetime changes, etc).

Transactions in the Python code are ignored.

COMPLETE

Complications

Some systems, like Django, have a single database used per Python session. That makes sense, given that it is designed to be a webserver with fixed settings. However, what to do if you want to use the Django ORM and switch settings?

I have implemented a solution based on the following:

import sys

class ModulesCheckpoint(object):
    def __init__(self):
        self.original = sys.modules.copy()
        
    def reset(self):
        # clear modules:
        for key in list(sys.modules.keys()):
            del(sys.modules[key])
        # load previous:
        for key in self.original:
            sys.modules[key] = self.original[key]

checkpoint = ModulesCheckpoint()
# do stuff
checkpoint.reset()
# do stuff again

However, this has to be done in the right place, and can't (as far as I can see) be embedded in an the imported database plugin. [DONE]

It would be nice to do away with hack, but is the only method I can find to unload Django. It works fine, so far.

Progress

The first step is to separate all of the gramps.gen.db code into reusable and extendable components. [DONE]

This has begun with the DictionaryDB 4972, which is a in-memory replacement for the BSDDB. Still needs the indexes, and metadata support (gender names, bookmarks, etc). Also, the Dictionary transaction is non-existent.

Currently, the best working replacement backend is "dictionarydb". [MOSTLY COMPLETE]

We can develop backends that work directly on Exported formats. Others to consider: GEDCOM and CSV. These would probably use a DbDictionary, and simply import/export on load/close (would lose data if power outage; would be fast as in-memory, slow on start/stop).

Branch geps/gep-032-backend-database is complete.

Unresolved Issues

(as of May 24, 2015)

  1. dictionarydb and djangodb are not yet finished. Mostly metadata needs to be dealt with.
  2. dbapi using sqlite3, postgresql, and mysql is complete.

Other Backends

Other Backend plugins to consider developing:

  • MongoDB
  • CouchDB
  • CSV - spreadsheet
  • SQLHeavy - probably not... sqlite is a thin, robust, backwards-compatible-guaranteed layer. SQLHeavy has too much that we do not need.
  • Libgda

See Also