Open main menu

Gramps β

Changes

GEPS 010: Relational Backend

3,734 bytes added, 23:54, 29 December 2014
See Also
{{man warn|This proposal has been withdrawn.|This original GEP was a design to use a relational database backend for Gramps. However, after experiments, it was shown to be too slow. However, it was shown that we could use other backends, but in an non-relational manner. Also, there are many [http://en.wikipedia.org/wiki/NoSQL NOSQL] solutions that would also work. Thus, a new proposal, [[GEPS 032: Database Backend API]] was created.}} This proposal is also related to [[GEPS 013: Gramps Webapp]] which can create a SQL interface.  There is also a SQL import and export Addon. <hr> This page is for the discussion of a proposed implementation of a relational (SQL ) backend for GRAMPS.
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).
= Reasons for adding a sql relational backend =
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)
# SQLITE is being added to the standard Python distribution
# BSDDB is not a relational database, but a hierarchical onedatastore
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code
# BSDDB is a programmer's API
Further implications:
# A fullscale MySQL backend would be a trivial step from SQLite True as long as the SQL written to access the database conforms to a particular defined standard. SQLite appears to support (most of) SQL-92. So for portability reasons we must code to the SQL-92 standard if SQLlite is chosen as the lowest common denominator. --[[User:Gburto01|Gburto01]] 23:06, 4 April 2009 (UTC)
# Easy to allow multiple users in a SQLite database (uses file-locking)
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)
# We will have to develop SQL experts
I have been writing SQL for 20 years :) --[[User:Gburto01|Gburto01]] 23:06, 4 April 2009 (UTC)
Perhaps one of the most important reasons for moving away from a low-level datastore is to allow more sophisticated interfaces. Currently BSDDB offers transactions and recover, or multiuser access, but not both together. If we want to have a GRAMPS webapp or a multiuser GTK interface (for examples), then we would have to either disable transactions and recover (dangerous!) or switch backends. (Another option would be to write a server interface, but the complexity of that is probably on par with just switching backends. See [[GEPS 013: Gramps Webapp]] for more discussion.)
== Discussions of BSDDB in Python ==
http://www.python.org/dev/peps/pep-3108/
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm
 
= Research =
*[[Relational database comparison]] *[[Database abstraction layers comparison]] 
= Questions and concerns =
Is the ORM available for all platforms?
I'm not certain if we need the orm layer or not. One option would be to just use sqlAlchemy's SQL Expression Language layer for abstraction and forgo the orm. sqlAlchemy's [http://www.sqlalchemy.org/ home] and [http://www.sqlalchemy.org/docs/05/intro.html intro] might help explain some things. Another is that what ever we do we will be using a DB-API so perhaps we should test how far that takes us first before adding yet another dependency. Is the ORM available for all platforms? if your asking about programming languages no sqlAlchemy is a python module. if your asking about databases the [http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis support databases list] is decent.--[[User:AaronS|AaronS]] 19:16, 26 March 2009 (UTC) Actually, I was wondering about OS platforms, but that info is useful, too. --[[User:Dsblank|Dsblank]] 00:40, 27 March 2009 (UTC) I only researched python orms so they should work where ever python does. I confirmed sqlalchemy probably does since it supports MS-SQL and MSAccess --[[User:AaronS|AaronS]] 03:33, 27 March 2009 (UTC)
= What now? =
 
Current DBI:
 
* [[Using_database_API]]
 
== Create Object model==
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.
# it will be very important to use slots properties in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)
 
Doug, I meant '''properties'''. The idea is the following. In gen/person an person contains all secondary objects and all references. In a database, this depends on the implementation. So bsddb has a flat table with the person object in one table. So one get. For sql, the secondary objects will be in other tables. Hitting all relevant tables on get of a person would be stupid. We only need to access the tables when the data is needed. In base.py, a person is obtained with get_person_from_handle. This fills up a Person() object. dbdir.py uses the unserialize method to do this. My suggestion of properties would be that with an sql backend, the handle is get, but all the other Person attributes are properties that remain None. If data is needed, the getter checks if value is None, if so, the sql database is hit, otherwise the data is just returned. Just an idea, the concept needs to be worked out. The alternative is changing /gen/lib dramatically, but I don't see the benefit of that. Note that making all objects in /gen/lib new style classes with slots would be a good move nonetheless. [[User:Bmcage|bmcage]] 13:22, 8 April 2009 (UTC)
 
Thanks for the clarification... that makes sense now :) The integration of the SQL DB with the existing GRAMPS gen/lib and gen/db does seem to be the biggest job to be done in this project. I've had a look at gen/lib and gen/db and there are lots of assumptions all over the place about how the data is stored. Trying to hide this through the use of properties may work, but in the long run, GRAMPS may need an API overhaul. But I like the idea of seeing how far we can get by using properties. --[[User:Dsblank|Dsblank]] 12:56, 9 April 2009 (UTC)
== Extending base.py ==
= See Also =
* [[ExportSql.py]]
[[Category:GEPS|S]]