Changes

← Older edit

GEPS 010: Relational Backend

4,321 bytes added, 00:20, 31 January 2022

m

no edit summary

{{man warn|This ~~page is for the discussion of~~ proposal has been withdrawn.|This original GEP was a design to use a ~~proposed implementation of an SQL~~ relational database backend for ~~GRAMPS~~Gramps. However, after experiments, it was shown to be too slow. However, it was shown that we could use other backends, but in an non-relational manner. Also, there are many [http://en.wikipedia.org/wiki/NoSQL NOSQL] solutions that would also work. Thus, a new proposal, [[GEPS 032: Database Backend API]] was created. }}

This proposal is also related to [[~~Category~~GEPS 013:~~GEPS~~Gramps Webapp]]which can create a SQL interface.

= There is also a SQL ~~Backend =~~import and export Addon.

<hr> This page is for the discussion of a proposed implementation of a relational (SQL) backend for GRAMPS. SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel). = Reasons for adding a relational backend =Currently, GRAMPS uses a ~~BSD~~ [[Gramps_Glossary#bsddb|BSDDB]] database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQLas an alternative backend. This should allow easy, single db file implementations (eg, ~~specifically~~ SQLite ~~as a replacement~~) to more complex and sophisticated client/server (eg, MySQL).

First, there are a number of facts related to this proposal:

# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)

# SQLITE is being added to the standard Python distribution# BSDDB is not a relational database, but a hierarchical ~~one~~datastore# BSDDB databases do not have schema or data-definitions. BSDDB requires ~~quite a bit~~ all of the database structure logic to reside in code ~~(rather than in the database)~~# ~~SQLite tables of~~ BSDDB is a ~~database reside in~~ programmer's API# SQL is a ~~single file~~declarative, independent abstraction layer# ~~SQLite~~ SQL can optimize queries (in low-level C) whereas BSDDB is done in Python ~~Next, are a number of issues that need to be tested, but best guesses suggest:~~ # An ~~SQLite~~ SQL version of a GRAMPS ~~BSDDB would~~ DB should be ~~at least 4 times smaller~~faster# An SQLite version of a GRAMPS ~~BSDDB~~ DB should be smaller than a BSDDB file.# SQL Engines can perform query optimizations# More code would reside in the db, rather than in Python# Enterprise SQL versions of GRAMPS would allow people to create and manage much ~~faster~~larger trees# An SQLite version of GRAMPS ~~would~~ might allow people to create ~~much bigger~~ larger trees## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API

Further implications:

# A fullscale MySQL backend would be a trivial step from SQLite

True as long as the SQL written to access the database conforms to a particular defined standard. SQLite appears to support (most of) SQL-92. So for portability reasons we must code to the SQL-92 standard if SQLlite is chosen as the lowest common denominator. --[[User:Gburto01|Gburto01]] 23:06, 4 April 2009 (UTC)

# Easy to allow multiple users in a SQLite database (uses file-locking)

# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)

# We will have to develop SQL experts

I have been writing SQL for 20 years :) --[[User:Gburto01|Gburto01]] 23:06, 4 April 2009 (UTC)

Perhaps one of the most important reasons for moving away from a low-level datastore is to allow more sophisticated interfaces. Currently BSDDB offers transactions and recover, or multiuser access, but not both together. If we want to have a GRAMPS webapp or a multiuser GTK interface (for examples), then we would have to either disable transactions and recover (dangerous!) or switch backends. (Another option would be to write a server interface, but the complexity of that is probably on par with just switching backends. See [[GEPS 013: Gramps Webapp]] for more discussion.)

== Discussions of BSDDB in Python ==

PEP 3108 marks BSDDB to be removed:

http://www.python.org/dev/peps/pep-3108/

Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm

= Research =

*[[Relational database comparison]]

*[[Database abstraction layers comparison]]

= Questions and concerns =

== Native DB access for other languages ==

If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?

Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)

~~A sqlite shelve interface for Python:~~== Power vs Dependencies ==~~http://bugs.python.org/issue3783~~Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)?

~~From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:SQLite versus Berkeley DB~~PROS:

~~<pre>~~ ~~Berkeley DB (BDB) is just the data storage layer - it does not~~ ~~support SQL or schemas. In spite of this, BDB is twice the size~~ ~~of SQLite. A comparison between BDB and SQLite is similar to a~~ ~~comparison between assembly language and a dynamic language like~~ ~~Python or Tcl. BDB is probably much faster if you~~ # Makes GRAMPS code it ~~carefully. But it is much~~ more ~~difficult to use and considerably~~ ~~less flexible.~~abstract

~~On the other hand BDB has very fine grained locking (although~~ ~~it's not very well documented), while SQLite currently has only~~ ~~database-level locking. -- fine grain locking is important for~~ ~~enterprise database engines, but much less so for embedded~~ ~~databases. In SQLite, a writer gets a lock, does an update, and~~ ~~releases the lock all in a few milliseconds. Other readers have~~ ~~to wait a few milliseconds to access the database, but is that~~ ~~really ever a serious problem?</pre>~~CONS:

~~== Comparing from BSDDB~~ # Makes it harder for other languages to ~~SQLite ==~~use the native GRAMPS db (but they can use the native db)# Adds a dependency

~~A company~~ Given that ~~justifies~~ GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a ~~switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:~~dependency?

~~Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:~~Is the ORM available for all platforms?

~~<pre>Berkeley~~ I'm not certain if we need the orm layer or not. One option would be to just use sqlAlchemy's SQL Expression Language layer for abstraction and forgo the orm. sqlAlchemy's [http://www.sqlalchemy.org/ home] and [http://www.sqlalchemy.org/docs/05/intro.html intro] might help explain some things. Another is that what ever we do we will be using a DB ~~Offers APIs~~-API so perhaps we should test how far that takes us first before adding yet another dependency. Is the ORM available for all platforms? if your asking about programming languages no sqlAlchemy is a python module. if your asking about databases the [http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis support databases list] is decent. --[[User:AaronS|AaronS]] 19:16, ~~not Query Languages~~ 26 March 2009 (UTC)

~~Berkeley DB~~ Actually, I was ~~designed for software developers~~wondering about OS platforms, ~~by softwaredevelopers. Relational database systems generally provide SQL accessto the data~~ but that ~~they manage~~info is useful, ~~and usually offer some SQL abstraction~~too. --[[User:Dsblank|Dsblank]] 00:40,~~like ODBC or JDBC, for use in applications.</pre>~~27 March 2009 (UTC)

~~What BSDDB is not~~I only researched python orms so they should work where ever python does. I confirmed sqlalchemy probably does since it supports MS-SQL and MSAccess --[[User:AaronS|AaronS]] 03:33, 27 March 2009 (UTC)

~~http://pybsddb.sourceforge.net/ref/intro/dbisnot.html~~= What now? =

~~From previous GRAMPS discussions~~Current DBI:

~~http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w~~* [[Using_database_API]]

~~From the GRAMPS archives:<pre>> Now, sometimes we get a request for a major architectural change that we> will accept. A good example is the new database backend for the upcoming> GRAMPS 2.0. The request came in to support a real database backend so we> could support larger databases. We analyzed the request, and felt that> it matched the goals of the project and would provide a significant step> forward in the usability of the program. The result was a major redesign> effort that will soon be released.~~== Create Object model==

~~I think I and few others are the ones that impacted this decision. Having~~ Going over src/gen/lib/, create an ~~850,000 person database tends to be deadly to the XML architecture that we~~ ~~were with. I've been the main person to test the integrity~~ object model of ~~the system~~ ~~with my Gedcom file importing. When I found that I couldn't import my file~~ ~~without extensive~~ how GRAMPS uses and manipulates genealogy data ~~loss, I came to Don and Alex and we all sought for~~ ~~solutions. We found that the XML interface was taking huge amounts of~~ ~~memory, and we looked for database backends that would handle the load. Don~~ ~~and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been~~ ~~happy as a clam with the Gramps project, because I'm one step closer to~~ ~~killing Windows~~.

~~I personally want~~ '''For this GEP to ~~do away~~ succeed it is extremely important that the experienced developers on the devel list agree with ~~it, but I need it for other applications~~the object model'''. ~~I've also come to the realization that both Windows and Linux are good, but~~ ~~in their own realms~~This goes without saying. ~~I don't want~~ For this to ~~become a huge flame war about~~ ~~Linux and Windows. so if you have other questions as to why I feel this way~~succeed, ~~email me~~the developers should agree with all of the major decisions.

~~> So, would we accept~~ == Select an SQL framework==# finish research and pick a ~~mySQL~~ database ~~backend? There is a good chance we> would (depending on the implementation), as long did not impact Aunt> Martha~~. ~~We have even architected the backend to support this, since we> can see that higher end databases could provide additional functionality> such as versioning~~ # finish research and ~~multiuser support~~pick a database abstraction layer.

~~We could accept mySQL because of this, but I agree with Don. If it negatively~~ ~~impacts the end user, why would we want to proceed with it? I have a friend~~ ~~that wondered about mySQL interaction, but he can see the impact that BSDDB~~ ~~has had on my database, and he has sided with me as well as the rest of the~~ ~~team. Not to say that this is not a possibility, but we need to remain~~ ~~focused on the tasks at hand.~~== Create models/tables ==

~~> So, in summary,~~ # use the ~~project is going in~~ framework to set up a ~~direction that seems to meet~~model of the database> # generate the ~~needs~~ tables# create a dump of ~~our users~~bsddb database in the sql database# validate that all things present in bsddb are present in the sql database# check validation rules. ~~If we changed directions~~Eg, ~~we might or might not~~> handle should be unique, rules must be ~~able~~ added to ensure adding to ~~reach~~ the family table an object with handle like a ~~larger audience, but numbers are not our goal. We> fully support others submitting patches and other contributions, but> they will be weighed~~ person object is '''impossible''' on ~~how they match~~ the ~~goals~~ database layer. These kind of ~~the project~~ rules can be done technically (~~and> most of the patches we've received to date do match the goals~~a primary object table with key on handle)or with rules. If~~> someone wants us take the project in~~ # best would be a ~~different direction, we may or> may not be receptive depending if~~ framework that based on the ~~direction matches our goals.> However, we will support your efforts if you decide~~ model can generate an admin module to ~~fork~~ browse the ~~project.> Who knows~~database, ~~maybe a remerge will occur~~ see eg the admin module in ~~the future, or a forked project> will make us irrelevent~~django.

~~I agree with Don on this, numbers don't matter as long as the users are happy.~~ ~~Getting things appropriately nailed down and ready~~ == New db backend for ~~the end user's use is~~ ~~what is paramount. After all, if there were no users, why would we even have~~ ~~a project with which to collaborate in the first place?~~GRAMPS ==

~~We are here for~~ # write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).# once written, this can be added as an experimental backend to GRAMPS## Family Tree manager needs to list the ~~users~~family tree type (bsddb, ~~especially Aunt Martha~~sqlite), ~~because~~ on creation of new family tree user must choose the ~~fact that~~ backend.~~many people are~~ ## User can import .gramps/gedcom files just ~~moving over~~ as this is done with bsddb backend once family tree is set up.# it will be very important to ~~Linux and having something familiar~~ use properties in src/gen/lib to ~~them~~make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, ~~like a genealogical program is what matters~~ should the attribute table be hit. This requires attributes that are not yet defined up to ~~them~~the moment they are accessed. ~~Making~~ It also means that the ~~transition~~ gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to ~~Linux is hard~~me, ~~don't get me wrong~~but definitely doable. ~~But we are making it one~~ ~~step~~ Just rewriting gen/lib for an sql datamodel might be easier ~~by not complicating~~ though, but that means rewriting the ~~user's experience in their move~~core of GRAMPS....

~~Like I said before,~~ Idon't understand the use of '''~~m just a bug finder. I~~slots'~~m not really a Python~~ ~~programmer, or anything, but I like to find bugs. Even if that~~'~~s all I do on~~ ~~this project, I~~'~~m rather content~~in the above. ~~Everyone else~~ How is that ~~wants~~ idea related to ~~port over to~~ ~~other toolkits and whatnot is free to do so.~~db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)

~~But also as~~ Doug, I meant '''properties'''. The idea is the following. In gen/person an ~~end user that's still~~ person contains all secondary objects and all references. In a ~~greenie to Linux in general~~database, ~~I can say~~ ~~that~~ this ~~program~~ depends on the implementation. So bsddb has ~~helped my move over~~ a flat table with the person object in one table. So one get. For sql, the secondary objects will be in other tables. Hitting all relevant tables on get of a person would be stupid. We only need to access the tables when the data is needed. In base.py, a person is obtained with get_person_from_handle. This fills up a Person() object. dbdir.py uses the unserialize method to ~~Linux~~ do this. My suggestion of properties would be that with an sql backend, the handle is get, but all the other Person attributes are properties that ~~much easier~~remain None. ~~Even~~ If data is needed, the getter checks if value is None, if so, the sql database is hit, otherwise the data is just returned. Just an idea, the concept needs to be worked out. The alternative is changing /gen/lib dramatically, but I ~~have only contributed a little in~~ don't see the ~~way~~ benefit of ~~feedback~~ that. Note that making all objects in /gen/lib new style classes with slots would be a good move nonetheless. [[User:Bmcage|bmcage]] 13:22, 8 April 2009 (~~mostly from the~~ ~~end-user perspective~~UTC).

Thanks for the clarification... that makes sense now :) The integration of the SQL DB with the existing GRAMPS gen/lib and gen/db does seem to be the biggest job to be done in this project. I've had a look at gen/lib and gen/db and there are lots of assumptions all over the place about how the data is stored. Trying to hide this through the use of properties may work, but in the long run, GRAMPS may need an API overhaul. But I like the idea of seeing how far we can get by using properties. -~~Jason</pre>~~-[[User:Dsblank|Dsblank]] 12:56, 9 April 2009 (UTC)

~~From http://osdir~~== Extending base.~~com/ml/genealogy.gramps.user/2004-06/msg00078.html:~~py ==

~~<pre>Alex said:~~Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there.

~~SQLite might~~ Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be ~~better or it might not~~comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, ~~we haven't tried it. A great factor~~ ~~speaking~~ a clear mechanism to optimize for ~~BSDDB~~ sql is ~~that it is supported by a standard Python module~~needed. Continuing above example, ~~bsddb~~see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase. py]

For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.

~~Don said~~= See Also =* [[Addon:SQLite Export Import]]

~~This is an important factor here - ease of setup and use. GRAMPS isdifficult enough to get installed on some platforms (especiallyKDE-centric systems). Requiring someone to get an SQL database up andrunning to try out the program is probably too much effort. What I'vediscovered is that GRAMPS is one of the first programs that a lot ofnew users want to get running - usually before they have a lot ofLinux experience. So we can't make the barriers to entry too high.</pre>~~[[Category:GEPS|S]]

Patsyblefebre

Bureaucrats, Administrators

4,865

edits

Changes

GEPS 010: Relational Backend

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Contributor help pages

wiki

Tools