GEPS 033: Abstract Database API

From Gramps
Jump to: navigation, search

This proposal defines an abstraction layer to the Gramps database. This proposal:

  • Creates new database methods that allows programming without using low-level object handles
    • handles are internal details that should not be unnecessarily exposed
  • Creates a layer to separate other low-level details from direct access
    • database representations requirements and interfaces (serialized data, unicode, data maps, etc) should not be exposed
  • Makes transactions more abstract and tied to the database
    • transaction requirements are low-level and database-specific
  • Better utilizes indices where possible (especially filters)
  • Removes code duplication and chance of errors
    • by having common functions as methods, the programmer does not need to recreate code repeatedly
    • handle-based code can be subtle, especially when proxies are active (lookups may return None)
  • Makes changing the backends easier
  • Cleans up the code
    • we can get rid of the database utility functions and incorporate the Simple database access library
  • Updates code to current and modern understandings and assumptions
    • For example, gramps_id's are now unique inside a particular database, and handles are now UUIDS and are unique across all Gramps instances

Plan

  1. Add the database to all primary and secondary objects
    • Nick has already started this: http://sourceforge.net/u/nick-h/gramps/ci/api/tree/
    • All Primary, Secondary, Reference and GrampsType classes have been updated. Classes such as Date and StyledText remain unchanged.
    • In nearly all circumstances a database instance will be available when creating a new object. The only exceptions are in two of the unit tests. An object will either be created by a database, or will be created in order to add it to a database.
  2. Design the new API functions
  3. Refactor gen/db/read.py gen/db/write.py so that there is a single gen/db/database.py class from which to subclass all databases
  4. Add new methods (and existing utility functions and Simple library) into database.py
  5. Create unit tests for new functions
  6. Change old code to use new methods

API

The following points capture the essence of the changes:

1. Add methods to get objects rather than handles

Examples:

person.get_events()
person.get_families()
person.get_notes()

2. Create simple access convenience methods

Examples:

person.get_display_name()
event.get_display_date()
event.get_place_name()

3. Add a get_label() method for all primary objects to replace the navigation_label utility. This could be used for bookmarks and to display the active object in the status bar.

4. Add a get_custom_values() method for GrampsTypes. At the moment we have to get the custom values from the database separately and add them to combobox lists.

5. The serialize/unserialize methods and some of the handle referents code look like they don't belong in an API interface. Should we separate them out?

6. Add add(), remove() and commit() methods for all primary objects.

So we could have:

note = Note(db)  # or db.Note()
note.add()
note.append('some text')
note.commit()
person.append_note(note)
person.commit()

But if the append method created the note, then this would become:

note = person.append_note()
note.append('some text')
note.commit()
person.commit()

Related syntax might be more Pythonic, rather than creating named-methods (such as "append_note"):

note = Note(db)   # or db.Note()
note.append('some text')
person.note_list.append(note)
note.commit()
person.commit()

We should utilize Python syntax (and special methods) wherever possible. This will keep the API simple and Pythonic.

7. Abstract the direct database interactions. Currently we have code like:

db.object_map[handle] = data

We just need an API to hide that, for abstraction. This shouldn't be too hard; something like:

db.add_person_direct(handle, data)

which just adds a function call to the current implementation. Other backends will be slightly different, but still fast.

Related Ideas

1. We could get empty objects from the database.

So instead of:

gen.lib.Person(db) 

we could have:

db.get_person()

or perhaps:

db.Person()

to keep the same type of interface with gen.lib. (Note: Person is a database method not a class in this last example)

2. We could change the way we return populated objects from the database.

So instead of:

db.get_person_from_handle(handle)

we could have:

db.get_person(handle)

or, Django-like:

db.Person.select(handle=handle)

which could be a general interface to looking up objects by any criteria.

3. Add methods could return an object. For example:

note = person.add_note()

or, alternatively, add code to the list's special methods (underscore-unserscore) or in the save()/commit() method:

person.note_list + [Note(db)]   # or db.Note()

4. Objects should know how to convert themselves to XML/serialized/json forms.

Currently, the XML exporter converts objects to XML. Having the object be responsible will allow the use of these serialized forms in other places (cut and paste, XML filter representations that include, for example, dates, etc.)

5. Interfacing with the ListViews and TreeViews currently requires using a "fast" (non-object) serialized representation. We may want to keep this, or perhaps rewrite the ListView/TreeView to be more standard (perhaps show pages, rather than all at once).

6. Handles are not only unique in a category (eg, Person, Family) but also across all categories, and also across all users. Handles are Globally Unique Identifiers. As such, we don't need different maps/indexes for each category; a single index can reference them all. So, we don't need different methods such as "get_person_from_handle", and "get_family_from_handle"... we just need one such method.

Issues

  • We would still need access to the handles for the bookmarks and views. They need some unique record identifier, and I'm not sure that we can rely on the Gramps ID to be unique (although it should be). Any place in the code that allows duplicate Gramps IDs should be fixed, and we start to rely on ID as a unique key.
  • Raw data access is used in the treeview models for views and selectors, and also for importers. Apparently we use it for performance reasons.

See Also