Open main menu

Gramps β

Changes

GEPS 017: Flexible gen.lib Interface

5,151 bytes added, 22:34, 15 October 2011
no edit summary
This proposal explores the possibility of making the creation of objects more general, and less tied to the particular unserializing process.
 
'''Update''': After building a [[Media:prototype.tar.gz|prototype]], it was found to be too slow for general use. Rather, it seems to be better to cache the data as it appears when it comes from the BSDDB database (pickled, serialized versions of gen.lib objects). Thus, this proposal has been withdrawn.
 
''The prototype uses a combination of Delayed evaluation, and removing of gen.lib objects' properties. When a property was accessed, the delayed object was evaluated, and the set to the attribute.''
= Overview =
For django, with mediaref in another table: self.private = False, self.__marker = 1, self.__media_list = ('Person', handle)
The aim should be clear, each engine unpacks the data passed in a way that allows delayed access of the attribute. The bsddb engine, uses only the typle tuple data passed by the database table. The django engine however, sets media_list to the value needed to obtain a media_list from the media reference table.
Next, pers.marker or pers.media_list is called:
@property
if not isinstance(self._media_list, list):
#delayed retrieval of media list from the engine using the key
self._marker _media_list = self._engine.get_medialist(self._media_list)
return self._media_list
So, as _marker is not initialized, the engine is used to obtain the marker from the data. Same for _media_list. Note that media_list returns a list of MediaRef objects, which however will use themselves delayed access to further unpack themselves as needed, so a minimal overhead has happened.
 
It is important to note here that media_list is in reality defined in the MediaBase() object, not in Person, as Person inherits from MediaBase. However, unpack_person must take this entire inheritence tree into account. This must be designed cleverly, allowing for the multiple inheritence available in gen.lib. Ideas??
 
==== Unpack and slots ====
To allow to init an object from another object, it is needed to load over the private/protected attributes without extra processing, so that delayed access can continue in the new object. That is, we cannot access eg .marker in the other object, we need to assign directly __marker.
 
To achieve this, all non-property attributes are added in the __slots__ (this does not work good with multiple inheritance, so probably not an option) list of the object, and an unpack method is created that can list them out for assignment. With the example above
 
def __init__(self, data, source=source):
DelayedAccess.__init__(self)
if source:
(self.private, self.__marker, self.__media_list) = source.unpack()
else:
(self.private, self.__marker, self.__media_list) = self._engine.unpack_person(data)
 
Where the unpack returns the private/protected variables:
 
def unpack(self):
return (self.private, self.__marker, self.__media_list)
 
As in the previous section, it is important to note here that media_list is in reality defined in the MediaBase() object, not in Person, as Person inherits from MediaBase. So, the unpack method must take this entire inheritence tree into account. This must be designed cleverly, allowing for the multiple inheritence available in gen.lib. Ideas??
We would want to avoid that adding a field means we need to edit all inheriting objects because the unpack needs to change everywhere. Well, not that big deal probably, because present un/serialize does it already like that. Probably, it is advantageous to use a construct:
 
self.__pack(source.unpack())
 
This needs to be designed cleverly because we want really fast __init__ and assign. Looking at the present serialize in eg Address:
def serialize(self):
"""
Convert the object to a serialized tuple of data.
"""
return (PrivacyBase.serialize(self),
SourceBase.serialize(self),
NoteBase.serialize(self),
DateBase.serialize(self),
LocationBase.serialize(self))
 
def unserialize(self, data):
"""
Convert a serialized tuple of data to an object.
"""
(privacy, source_list, note_list, date, location) = data
PrivacyBase.unserialize(self, privacy)
SourceBase.unserialize(self, source_list)
NoteBase.unserialize(self, note_list)
DateBase.unserialize(self, date)
LocationBase.unserialize(self, location)
return self
 
In the worst case unpack needs to work likewise.
 
 
==== getters and setters ====
 
The typical get and set methods in gen.lib would be deprecated. For 3.3 it would print a Deprecated warning, for 3.4 they should be completely removed.
 
==== bsddb get_raw methods ====
 
The get_raw_person_data and friends methods would become private/protected to the bsddb. They should not be used outside gen.db, so the code in the models will no longer depend on it, allowing for a backend based on another bsddb schema or another database
 
=== Advantages ===
 
The advantages of this approach are:
 
* the delayed access is behind the scenes, and via a standard easy to understand mechanism. The hard part of obtaining data is all in the db code in gen.db, and the engine code for a db in gen.lib.
 
* we can move more freely to another database schema. This might be several things: add bsddb tables, or use an sql backend. Upgrade of bsddb could even be done while supporting still normal read of the old bsddb layout (so without expensive upgrade before you can access the data). The only thing that would be needed is write a new engine for the new schema. As an example, suppose we add type tables to store all used custom types, then this change to bsddb can be done without influence on how gen.lib works. In the present setup serialize/unserialize must be changed.
 
* In the future, the engine could be used for more advanced stuff. Eg, doing Person().obtain(name="McDonald") could be implemented. In that case, obtain accesses the engine and does the query. Note that this is ''not'' the aim of the change, it is just a remark that this is a possibility.
 
= References =
# - [http://old.nabble.com/Lazy-Evaluation-in-Gramps-ts26940237.html mailing list discussion]
# - [http://www.gramps-project.org/bugs/view.php?id=3476 Lazy experiment (patch)]
# - [http://blog.gramps-project.org/?p=211 Blog post discussing ideas]
 
[[Category:GEPS|F]]