Changes

Jump to: navigation, search

Database Query API

3,310 bytes removed, 17:01, 6 February 2016
no edit summary
Starting with Gramps 5.0, there is a new method on manner to select data from the database object called "select." The goal of select this API is to make it possible easy to select most any data, based on most any by different criteria, sorted in most any order. In addition, where possible, the and make selection would be as fast as possible.
= DatabaseThis page documents the interface for developers.select() =
The database select method works as follows:= QuerySet =
db.select(TABLE-NAME, SELECT-LIST, where=WHERE-EXPRESSION, order_by=ORDER-BY-LIST, start=START-ROW, limit=LIMIT-ROW-COUNT)Each database backend now supports the following QuerySet objects:
The following are required:* db.Person* db.Family* db.Note* db.Source* db.Citation* db.Place* db.Repository* db.Event* db.Tag
* TABLE-NAME - the name Each of the table. That would be "Person", Family", "Media", "Repository", "Place", "Note", "Source", "Citation", or "Tag"* SELECT-LIST - these QuerySets allows you to chain together a list series of dot-separated field path strings from this object type (eg, "gramps_id", "primary_nameQuerySet method calls.first_name", etc)QuerySet supports the following methods:
Optional arguments:* .proxy(name, args...) - applies a named-proxy ("living", "private", or "referenced")* .filter(obj, args...) - applies a FilterObject or Python callable* .map(f) - applies a function to all selected objects* .count() - returns the number of matches* .limit() - set start or limit or selection* .order(fieldname, ...) - order-by fields given as strings; use "-name" for descending* .select([fields...]) - returns generator* .tag(tag_name) - puts a tag on all selected items, if not tagged with tag_name already
* WHERE-EXPRESSION - a matching expression, such as ("gramps_id", "==", "I0001")Database.QuerySet. These can be nested select(see below)* ORDER-BY-LIST - a list of dot-separated field path strings, each paired with a sorting direction, for example [("gramps_id", "ASC")]* START-ROW - the row number on which to start. Default is 0, meaning start at beginning* LIMIT-ROW-COUNT - the limit of how many rows to return. Default is -1, meaning no limit==
As an example, consider selecting the gramps_id from all people who have a surname of "Smith" and whose name begins with a "J", ordered by the gramps_id:
db.Person.select("Person", ) ["gramps_id"], where=["AND", [("primary_name.surname_list.0.surname", "=", "Smith"), ("primary_nameDatabase.first_name", "LIKE", "J%")]], order_by=[("gramps_id", "ASC")]) The parameters "start" and "limit" are used for paged selectsQuerySet. The result will also return the total of the selection as if start or limit had not been given filter(see Result below). === WHERE-EXPRESSION ===
The WHERE-EXPRESSION must be in one of these four forms (tuples or lists allowed):
* None - no == Database.QuerySet.filter applied to data* (dot-separated field path string, COMPARISON-OPERATOR, value)* ["AND" | "OR", [WHERE-EXPRESSION, WHERE-EXPRESSION, ...]]* ["NOT", WHERE-EXPRESSION]==
COMPARISON-OPERATOR is one of: * "LIKE" - use with "%" wildcard* "="* "!=", or "<>"* "<* "<="* ">"* ">="* "IS"* "IS NOT db.Person.order("* gramps_id"IN" Currently, value is limited to be an actual value, such as None, a str, int, or bool. Field names are not currently allowed on the right-hand side of the operator).select()
Examples:
* db.Person.order("primary_name.first_name", "=", "Marygramps_id")* ["OR", [("primary_name.first_name", "=", "Mary"), select("primary_name.first_name", "LIKE", "Eliza%")]]* ["NOT", ("primary_namedb.Person.first_name", "=", "Mary")] Note that where expressions may be recursively nested: * ["NOT", ["AND", order("primary_name.first_name", "=", "Mary"), ("gramps_id", "=", "I0003")]] === ORDER-BY-LIST === The ORDER-BY-LIST is either None or is a list of dotted-field path strings paired with "ASC" (ascending) or "DESC" (descending). Examples: * [("gramps_id", "DESC")]* [("primary_name.first_name", "ASC"), select("gramps_id", "DESC")]
The first-listed field name is the primary sort, followed by secondary sorts. In the last example above, the data would be sorted first by first_name ascending, and inside that, by gramps_id descending, such as:
The database.select method can also do joins across primary objects in each of the WHERE, ORDER, or select fields. For example consider this request:
>>> db.Family.select("Family", ["mother_handle.gramps_id"])
This will return the gramps_id's of the mothers of each family, if there is a mother.
You can join across multiple tables with more complex queries, such as:
>>> list(db.Family.select("Family", ["mother_handle.event_ref_list.ref.gramps_id"]))
That returns all of the Event gramps_ids for all events for the mother (Person) of all families, looking something like:
[{"mother_handle.event_ref_list.ref.gramps_id": ["E0001", "E0002"]}, {"mother_handle.event_ref_list.ref.gramps_id": "E0003"}, {"mother_handle.event_ref_list.ref.gramps_id": ["E0003", "E0001"]},
...
]
Note that these joins are done per record and are therefore not optimized---each requires another database access. In the future, these could be optimized via a SQL JOIN.
 
== Result ==
 
The database.select() method will always return a Result. A result is a collection of all of the data (ie, it is not a generator). You can find out how many records are returned with len(result).
 
Results are a subclass of the Python list object, with additional properties:
 
* result.total - total number of records (in the case of start or limit is given. In that case len(result) != result.total)
* result.time - the time in seconds it took to collect the data
* result.expanded - whether the data needed to be expanded (unpickled and primary objects created). BSDDB selects alway are expanded
* result.query - the actual SQL query, if one
 
When writing selects, the goal is to always have a query, and to always have expanded be False. Those will be the fastest queries. However, of course, that is not always possible. With BSDDB databases, there will never be a query, and the data will always be expanded.
 
Each element in the Result list is a dictionary of dotted-field path strings and their data (if any). If a value is None, it either means that the value is None, or that there is no such field for this primary object. For example:
 
[{"gramps_id": "I0000"},
{"gramps_id": "I0001"},
{"gramps_id": "I0002"},
{"gramps_id": "I0003"},
...]
 
== self ==
 
There is a special field name named "self" that can be used to access the preceding element in the dotted-field path as an object.
 
Examples:
 
>>> db.select("Person", ["handle.self"])
## Returns a Result of [{"handle.self": Person-object}, ...]
 
>>> db.select("Family", ["mother_handle.self"])
## Returns a Result of [{"mother_handle.self": Person-object}, ...]
 
>>> db.select("Person", ["event_ref_list.ref.self"])
## Returns a Result of [{"event_ref_list.ref.self": Event-object}, ...]
== Implementation ==
Using he dotted-field path string field-based Select API, we can write code as follows. Consider that we want to select the handle of all people whose surname is "Smith", given name starts with a "J", and ordered by gramps_id:
list(db.select("Person", ["handle", "gramps_id"], where=["AND", [.filter("primary_name.surname_list.0.surname", "primary_name__surname_list__0__surname=", "Smith"), ("primary_name.first_name", "LIKE", primary_name__first_name__LIKE="J%")]], order_by=[.order("gramps_id", "ASC")].select())
This code works on BSDDB as well as DB-API. Let's see the difference in timing on databases that have 187,294 people (created from GenFan, this is 20 full generations).

Navigation menu