Difference between revisions of "Addon:QueryGramplet"

From Gramps
Jump to: navigation, search
(Examples)
(32 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
{{Third-party plugin}}
 +
 
The Query Gramplet takes SQL-like queries and produces a Quick View.
 
The Query Gramplet takes SQL-like queries and produces a Quick View.
  
The QueryGramplet in gramps-addons/trunk for gramps/master can now
+
= Examples =
SELECT, UPDATE, and DELETE. Some examples (keywords are shown
+
 
capitalized, but the SQL parser is case-insensitive):
+
The QueryGramplet in Gramps 5.0 can SELECT, UPDATE, and DELETE. Some examples (keywords are shown
 +
capitalized, but the SQL parser is case-insensitive; fields that are capitalized are ''macros'' and must be capitalized, see below for more information):
 +
 
 +
<pre>
 +
DELETE FROM person WHERE GIVEN == "Travis";
 +
 
 +
SELECT * FROM person LIMIT 10;
 +
 
 +
SELECT gramps_id, GIVEN, SURNAME FROM person;
 +
 
 +
SELECT event_ref_list[0].ref FROM person;
 +
 
 +
UPDATE person SET GIVEN="Gary" WHERE GIVEN == "Travis";
  
DELETE FROM person WHERE primary_name.first_name == "Travis";
+
SELECT gramps_id FROM person where ROWNUM < 10;
  
SELECT * FROM person;
+
SELECT gramps_id FROM person LIMIT 5;
  
SELECT gramps_id, primary_name.first_name, primary_name.surname_list.0.surname FROM person;
+
SELECT gramps_id FROM person LIMIT 20,30;
  
SELECT event_ref_list.0.ref FROM person;
+
SELECT gramps_id, father_handle.SURNAME, mother_handle.SURNAME from family;  
  
UPDATE person SET primary_name.first_name="Gary"  
+
UPDATE gramps_id SET tag_list = Tag("Betty") FROM person WHERE "Betty" in primary_name.first_name;
  WHERE primary_name.first_name == "Travis"
+
 
 +
</pre>
 +
 
 +
Hints:
 +
 
 +
* You may want to do a general SELECT first ("SELECT * FROM table")--- that will show you the names of fields
 +
* The query will automatically outer-join tables (use FLAT to not join)
 +
* Assigning to a list will append onto it
 +
* Use Tag("name") to lookup or create a new tag
 +
* Use Date(year[, month[, day]]) to create a date
 +
* Use TODAY for a date create for today
 +
* You have access to these libraries/functions: _ (for translations), re, random, db (database)
 +
 
 +
Other options:
 +
 
 +
* FLAT - do not create extra rows via a JOIN
 +
* EXPAND - do automatic JOINs
 +
* RAW - no extra processing
 +
* NORAW - follow handles, etc
  
 
This API is made possible through the generic struct/json interface. It is very little code, because it relies on these generic structures. It should be able to be made solid enough to expose to users (say as a generic filter). The parser can be made more user friendly... it may just throw an error currently.
 
This API is made possible through the generic struct/json interface. It is very little code, because it relies on these generic structures. It should be able to be made solid enough to expose to users (say as a generic filter). The parser can be made more user friendly... it may just throw an error currently.
  
 
I'd be interested in any limitations you find, or enhancement ideas.
 
I'd be interested in any limitations you find, or enhancement ideas.
 +
 +
= SQL =
 +
 +
Here is the grammar for the subset of SQL supported. The SELECT, UPDATE, DELETE, and LIMIT clauses may be in any order. The WHERE clause (if used) must be last.
 +
<pre>
 +
SELECT expr1 [as var1][, expr2 [as var2], ...] FROM table [LIMIT number1[, number2]] [WHERE expression];
 +
 +
UPDATE table SET field1=expr1[, field2=expr2, ...] [LIMIT number1[, number2]] [WHERE expression];
 +
 +
DELETE FROM table [LIMIT number1[, number2]] [WHERE expression];
 +
 +
... [FLAT | EXPAND] ...
 +
 +
... [RAW | NORAW] ...
 +
</pre>
 +
'''table''' is one of:
 +
 +
* person
 +
* place
 +
* repository
 +
* event
 +
* citation
 +
* source
 +
* tag
 +
* media
 +
* family
 +
 +
Other items:
 +
 +
* '''expr''' is a field, *, or expression
 +
* '''var''' is an alias
 +
* '''number1''' by itself is maximum number of rows to select
 +
* '''number1''' with '''number2''' is start, stop (first row is zero)
 +
* '''expression''' is any valid Python expression
 +
 +
'''expression''' and '''expr''' may use:
 +
 +
* random.random() (or other random method)
 +
* ROWNUM (zero-based counter)
 +
* col[N] (alias to column)
 +
* aliases
 +
* '''object''' - the primitive gen.lib object (such as Person, Family, etc)
 +
 +
RAW/NORAW: does not turn the results into strings, but leaves the selected values as raw Python. The default is NORAW. Once set, the new setting will remain the default for this session.
 +
 +
FLAT/EXPAND: if FLAT, then the rows are not cross-product JOINED with other multi-valued columns, but rather left as LISTS. Default is EXPAND. Once set, the new setting will remain the default for this session.
 +
 +
The following shortcuts (also called "macros") can be used in expressions and as a field:
 +
 +
* '''SURNAME''', short for "primary_name.surname_list[0].surname"
 +
* '''GIVEN''', short for "primary_name.first_name"
 +
 +
A ''macro'' is a low-level text replacement system. We could add other macros, and even allow users to define their own.
 +
 +
== Pre-Defined Functions and Libraries ==
 +
 +
The following are defined for use in your queries:
 +
 +
* Tag(name) - Create or lookup a tag by its name
 +
* re - The Python regular expression library
 +
* random - The Python random library
 +
* db - the current Gramps database
 +
* sdb - Simple Database API to the database
 +
* Today() - a Gramps Date object set to today's date
 +
* Date() - creates a Gramps Date object
 +
* lib - to access gramps.gen.lib object definitions
 +
* _(text) - for translations
 +
 +
Examples:
 +
<pre>
 +
SELECT gramps_id, primary_name.surname_list.surname
 +
FROM person
 +
WHERE any([re.match("Sm.*th", name) for name in col[1]]);
 +
</pre>
 +
Searches all primary_name surnames to find names that start with "Sm" and end in "th". col[1] is primary_name.surname_list.surname, which is a list of surnames.
 +
 +
UPDATE person SET tag_list=Tag("Smith") WHERE SURNAME == "Smith";
 +
 +
== Lists ==
 +
 +
When a attribute is a list, you can select elements from items in the list, and also filter the list. For example, consider a person's parent_family_list. You can select only a single component, say private, of the parent family like:
 +
 +
SELECT parent_family_list("private") FROM person;
 +
 +
This would select only the private component from the parent families.
 +
 +
Likewise, you can filter the list to, say, only show those families that are private:
 +
 +
SELECT parent_family_list(private=True) FROM person;
 +
 +
This will only show (in the finally selected people) the parent families that are private.
 +
 +
Finally, you can both limit, and select from a list:
 +
 +
SELECT parent_family_list("gramps_id", private=True) FROM person;
 +
 +
That will limit the list to be a list of family gramps_id for private families.
 +
 +
You can delete an entire list by assigning None to it:
 +
 +
UPDATE note_list=None from person;
 +
 +
You can delete an item in a list by assigning None to it:
 +
 +
UPDATE note_list[0]=None from person;
 +
 +
= Notes =
  
 
Some notes on use:
 
Some notes on use:
  
1) Most SQL clauses (UPDATE table, FROM table, SELECT ..., SET field=value, ...) can appear in any position, any order
+
1) Most SQL clauses (UPDATE table, FROM table, SELECT ..., SET field=value, ..., LIMIT ...) can appear in any position, any order
  
 
2) ...except the WHERE clause: it must be last; this is because the WHERE clause is not parsed, because:
 
2) ...except the WHERE clause: it must be last; this is because the WHERE clause is not parsed, because:
  
3) The WHERE clause uses any valid Python expressions. (May need to import some libraries such as random, to have ready any possible
+
3) The WHERE clause uses any valid Python expressions. It imports some libraries (such as random), to have ready possible
expression needed)
+
expression needed.
 +
 
 +
SELECT * from person WHERE random.random() < .1;
 +
 
 +
This selects records where each has a 10% chance of being selected.
  
4) The SELECT fields currently use a dotted notation for list references. Use "event_ref_list.0" rather than "event_ref_list[0]".
+
4) The SELECT fields use the bracketed notation for list references. Use "event_ref_list[0]".
  
5) JOINS are not necessary, because it automatically looks up all relations through the handles.
+
5) JOINS are not necessary, because it automatically looks up all relations through the handles. In a SELECT, columns with multiple values in a list will appear as an outer-join with other values in the row.
  
6) UPDATE currently only works on the primary object, not on the joined object. For example, you can't update the birth date of an
+
6) UPDATE will work on any field, through a joined object or on the primary object. For example, you can update the birth date of an
event through the person's referenced events. I think this can be fixed.
+
event through the person's referenced events.  
  
 
7) Tables are lowercase, single (not plural) form (eg, person, tag, event).
 
7) Tables are lowercase, single (not plural) form (eg, person, tag, event).
  
8) Need to implement LIMIT number; LIMIT start, stop; and WHERE ROWNUM < number
+
8) Implemented "LIMIT number", "LIMIT start, stop", and "WHERE ROWNUM < number" (ROWNUM can be used in any expression).
  
 
9) Field names are the actual names of the fields of the gramps.gen.lib objects, verbatim, no differences. You might need to
 
9) Field names are the actual names of the fields of the gramps.gen.lib objects, verbatim, no differences. You might need to
Line 45: Line 188:
 
10) .handle or .ref automatically look up their references.
 
10) .handle or .ref automatically look up their references.
  
11) Shortcut: you can use col[N] in the WHERE clause to reference a column selected.
+
11) Shortcut: you can use col[N] in the WHERE clause to reference a column selected. N is zero-based.
  
 
  SELECT gramps_id, private FROM person WHERE not col[1];
 
  SELECT gramps_id, private FROM person WHERE not col[1];
 +
 +
That will select all people where private is not True. That would include None (non-existent record) and False values. To select only False values, use:
 +
 +
SELECT gramps_id, private FROM person WHERE col[1] == False;
 +
 +
That will not select None values.
  
 
12) If an object doesn't match any selected field, it just doesn't show. For example, to find all of the people with at least two
 
12) If an object doesn't match any selected field, it just doesn't show. For example, to find all of the people with at least two
 
surnames on their primary name, use:
 
surnames on their primary name, use:
  
  SELECT primary_name.surname_list.1 FROM person;
+
  SELECT primary_name.surname_list[1] FROM person;
  
 
or
 
or
  
  SELECT gramps_id, primary_name.surname_list.1 FROM person WHERE col[1];
+
  SELECT gramps_id, primary_name.surname_list[1] FROM person WHERE col[1];
  
13) In a SELECT (for speed reasons), you need to reference a field before you can use it. That is not necessary in the UPDATE or DELETE statements. (The idea here is that SELECTS are done quite frequently, but UPDATES are done rarely, and it doesn't matter if those take a little longer). Maybe we can relax this constraint.
+
13) You do not need to reference a field before you can use it in the WHERE clause.  
  
14) The semicolon is optional
+
14) The semicolon is optional.
  
15) Be careful selecting all fields from all records... that could take up a lot of memory, and bring down gramps.
+
15) Be careful selecting all fields from all records... that could take up a lot of memory, and bring down Gramps.
  
 
16) This should be fairly fast, but it does call eval(). This might make things a little slower, but made the code much easier to write. And it does use the full power of python.
 
16) This should be fairly fast, but it does call eval(). This might make things a little slower, but made the code much easier to write. And it does use the full power of python.
  
17) You can use parens in an "UPDATE table SET field=value" value. Something like:
+
17) You can use parentheses in an "UPDATE table SET field=value" value. Something like:
  
 
  UPDATE table SET field=(field + 1);
 
  UPDATE table SET field=(field + 1);
Line 72: Line 221:
 
but that hasn't been well-tested. (Speaking of testing, there is a Vassilii-inspired unittest with the QueryQuickview... will add more there).
 
but that hasn't been well-tested. (Speaking of testing, there is a Vassilii-inspired unittest with the QueryQuickview... will add more there).
  
18) The primary_name... stuff is really long and verbose. Maybe we need some "virtual columns".
+
18) The primary_name... stuff is really long and verbose. See "shortcuts" above.
  
 
19) Fields that contain other objects, or lists of objects, will show as dictionaries and lists of dictionaries. You can refine those fields by further specifying subparts. Maybe we should not show these, or show in another form...
 
19) Fields that contain other objects, or lists of objects, will show as dictionaries and lists of dictionaries. You can refine those fields by further specifying subparts. Maybe we should not show these, or show in another form...
 +
 +
20) If a selected field does not exist in a record, then it will have a value of None. For example, if you are selecting those people that have a second surname on their primary name, and there are some people who do not have a second surname, it will appear as None. If all columns are None, then the item will not be selected at all.
 +
 +
21) If you know that only one value will match, then a "LIMIT 1" may be a way to speed up the query.
  
 
= Older documentation for the QueryGramplet in Gramps 3.4 =
 
= Older documentation for the QueryGramplet in Gramps 3.4 =
  
This has different table names (people rather than person), and other names that are different (surname vs primary_name.surname_list.0.surname).
+
This has different table names (people rather than person), and other names that are different (surname vs primary_name.surname_list[0].surname).
  
 
[[Image:QueryGramplet.jpg|thumb|400px|left]]
 
[[Image:QueryGramplet.jpg|thumb|400px|left]]
Line 100: Line 253:
 
[[Image:QuerySmith.png|thumb|left|400px]]
 
[[Image:QuerySmith.png|thumb|left|400px]]
 
{{-}}
 
{{-}}
 
+
<pre>
$ select given_name, surname from people;
+
$ select given_name, surname from people;
 
   
 
   
$ select * from sources;
+
$ select * from sources;
 
 
$ select * from events;
 
  
$ select * from families;
+
$ select * from events;
  
 +
$ select * from families;
 +
</pre>
 
[[Category:Plugins]]
 
[[Category:Plugins]]
 
[[Category:Developers/General]]
 
[[Category:Developers/General]]
 
[[Category:Reports]]
 
[[Category:Reports]]
 
[[Category:Gramplets]]
 
[[Category:Gramplets]]
 +
[[Category:Views]]

Revision as of 18:10, 17 August 2015

Gramps-notes.png

Please use carefully on data that is backed up, and help make it better by reporting any comments or problems to the author, or issues to the bug tracker
Unless otherwise stated on this page, you can download this addon by following these instructions.
Please note that some Addons have prerequisites that need to be installed before they can be used.
This Addon/Plugin system is controlled by the Plugin Manager.


The Query Gramplet takes SQL-like queries and produces a Quick View.

Examples

The QueryGramplet in Gramps 5.0 can SELECT, UPDATE, and DELETE. Some examples (keywords are shown capitalized, but the SQL parser is case-insensitive; fields that are capitalized are macros and must be capitalized, see below for more information):

DELETE FROM person WHERE GIVEN == "Travis";

SELECT * FROM person LIMIT 10;

SELECT gramps_id, GIVEN, SURNAME FROM person;

SELECT event_ref_list[0].ref FROM person;

UPDATE person SET GIVEN="Gary" WHERE GIVEN == "Travis";

SELECT gramps_id FROM person where ROWNUM < 10;

SELECT gramps_id FROM person LIMIT 5;

SELECT gramps_id FROM person LIMIT 20,30;

SELECT gramps_id, father_handle.SURNAME, mother_handle.SURNAME from family; 

UPDATE gramps_id SET tag_list = Tag("Betty") FROM person WHERE "Betty" in primary_name.first_name;

Hints:

  • You may want to do a general SELECT first ("SELECT * FROM table")--- that will show you the names of fields
  • The query will automatically outer-join tables (use FLAT to not join)
  • Assigning to a list will append onto it
  • Use Tag("name") to lookup or create a new tag
  • Use Date(year[, month[, day]]) to create a date
  • Use TODAY for a date create for today
  • You have access to these libraries/functions: _ (for translations), re, random, db (database)

Other options:

  • FLAT - do not create extra rows via a JOIN
  • EXPAND - do automatic JOINs
  • RAW - no extra processing
  • NORAW - follow handles, etc

This API is made possible through the generic struct/json interface. It is very little code, because it relies on these generic structures. It should be able to be made solid enough to expose to users (say as a generic filter). The parser can be made more user friendly... it may just throw an error currently.

I'd be interested in any limitations you find, or enhancement ideas.

SQL

Here is the grammar for the subset of SQL supported. The SELECT, UPDATE, DELETE, and LIMIT clauses may be in any order. The WHERE clause (if used) must be last.

SELECT expr1 [as var1][, expr2 [as var2], ...] FROM table [LIMIT number1[, number2]] [WHERE expression];

UPDATE table SET field1=expr1[, field2=expr2, ...] [LIMIT number1[, number2]] [WHERE expression];

DELETE FROM table [LIMIT number1[, number2]] [WHERE expression];

... [FLAT | EXPAND] ...

... [RAW | NORAW] ...

table is one of:

  • person
  • place
  • repository
  • event
  • citation
  • source
  • tag
  • media
  • family

Other items:

  • expr is a field, *, or expression
  • var is an alias
  • number1 by itself is maximum number of rows to select
  • number1 with number2 is start, stop (first row is zero)
  • expression is any valid Python expression

expression and expr may use:

  • random.random() (or other random method)
  • ROWNUM (zero-based counter)
  • col[N] (alias to column)
  • aliases
  • object - the primitive gen.lib object (such as Person, Family, etc)

RAW/NORAW: does not turn the results into strings, but leaves the selected values as raw Python. The default is NORAW. Once set, the new setting will remain the default for this session.

FLAT/EXPAND: if FLAT, then the rows are not cross-product JOINED with other multi-valued columns, but rather left as LISTS. Default is EXPAND. Once set, the new setting will remain the default for this session.

The following shortcuts (also called "macros") can be used in expressions and as a field:

  • SURNAME, short for "primary_name.surname_list[0].surname"
  • GIVEN, short for "primary_name.first_name"

A macro is a low-level text replacement system. We could add other macros, and even allow users to define their own.

Pre-Defined Functions and Libraries

The following are defined for use in your queries:

  • Tag(name) - Create or lookup a tag by its name
  • re - The Python regular expression library
  • random - The Python random library
  • db - the current Gramps database
  • sdb - Simple Database API to the database
  • Today() - a Gramps Date object set to today's date
  • Date() - creates a Gramps Date object
  • lib - to access gramps.gen.lib object definitions
  • _(text) - for translations

Examples:

SELECT gramps_id, primary_name.surname_list.surname 
FROM person 
WHERE any([re.match("Sm.*th", name) for name in col[1]]);

Searches all primary_name surnames to find names that start with "Sm" and end in "th". col[1] is primary_name.surname_list.surname, which is a list of surnames.

UPDATE person SET tag_list=Tag("Smith") WHERE SURNAME == "Smith";

Lists

When a attribute is a list, you can select elements from items in the list, and also filter the list. For example, consider a person's parent_family_list. You can select only a single component, say private, of the parent family like:

SELECT parent_family_list("private") FROM person;

This would select only the private component from the parent families.

Likewise, you can filter the list to, say, only show those families that are private:

SELECT parent_family_list(private=True) FROM person;

This will only show (in the finally selected people) the parent families that are private.

Finally, you can both limit, and select from a list:

SELECT parent_family_list("gramps_id", private=True) FROM person;

That will limit the list to be a list of family gramps_id for private families.

You can delete an entire list by assigning None to it:

UPDATE note_list=None from person;

You can delete an item in a list by assigning None to it:

UPDATE note_list[0]=None from person;

Notes

Some notes on use:

1) Most SQL clauses (UPDATE table, FROM table, SELECT ..., SET field=value, ..., LIMIT ...) can appear in any position, any order

2) ...except the WHERE clause: it must be last; this is because the WHERE clause is not parsed, because:

3) The WHERE clause uses any valid Python expressions. It imports some libraries (such as random), to have ready possible expression needed.

SELECT * from person WHERE random.random() < .1;

This selects records where each has a 10% chance of being selected.

4) The SELECT fields use the bracketed notation for list references. Use "event_ref_list[0]".

5) JOINS are not necessary, because it automatically looks up all relations through the handles. In a SELECT, columns with multiple values in a list will appear as an outer-join with other values in the row.

6) UPDATE will work on any field, through a joined object or on the primary object. For example, you can update the birth date of an event through the person's referenced events.

7) Tables are lowercase, single (not plural) form (eg, person, tag, event).

8) Implemented "LIMIT number", "LIMIT start, stop", and "WHERE ROWNUM < number" (ROWNUM can be used in any expression).

9) Field names are the actual names of the fields of the gramps.gen.lib objects, verbatim, no differences. You might need to look up what you need... no help yet from this interface (although I am working on defining a built-in Schema that could help)

10) .handle or .ref automatically look up their references.

11) Shortcut: you can use col[N] in the WHERE clause to reference a column selected. N is zero-based.

SELECT gramps_id, private FROM person WHERE not col[1];

That will select all people where private is not True. That would include None (non-existent record) and False values. To select only False values, use:

SELECT gramps_id, private FROM person WHERE col[1] == False;

That will not select None values.

12) If an object doesn't match any selected field, it just doesn't show. For example, to find all of the people with at least two surnames on their primary name, use:

SELECT primary_name.surname_list[1] FROM person;

or

SELECT gramps_id, primary_name.surname_list[1] FROM person WHERE col[1];

13) You do not need to reference a field before you can use it in the WHERE clause.

14) The semicolon is optional.

15) Be careful selecting all fields from all records... that could take up a lot of memory, and bring down Gramps.

16) This should be fairly fast, but it does call eval(). This might make things a little slower, but made the code much easier to write. And it does use the full power of python.

17) You can use parentheses in an "UPDATE table SET field=value" value. Something like:

UPDATE table SET field=(field + 1);

but that hasn't been well-tested. (Speaking of testing, there is a Vassilii-inspired unittest with the QueryQuickview... will add more there).

18) The primary_name... stuff is really long and verbose. See "shortcuts" above.

19) Fields that contain other objects, or lists of objects, will show as dictionaries and lists of dictionaries. You can refine those fields by further specifying subparts. Maybe we should not show these, or show in another form...

20) If a selected field does not exist in a record, then it will have a value of None. For example, if you are selecting those people that have a second surname on their primary name, and there are some people who do not have a second surname, it will appear as None. If all columns are None, then the item will not be selected at all.

21) If you know that only one value will match, then a "LIMIT 1" may be a way to speed up the query.

Older documentation for the QueryGramplet in Gramps 3.4

This has different table names (people rather than person), and other names that are different (surname vs primary_name.surname_list[0].surname).

QueryGramplet.jpg


SQL-like:

select FIELDS from TABLE where PYTHON-BOOLEAN-EXPRESSION;

TABLES: FIELDS:

  1. people: given_name, surname, suffix, title, birth_date, death_date, gender, birth_place, death_place, change, marker
  2. families:
  3. sources:
  4. events:

Examples:

$ select * from people where surname.startswith("Smith")
QuerySmith.png


$ select given_name, surname from people;
 
$ select * from sources;

$ select * from events;

$ select * from families;