GEPS 047: Select data from SQL API
GEP to document discussions about creating an API to allow fast database queries, targeting SQLite and Postgresql.
Contents
- 1 Design Goals
- 2 API
- 2.1 Basic Usage
- 2.2 Simple Examples
- 2.3 The `what` Parameter
- 2.4 Extracting Attributes
- 2.5 Extracting from Arrays (List Comprehensions)
- 2.6 Array Expansion Pattern
- 2.7 JOINing Tables
- 2.8 Basic JOIN Syntax
- 2.9 JOIN with Additional Conditions
- 2.10 Reverse Direction (from Family to Person)
- 2.11 JOIN with Mother Relationship
- 2.12 JOIN with Multiple Conditions
- 2.13 JOIN with ORDER BY
- 2.14 JOIN with Variable-Index Array Access
- 2.15 JOIN with Variable-Index Array Access and Additional Conditions
- 2.16 How JOIN Detection Works
- 2.17 Available Tables
- 2.18 Notes on JOINs
- 2.19 The `where` Parameter
- 2.20 Basic Filtering
- 2.21 Filtering with `any()` and List Comprehensions
- 2.22 Combining Filters
- 2.23 Array Expansion in `where`
- 2.24 Combining `what` and `where` with List Comprehensions
- 2.25 The `order_by` Parameter
- 2.26 Common Patterns
- 2.27 Getting All Handles
- 2.28 Filtering by Array Length
- 2.29 Extracting from Nested Structures
- 2.30 Working with Event References
- 2.31 Variable-Index Array Access
- 2.32 Limitations
- 2.33 Hints and Best Practices
- 2.34 Examples by Use Case
- 2.35 Find All Persons with Birth Events
- 2.36 Extract All Event Reference Handles
- 2.37 Find People with Families
- 2.38 Find People Who Are Fathers (Using JOIN)
- 2.39 Find Married Families with Male Fathers
- 2.40 Get Surnames of All Males, Sorted
- 2.41 Extract Role Values from Event References with Conditions
- 3 See also
Design Goals
1. Easy for Gramps developers to use (no new syntax, no ORMs) 2. Provide fast access to select data, targeting filters 3. Ability to be used from SQLite and Postgresql 4. Easily used by Gramps Web 5. Targeting exploiting Gramps 6.0 selected_handles in filters
API
This section shows examples using `db.select_from_person()`, `db.select_from_family()`, and other `select_from_*` functions to query the database using Python string expressions.
Basic Usage
The `select_from_*` functions accept three main parameters:
- what: What data to extract (defaults to the entire object)
- where: Filter conditions (optional)
- order_by: Sorting order (optional)
Simple Examples
# Get all person handles
handles = list(db.select_from_person(what="person.handle"))
# Get handles of all males
male_handles = list(
db.select_from_person(
what="person.handle",
where="person.gender == Person.MALE"
)
)
# Get handles sorted by surname
sorted_handles = list(
db.select_from_person(
what="person.handle",
order_by="person.primary_name.surname_list[0].surname"
)
)
The `what` Parameter
The `what` parameter specifies what data to extract from each matching record.
Extracting Attributes
You can access nested attributes using dot notation:
# Get surnames surnames = list(db.select_from_person(what="person.primary_name.surname_list[0].surname")) # Get birth dates birth_dates = list(db.select_from_person(what="person.birth_ref.ref")) # Get primary names names = list(db.select_from_person(what="person.primary_name.first_name"))
Extracting from Arrays (List Comprehensions)
You can extract values from arrays using list comprehensions:
# Extract all role values from event_ref_list
role_values = list(
db.select_from_person(
what="[eref.role.value for eref in person.event_ref_list]"
)
)
# Extract role values with a condition
primary_roles = list(
db.select_from_person(
what="[eref.role.value for eref in person.event_ref_list if eref.role.value == EventRoleType.PRIMARY]"
)
)
# Extract multiple fields
event_data = list(
db.select_from_person(
what="[eref.ref for eref in person.event_ref_list if eref.role.value == EventRoleType.PRIMARY]"
)
)
Note: List comprehensions in `what` return one row per matching array element. If a person has 3 matching event_refs, you'll get 3 rows.
Array Expansion Pattern
For simpler cases, you can use the array expansion pattern:
# Extract role.value from all event_refs
role_values = list(
db.select_from_person(
what="item.role.value",
where="item in person.event_ref_list"
)
)
You can also include person-level fields along with array element fields:
# Get both person.handle and role.value for each event_ref
# Use a list for multiple fields
results = list(
db.select_from_person(
what=["person.handle", "item.role.value"],
where="item in person.event_ref_list"
)
)
# Returns tuples: (handle, role_value)
This is equivalent to the list comprehension but uses a different syntax. When you include `person.handle`, each row will show which person the array element belongs to.
JOINing Tables
You can reference other tables in your queries using natural table prefixes. The system automatically detects table references and generates appropriate JOIN clauses.
Basic JOIN Syntax
Simply reference other tables using their lowercase names (e.g., `family`, `person`, `event`):
# Join person with family - get person and family handles where person is the father
results = list(
db.select_from_person(
what=["person.handle", "family.handle"],
where="person.handle == family.father_handle"
)
)
JOIN with Additional Conditions
You can combine JOIN conditions with other filters:
# Get person and family where person is father and family type is married
results = list(
db.select_from_person(
what=["person.handle", "family.handle", "family.type.value"],
where="person.handle == family.father_handle and family.type.value == FamilyRelType.MARRIED"
)
)
Reverse Direction (from Family to Person)
You can query from any table and JOIN to others:
# Get family and father's information
results = list(
db.select_from_family(
what=["family.handle", "person.handle", "person.primary_name.first_name"],
where="family.father_handle == person.handle"
)
)
JOIN with Mother Relationship
# Get person and family where person is the mother
results = list(
db.select_from_person(
what=["person.handle", "family.handle"],
where="person.handle == family.mother_handle"
)
)
JOIN with Multiple Conditions
# Get person and family with conditions on both tables
results = list(
db.select_from_person(
what=["person.handle", "family.handle"],
where="person.handle == family.father_handle and person.gender == Person.MALE and family.type.value == FamilyRelType.MARRIED"
)
)
JOIN with ORDER BY
# Join and sort results
results = list(
db.select_from_person(
what=["person.handle", "family.handle"],
where="person.handle == family.father_handle",
order_by=["person.handle", "family.handle"]
)
)
JOIN with Variable-Index Array Access
You can join tables using variable-index array access, which is useful for joining through array references:
# Join person to event through the birth event reference
# This uses person.birth_ref_index to find the birth event in event_ref_list
results = list(
db.select_from_person(
what=["person.handle", "event.handle"],
where="person.event_ref_list[person.birth_ref_index].ref == event.handle"
)
)
This is particularly useful for joining to related records through indexed array references. The system automatically handles the variable index (like `person.birth_ref_index`) and generates the appropriate SQL subquery.
JOIN with Variable-Index Array Access and Additional Conditions
You can combine variable-index array access joins with additional filters:
from gramps.gen.lib import EventType
# Join person to event via birth, and filter by event type
results = list(
db.select_from_person(
what=["person.handle", "event.handle", "event.type.value"],
where="person.event_ref_list[person.birth_ref_index].ref == event.handle and event.type.value == EventType.BIRTH"
)
)
This finds all persons with birth events that are actually marked as BIRTH type events.
How JOIN Detection Works
- Table References: When you use `family.handle`, `person.handle`, etc., the system detects these as table references
- Automatic JOIN Generation: JOIN clauses are automatically added to the SQL query
- Join Conditions: The system requires explicit join conditions in your WHERE clause (e.g., `person.handle == family.father_handle`). Join conditions must be explicitly specified - there are no fallback defaults.
- Table Prefixes: When JOINs are present, all table attributes automatically use table prefixes (e.g., `person.json_data`, `family.json_data`)
Available Tables
You can reference these tables in your queries:
- `person` - Person records
- `family` - Family records
- `event` - Event records
- `place` - Place records
- `source` - Source records
- `citation` - Citation records
- `repository` - Repository records
- `media` - Media records
- `note` - Note records
- `tag` - Tag records
Notes on JOINs
1. Table Names: Use lowercase table names (e.g., `family`, not `Family`) to reference tables 2. Class Names: PascalCase names like `Person`, `FamilyRelType` are classes from the environment, not table references 3. Constants: All Gramps constants are available (e.g., `Person.MALE`, `FamilyRelType.MARRIED`, `EventType.MARRIAGE`) 4. Join Type: Currently supports INNER JOIN. The system automatically determines the join condition based on handle relationships 5. Multiple JOINs: You can reference multiple tables in a single query 6. Handle-Only Joins: JOINs are only allowed between handle fields. Valid handle fields include:
- `handle` (the primary handle, e.g., `person.handle`) - `ref` (reference handle, e.g., `event_ref.ref`, `citation_ref.ref`) - Any field ending in `_handle` (e.g., `family.father_handle`, `family.mother_handle`)
Examples of valid joins: - `person.handle == family.father_handle` ✓ - `family.mother_handle == person.handle` ✓ - `event_ref.ref == event.handle` ✓ - `person.event_ref_list[person.birth_ref_index].ref == event.handle` ✓ (variable-index array access) - `person.event_ref_list[0].ref == event.handle` ✓ (constant-index array access)
Examples of invalid joins (will be ignored): - `person.gender == family.type.value` ✗ (not handle fields) - `person.gramps_id == family.gramps_id` ✗ (not handle fields)
The `where` Parameter
The `where` parameter filters which records to include.
Basic Filtering
# Filter by gender
males = list(
db.select_from_person(
what="person.handle",
where="person.gender == Person.FEMALE"
)
)
# Filter by multiple conditions
married_males = list(
db.select_from_person(
what="person.handle",
where="person.gender == Person.UNKNOWN and len(person.family_list) > 0"
)
)
# String operations
smiths = list(
db.select_from_person(
what="person.handle",
where="person.primary_name.surname_list[0].surname.startswith('Smith')"
)
)
Filtering with `any()` and List Comprehensions
Check if any element in an array matches a condition:
# Find persons with any event_ref
persons_with_events = list(
db.select_from_person(
what="person.handle",
where="any([eref for eref in person.event_ref_list])"
)
)
# Find persons with any PRIMARY event_ref
persons_with_primary = list(
db.select_from_person(
what="person.handle",
where="any([eref for eref in person.event_ref_list if eref.role.value == EventRoleType.PRIMARY])"
)
)
# Negation: find persons with no event_refs
persons_without_events = list(
db.select_from_person(
what="person.handle",
where="not any([eref for eref in person.event_ref_list])"
)
)
Combining Filters
You can combine `any()` filters with other conditions:
# Find male persons with any event_ref
male_with_events = list(
db.select_from_person(
what="person.handle",
where="person.gender == Person.MALE and any([eref for eref in person.event_ref_list])"
)
)
Array Expansion in `where`
You can also use array expansion for filtering:
# Get role values from event_refs where role.value == 1
primary_roles = list(
db.select_from_person(
what="item.role.value",
where="item in person.event_ref_list and item.role.value == 1"
)
)
# Include person.handle to see which person each event_ref belongs to
results = list(
db.select_from_person(
what=["person.handle", "item.role.value"],
where="item in person.event_ref_list and item.role.value == 1"
)
)
Combining `what` and `where` with List Comprehensions
You can combine list comprehensions in both `what` and `where` for powerful filtering and extraction:
# Filter persons with any PRIMARY event_ref, then extract role values from PRIMARY event_refs
primary_roles = list(
db.select_from_person(
what="[eref.role.value for eref in person.event_ref_list if eref.role.value == EventRoleType.PRIMARY]",
where="any([eref for eref in person.event_ref_list if eref.role.value == EventRoleType.PRIMARY])"
)
)
# Different conditions: filter persons with any event_ref, extract only PRIMARY roles
mixed = list(
db.select_from_person(
what="[eref.role.value for eref in person.event_ref_list if eref.role.value == EventRoleType.PRIMARY]",
where="any([eref for eref in person.event_ref_list])"
)
)
# Combine with person-level filtering
female_primary = list(
db.select_from_person(
what="[eref.role.value for eref in person.event_ref_list if eref.role.value == EventRoleType.PRIMARY]",
where="person.gender == Person.FEMALE and any([eref for eref in person.event_ref_list if eref.role.value == EventRoleType.PRIMARY])"
)
)
The `order_by` Parameter
Sort results using the `order_by` parameter:
# Sort by surname
sorted_persons = list(
db.select_from_person(
what="person.handle",
order_by="person.primary_name.surname_list[0].surname"
)
)
==== Sort by multiple fields (use a list) ====
sorted_persons = list(
db.select_from_person(
what="person.handle",
order_by=["person.primary_name.surname_list[0].surname", "person.primary_name.first_name"]
)
)
Common Patterns
Getting All Handles
all_handles = list(db.select_from_person(what="person.handle"))
Filtering by Array Length
# Persons with events
with_events = list(
db.select_from_person(
what="person.handle",
where="len(person.event_ref_list) > 0"
)
)
# Persons without events
without_events = list(
db.select_from_person(
what="person.handle",
where="len(person.event_ref_list) == 0"
)
)
Extracting from Nested Structures
# Get all citation handles from persons
citation_handles = list(
db.select_from_person(
what="[cit.handle for cit in person.citation_list]"
)
)
# Get note handles
note_handles = list(
db.select_from_person(
what="[note.handle for note in person.note_list]"
)
)
Working with Event References
# Get all event reference handles
event_ref_handles = list(
db.select_from_person(
what="[eref.ref for eref in person.event_ref_list]"
)
)
# Get PRIMARY event reference handles only
primary_event_refs = list(
db.select_from_person(
what="[eref.ref for eref in person.event_ref_list if eref.role.value == EventRoleType.PRIMARY]"
)
)
Variable-Index Array Access
You can access array elements using a variable index (like `person.birth_ref_index`):
# Get the birth event reference using birth_ref_index
birth_refs = list(
db.select_from_person(
what="person.event_ref_list[person.birth_ref_index]"
)
)
# Get the role value from the birth event reference
birth_roles = list(
db.select_from_person(
what="person.event_ref_list[person.birth_ref_index].role.value"
)
)
# Filter persons who have a valid birth event reference
persons_with_birth = list(
db.select_from_person(
what="person.handle",
where="person.event_ref_list[person.birth_ref_index]"
)
)
# Filter persons whose birth event reference has PRIMARY role
persons_with_primary_birth = list(
db.select_from_person(
what="person.handle",
where="person.event_ref_list[person.birth_ref_index].role.value == EventRoleType.PRIMARY"
)
)
Limitations
1. List Comprehensions in `what`: When using list comprehensions in `what`, you get one row per matching array element. If a person has 5 matching event_refs, you'll get 5 rows.
2. Array Expansion vs List Comprehension:
- Array expansion (`item in person.array_path`) works in `where` and expands the result set - List comprehensions in `what` extract values from arrays - You cannot combine array expansion in `where` with list comprehensions in `what` effectively
3. Complex Boolean Logic: You can use `and` and `or` in `where` clauses with full structure preservation. Most combinations are supported:
All cases work correctly: - Simple combinations: `condition1 and condition2` or `condition1 or condition2` - Same operator throughout: `A and B and C` or `A or B or C` - Mixed operators: `A and B or C and D` - Nested expressions: `(A and B) or (C and D)` - Deeply nested structures: `(A and (B or C)) and D` - `any()` with any boolean structure: `(A and B) or any([...])` or `(A or any([...])) and B` - Array expansion with AND: `(A and B) and (item in person.array)` - Array expansion with OR: `(A and B) or (item in person.array)`
Array Expansion in OR Expressions:
When array expansion is used in an OR expression, the query uses a UNION to combine results:
# This works correctly, including persons with empty arrays:
db.select_from_person(
what="person.handle",
where="(person.gender == Person.MALE and len(person.family_list) > 0) or (item in person.event_ref_list)"
)
Note: If you need one row per person (not per array element), use `any()` instead:
# Returns one row per person (not per array element):
db.select_from_person(
what="person.handle",
where="(person.gender == Person.MALE and len(person.family_list) > 0) or any([item for item in person.event_ref_list])"
)
The difference: `any()` returns one row per person, while array expansion returns one row per array element.
4. Single Condition in List Comprehensions: List comprehensions support only one `if` condition. Multiple conditions should be combined with `and`:
# This works: what="[item.field for item in person.array if item.field1 == 1 and item.field2 == 2]" # Multiple if clauses are not supported: # what="[item.field for item in person.array if item.field1 == 1 if item.field2 == 2]" # Not supported
5. Nested Arrays: Nested array access (arrays within arrays) is not supported.
What works: - Single-level arrays: `person.event_ref_list`, `person.citation_list` - Accessing attributes of array elements: `item.role.value` where `item` is from `person.event_ref_list`
What doesn't work:
- Nested arrays: Accessing arrays that are properties of array elements
- Example: If an event_ref had a `sub_items` array, `item.sub_items` would not work
- The code does not support `json_each` within `json_each` (nested array iteration)
- Arrays within nested objects: `person.some_object.array_field` where `some_object` itself contains arrays
6. Lambda Functions: Lambda functions (callables) are not supported. All arguments (`what`, `where`, `order_by`) must be strings or lists of strings.
Hints and Best Practices
1. Use Constants: All of the Gramps objects and types are available in the environment
# Good: where="eref.role.value == EventRoleType.PRIMARY" where="person.gender == Person.MALE" # Avoid: where="eref.role.value == 1" # What does 1 mean?
2. Empty Arrays: To check for empty arrays, use `len()` or `not any()`:
# Both work: where="len(person.event_ref_list) == 0" where="not any([eref for eref in person.event_ref_list])"
3. Performance:
- List comprehensions in `what` can return many rows if arrays are large - `any()` in `where` is efficient for filtering - Consider filtering in `where` before extracting in `what` to reduce result set size
4. Testing for Existence: Use `any()` to test if any element exists:
# Check if person has any event_refs where="any([eref for eref in person.event_ref_list])"
5. Extracting Multiple Fields:
- For array expansion, you can include both person-level and array element fields:
# Get person.handle along with array element fields
results = list(
db.select_from_person(
what=["person.handle", "item.role.value", "item.ref"],
where="item in person.event_ref_list"
)
)
- For list comprehensions, you can extract multiple fields using tuples:
# Extract multiple fields at once using tuples
refs_and_roles = list(db.select_from_person(what="[(eref.ref, eref.role.value) for eref in person.event_ref_list]"))
# Returns list of tuples: [(ref1, role1), (ref2, role2), ...]
# You can also extract one field at a time if needed
refs = list(db.select_from_person(what="[eref.ref for eref in person.event_ref_list]"))
roles = list(db.select_from_person(what="[eref.role.value for eref in person.event_ref_list]"))
6. Order of Operations:
- `where` filters which records (persons) to include - `what` determines what data to extract from those records - When both use list comprehensions, `where` filters persons first, then `what` extracts from matching array elements
7. Debugging: If a query doesn't work as expected:
- Check that attribute paths are correct (use dot notation) - Verify array names (e.g., `event_ref_list`, not `event_refs`) - Ensure constants are imported and used correctly - Test with simpler queries first, then add complexity
Examples by Use Case
Find All Persons with Birth Events
birth_handles = list(
db.select_from_person(
what="person.handle",
where="any([eref for eref in person.event_ref_list if eref.role.value == EventRoleType.PRIMARY])"
)
)
Extract All Event Reference Handles
# Just the handles
all_event_refs = list(
db.select_from_person(
what="[eref.ref for eref in person.event_ref_list]"
)
)
# With person.handle to see which person each belongs to (using array expansion)
all_event_refs_with_person = list(
db.select_from_person(
what=["person.handle", "item.ref"],
where="item in person.event_ref_list"
)
)
Find People with Families
have_family = list(
db.select_from_person(
what="person.handle",
where="len(person.family_list) > 0"
)
)
Find People Who Are Fathers (Using JOIN)
# Using JOIN to find people who are fathers
fathers = list(
db.select_from_person(
what=["person.handle", "family.handle"],
where="person.handle == family.father_handle"
)
)
Find Married Families with Male Fathers
married_families = list(
db.select_from_person(
what=["person.handle", "family.handle"],
where="person.handle == family.father_handle and person.gender == Person.MALE and family.type.value == FamilyRelType.MARRIED"
)
)
Get Surnames of All Males, Sorted
male_surnames = list(
db.select_from_person(
what="person.primary_name.surname_list[0].surname",
where="person.gender == Person.MALE",
order_by="person.primary_name.surname_list[0].surname"
)
)
Extract Role Values from Event References with Conditions
# Get PRIMARY role values from persons who have PRIMARY events
primary_roles = list(
db.select_from_person(
what="[eref.role.value for eref in person.event_ref_list if eref.role.value == EventRoleType.PRIMARY]",
where="any([eref for eref in person.event_ref_list if eref.role.value == EventRoleType.PRIMARY])"
)
)