Optimizing SQL with Gramps-Connect
This is the first blog post about a new project under the Gramps umbrella: Gramps-Connect. The idea is to take as much of Gramps as we can to the web, focusing on collaboration. There are already many ideas from users and developers, a proposal page detailing some of the planning, a student (Kathy) working on adding and editing data from the web, and a demo website, gramps-connect.org. We are building Gramps-Connect on top of Django, a set of Python web development tools.
One of my goals in working on Gramps-Connect is to get a very large family tree on-line for collaboration among a number of researchers. The organizing property of these data are those people who are descended from John Chenoweth & Mary Calvert in the US (see http://www.chenowethsite.com/ for some of this information).
So far, the privatized dataset has the following numbers:
The GEDCOM file is 43 MB. I imported the file into Gramps Gtk, and exported it to Django (using current trunk base, which will become Gramps 3.2). This took hours (overnight, on a hefty machine). Afterwards, the SQLite file was 228 MB (this file is stored in src/web/sqlite.db).
Now, the question was how would the response time be in viewing this data through the browser? I didn’t pay too much attention to the stats when not logged in (as this is using probably_alive to determine what info to show, is recursive, and needs some speedup work). So I logged in (id = admin/password = gramps), and began to browse the People View. I was disappointed: it took over 30 seconds to view each page of 15 names! (BTW, this view has all names in it, including all alternate names, unlike Gramps Gtk which only shows primary names).
Was this slowness a limitation of Django? Sqlite? Data organization? To narrow down the problem I installed the Django Debug Toolbar, and turned it on (uncommented all debug_toolbar items, and set DEBUG = True in src/web/settings.py) and reloaded the page. The toolbar shows a pane on the right-hand side of the screen. One of the tabs is SQL queries. It showed 311 queries per people page view. Clicking on the Query tab shows all of the queries for that page, and even a graphical bar for each showing the relative percentage time that that query took across the entire time of that page creation.
There were a few query types that were taking about 800 milliseconds (.8 seconds) each. That adds up fast! So I opened up a console window, started sqlite on the Django database (“sqlite3 sqlite.db”) and started poking around. All of the Gramps tables are prefixed with “grampsdb_”. So the Person model is stored in the “grampsdb_person” table. Some Sqlite meta commands:
sqlite> .indices grampsdb_eventref
sqlite> .timer on
I could see the issue by cutting and pasting the SQL query from the browser window to the sqlite prompt, and saw the same timings from the browser:
sqlite> SELECT “grampsdb_eventref”.”id”, “grampsdb_eventref”.”object_type_id”, “grampsdb_eventref”.”object_id”, “grampsdb_eventref”.”order”, “grampsdb_eventref”.”last_saved”, “grampsdb_eventref”.”last_changed”, “grampsdb_eventref”.”private”, “grampsdb_eventref”.”ref_object_id”, “grampsdb_eventref”.”role_type_id” FROM “grampsdb_eventref” WHERE (“grampsdb_eventref”.”object_type_id” = 25 AND “grampsdb_eventref”.”object_id” = 64647 ) ORDER BY “grampsdb_eventref”.”order” ASC;
I started adding indexes to see if I can get the time down. (One current limitation with Django is that it can’t create multi-field indexes by itself… you have to do that through raw SQL.) I created indexes, testing the queries as a went to see what would help reduce time. I found a couple that help:
sqlite> create index grampsdb_name_surname ON grampsdb_name (surname, first_name);
sqlite> create index grampsdb_eventref_object_id_object_type_id ON grampsdb_eventref
Now the views appear instantaneously! So without adding any code, examining the timings of each individual query, and creating new indexes on the fly (with the browser open, viewing live data), one can optimize functionality throughout Gramps-Connect. I’m really enjoying my Gramps database now!