Difference between revisions of "Gramps Performance"

Revision as of 14:58, 7 September 2011

Comparison of performance on large datasets between different GRAMPS versions

Performance tests

It is important that GRAMPS performs well on datasets in the 10k to 30k range. A good benchmark is to test GRAMPS on a 100k range dataset, and keep track of performance with every new version.

Furthermore, this page can serve as proof to users that the present version of GRAMPS is not slow. From version 2.2.5 onwards, special attention will be given to performance, so that it does not deteriorate due to changes.

If you want to work with a large database, read Tips for large databases.

General setup

Comparison should be with equal hardware, and on the same datasets to be fair. Optimal representation may be chosen, so for GRAMPS, tests are done in the native database format, called GRAMPS GRDB format.

Should somebody want to publish results of commercial software under windows, this is allowed, but should be fair: same hardware and dataset, so test on a dual-boot machine, and use the internal format of the program.

A table with datasets is given. Pay attention to the copyright

The second table is a table with hardware configuration. Add your machine to this list if you do some tests and want to add them to this article.

The third table gives the test results, which are subjective. Please, don't use other software while doing the tests.

The Test Results

Genealogical datasets

	Private datasets will not be shared under any reason. Free datasets are given under the following copyright: use for testing of genealogical programs only, no publication, no sharing. They have been created with free information on the net of which the users explicitly state it can be used freely. Should you however feel certain data is misplaced, or the original author does not have the right to distribute the data, please contact us to remove any information necessary.

FAQ

My computer hangs on open, eating memory? These are LARGE datasets, so do NOT open them directly. For GRAMPS open them as follows: create a new grdb file. In the empty file go to file menu-import and import the dataset
What is tar.bz? This is a compression format. You must uncompress the file before importing it
Can you provide the GEDCOM? No. Offering GEDCOM has the danger of attracting to much traffic to this site. If you need GEDCOM, you should install GRAMPS, import the dataset, and then choose "Export to GEDCOM".
What is in these files? See summary at the bottom of this page.

Code	name	Download size	People	Size	Copyright
d01	Doug's test GEDCOM	-	100993	32MB	Private
d02	testdb80000	11.2 MB	82688	70MB	Testing only, no sharing, no publication * NOTE: THIS FILE IS MISSING. IF ANYONE HAS A COPY, PLEASE CONTACT [email protected] *
d03	testdb120000	18.5MB	124032	105MB	Testing only, no sharing, no publication
d04	Jean-Raymond's test GEDCOM french forum	-	52699	13.6MB	Private

Hardware configurations

Code	Processor	clock	RAM	OS	User
H01	Pentium 4	2.66 GHz	512 MB	Linux	?
H02	?	1.7 GHz	512 MB	Linux	?
H03	AMD Athlon64 X2	2x2.1 GHz	1 GB	Kubuntu 6.06	?
H04	Intel Centrino Duo	2x1.66 GHz	2 GB	Ubuntu 9.04	User:Duncan
H05	Intel Centrino Duo	2x1.66 GHz	2 GB	Ubuntu 8.10	User:Duncan
H06	AMD Phenom 9500	Quad Core 2.2 GHz	3GB	Windows Vista	Jean-Raymond Floquet
H07	Intel Pentium 4	2.80 GHz	512 MB *	Ubuntu 9.04	User:Romjerome
H08	Intel Celeron Dual Core	2.60 GHz	2 GB	Ubuntu 10.04	User:Romjerome

(*) + 80MB of swap used on import

Tests table

Code	test
T01	Time to import GEDCOM/GRAMPS in empty native file format (GRDB)
T02	Size native file format (GRDB)
T03	Time to open native file format (GRDB) for clean/nonclean start on people view (*)
T04	Time to open edit person dialog
T05	Time to delete/undelete person
T06	Open event view clean/after T03 (*)
T07	Sort on date in event view
T08	Overal editing responsiveness

(*) clean start means computer restart (so also python methods/modules must be loaded and started). Non clean means you have opened GRAMPS with .grdb file before, and open it again. Parts will be still in memory and access will be faster, as well as python being in memory.

Performance results

General remark: tests are done with in GRAMPS preferences: transactions enabled, unless indicated otherwise with notrans. This gives a performance boost. For safety: only change this setting on an empty database -- you are warned!

Comp	GRAMPS	data	T01	T02
H03	2.2.4 notrans	d01	2h	542.6MB (v11)
H03	2.2.4	d01	24 min	544.5MB
H03	2.2.4	d02	20 min	323MB
H03	2.2.4	d03	25 min	527MB
H03	2.2.6	d02	15min	332MB
H03	2.2.6	d03	23min	528MB (v12)
H04	3.0.4	d03	1h:56m	?
H05	2.2.10 (trans?)	d03	1h:56m	?
H06	3.1.2	d04	8min	937MB
H07	3.1.90 - 2009-7-20 (trans?)	d03	2h:44m	2GB *
H08	3.3.0 (+ DB upgrade ...)	d03	35m (work in progress 51%)	959MB (work in progress)

(*) 1520MB log files - 480MB tables

Comp	data	GRAMPS	T03	T04	T05	T06	T07	T08	result
H02	d01	2.2.4	T03 = 4m17s	T04 = ?	T05 = ?/?	T06 = ?	T07 = ?	T08 =
H03	d03	2.2.4	T03 = 2m37s/4m3s	T04 = 3s	T05 = 43s/23s	T06 = 1m23s/12s	T07 = 20s	T08 =	very bad
H03	d01	2.2.4	T03 = 2m22s/2m	T04 = 3s	T05 = 33s	T06 = 1m9s/10s	T07 = 18s	T08 =	very bad
H02	d01	2.2.5	T03 = 12s	T04 = ?	T05 = ?/?	T06 = ?	T07 = ?	T08 =
H03	d03	2.2.6	T03 = /17s	T04 = 1s	T05 = 20s/18s	T06 = ?/9s	T07 = 21s	T08 =	Excellent
H03	d02	2.2.6	T03 = ?/24s	T04 = 1s	T05 = 17s/13s	T06 = ?/11s	T07 = 17s	T08 =	Excellent
H05	d03	2.2.10	T03 = 1m15s/16s	T04 = 1s	T05 = 16s/13s	T06 = 11s/1s	T07 = 26s	T08 =	good after loading each view once
H06	d04	3.1.2	T03 = 1m30/?	T04 = 10s	T05 = ?/?	T06 = ?	T07 = 19s	T08 = 11s	not bad
H07	d03	3.1.90 2009-7-20	Cannot allocate memory (also python-2.6)	T04 = \	T05 = \	T06 = \	T07 = \	T08 =	size limitation on 3.0.x, 3.1.x and trunk ...
H08	d03	3.3.0	T03 = ?/?	T04 = ?	T05 = ?/?	T06 = ?	T07 = ?	T08 =	description
?	db	version	T03 = ?/?	T04 = ?	T05 = ?/?	T06 = ?	T07 = ?	T08 =	description

Dataset summaries

For every test dataset, create a summary with Report:

Summary of the database

Summary of database test d01

Number of individuals: 100993
Males: 53046
Females: 47947
Individuals with incomplete names: 324
Individuals missing birth dates: 42726
Disconnected individuals: 19
Number of families: 36554
Unique surnames: 15308

Summary of database test d02

Number of individuals: 82688
Males: 44736
Females: 37952
Individuals with incomplete names: 17120
Individuals missing birth dates: 31528
Disconnected individuals: 880
Number of families: 32256
Unique surnames: 13957

Summary of database test d03

Number of individuals: 124032
Males: 67104
Females: 56928
Individuals with incomplete names: 25680
Individuals missing birth dates: 47292
Disconnected individuals: 1320
Number of families: 48384
Unique surnames: 20695

Summary of database test d04

Number of individuals: 52699
Males: 26420
Females: 26279
Individuals with incomplete names: 2
Individuals missing birth dates: 16427
Disconnected individuals: 0
Number of families: 24604
Unique surnames: 5822

Possible Future Optimizations

One can fine tune some things to obtain better results. An overview.

See if GRAMPS can pass this:

The Confucius Challenge
- Confucius Cascade a real-world test based on consisting of increasingly gigantic GEDCOMs, tough time limits.
- Confucius Cup 2008
- Two Huge GEDCOM Files

@@ Line 158: / Line 158: @@
 |H07 || 3.1.90 - 2009-7-20 (trans?)|| d03 || 2h:44m  || 2GB *
 |-
-|H08 || 3.3.0 (+ DB upgrade ...)|| d03 || 30m (work in progress)|| 750MB (work in progress)
+|H08 || 3.3.0 (+ DB upgrade ...)|| d03 || 35m (work in progress 51%)|| 959MB (work in progress)
 |}

Difference between revisions of "Gramps Performance"

Revision as of 14:58, 7 September 2011

Contents

Performance tests

General setup

The Test Results

Genealogical datasets

Hardware configurations

Tests table

Performance results

Dataset summaries

Possible Future Optimizations

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Contributor help pages

wiki

Tools