GEPS 009: Import Export Merge
This page is for the discussion of a proposed implementation of the merging old and new data both whilst importing, and as a independent merge process, in GRAMPS. As this action is closely related to import and export, this section has been named "Import Export Merge"
Import Export Merge
Officially, GRAMPS import does not merge existing data with new data being imported. (The Spreadsheet/CSV does do a type of merge, but let's leave that aside for the moment. It is discussed in a section of Gramps Manual). However, the standard GRAMPS import will duplicate some data (such as events, but not people) if you import a GEDCOM file twice. This proposal will fix this bug by allowing a user to intelligently, interactively, or automatically do a better job than the current version.
This same process can be used to interactively merge two objects in GRAMPS by the user. For example, a user may realize that two person entries are really the same person, and so should be combined.
Current Related Files
Some work in trunk/gramps40 towards these ends:
- Method to_struct added to all gen.lib objects. Returns JSON-like dictionary self-documenting the fields/values recursively.
- to_struct used to find differences, gramps/gen/merge/diff.py
- Report based on diff:
Currently exporting to GEDCOM and csv is limited to some information. Though GEDCOM is "standard" lingua franca of genealogy, it is inherently limited, particularly because various extant versions of GEDCOM. CSV has the advantage that it can be imported to any current spreadsheet, particularly LibreOffice or OpenOffice.org. What are the limitation of csv exports in gramps?
CSV export/import is limited to the main objects in GRAMPS. It was not designed as a general purpose import/export but rather an alternative input/output tool.
This is not a trivial task, though probably not impossible. A description of the current functionality of merging people can be found in Merging People. The aims of Merging should be first defined in non-ambiguous format.
One can sub-classify import in three sub-titles:
Fresh Data Import
This is probably the simplest option and safest - delete (first archive!) the current gramps data base and import all data.
Simply append all import data to the existing data base. The editing task would be left to the user. This option should be relatively easy to implement.
Leave some editing of the data to the program. Whilst manual intervention by the user would inevitably be required, some of it could be achieved in the program.
Merge Two Objects
This is a topic that was initially overlooked. For further information see Merging People.
To merge two databases with the same handle (internal reference) will break records. It only occurs by importing .gpkg and .gramps format as handles are stored into the file. See Handle.bash.
The above text is a raw outline only. The writer is not really familiar with gramps and has only offered to open a page in wiki to the Coordinator because everybody else seemed to be reluctant to do so. There is no doubt that this is a mere "bones" of the task and a very small step in potential programming task which can only occur if there is input from other persons interested in the topic and willing to discuss in the wiki style. There is some hope that such a discussion may take place as there has been a considerable exchange of thoughts and information in the developers' mailing list.
Julio patch set
Julio custom coded merge code in the 2.2.x branch. You find them here. This code has been integrated.
UID, GUID and _UID, what is needed in GRAMPS?
- On the generation and handling of UID
- Linking in Notes
- What's the best method for GEDCOM merge
- Suggestion of two stage importer
- Towards database synchronizing
- Questions on gedcom REFN strategy
The discussion of UID fits with the merge problem. Some unofficial standard for UID we should perhaps follow:
- mail list discussion on ldsoss:
- word document with the basics
- Bettergedcom comments about UUID
Julio has a patch against Utils.py to generate a UID, see here
- 2013-01-13 Towards database syncing
Related Gramps Bug Numbers
- #684: REFN vs. INDI - Feature Request
- #2370: Errors occur when importing or exporting gramps data to gedcom format
- #2623: Import Export Merge (GEPS 009) - Feature Request
- #4169: To generate numbering class - Feature Request
- #5125: Expand CSV support with AFN and REFN - Feature Request
- #5253: Read and display the content of a .gramps into Gramps without import - Feature Request