GEPS 009: Import Export Merge

From Gramps
Jump to: navigation, search

This page is for the discussion of a proposed implementation of the merging old and new data both whilst importing, and as a independent merge process, in GRAMPS. As this action is closely related to import and export, this section has been named "Import Export Merge"

Import Export Merge

Current State

Officially, GRAMPS import does not merge existing data with new data being imported. (The Spreadsheet/CSV does do a type of merge, but let's leave that aside for the moment. It is discussed in a section of Gramps Manual). However, the standard GRAMPS import will duplicate some data (such as events, but not people) if you import a GEDCOM file twice. This proposal will fix this bug by allowing a user to intelligently, interactively, or automatically do a better job than the current version.

This same process can be used to interactively merge two objects in GRAMPS by the user. For example, a user may realize that two person entries are really the same person, and so should be combined.

Current Related Files

  1. Lib
    1. gramps/plugins/lib/libgrdb.py
    2. gramps/plugins/lib/libmixin.py
  2. Import
    1. gramps/plugins/importer/import*.py
    2. gramps/plugins/importer/importgedcom.glade
  3. Merging
    1. gramps/gen/merge/*.py
    2. gramps/gui/glade/*.glade

Some work in trunk/gramps40 towards these ends:

  • Method to_struct added to all gen.lib objects. Returns JSON-like dictionary self-documenting the fields/values recursively.
  • to_struct used to find differences, gramps/gen/merge/diff.py
  • Report based on diff:

Database-diff-report.png

Exporting

Currently exporting to GEDCOM and csv is limited to some information. Though GEDCOM is "standard" lingua franca of genealogy, it is inherently limited, particularly because various extant versions of GEDCOM. CSV has the advantage that it can be imported to any current spreadsheet, particularly LibreOffice or OpenOffice.org. What are the limitation of csv exports in gramps?

CSV export/import is limited to the main objects in GRAMPS. It was not designed as a general purpose import/export but rather an alternative input/output tool.

Merging

This is not a trivial task, though probably not impossible. A description of the current functionality of merging people can be found in Merging People. The aims of Merging should be first defined in non-ambiguous format.

One can sub-classify import in three sub-titles:

Fresh Data Import

This is probably the simplest option and safest - delete (first archive!) the current gramps data base and import all data.

Append Import

Simply append all import data to the existing data base. The editing task would be left to the user. This option should be relatively easy to implement.

Merge Import

Leave some editing of the data to the program. Whilst manual intervention by the user would inevitably be required, some of it could be achieved in the program.

Merge Two Objects

This is a topic that was initially overlooked. For further information see Merging People.

Handle issue

To merge two databases with the same handle (internal reference) will break records. It only occurs by importing .gpkg and .gramps format as handles are stored into the file. See Handle.bash.

Comments

The above text is a raw outline only. The writer is not really familiar with gramps and has only offered to open a page in wiki to the Coordinator because everybody else seemed to be reluctant to do so. There is no doubt that this is a mere "bones" of the task and a very small step in potential programming task which can only occur if there is input from other persons interested in the topic and willing to discuss in the wiki style. There is some hope that such a discussion may take place as there has been a considerable exchange of thoughts and information in the developers' mailing list.

Julio patch set

Julio custom coded merge code in the 2.2.x branch. You find them here. This code has been integrated.

I think that that was referring to just a couple of functions for creating UUID strings, and that has been incorporated into Gramps. So, I think it is done. Thanks! -Doug

UID, GUID and _UID, what is needed in GRAMPS?

Gramps discussions:

The discussion of UID fits with the merge problem. Some unofficial standard for UID we should perhaps follow:

Julio has a patch against Utils.py to generate a UID, see here

Related Discussions

Related Gramps Bug Numbers

  • #684: REFN vs. INDI - Feature Request
  • #2370: Errors occur when importing or exporting gramps data to gedcom format
  • #2623: Import Export Merge (GEPS 009) - Feature Request
  • #4169: To generate numbering class - Feature Request
  • #5125: Expand CSV support with AFN and REFN - Feature Request
  • #5253: Read and display the content of a .gramps into Gramps without import - Feature Request
  • #7072: A friendly way for recovery - Feature Request