Addon:PlaceCleanupGramplet

From Gramps
Jump to: navigation, search
Gramps-notes.png

Please use carefully on data that is backed up, and help make it better by reporting any comments or problems to the author, or issues to the bug tracker
Unless otherwise stated on this page, you can download this plugin by following these instructions.
Please note that some Addons have prerequisites that need to be installed before they can be used.
This Addon/Plugin system is controlled by the Plugin Manager

Gramps-notes.png
Released for Gramps 5.x + versions

Place Cleanup Gramplet

The Place Cleanup Gramplet addon uses the WebServices feature from the GeoNames.org geographical database to lookup as many as 2,000 placenames per day. Selected parts of the validated information can then be pushed into Gramps to correct and improve incomplete Place records.

Purpose of this Gramplet

This was written to assist in cleaning up already existing places. You can use it while adding a new place, but the place must be created with the normal Gramps Add -> Place and select the new place in the top window prior to using the Gramplet.

When starting with GEDCOM imported trees, the Places that are created are often incomplete or simply don't take full advantage of Gramps features. The place names and/or titles are often left with strings of comma separated names.

Even when transcribing events from census or other sources, Place data might have be ambiguous about which adminstrative area enclosed a locality and were, in turn, enclosed. Sometimes, Places may have been entered with the proper enclosure hierarchy... but without the Latitude/Longitude coordinates needed to plot Events on the maps of the Gramps Geography view.

Rather than intefere the flow of data or research workflow to validate each Place, it can be more efficient to come back to handle these deficiencies collectively.

Cleaning up these shortfalls previously required a lot of manual work to clean up.

The Gramplet assists in merging new places with already present places; when you do a search, if the Place you are looking for is located elsewhere in your Gramps database, and is completed, then you can merge your current place with the completed place.

The Gramplet makes use of the GeoNames database via the web to get information to fill out the Places.

Gnome-important.png
Warning: Automated records updates are risky!

As always, back up your current tree BEFORE using any tool that automates data entry or editting. While it is usually possible to Undo changes individually within a session, you may decide that you need to restart from an earlier state.
See make a Gramps XML backup

Usage

Installing the Place Cleanup Gramplet

The Gramplet can be installed like any other Addon; see Installing Addons in Gramps

Once installed, make the Gramplet visible in the Places category View and on either the Place Tree or Places view (which ever you usually use), and load the Gramplet in the bottom bar by clicking the (Down Arrowhead button also known as the Gramplet Bar Menu) at the far top right of the bottombar titles, and then using the Add a gramplet menu item.

GeoNames account prerequisite

Add free WebServices at the Login screen

The first time you attempt a search, you will be reminded to obtain and set a GeoNames user ID, which is required for use of their data. You can sign up for the ID at the following link: http://www.geonames.org/login.

The account signup requires responding to a confirmation eMail.

Just having a basic GeoNames account isn't quite enough. The Gramplet needs to access to the Web Services. Return to the login screen once you've validated your account, then enable the Free Web Services by clicking the link in the section of their login screen. It is shown in the screen capture to the right.

Once you've logged into the GeoNames website, requests from Gramps on the same machine will be part of the same session. If the session log-in expires or you exceed the number of requests allowed in a day, the Gramplet will report failing to find matches for placenames being searched. If a suspiciously high number of failures occurs, try logging in again.

Once you have an ID, you should set it into the Gramplet via the Preferences button, see below.

Preferences

Preferences icon
Place Cleanup Gramplet - Preferences

The Gramplet Preferences is accessed via the Preferences button (located in the top right of the main view of the Gramplet, next to the Title field). The button should appear in the far upper right of the Gramplet.




The preferences dialog has the following settings:

  • Keep Web Links checkbox. The GeoNames database contains various URL links to other web data, associated with each place. For example, many places have a Wikipedia link. If you check this setting, the Gramplet will save these web links with the place data. They will appear in the 'Internet' tab of the Place Edit dialog.
  • Add citation and source to Place checkbox. If you check this setting, a Citation will be added to each place, citing the 'GeoNames' Source and Repository, with the Citation having a 'Volume/Page:' value with the GeoNames ID number. There will be a Citation for each Place, but only a single Source and Repository.
  • GeoNames User ID field. This contains the necessary GeoNames user ID. You can sign up for the ID at the following link: http://www.geonames.org/login.
  • Alternative Names Languages to keep field. This contains a space separated list of the ISO-639-1 2-letter language code: en de fr it es, et cetera. As you work with the Gramplet, you will see a list of alternate names available for most places, with the language code for the names. If you want these names automatically checked off for inclusion in your Gramps place, add the code to this field.
  • Currently Enclosed Places. When you use the Ok button after selecting a GeoNames place, the Gramplet may have to deal with a place enclosure. If you are modifying a current place, and your place is already 'Enclosed by' another place the Gramplet has two choices on how to handle changes to the enclosure.
    • RadioButton Selected.pngKeep current Enclosure radio button selector. If you select this setting, and your place is already 'Enclosed by' another place, then no changes to the 'Enclosed by' place value will be made, even if your current enclosing place differs from that provided by the GeoNames data.
      If there is no 'Enclosed by' place value, and the GeoNames data has an enclosing place, then one will be created, and selected for approval.
    • RadioButton Deselected.pngReplace with GeoNames Enclosure radio button selector. If you select this setting, and your place is already 'Enclosed by' another place, then the current enclosure is replaced by the enclosing place from the GeoNames data. If it did not exist before this, it will then be created, and selected for approval.
      While this ensures that the enclosure matches GeoNames, it is possible that the previous enclosing place will no longer be used, and will remain in your database.

Main view

Gnome-important.png
Use extra caution when changing the Title field

The place data found and pushed to update the Place view will depend on the contents of this field. If you change it in inappropriate ways, you could end up with mismatched place data.
For example, if the originally active record in the Place view is "Berlín, Usulután", and you manually type in a Place title of "Berlin, Germany" to search, when you find and accept the Berlin place, your originally selected Place view record for the municipality in El Salvador will be changed to Europe!

Place Cleanup - finding a particular place title

The Main view has several buttons, a 'Title' field, a small 'xx Matches' status, and a 'Choices' or match results list.

  • Title field. This field shows the Place that is currently under investigation. It provides the search terms that are used for finding a place. It changes automatically to the place title when the Gramps Place view has an active selection. The field can also be modified by the user, just set the cursor in it and edit as normal. This is generally only needed when trying to find a place in GeoNames that might have unexpected abbreviations, or historical hierarchy or other issues which can make finding a match in the GeoNames database difficult. (As an example, your Gramps place hierarchy has "west berliner, west germany". There are no matches and you wonder if GeoNames only has post-unification names. Then you notice that there is a typographical error with a trailing 'er'. So you change it to successfully search for "west berlin, west germany" instead of that elusive donut.)
  • Matches status. This small status block is located on the left side of the Gramplet, just below the 'Title' label. It indicates the number of matching places found when searching the local Places list or the GeoNames database. The status also indicates where the matches were found, either 'Local', or 'GeoNames'.
  • Search result places list. This area lists the found matches. The user should select the correct match for the place under investigation, and either press the Select or double-click, press 'Enter' or 'Space' to move to the result/edit screen.
There are two columns in this list, the names, and the 'Types' columns. The names are comma separated places in the hierarchy, as found in the local database, or in GeoNames. The 'Type' column shows the Gramps Type, when showing places from the local database, and shows GeoNames types when showing results from the GeoNames search.
Tango-Dialog-information.png
Avoiding rework

If you search GeoNames for a place in that is already in your database, the search results may return some entries annotated with a red strikethrough. This indicates that those places are already enclosed by the current place and cannot be selected.

The Type column is provide for informational use only, to help the user determine a bit more about the found place. The GeoNames types are not always obvious; the type starts with one of 'A', 'P', or 'S' for 'Administrative place', 'Populated place', or 'Spot' respectively. You can restrict the returned search matches to desired categories with the matching Checkboxes (see below). The remaining part of the type field for GeoNames can be decoded from http://www.geonames.org/export/codes.html, if desired.
  • Populated checkbox. If you check this setting, 'Populated Places' (typically cities, towns, villages etc.) will be returned for a GeoNames search. This setting does not affect a local database search. This is initially enabled, as it allows the search to return what most people expect for a city type search.
  • Admin checkbox. If you check this setting, 'Administrative Divisions' (countries, states, regions etc.) will be returned for a GeoNames search. This setting does not affect a local database search. . This is initially enabled, as it allows the search to return what most people expect for a county, state, country type search.
  • Spot checkbox. If you check this setting, 'Spots' (typically farms, buildings, graveyards, churches etc.) will be returned for a GeoNames search. This setting does not affect a local database search. This is initially disabled, as most place searches don't go to this level of detail.
  • Help button. This brings up this web page.
  • Find button. This performs a search for the placename indicated in the Title field. The search is performed in two stages:
    • Initially, the local Gramps place list is examined for a match. The local match is only searched among 'Complete' places, those with Latitude/Longitude coordinates and Place Type data filled in. Only the initial segment of the place is used for searching. For example, if you search for 'Paris, FR', and your local Gramps Place Tree data has both 'Paris, Lamar, Texas, United States' and 'Paris, Paris, Île-de-France, France', both will be shown. Alternative names in the local Gramps places are also searched.
    • The second stage is the GeoNames search. It is performed automatically if the first stage doesn't find any matches. If the first stage did find one or more matches, but none of them were correct, the user can press the Find button again to do the GeoNames search. Note that the Web Service of the GeoNames search can return only ten results at a time. If the place you want is not in the first subset of results, press the Find button again to see the next ten.
Gramps-notes.png
Find shops locally before resorting to the net

The 'Find' button changes to 'Find GeoNames' after a local database search where something was found. This is a reminder that you can search that online database if the local search did not return what you wanted.

  • Select button. After the user selects one of the matches in the 'Choices' list, this causes the result/edit screen to be shown.
  • Edit button. If no suitable places are found, the user can use this to advance directly to the result/edit screen. This can occur if the place you are searching for is too detailed, for example 'My Estate, Paris, TX' or '123 Mains St., Paris, TX'. Since such detailed information is not going to be found in the GeoNames search, you have to edit it manually.
  • Next Place button. This causes the Gramplet to advance to the next incomplete Place in the Gramps Places list. Incomplete places are missing any of the Latitude/Longitude or Place Type data. If a place is found that is not used by any Event or encloses another place, you will get a dialog popup message suggesting that the place be deleted.

Result/Edit view

Place Cleanup Result View

This view presents the found place data for review and potential editing.

At the top of this view is the full title of the Current place, in bold text.

  • Primary Name field. This field is displayed based on the name chosen as primary from the list of available names. It is not editable.
  • Postal Code field. This field displays the postal code(s). It can be edited. There is a small Orig checkbox just above this field, if checked, the Original postal code from the place is used instead of the found postal code. Note that this checkbox is disabled if there is no original postal code.
  • Type field. This field displays the place type. It can be edited. There is a small Orig checkbox just above this field, if checked, the Original place type from the place is used instead of the found place type. Note that this checkbox is disabled if there is an 'Unknown' original place type.
Gramps-notes.png
NOTE

The GeoNames database does NOT provide Gramps Place Type data. This Gramplet uses the available data to make a 'best guess' at the correct Place Type. You should always check and make sure that the value is correct, and fix it if necessary.

  • Latitude and Longitude fields. These fields displays the Latitude and Longitude. They can be edited. There is a small Orig checkbox just above these fields, if checked, the Original Latitude and Longitude from the place is used instead of the found Latitude and Longitude. Note that this checkbox is disabled if there is no original Latitude and Longitude.
  • ID field. This field displays the Gramps ID. It cannot be edited. There is a small Orig checkbox just above this field, if checked, the Original ID from the place is used instead of the found ID. Note that this checkbox is disabled if there is no original ID.
  • The Names list. This list contains the names associated with the found place. Each name has an associated Language and Date. There is also an 'Inc' column which displays the include status of the name in that row. If there is a 'P' in the 'Inc' column, that name is used as the Primary name. If the column has a checkmark '✔' the name will be included in the final Place alternative names list. If you double-click, press 'Enter' or 'Space' on a selected row, that will toggle the include status.
  • Keep button. If one or more of the places in the names list is selected, pressing this button will cause the associated names to be marked for inclusion in the final Place alternative names list.
  • Primary button. If one or more of the places in the names list is selected, pressing this button will cause the topmost associated name to be marked as the Primary name.
  • Discard button. If one or more of the places in the names list is selected, pressing this button will cause the associated names to be unmarked so they will not be included in the final Place alternative names list.
  • Cancel button. Pressing this button returns the view to the main search view. No changes are saved.
  • Ok button. Pressing this modifies the selected place with the new values. In addition, the next level of the place hierarchy will be set into the place. If that next level is already present in the local Gramps place list, and has a matching GeoNames ID, the modifications are completed. If not, the next level in the place hierarchy is shown in the results view.

FAQ

  • GeoNames data is primarily intended for present day usage. It contains little historical information. I have heard that it is pretty complete for United States, Canada, and less so for Europe, although it seems to have all the European cities I have tested so far. If you find an error or omission in the GeoNames data, you may want to tell GeoNames about it http://www.geonames.org/manual.html.
  • GeoNames finds multiple places: Example, 'Santa Rosa, CA'. GeoNames finds both 'Santa Rosa, Sonoma, California, United States' and 'City of Santa Rosa, Sonoma, California, United States'. The way the GeoNames database is structured, they have a 'Populated Place', which generally has lots of Alternate names and postal codes, and an 'Administrative subdivision' for the same place. The 'City of' version is the Administrative subdivision, which generally doesn't have postal codes and alternate names. I personally prefer the first version, which should appear first in the 'Choices' list.
  • Timeouts: When using the GeoNames web database with a free account, you will sometimes get a timeout. The Timeout is set to 20 seconds for each request, unfortunately some actions take several requests, so the user may see unresponsive Gramps for up to 40 seconds. You can always try the request again by pressing the last used button. According to the GeoNames web site, if you pay for an account, you may get better performance, as the paid accounts use more lightly loaded servers.
  • Not Found: While the GeoNames database contains some historical names, it mostly tries to be up to date with the current times. Many places in a typical family tree are described as of the time the event occurred. So it is entirely possible that the described place no longer exists. It may now belong to a different country, or other administrative subdivision. Or it may have been subsumed within another larger place. Initially you should try removing the intermediate place data, leaving only the initial segment and the country. You may get a lot of matches, but that may give you a better clue as to the current situation. Or you may have to do some research outside the Gramplet to figure out what it might be called now.
  • Place Types: Since GeoNames does not have place 'type' data, the same way that Gramps does, this addon uses an algorithm to attempt to pick an appropriate place type. If a country is known to have a rigid administrative place type structure, such as the United States with its City/County/State/Country, then the various parts of the place hierarchy are typed accordingly.
If the structure is not always known or used, then the algorithm attempts to assign place types by looking at the place name. For example if the place name includes a word that matches a place type, such as "Harris County" or "Smith Township", then the place type is set accordingly. To do this, all the various place alternate names are scanned, in case the primary name does not include a suitable word.
If you know that a particular country has a rigid (or even mostly rigid) hierarchical structure, and this addon seems to be getting it wrong a lot, please let the author know at paulr2787 at gmail.com. I can modify the code to make it work better if I understand the correct country structure.
  • IDs: When GeoNames is the source of place data, the normal place ID ('P0001') is replaced by a GeoNames specific ID ('GEO12345'). The number portion of this ID is the GeoNames identifier for the place. By using this type of ID, the addon can easily identify when enclosing places are already present in your database, which means that the user doesn't have to do a local search for and selection of each level of enclosing place.
  • GetGov, how does this interoperate with the data from GetGov? At this time it does not integrate with that data. If you already have places loaded from the GetGov addon, then the ID field will have a GovID. If you want to do a lookup with GeoNames for that place, you should probably set the Place Cleanup preferences for the enclosure to keep the current values. In addition, you should probably use the 'Orig' checkbox for the ID to keep the original Gov ID.

Issues

This tool is fairly complex. Please report any issues or desired enhancements to paulr2787 at gmail.com.

I have included a French translation of the tool, mostly as a test of the internationalization code. Since I don't speak French (USA Texas version of English only), please excuse my French.