Difference between revisions of "GEPS 024: Natural transcription of Records"

From Gramps
Jump to: navigation, search
m
 
(7 intermediate revisions by 4 users not shown)
Line 1: Line 1:
This was formarly GEPS 024: Certificates
+
This was formerly GEPS 024: Certificates
  
Natural transcription of Records is a method for creating and storing genealogical information from a document. For example, one might create a Census Record that would allow users to enter data straight from a Census sheet. The current Census Gramplet does this now. In this GEP, the associated data would be stored in a well-defined location in the database, and in exported file formats, so that the the record not only leads to update of the data in the family tree, but allows retracing the flow of a source to the data in the family tree.  
+
Natural transcription of Records is a method for creating and storing genealogical information from a document. For example, one might create a Census Record that would allow users to enter data straight from a Census sheet. The current [[Census Addons|Census Gramplet]] does this now. In this GEP, the associated data would be stored in a well-defined location in the database, and in exported file formats, so that the the record not only leads to update of the data in the family tree, but allows retracing the flow of a source to the data in the family tree.  
  
 
= Needs =
 
= Needs =
Line 11: Line 11:
 
# A way to recreate the document from the database (see Gramps Census Report)
 
# A way to recreate the document from the database (see Gramps Census Report)
  
Records themself would not be part of GEDCOM export. The data learnt from the record would be present in the family tree and would appear in GEDCOM like that.
+
Records themselves would not be part of GEDCOM export. The data learnt from the record would be present in the family tree and would appear in GEDCOM like that.
  
 
= Example Workflows =
 
= Example Workflows =
Line 37: Line 37:
 
So, the workflow here would be:  
 
So, the workflow here would be:  
 
# Find the baptize record you would like to enter data from
 
# Find the baptize record you would like to enter data from
# Select in the Records transcription view Record Type 'Baptize Curch Record', and Layout 'Generic'. Record Types and Layouts are predefined in an xml.
+
# Select in the Records transcription view Record Type 'Baptize Church Record', and Layout 'Generic'. Record Types and Layouts are predefined in an xml.
 
# Set the header data of the record, this would page number or index number, name of the church, source title. We could do publication info with import from eg BibTeX.
 
# Set the header data of the record, this would page number or index number, name of the church, source title. We could do publication info with import from eg BibTeX.
 
# As in Census, every 'entry' could be added in a column manner. As normally only one row is needed, it would be logical to default to an editor style of data entry. For baptize record possibly a natural language input is possible like in a form that must be filled in (Think: On ______ the <boy/girl> was baptized, and given the name _______ ....)
 
# As in Census, every 'entry' could be added in a column manner. As normally only one row is needed, it would be logical to default to an editor style of data entry. For baptize record possibly a natural language input is possible like in a form that must be filled in (Think: On ______ the <boy/girl> was baptized, and given the name _______ ....)
Line 49: Line 49:
  
 
== Retracing Steps ==
 
== Retracing Steps ==
The normal data of Gramps now holds information like it would be the case without the natural transcription. However, in the normal data it is problematic to quickly know what data comes from which source. Notes and citation objects must be checked. In this proposal Citation holds a link to the Record ID. So for all data that is entered in this way, the record used by the user to enter the data can be shown again. Note that this info will be present in a collection of sources, citations and objects in the normal family tree too. Seeing this information in the original form is usefull. (Alternative is storing this duplicate in all citation objects, or in the main source object in a Note form).  
+
The normal data of Gramps now holds information like it would be the case without the natural transcription. However, in the normal data it is problematic to quickly know what data comes from which source. Notes and citation objects must be checked. In this proposal Citation holds a link to the Record ID. So for all data that is entered in this way, the record used by the user to enter the data can be shown again. Note that this info will be present in a collection of sources, citations and objects in the normal family tree too. Seeing this information in the original form is useful. (Alternative is storing this duplicate in all citation objects, or in the main source object in a Note form).  
  
 
As the Record is still present, a user theoretically could indicate to show how eg a person would look like if a certain record was not taken into account. This opens many possibilities.
 
As the Record is still present, a user theoretically could indicate to show how eg a person would look like if a certain record was not taken into account. This opens many possibilities.
  
 
= Record definitions via XML =
 
= Record definitions via XML =
 +
Different records can be stored in different XML files. First the user downloaded versions are scanned ~/.gramps/grampsxx/plugins/records. Next the definitions installed are scanned, skipping duplicates present also in plugins dir.
 +
 +
== Current Census XML ==
 +
Currently, version 4.0, census plugin has following xml.
 +
<?xml version="1.0" encoding="UTF-8"?>
 +
  <censuses>
 +
    <census id='UK1841' title='1841 England and Wales Census' date='6 Jun 1841'>
 +
        <heading>
 +
            <_attribute>City or Borough</_attribute>
 +
        </heading>
 +
        <heading>
 +
            <_attribute>Parish or Township</_attribute>
 +
        </heading>
 +
        <column>
 +
            <_attribute>Name</_attribute>
 +
            <size>25</size>
 +
        </column>
 +
        <column>
 +
            <_attribute>Age</_attribute>
 +
            <size>5</size>
 +
        </column>
 +
        <column>
 +
            <_attribute>Occupation</_attribute>
 +
            <size>25</size>
 +
        </column>
 +
        <column>
 +
            <_attribute>Where Born</_attribute>
 +
            <size>5</size>
 +
        </column>
 +
    </census>
 +
  </censuses>
 +
 +
 +
When records are general, the census is just one of the records. Some other problems with this design is
 +
# very census orientated with the use of column as names.
 +
# why is date part of the census line? Don't understand what that date actually is.
 +
# English content like 'Where Born' is also the attribute key. As a consequence, non English users who make a record definition will try to use English will will be wrong, or worse, they will use a translated string so it would show up in Gramps. If the user definition is accepted into Gramps, we will need to change to proper English, and the record already present for the creator of the definition is no longer valid.
 +
# only text input for attribute. We can allow more versatility with a range, a bool, and a list of values.
 +
 +
== Proposed Record XML ==
 +
 +
<?xml version="1.0" encoding="UTF-8"?>
 +
  <records>
 +
    <record type='Census' layout='UK1841' title='1841 England and Wales Census' datefixed='1' date='6 Jun 1841' fieldorder='columns'>
 +
        <heading nameid='city_borough' _description='City or borough where this Census Page was taken' object='Place'/>
 +
        <heading nameid='parish_township' _description='Parish or Township where this Census Page was taken' object='Place' />
 +
        <field nameid='Name' size='25' object='Person' relation='center' _description='Name of a person on Census row/>
 +
        <field nameid='Age' size='5'/>
 +
        <field nameid='Occupation' size='25'/>
 +
        <field nameid='Where Born' size='5' />
 +
    </record>
 +
    <record type='Baptize Church' layout='Generic Catholic' title='Baptize Record Catholic Church' datefixed='0' fieldorder='editor'>
 +
        <heading nameid='parish' _description='Parish where this Baptize was done' object='Place'/>
 +
        <heading nameid='churchbooktitle' _description='Title of Church Book' object='Source' />
 +
        <heading nameid='page_index' _description='Parish or Township where this Census Page was taken' />
 +
        <field nameid='datebapt' type='datefield/>
 +
        <field nameid='Name' size='25' object='Person' _description='Name of person born' relation='center'/>
 +
        <field nameid='Gender' type='genderfield' object='Person' _description='Name of person born' relation='center'/>
 +
        <field nameid='datebirth' type='datefield _description="Some Baptize record register birth date" optional='1'/>
 +
        <field nameid='Father' size='25' object='Person' relation='father'/>
 +
        <field nameid='Mother' size='25' object='Person' relation='mother'/>
 +
        <field nameid='First Witness' object='Person' optional='1' size='25' relation='witness'/>
 +
        <field nameid='Second Witness' object='Person' optional='1' size='25' relation='witness/>
 +
        <field nameid='Celebrant' object='Person' optional='1' size='25' relation='celebrant'/>
 +
        <field nameid='addressparents' size='35' object='Place' optional='1' _description='Some records contain part of the address of the parents'/>
 +
        <freeflowinputs>
 +
        <freeflowinput style='condensed' lang='en'>
 +
        On <datebapt>, a <Gender> was baptized, and given the name <Name>. His father is <Father>, his mother <Mother>,
 +
        who gave birth on <datebirth>. Witnesses are <First Witness> and <Second Witness> and celebrant <Celebrant>.
 +
        </freeflowinput>
 +
        </freeflowinputs>
 +
    </record>
 +
 +
  </records>
 +
  <translation>
 +
  <key id="city_borough" _en="City or Borough">
 +
  <key id="parish_township" _en="Parish or Township">
 +
  <key id="name" _en="Name">
 +
  ....
 +
  </translation>
 +
 +
Some features:
 +
# nameid is a fixed key, which via translation section obtains it's true English value. _en can be translated via po files
 +
# relation attribute could be dropped, as nameid could be mapped in code to relation. Keep?
 +
# freeflowinput should not be translated. Translators should update this xml and add their lang, in the GUI a dropdown box for all lang can be offered. It is not good practice to translate such long texts via po. Using po puts pressure on English not too change too much. Working like this allows to offer a Latin version and have people inspect the typical form in Latin. The style allows different freeflow texts for the same language
 +
# fields are by default textfield, which need a size attribute
 +
# if not textfield is needed, the field is given a type, eg genderfield, datefield, ...
  
 
= Record input GUI =
 
= Record input GUI =
Line 78: Line 165:
 
* [[GEPS_012:_Ecosystem_definition#Certificates|Ecosystem definition]]
 
* [[GEPS_012:_Ecosystem_definition#Certificates|Ecosystem definition]]
 
* [[Citations]]
 
* [[Citations]]
* [[DataEntryGramplet]], [[Census_Addons|Census addon]], [[ImportGramplet]].
+
* [[Addon:DataEntryGramplet]], [[Census_Addons|Census addon]], [[Addon:ImportGramplet]].
 
* {{bug|5552}}: Handle sources/records Indexing
 
* {{bug|5552}}: Handle sources/records Indexing
  
 
[[Category:GEPS|N]]
 
[[Category:GEPS|N]]
 +
[[Category:Developers/Design]]

Latest revision as of 00:25, 5 January 2022

This was formerly GEPS 024: Certificates

Natural transcription of Records is a method for creating and storing genealogical information from a document. For example, one might create a Census Record that would allow users to enter data straight from a Census sheet. The current Census Gramplet does this now. In this GEP, the associated data would be stored in a well-defined location in the database, and in exported file formats, so that the the record not only leads to update of the data in the family tree, but allows retracing the flow of a source to the data in the family tree.

Needs

  1. A manner to create, and edit over time, a record definition
    1. See Gramps Census XML format
  2. A way to view the records
  3. A way to map record fields onto database items, where user intervention allows to couple to existing objects, or to not add data.
  4. A way to recreate the document from the database (see Gramps Census Report)

Records themselves would not be part of GEDCOM export. The data learnt from the record would be present in the family tree and would appear in GEDCOM like that.

Example Workflows

Census

  1. Find a census sheet that you would like to enter data from
  2. Select in the Records transcription view Record Type 'Census', and Layout eg 'UK1871'. Record Types and Layouts are predefined in an xml.
  3. Set the header data of the census
  4. Add the rows in column manner as present in the census sheet. This is literal transcription.
  5. An import function could be written for downloadable content that enters this automatic from the downloaded data
  6. Click the "Transcribe to Family Tree" button. This saves the Record to database. Gramps calculates what would be added to an empty family tree from this data, the "Proposed Transcription". The Left an empty 'Before' is shown, Right the new data (Person Objects, Census Events, Source Objects, Citation Objects, ...).
  7. The Left empty part shows drop down boxes to select possible existing Persons that could be the People in the census based on the given Name. The User can explicitly select an existing person. This updates the Right part of the window. Check boxes in the Right part are given to allow choices, eg 'Set name as alternate name', ....
  8. For every setting in the Left Part of this "Proposed Transcription", the user can give a "Reasoning", which is free text, a "Conclusion", which is also free text, and a Confidence Level.
  9. User needs to Approve the Changes
  10. On Approval, the data is stored in the family tree. The Confidence level goes to the citations. The Reasoning and Conclusion are stored in a Note of type " Analysis Document" in the citation. Citation holds a link to the Record ID.

Baptize Record

The baptize records lists typically the following information:

  1. The gender and name of the child(ren)
  2. The date of baptism, sometimes the date of birth as well
  3. The names of the parents of the child. Some lazy record keepers only list the father.
  4. Usually, the names of the witnesses, The baptism had to be witnessed by at least two people. In Catholic families, these were the godparents.
  5. Often the name of the Priest

So, the workflow here would be:

  1. Find the baptize record you would like to enter data from
  2. Select in the Records transcription view Record Type 'Baptize Church Record', and Layout 'Generic'. Record Types and Layouts are predefined in an xml.
  3. Set the header data of the record, this would page number or index number, name of the church, source title. We could do publication info with import from eg BibTeX.
  4. As in Census, every 'entry' could be added in a column manner. As normally only one row is needed, it would be logical to default to an editor style of data entry. For baptize record possibly a natural language input is possible like in a form that must be filled in (Think: On ______ the <boy/girl> was baptized, and given the name _______ ....)

This step is literal transcription. Note that for twins double entry is needed, so 'multiple rows' like in census.

  1. An import function could be written for downloadable content that enters this automatic from the downloaded data
  2. Click the "Transcribe to Family Tree" button. This saves the Record to database. Gramps calculates what would be added to an empty family tree from this data, the "Proposed Transcription". The Left an empty 'Before' is shown, Right the new data (Person Objects, Baptize Events, Associations to witnesses, Source Objects, Citation Objects, ...).
  3. The Left empty part shows drop down boxes to select possible existing Persons that could be the People in the census based on the given Name. The User can explicitly select an existing person. This updates the Right part of the window. Check boxes in the Right part are given to allow choices, eg 'Set name as alternate name', ....
  4. For every setting in the Left Part of this "Proposed Transcription", the user can give a "Reasoning", which is free text, a "Conclusion", which is also free text, and a Confidence Level.
  5. User needs to Approve the Changes
  6. On Approval, the data is stored in the family tree. The Confidence level goes to the citations. The Reasoning and Conclusion are stored in a Note of type " Analysis Document" in the citation. Citation holds a link to the Record ID.

Retracing Steps

The normal data of Gramps now holds information like it would be the case without the natural transcription. However, in the normal data it is problematic to quickly know what data comes from which source. Notes and citation objects must be checked. In this proposal Citation holds a link to the Record ID. So for all data that is entered in this way, the record used by the user to enter the data can be shown again. Note that this info will be present in a collection of sources, citations and objects in the normal family tree too. Seeing this information in the original form is useful. (Alternative is storing this duplicate in all citation objects, or in the main source object in a Note form).

As the Record is still present, a user theoretically could indicate to show how eg a person would look like if a certain record was not taken into account. This opens many possibilities.

Record definitions via XML

Different records can be stored in different XML files. First the user downloaded versions are scanned ~/.gramps/grampsxx/plugins/records. Next the definitions installed are scanned, skipping duplicates present also in plugins dir.

Current Census XML

Currently, version 4.0, census plugin has following xml.

<?xml version="1.0" encoding="UTF-8"?>
  <censuses>
   <census id='UK1841' title='1841 England and Wales Census' date='6 Jun 1841'>
       <heading>
           <_attribute>City or Borough</_attribute>
       </heading>
       <heading>
           <_attribute>Parish or Township</_attribute>
       </heading>
       <column>
           <_attribute>Name</_attribute>
           <size>25</size>
       </column>
       <column>
           <_attribute>Age</_attribute>
           <size>5</size>
       </column>
       <column>
           <_attribute>Occupation</_attribute>
           <size>25</size>
       </column>
       <column>
           <_attribute>Where Born</_attribute>
           <size>5</size>
       </column>
   </census>
 </censuses>


When records are general, the census is just one of the records. Some other problems with this design is

  1. very census orientated with the use of column as names.
  2. why is date part of the census line? Don't understand what that date actually is.
  3. English content like 'Where Born' is also the attribute key. As a consequence, non English users who make a record definition will try to use English will will be wrong, or worse, they will use a translated string so it would show up in Gramps. If the user definition is accepted into Gramps, we will need to change to proper English, and the record already present for the creator of the definition is no longer valid.
  4. only text input for attribute. We can allow more versatility with a range, a bool, and a list of values.

Proposed Record XML

<?xml version="1.0" encoding="UTF-8"?>
  <records>
   <record type='Census' layout='UK1841' title='1841 England and Wales Census' datefixed='1' date='6 Jun 1841' fieldorder='columns'>
       <heading nameid='city_borough' _description='City or borough where this Census Page was taken' object='Place'/>
       <heading nameid='parish_township' _description='Parish or Township where this Census Page was taken' object='Place' />
       <field nameid='Name' size='25' object='Person' relation='center' _description='Name of a person on Census row/>
       <field nameid='Age' size='5'/>
       <field nameid='Occupation' size='25'/>
       <field nameid='Where Born' size='5' />
   </record>
   <record type='Baptize Church' layout='Generic Catholic' title='Baptize Record Catholic Church' datefixed='0' fieldorder='editor'>
       <heading nameid='parish' _description='Parish where this Baptize was done' object='Place'/>
       <heading nameid='churchbooktitle' _description='Title of Church Book' object='Source' />
       <heading nameid='page_index' _description='Parish or Township where this Census Page was taken' />
       <field nameid='datebapt' type='datefield/>
       <field nameid='Name' size='25' object='Person' _description='Name of person born' relation='center'/>
       <field nameid='Gender' type='genderfield' object='Person' _description='Name of person born' relation='center'/>
       <field nameid='datebirth' type='datefield _description="Some Baptize record register birth date" optional='1'/>
       <field nameid='Father' size='25' object='Person' relation='father'/>
       <field nameid='Mother' size='25' object='Person' relation='mother'/>
       <field nameid='First Witness' object='Person' optional='1' size='25' relation='witness'/>
       <field nameid='Second Witness' object='Person' optional='1' size='25' relation='witness/>
       <field nameid='Celebrant' object='Person' optional='1' size='25' relation='celebrant'/>
       <field nameid='addressparents' size='35' object='Place' optional='1' _description='Some records contain part of the address of the parents'/>
       <freeflowinputs>
       <freeflowinput style='condensed' lang='en'>
       On <datebapt>, a <Gender> was baptized, and given the name <Name>. His father is <Father>, his mother <Mother>, 
       who gave birth on <datebirth>. Witnesses are <First Witness> and <Second Witness> and celebrant <Celebrant>.
       </freeflowinput>
       </freeflowinputs>
   </record>

 </records>
 <translation>
 <key id="city_borough" _en="City or Borough">
 <key id="parish_township" _en="Parish or Township">
 <key id="name" _en="Name">
 ....
 </translation>

Some features:

  1. nameid is a fixed key, which via translation section obtains it's true English value. _en can be translated via po files
  2. relation attribute could be dropped, as nameid could be mapped in code to relation. Keep?
  3. freeflowinput should not be translated. Translators should update this xml and add their lang, in the GUI a dropdown box for all lang can be offered. It is not good practice to translate such long texts via po. Using po puts pressure on English not too change too much. Working like this allows to offer a Latin version and have people inspect the typical form in Latin. The style allows different freeflow texts for the same language
  4. fields are by default textfield, which need a size attribute
  5. if not textfield is needed, the field is given a type, eg genderfield, datefield, ...

Record input GUI

Record to Family Tree generation

Record Manipulations

Other

Current Census Limitations

Things needed to bring the Census work up to this level of integration:

  1. If you change the way a certificate is defined, we need a way to change the data. For example, changing a column name disassociates all of the information in the database.
  2. How to handle translations?
  3. Code should be able to add items such as sources to the database
  4. Items lose their ordering

Other Connections

  • Gramps' FamilySearch API will perhaps have the ability to connect and download an entire census sheet. Consider creating the Certificate from the downloaded definition.

See also