GEPS 018: Evidence style sources

From Gramps
Revision as of 21:43, 9 August 2011 by Jralls (Talk | contribs) (Background: More points on more fields)

Jump to: navigation, search

Background

Many users, particularly if they aren't experienced researchers, may have difficulty abstracting the details from the wide variety of source types they encounter into the 4 fields that Gramps provides. Worse, the 4 fields aren't really adequate to capture all of the possible source information and redisplay it in well-formatted footnotes or endnotes in a report or reference links in a web page.

Elizabeth Shown Mills is an eminent American genealogist who has written extensively about collecting, analyzing, and citing evidence in genealogical research and publications, including the books Evidence! Citation and Analysis for the Family Historian and an expanded version, Evidence Explained: Citing Family History Sources from Artifacts to Cyberspace. While most readers focus on the formats of the citations provided in the books, in reality every publisher has a style guide and Evidence Explained isn't used by any of them. The real value in these books is Mills's explanation of how to effectively analyze the evidence and how to integrate the many pieces of evidence (and Mills is well known for taking the "reasonably exhaustive search" requirement of the BCG's Genealogical Proof Standard to the absolute limit) into a well supported conclusion.

Citation styles are the concern of published material, and will differ both for the medium and for the publisher. So long as the necessary information of creator, title, enclosing work (for e.g. magazine or jouranl articles), publisher (if published) or repository (if not), date, and details (like page number) are available in the citation, the style isn't very important to the reader. Publishers want all of their publications to have a consistent style and issue style manuals to help authors prepare their work.

For a computer program like GRAMPS, the goals should be to collect all of the necessary information noted above in a way that is easy for users to enter, to support evidence analysis and comparison to create "proof arguments", and to link those proof arguments to the genealogical conclusions in the database.

GRAMPS's present data structure maps directly to the SOUR and SOUR_CITATION structures in the GEDCOM5.5 standard, and the source entry form maps directly to the data structure. While it's possible to cram everything needed for a good citation into those three fields, parsing the information back out to actually create a citation is unnecessarily challenging.

Bibliography Data Formats

  • [www.bibtex.org BibTeX] has emerged as a common format (for interchange at least) among bibliography and reference management tools and offers a much richer set of available fields.
  • The U.S. Library of Congress has published the Metadata Object Description Schema, an XML schema for encoding library catalog data. That wouldn't be very interesting except that BibUtils uses it as an intermediate format for converting between a variety of bibliography file standards.
  • Zotero, Mendeley, and Papers use Citation Style Language (CSL), an XML schema, at least as an import/export medium. (Zotero uses a relational database for its actual storage.)
  • Thompson-Reuters EndNote is easily the most popular commercial reference management program. It uses a proprietary file format which has nevertheless been reverse-engineered many times so that bibliographies can be easily exchanged between EndNotes and other reference managers.
  • Most of the major commercial genealogy programs use a proprietary relational schema for storage of citation data. These fall into two broad categories, binary (similar to GRAMPS's key/value schema, where a citation is composed of several records each having a key/value pair and the program's logic parses the keys to display the citation in the desired format), single-table (where a database tuple is defined which contains the maximum needed fields, each of which is assigned a value according to a parsing scheme in the programs logic), and multiple-table, where different citation types are stored in tables with tuple schema which reflect the requirements of each. As so often in programming, each has costs and benefits with respect to

Further Reading

John Yates has, with Mills's permission, encoded the elements of the specific examples in Evidence Explained: Two Computer Ready Parametrizations of "Evidence Style" Historical Sources.

See also :

Workflow

Entering source information

There are three broad alternatives for entering source data into a program, and GRAMPS should support all three:

  • Form based: The traditional keyboard data entry method. The fields can be fixed or flexible: The former is easier on the developers, the latter more helpful to users, especially if they are inexperienced.
  • Import: GRAMPS should be able to import (and export) source data from regular reference managers like Zotero, Pybliographer, and BibTeX. The Perl code in BibUtils could be adapted for to speed development.
  • Parsing: It's becoming more common for the large reference websites to provide a ready-made citation on the webpage along with the data or image being presented. (Google Books even offers to download it as a BibTeX file, but that's unfortunately not yet common). It would be very helpful if the user could just paste this citation into a block and GRAMPS took care of parsing it into the appropriate database fields. Experienced users might find typing into the parsing text-entry to be a faster way of inputting source data than using form based input.

Further Discussion of Form-based Input

When the end user cites a source for information, they would be prompted with a window where they would select a main type and drill down through subtypes, as in the first few columns of the table presentation I've given. Once it is selected, the user will be prompted for the required (and perhaps optional) fields specific for that type of source reference.

The user would select the type of the source, and fill in the fields, for L (biblio list), F (full citation), and S (short citation) at citation time. The templates I've provided would be in pop up menus for the user to select.

comment: popup is not very user friendly, better would be a wizard button on the source editor, this lets you define the source, asks for fields, and shows the automatic citation markup based on the templates at the bottom while user adds fields. On Save, all this data is saved in the attributes as needed. To investigate if a new field is needed on source editor.

Generating citation in reports

Then, when generating a report that contains citations, the mark up needs to be done on the fields according to the specifications in the table method or template method I've provided. (e.g. substitute the variables, italicize, embed with the proper punctuation, etc. Remove optional variables (and their punctuation) if the variable was not input. Remove privacy fields unless a privacy flag is turned on so that things like home addresses and phone numbers of people aren't put in reports unless you "force" it.

And the first time a citation is encountered in a report, use the Full version (F). The second and succeeding times use the Short (S) version. And when a bibliography is called for, use the L (List) template for that.

template definition

The templates would be stored in an internal database, as would the completed citations for storage and retrieval.

But, these would only be a (good) starting set. Part of the beauty of this parametrization is that the end user can use the language of the mark up in this table or template to define his own source style, punctuation, field quoted or italicized, etc. So in essence, any source output style can be accommodated, and is under full control of the end user. Evidence Style templates can be supplied as a starting set, not the only set. New Evidence Styles can be added, old ones deleted or modified, as the user wishes.


Proposed changes

Proposed Interface changes

How to store the fields? Attributes in the attribute tab of source?

Proposed Report changes

Reports use the new citation style, using templates to build the citation.

References

  1. - Original Users Mailing list discussion: Evidence Explained Style Sources