Difference between revisions of "Meaningful filenames"

From Gramps
Jump to: navigation, search
m (GRAMPS ID based: added a bit)
m
Line 1: Line 1:
 
After thinking about the limits to how we can structure our files and folder (see [[Portable_Filenames]]) the next step is developing a semantic controlled vocabulary.
 
After thinking about the limits to how we can structure our files and folder (see [[Portable_Filenames]]) the next step is developing a semantic controlled vocabulary.
 +
 +
Before launching too deep into this lets look at what we want to achieve.
 +
* Understandable filenames
 +
* Computer readable filenames
 +
* A system simple enough to remember
 +
 +
To be understandable we need to be able to use full words where appropriate.
 +
 +
To be computer readable we need to seperate the parts in a way which a script can easily recognise and, more importantly, in a way which would never occur in real language. So it would be no good to mark a ''name'' section with the word ''name'' if we also can use the word name somewhere in the file where it is not meant to be a marker.
 +
 +
To be simple enough to remember the system should not be too complicated, after all GRAMPS is meant to store the real information, this is just a supplement.
 +
 +
== What name-parts do we need? ==
 +
It would be nice if we could have files called
 +
Marriage of Mary Angus Jones and Matthew Williams, 2nd Dec 1923 (William Angus is to Mary's right).jpg
 +
 +
But this meets only one of the criteria above, that of ''understandable filenames''. How can a computer know who got married? what their surnames are? and so on. And anyway because of the limitations of [[Portable_Filenames]] we can't have file names like that. We have to drop the reliance on capitalisation, drop the spaces, drop the comma and drop the brackets. To be computer readable we need to separate the sections with a system of markers to indicate where the surname, event name etc are.
 +
 +
So what sections do we want to be able to identify? Here's a basic list that should be enough for most situation, remember that GRAMPS stores the more complex information, we're just trying to give a useful structure to our files.
 +
* Surname
 +
* Firstname
 +
* Date
 +
* Event type
 +
* Place
 +
* Source
 +
* Note
  
 
= GEDCOM based =
 
= GEDCOM based =
 
Here is a proposed system contributed by [[User:Duncan|Duncan Lithgow]].
 
Here is a proposed system contributed by [[User:Duncan|Duncan Lithgow]].
  
First I want to be able to split the file name up into pieces which each have markers. This makes it easy for me to manipulate them with utilities for searching and renaming file. Especially it makes it easy for me recover broken file paths with the GRAMPS media manager.
+
Each marker ends with two hyphens (--). Two because we can't rely on the marker being recognised as capitalised, so a surname like ''Besour-Jean'' could be mistaken for ''beSOUR-Jean'' and the system thinks that ''SOUR-'' marks a ''source'' section.
 
+
Each marker starts with an underscore (_) and ends with two hyphens (--).
+
  
 
{| border="1"
 
{| border="1"
Line 16: Line 40:
 
! GEDCOM equivalent
 
! GEDCOM equivalent
 
|-
 
|-
| _PLAC--
+
| PLAC--
 
| place marker
 
| place marker
 
| london__england
 
| london__england
Line 22: Line 46:
 
| ?
 
| ?
 
|-
 
|-
| _INDV--
+
| INDV--
 
| individual marker
 
| individual marker
 
| mary_jones
 
| mary_jones
Line 28: Line 52:
 
| ?
 
| ?
 
|-
 
|-
| _EVNT--
+
| EVNT--
 
| event marker
 
| event marker
 
| marriage
 
| marriage
Line 34: Line 58:
 
| ?
 
| ?
 
|-
 
|-
| _DATE--
+
| DATE--
 
| date marker
 
| date marker
 
| 2008-12-31
 
| 2008-12-31
Line 40: Line 64:
 
| ?
 
| ?
 
|-
 
|-
| _SOUR--
+
| SOUR--
 
| source marker
 
| source marker
 
| lds_church_website
 
| lds_church_website
Line 46: Line 70:
 
| ?
 
| ?
 
|-
 
|-
| _SURN--
+
| SURN--
 
| family name marker
 
| family name marker
 
| jones
 
| jones
Line 52: Line 76:
 
| ?
 
| ?
 
|-
 
|-
| _FIRS--
+
| FIRS--
 
| first name marker
 
| first name marker
 
| mary
 
| mary
Line 58: Line 82:
 
| ?
 
| ?
 
|-
 
|-
| _NOTE--
+
| NOTE--
 
| note marker
 
| note marker
 
| is_that_marys_father_beside_her
 
| is_that_marys_father_beside_her
Line 91: Line 115:
  
 
Filename
 
Filename
  _EVNT--marriage_SURN--jones_FIRS--mary_angus__SURN--williams_FIRS--matthew_DATE--1923-12-02_NOTE--william_angus_to_right_of_mary.jpg
+
  EVNT--marriage_SURN--jones_FIRS--mary_angus__SURN--williams_FIRS--matthew_DATE--1923-12-02_NOTE--william_angus_to_right_of_mary.jpg
  
 
This could be parsed (by GRAMPS?) as the description:
 
This could be parsed (by GRAMPS?) as the description:
Line 110: Line 134:
  
 
File name
 
File name
  _SOUR--uk_census_EVNT--census_PLAC--london__england_DATE--1840-03-21_SURN--jones_FIRS--mary.pdf
+
  SOUR--uk_census_EVNT--census_PLAC--london__england_DATE--1840-03-21_SURN--jones_FIRS--mary.pdf
  
 
This could be parsed (by GRAMPS?) as the description:
 
This could be parsed (by GRAMPS?) as the description:
Line 122: Line 146:
 
or it could make the text:
 
or it could make the text:
  
  Uk census at London, england, 21st March 1840. Source for Mary Jones
+
  Uk census, Place: London, england, on 21st March 1840. This is a source connected to Mary Jones
  
 
= GRAMPS ID based =
 
= GRAMPS ID based =

Revision as of 15:19, 21 July 2008

After thinking about the limits to how we can structure our files and folder (see Portable_Filenames) the next step is developing a semantic controlled vocabulary.

Before launching too deep into this lets look at what we want to achieve.

  • Understandable filenames
  • Computer readable filenames
  • A system simple enough to remember

To be understandable we need to be able to use full words where appropriate.

To be computer readable we need to seperate the parts in a way which a script can easily recognise and, more importantly, in a way which would never occur in real language. So it would be no good to mark a name section with the word name if we also can use the word name somewhere in the file where it is not meant to be a marker.

To be simple enough to remember the system should not be too complicated, after all GRAMPS is meant to store the real information, this is just a supplement.

What name-parts do we need?

It would be nice if we could have files called

Marriage of Mary Angus Jones and Matthew Williams, 2nd Dec 1923 (William Angus is to Mary's right).jpg

But this meets only one of the criteria above, that of understandable filenames. How can a computer know who got married? what their surnames are? and so on. And anyway because of the limitations of Portable_Filenames we can't have file names like that. We have to drop the reliance on capitalisation, drop the spaces, drop the comma and drop the brackets. To be computer readable we need to separate the sections with a system of markers to indicate where the surname, event name etc are.

So what sections do we want to be able to identify? Here's a basic list that should be enough for most situation, remember that GRAMPS stores the more complex information, we're just trying to give a useful structure to our files.

  • Surname
  • Firstname
  • Date
  • Event type
  • Place
  • Source
  • Note

GEDCOM based

Here is a proposed system contributed by Duncan Lithgow.

Each marker ends with two hyphens (--). Two because we can't rely on the marker being recognised as capitalised, so a surname like Besour-Jean could be mistaken for beSOUR-Jean and the system thinks that SOUR- marks a source section.

Marker Meaning Example value GRAMPS XML equivalent GEDCOM equivalent
PLAC-- place marker london__england  ?  ?
INDV-- individual marker mary_jones  ?  ?
EVNT-- event marker marriage  ?  ?
DATE-- date marker 2008-12-31  ?  ?
SOUR-- source marker lds_church_website  ?  ?
SURN-- family name marker jones  ?  ?
FIRS-- first name marker mary  ?  ?
NOTE-- note marker is_that_marys_father_beside_her  ? ?

In order for the file name to be parsed as meaningful text I think some we also would need

Marker Description Example Rendering
_ space indicator mary_jones mary jones
__ comma followed by space indicator jones__mary jones, mary

Examples

Image file

Filename

EVNT--marriage_SURN--jones_FIRS--mary_angus__SURN--williams_FIRS--matthew_DATE--1923-12-02_NOTE--william_angus_to_right_of_mary.jpg

This could be parsed (by GRAMPS?) as the description:

Event: Marriage
Surname: Jones
Firstname: Mary Angus
Surname: Williams
Firstname: Matthew
Date: 2nd Jan, 1923
Note: William angus to the right of mary

or it could make the text:

Mary Angus Jones and Matthew Williams, marriage 2nd Jan 1923. (William angus to the right of mary)

Source text

File name

SOUR--uk_census_EVNT--census_PLAC--london__england_DATE--1840-03-21_SURN--jones_FIRS--mary.pdf

This could be parsed (by GRAMPS?) as the description:

Source: Uk census
Place: London, england
Date: 21st March, 1840
Surname: Jones
Firstname: Mary

or it could make the text:

Uk census, Place: London, england, on 21st March 1840. This is a source connected to Mary Jones

GRAMPS ID based

This is another attempt by Duncan Lithgow to find a good system.

GRAMPS ID's use the first character to denote the type of item the ID refers to. This could be converted to work in filenames.

Marker Description GRAMPS ID equivalent
P-- place P
I-- individual I
F-- family F
E-- event E
S-- source S
O-- media object O
R-- repository R
N-- note N

Extending this idea a bit with some more markers we could get a filename like:

E--marriage_SN--jones_FN--mary_angus_SN--williams_FN--matthew_DT--1923-12-02_N--william_angus_to_right_of_mary.jpg

This can store the same information as in the GEDCOM based schema.