Difference between revisions of "Portable Filenames"

From Gramps
Jump to: navigation, search
m (External links: added link)
m (Recommendations: added a few clarifications)
Line 5: Line 5:
 
To find a set of characters which can meet all these criteria this article is originally based on content from Wikipedia Online Encyclopedia, especially the articles [http://en.wikipedia.org/wiki/Filename Filenames], [http://en.wikipedia.org/wiki/Comparison_of_file_systems Comparison of file systems] and [http://en.wikipedia.org/wiki/Ascii ASCII character encoding] and [http://msdn.microsoft.com/en-us/library/aa365247.aspx Naming a File] from MSDN. Please add other references to improve this article.
 
To find a set of characters which can meet all these criteria this article is originally based on content from Wikipedia Online Encyclopedia, especially the articles [http://en.wikipedia.org/wiki/Filename Filenames], [http://en.wikipedia.org/wiki/Comparison_of_file_systems Comparison of file systems] and [http://en.wikipedia.org/wiki/Ascii ASCII character encoding] and [http://msdn.microsoft.com/en-us/library/aa365247.aspx Naming a File] from MSDN. Please add other references to improve this article.
  
= Recommendations =
+
= Introduction =
If you follow the rules below your directories and files will be handled without issues on all of the following: servers, USB drives, CD's, DVD's, Blue Ray discs, HD DVD's, hard drives formatted with FAT32, NTFS, EXT2/3, Windows from '95 onwards, POSIX compliant systems (Linux, Unix, OSX) and much more...
+
If you want to make sure your files can be safely moved between different types of computers you need to consider what your files can and can't be called and how they can and can't be organised. For example, a file called ''uk_census_of_15.5.1851.txt'' will not be understood by a Windows computer. And even though your computer might let you make a file called ''birth_certificate_of_André_Mollier.jpg'' it won't open on all computers because of that accented ''é''.
  
Not supported are CD's and DVD's using the file format ISO 9660 level 1. This is ''very'' unlikely to be an issue for you unless your operating system is from before 1995 ([http://en.wikipedia.org/wiki/Joliet_(file_system) Joliet extensions to ISO 9660]).
+
If you follow the rules below then your directories and files will be handled without issues on all of the following:
 +
* servers
 +
* USB drives
 +
* CD's, DVD's, Blue Ray discs and HD DVD's
 +
* hard drives formatted with FAT32, NTFS, EXT2/3
 +
* Computers running Windows '95 and later
 +
* Any POSIX compliant systems (Linux, Unix, OSX)
 +
and much more...
  
== Safe characters ==
+
Not supported by the rules on this page are CD's and DVD's using the original file format from 1988 known as ISO 9660:1988 level 1. This is ''very'' unlikely to be an issue for you unless your operating system is from before 1995, which is when the [http://en.wikipedia.org/wiki/Joliet_(file_system) Joliet extensions] to the original CD format (ISO 9660 level 1) were added.
* a-z Lowercase alphabetical characters (see below)
+
 
* A-Z Uppercase alphabetical characters (see below)
+
== File and directory names ==
 +
To make a list of ''unsafe'' characters would take up far too much space, so here is the list of what is safe. Notice that a space is not a safe character.
 +
* a-z Lowercase alphabetical characters without any accents (see below)
 +
* A-Z Uppercase alphabetical characters without any accents (see below)
 
* 0-9 Numerals
 
* 0-9 Numerals
* - Hyphens/ dashes (see below)
+
* - Hyphens/ dashes (except at the start)
 
* _ Underscores
 
* _ Underscores
* .    Period/ full stop (see below)
 
  
== Safe character use ==
+
The characters from the list of safe characters must be used with some care:
The characters from the list of safe characters must be used with a little care.
+
* a-z,A-Z  Capital and small letters. Always use mixed case. ''MYFILE.txt'' can become ''myfile.txt'' without warning. ''MyFile.txt'' will not be changed by [http://en.wikipedia.org/wiki/Comparison_of_file_systems#Features most file systems]. Windows [http://msdn.microsoft.com/en-us/library/aa365247.aspx ignores capitalisation] so two otherwise identical names in the same directory with only capitalisation being different, ie: ''uk_census_1851.txt'' and ''UK_Census_1851.txt'', is not possible on many systems
* a-z,A-Z  Capital and small letters. Always use mixed case. ''MYFILE.txt'' can become ''myfile.txt'' without warning. ''MyFile.txt'' will not be changed by [http://en.wikipedia.org/wiki/Comparison_of_file_systems#Features most file systems] without warning. (Windows [http://msdn.microsoft.com/en-us/library/aa365247.aspx ignores capitalisation])
+
* -        Hyphens should not start a file name. Many types of scripts use a hyphen to indicate that what follows is an option for the script. If the file itself starts with a hyphen the script will try to interpret it as an option and almost certainly fail
* -        Hyphens must not start a file name.{{fact}}
+
* .        Period/ full stop, while allowed it can only be used to indicate the start of a file extension. It may not be used to start a directory name. (Limitation of [http://www.controlledvocabulary.com/imagedatabases/filename_limits.html ISO 9660])
* .        Period/ full stop, if used it must '''only''' be one, two '''or''' three characters from the end of the file name.(Limitation of [http://www.controlledvocabulary.com/imagedatabases/filename_limits.html ISO 9660 level 1 and Windows]).
+
 
 +
=== Illegal file names ===
 +
*  CON, PRN, AUX, CLOCK$, NUL, COM0, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT0, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9 ([http://msdn.microsoft.com/en-us/library/aa365247.aspx reference]) also ., and .. ([http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words reference])
  
 
== Limits ==
 
== Limits ==
 
There are limits imposed by operating systems and file formats. The lowest of each of these limits (for systems after 1994) is listed below.
 
There are limits imposed by operating systems and file formats. The lowest of each of these limits (for systems after 1994) is listed below.
* The number of nested directories must not be more than eight (root plus seven) (limit of [http://en.wikipedia.org/wiki/ISO_9660 ISO 9660])
+
* The number of directories in any path on a CD must not be more than eight. Ie: ''(the CD itself)/2/3/4/5/6/7/file.txt'' (the limit of [http://en.wikipedia.org/wiki/ISO_9660 ISO 9660]). Is this valid after 1995? --[[User:Duncan|DuncanNZ]] 12:21, 16 October 2008 (EDT)
* The number of directories to 65,535 (limit of [http://en.wikipedia.org/wiki/ISO_9660 ISO 9660]) on Windows)
+
* The number of directories on a CD is limited to 65,535 (the limit of [http://en.wikipedia.org/wiki/ISO_9660 ISO 9660]) on Windows)
* The length of a file path to 256 characters (limit of [http://msdn.microsoft.com/en-us/library/aa365247.aspx Windows Path Size])
+
* The length of a file's path, ie: ''/genealogy/sources/uk_census_1851.txt'', is limited to 256 characters (the limit of [http://msdn.microsoft.com/en-us/library/aa365247.aspx Windows Path Size])
* The length of a file name to 31 characters including the period and extension (Limit of the Macintosh [http://en.wikipedia.org/wiki/Hierarchical_File_System HFS] file system)
+
* The length of a file's name, ie: ''uk_census_1851.txt'', is limited to 31 characters including the period and extension (the limit of the Macintosh [http://en.wikipedia.org/wiki/Hierarchical_File_System HFS] file system)
* The size of files to 2 GiB ([http://en.wikipedia.org/wiki/ISO_9660 limit of ISO 9660] and Macintosh [http://en.wikipedia.org/wiki/Hierarchical_File_System HFS] file system)
+
* The size of a file is limited to 2 gigabytes (the limit of [http://en.wikipedia.org/wiki/ISO_9660 ISO 9660] and the Macintosh [http://en.wikipedia.org/wiki/Hierarchical_File_System HFS] file system)
 
 
== Illegal file names ==
 
*  CON, PRN, AUX, CLOCK$, NUL, COM0, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT0, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9 ([http://msdn.microsoft.com/en-us/library/aa365247.aspx ref]) also ., and .. ([http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words ref])
 
  
 
= Needing clarification =
 
= Needing clarification =

Revision as of 16:21, 16 October 2008

In order to be able to move our media files from one computer to another it is critical that the names of our files can be understood by the different file systems and encodings they meet.

To find a set of characters which can meet all these criteria this article is originally based on content from Wikipedia Online Encyclopedia, especially the articles Filenames, Comparison of file systems and ASCII character encoding and Naming a File from MSDN. Please add other references to improve this article.

Introduction

If you want to make sure your files can be safely moved between different types of computers you need to consider what your files can and can't be called and how they can and can't be organised. For example, a file called uk_census_of_15.5.1851.txt will not be understood by a Windows computer. And even though your computer might let you make a file called birth_certificate_of_André_Mollier.jpg it won't open on all computers because of that accented é.

If you follow the rules below then your directories and files will be handled without issues on all of the following:

  • servers
  • USB drives
  • CD's, DVD's, Blue Ray discs and HD DVD's
  • hard drives formatted with FAT32, NTFS, EXT2/3
  • Computers running Windows '95 and later
  • Any POSIX compliant systems (Linux, Unix, OSX)

and much more...

Not supported by the rules on this page are CD's and DVD's using the original file format from 1988 known as ISO 9660:1988 level 1. This is very unlikely to be an issue for you unless your operating system is from before 1995, which is when the Joliet extensions to the original CD format (ISO 9660 level 1) were added.

File and directory names

To make a list of unsafe characters would take up far too much space, so here is the list of what is safe. Notice that a space is not a safe character.

  • a-z Lowercase alphabetical characters without any accents (see below)
  • A-Z Uppercase alphabetical characters without any accents (see below)
  • 0-9 Numerals
  • - Hyphens/ dashes (except at the start)
  • _ Underscores

The characters from the list of safe characters must be used with some care:

  • a-z,A-Z Capital and small letters. Always use mixed case. MYFILE.txt can become myfile.txt without warning. MyFile.txt will not be changed by most file systems. Windows ignores capitalisation so two otherwise identical names in the same directory with only capitalisation being different, ie: uk_census_1851.txt and UK_Census_1851.txt, is not possible on many systems
  • - Hyphens should not start a file name. Many types of scripts use a hyphen to indicate that what follows is an option for the script. If the file itself starts with a hyphen the script will try to interpret it as an option and almost certainly fail
  • . Period/ full stop, while allowed it can only be used to indicate the start of a file extension. It may not be used to start a directory name. (Limitation of ISO 9660)

Illegal file names

  • CON, PRN, AUX, CLOCK$, NUL, COM0, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT0, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9 (reference) also ., and .. (reference)

Limits

There are limits imposed by operating systems and file formats. The lowest of each of these limits (for systems after 1994) is listed below.

  • The number of directories in any path on a CD must not be more than eight. Ie: (the CD itself)/2/3/4/5/6/7/file.txt (the limit of ISO 9660). Is this valid after 1995? --DuncanNZ 12:21, 16 October 2008 (EDT)
  • The number of directories on a CD is limited to 65,535 (the limit of ISO 9660) on Windows)
  • The length of a file's path, ie: /genealogy/sources/uk_census_1851.txt, is limited to 256 characters (the limit of Windows Path Size)
  • The length of a file's name, ie: uk_census_1851.txt, is limited to 31 characters including the period and extension (the limit of the Macintosh HFS file system)
  • The size of a file is limited to 2 gigabytes (the limit of ISO 9660 and the Macintosh HFS file system)

Needing clarification

The following are not recommended but I can't find any reference to why they could be a problem. They are all ASCII characters.

# number sign Yes Not reservedref
& Ampersand Yes Not reserved(ref)
' Apostrophe Yes Not reserved(ref). Some websites have trouble handling file names containing apostrophes (PHP Bug #33198)
( and ) Parentheses Template:Maybe Unclear. Reference.
+ Plus Yes Not reserved(ref)
, Comma Yes Not reserved(ref)
; Semi colon Yes Not reserved(ref)
= Equals sign Yes Not reserved. reference.
@ At sign Yes Not reserved. Reference.
[ and ] square brackets or box brackets Yes Not reserved. Reference.
^ Caret Yes Not reserved. Reference.
_ Underscore Yes Not reserved. Reference.
{ and } Curly brackets Yes Not reserved. Reference.
~ Tilde Yes Not reserved. Reference.

External links