Difference between revisions of "Manual Generation 3.0"

From Gramps
Jump to: navigation, search
(html to docbook)
(29 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:Documentation]][[Category:Translators/Categories]]
 
 
Creation of the GRAMPS manual (docbook/pdf/html) starting from the [[Gramps 3.0 Wiki Manual]]. This is a work in progress, no decision has been taken on how to proceed with this.  
 
Creation of the GRAMPS manual (docbook/pdf/html) starting from the [[Gramps 3.0 Wiki Manual]]. This is a work in progress, no decision has been taken on how to proceed with this.  
  
 
==How creating a manual starting from wiki ?==
 
==How creating a manual starting from wiki ?==
 +
===MediaWiki to OpenDocument===
 +
*[http://www.mediawiki.org/wiki/Extension:OpenDocument_Export OpenDocument Export] extension makes possible to export single pages or [http://www.mediawiki.org/wiki/Extension:Collection collections] from MediaWiki in OpenDocument Text format (.odt).
 +
 +
===MediaWiki to PDF===
 +
*[http://www.mediawiki.org/wiki/Extension:PDF_Writer PDF writer] extension makes possible to export single pages or [http://www.mediawiki.org/wiki/Extension:Collection collections] from MediaWiki in PDF format (.pdf).
 +
*[[Manual Generation 3.0 Prince|Conversion using Prince XML]]
  
 
===XML to XML===
 
===XML to XML===
 
# Wikipedia use [http://www.mediawiki.org/wiki/Wikipedia_DTD Wikimedia DTD], a format based on XML, for sharing his data. SGML, docbook are based on XML too.
 
# Wikipedia use [http://www.mediawiki.org/wiki/Wikipedia_DTD Wikimedia DTD], a format based on XML, for sharing his data. SGML, docbook are based on XML too.
# We can make a test for [http://meta.wikimedia.org/wiki/Help:Export exporting our wiki data to Wikimedia DTD].
+
# We can [[Rollover_for_the_manual|make a test]] for [http://meta.wikimedia.org/wiki/Help:Export exporting our wiki data to Wikimedia DTD].
 
# To generate a script (XSLT, python, perl, sh ?) for parsing data from Wikimedia DTD to docbook/SGML.
 
# To generate a script (XSLT, python, perl, sh ?) for parsing data from Wikimedia DTD to docbook/SGML.
 +
* [http://johnmacfarlane.net/pandoc/ Pandoc] will convert files from one markup format into another.
  
 
===Text to XML===
 
===Text to XML===
# All wiki pages are saved as txt: ''header.txt, preface.txt, chapter_01.txt, ...'', which could be included into one [http://txt2tags.sourceforge.net/userguide/includecommand.html#5_9 file] later.  
+
* All wiki pages are saved as txt: ''header.txt, preface.txt, chapter_01.txt, ...'', which could be included into one [http://txt2tags.sourceforge.net/userguide/includecommand.html#5_9 file] later.[http://txt2tags.sourceforge.net/index.html txt2tags] supports [http://txt2tags.wordpress.com/2008/07/26/7-years-of-txt2tags/ Wikipedia].
# We make a python script to do the conversion for working with [http://txt2tags.sourceforge.net/index.html txt2tags], maybe sed commands according to txt2tags [http://txt2tags.sourceforge.net/sample.t2t sample] and current [http://www.mediawiki.org/wiki/Help:Formatting Mediawiki grammar] or to modify the GPL script for working with mediawiki code. Also, there is possible output filter [http://txt2tags.sourceforge.net/userguide/PostProc.html#8_5 function] on txt2tags which could help us.
+
* Output will be a full gramps.xml/gramps.sgml file with utf8 encoding to avoid non-ASCII characters issues. The present Makefiles in GRAMPS can create html/manual/pdf from these xml files. Possible solution for keeping docbook : [http://openjade.sourceforge.net/ OpenJade] + [http://en.wikipedia.org/wiki/DSSSL DSSSL]. Note that yelp may open xhtml too.
# Output will be a full gramps.xml/gramps.sgml file with utf8 encoding to avoid non-ASCII characters issues. The present Makefiles in GRAMPS can create html/manual/pdf from these xml files. Possible solution for keeping docbook : [http://openjade.sourceforge.net/ OpenJade] + [http://en.wikipedia.org/wiki/DSSSL DSSSL]. Note that yelp may open xhtml too.
+
  
 
We should keep an eye on official developments here: [http://www.mediawiki.org/wiki/DocBook_XML_export]
 
We should keep an eye on official developments here: [http://www.mediawiki.org/wiki/DocBook_XML_export]
Line 19: Line 24:
 
An alternative is to proceed as [http://en.wikibooks.org/wiki/Wikibooks:Print_versions Wikibooks] do.
 
An alternative is to proceed as [http://en.wikibooks.org/wiki/Wikibooks:Print_versions Wikibooks] do.
 
===PHP===
 
===PHP===
[http://tools.wikimedia.de/~magnus/wiki2xml/w2x.php wiki2xml] is a [http://svn.wikimedia.org/svnroot/mediawiki/trunk/wiki2xml/php/wiki2xml.php GPL script] for parsing MediaWiki.
+
* [http://tools.wikimedia.de/~magnus/wiki2xml/w2x.php wiki2xml] is a [http://svn.wikimedia.org/svnroot/mediawiki/trunk/wiki2xml/php/wiki2xml.php GPL script] for parsing MediaWiki.
 +
 
 +
* [http://wikirenderer.berlios.de/en/ WikiRenderer] is a php component which can parse a wiki content, and transform it to XHTML content, to any other markup language, or to an other wiki content with a different syntax. Sounds correct with [http://www.dokuwiki.org/syntax dokuwiki syntax], which is not far away (headline rule inversed) from Mediawiki syntax ! => [http://wikirenderer.berlios.de/en/demo.php Demo]
  
 
===wt2db===
 
===wt2db===
Line 27: Line 34:
 
[[Manual Html Generation|A Python program]] can be used to generate html from the text of the Gramps manual wiki pages.
 
[[Manual Html Generation|A Python program]] can be used to generate html from the text of the Gramps manual wiki pages.
  
===html to html translation===  
+
===xhtml to ODT===
 +
[http://gitorious.org/xhtml2odt xhtml2odt] stylesheets convert namespaced XHTML to [http://en.wikipedia.org/wiki/OpenDocument ODT].
 +
 
 +
===html to html translation===
 +
 
 +
* Translate toolkit
 +
 
 
We can try to translate generated html by using [http://sourceforge.net/projects/translate/ translate toolkit]
 
We can try to translate generated html by using [http://sourceforge.net/projects/translate/ translate toolkit]
 
  html2po <html> > <pot>
 
  html2po <html> > <pot>
Line 33: Line 46:
 
  po2html -t <html> -i <new_po> -o <new_html>  
 
  po2html -t <html> -i <new_po> -o <new_html>  
 
where <x> is the file format, use your names.
 
where <x> is the file format, use your names.
 +
 +
* GNUnited Nations
 +
 +
[http://www.gnu.org/software/gnun/ GNUnited Nations (GNUN)] is a build system for www.gnu.org translations. It generates a PO template (.pot) for an original HTML article, and merges the changes into all translations, which are maintained as PO (.po) files. Finally, it regenerates the translations in HTML format.
 +
 +
The goal of GNUN is to make maintenance of gnu.org translations easier and to avoid the effect of seriously outdated translations when a particular team becomes inactive.
  
 
===[[Manual_Html_Generation|html]] to docbook===
 
===[[Manual_Html_Generation|html]] to docbook===
[http://wiki.docbook.org/topic/Html2DocBook Html2Docbook] converts project documentation from HTML to DocBook.
+
* [http://wiki.docbook.org/topic/Html2DocBook Html2Docbook] converts project documentation from HTML to DocBook.
 
# Convert all of your HTML to XHTML using [http://tidy.sourceforge.net/ Tidy]. Enable 'enclose-block-text' in the configfile, else any unenclosed text (where this is allowed under XHTML Transitional but not under XHTML Strict) will vanish.
 
# Convert all of your HTML to XHTML using [http://tidy.sourceforge.net/ Tidy]. Enable 'enclose-block-text' in the configfile, else any unenclosed text (where this is allowed under XHTML Transitional but not under XHTML Strict) will vanish.
 
# Use the XSL stylesheet (below) to convert the XHTML into [http://wiki.docbook.org/topic/DocBook DocBook] (There's no way to merge the multiple XHTML files into a single document, so the stylesheet converts each HTML page into a ''section''). Be sure to pass in the filename (minus the extension) as a parameter. This will become the section id.
 
# Use the XSL stylesheet (below) to convert the XHTML into [http://wiki.docbook.org/topic/DocBook DocBook] (There's no way to merge the multiple XHTML files into a single document, so the stylesheet converts each HTML page into a ''section''). Be sure to pass in the filename (minus the extension) as a parameter. This will become the section id.
Line 41: Line 60:
 
# Correct any validity errors. (At this point, there are likely to be a few, depending on how good the original HTML was.)
 
# Correct any validity errors. (At this point, there are likely to be a few, depending on how good the original HTML was.)
 
# Peruse the now valid [http://wiki.docbook.org/topic/DocBook DocBook] document, and look for the following:
 
# Peruse the now valid [http://wiki.docbook.org/topic/DocBook DocBook] document, and look for the following:
*Broken links ''xref'' elements that should be ''link''s
+
Broken links ''xref'' elements that should be ''link''s
*Missing headers (the heading logic isn't perfect. You'll lose at most 1 header per page, though, and most pages come through with all headers intact.)
+
*Overuse of ''emphasis'' and ''emphasis role="bold"''
+
  
[http://www.eecs.umich.edu/~ppadala/projects/tidy/ html2db] is a small utility to convert HTML to Docbook SGML/XML. It uses [http://tidy.sourceforge.net/ TidyLib] for parsing the HTML.
+
Missing headers (the heading logic isn't perfect. You'll lose at most 1 header per page, though, and most pages come through with all headers intact.)
  
[http://search.cpan.org/dist/html2dbk/ HTML::ToDocBook] is CPAN perl module who converts an XHTML file into DocBook.
+
Overuse of ''emphasis'' and ''emphasis role="bold"''
 +
 
 +
*[http://search.cpan.org/dist/html2dbk/ HTML::ToDocBook] is CPAN perl module who converts an XHTML file into DocBook.
  
 
==Manual Text Guidelines==
 
==Manual Text Guidelines==
Line 99: Line 118:
 
===FOP (need Java)===
 
===FOP (need Java)===
 
see [http://www.gramps-project.org/wiki/index.php?title=Manual_Generation Manual generation]
 
see [http://www.gramps-project.org/wiki/index.php?title=Manual_Generation Manual generation]
 +
 +
==A Test==
 +
There is a user request for a {{bug|2132}}:''downloadable text format users manual'' on bug manager.
 +
 +
Steps:
 +
# go to webpages of the manual (wiki), and save full local copies.
 +
# create an empty file and Ctrl+c/Ctrl+v with HTML <body> codes (1.) without scripts or javascript
 +
# copy all images into one directory, change href links on code
 +
# make some href links as relative links
 +
# add/clean anchors
 +
# using [http://tidy.sourceforge.net/ Tidy]
 +
 +
[[Category:Documentation]]
 +
[[Category:Translators/Categories]]

Revision as of 19:21, 17 October 2011

Creation of the GRAMPS manual (docbook/pdf/html) starting from the Gramps 3.0 Wiki Manual. This is a work in progress, no decision has been taken on how to proceed with this.

How creating a manual starting from wiki ?

MediaWiki to OpenDocument

MediaWiki to PDF

XML to XML

  1. Wikipedia use Wikimedia DTD, a format based on XML, for sharing his data. SGML, docbook are based on XML too.
  2. We can make a test for exporting our wiki data to Wikimedia DTD.
  3. To generate a script (XSLT, python, perl, sh ?) for parsing data from Wikimedia DTD to docbook/SGML.
  • Pandoc will convert files from one markup format into another.

Text to XML

  • All wiki pages are saved as txt: header.txt, preface.txt, chapter_01.txt, ..., which could be included into one file later.txt2tags supports Wikipedia.
  • Output will be a full gramps.xml/gramps.sgml file with utf8 encoding to avoid non-ASCII characters issues. The present Makefiles in GRAMPS can create html/manual/pdf from these xml files. Possible solution for keeping docbook : OpenJade + DSSSL. Note that yelp may open xhtml too.

We should keep an eye on official developments here: [1]

Wikibooks

An alternative is to proceed as Wikibooks do.

PHP

  • WikiRenderer is a php component which can parse a wiki content, and transform it to XHTML content, to any other markup language, or to an other wiki content with a different syntax. Sounds correct with dokuwiki syntax, which is not far away (headline rule inversed) from Mediawiki syntax ! => Demo

wt2db

wt2db converts a text file in a special format similar to that used in WikiWikiWebs into DocBook XML/SGML

wiki text to html

A Python program can be used to generate html from the text of the Gramps manual wiki pages.

xhtml to ODT

xhtml2odt stylesheets convert namespaced XHTML to ODT.

html to html translation

  • Translate toolkit

We can try to translate generated html by using translate toolkit

html2po <html> > <pot>
msgmerge --no-wrap <po> <pot> > <new_po>
po2html -t <html> -i <new_po> -o <new_html> 

where <x> is the file format, use your names.

  • GNUnited Nations

GNUnited Nations (GNUN) is a build system for www.gnu.org translations. It generates a PO template (.pot) for an original HTML article, and merges the changes into all translations, which are maintained as PO (.po) files. Finally, it regenerates the translations in HTML format.

The goal of GNUN is to make maintenance of gnu.org translations easier and to avoid the effect of seriously outdated translations when a particular team becomes inactive.

html to docbook

  • Html2Docbook converts project documentation from HTML to DocBook.
  1. Convert all of your HTML to XHTML using Tidy. Enable 'enclose-block-text' in the configfile, else any unenclosed text (where this is allowed under XHTML Transitional but not under XHTML Strict) will vanish.
  2. Use the XSL stylesheet (below) to convert the XHTML into DocBook (There's no way to merge the multiple XHTML files into a single document, so the stylesheet converts each HTML page into a section). Be sure to pass in the filename (minus the extension) as a parameter. This will become the section id.
  3. Combine the multiple DocBook section files into a single file, and re-arrange the sections into the proper order
  4. Correct any validity errors. (At this point, there are likely to be a few, depending on how good the original HTML was.)
  5. Peruse the now valid DocBook document, and look for the following:

Broken links xref elements that should be links

Missing headers (the heading logic isn't perfect. You'll lose at most 1 header per page, though, and most pages come through with all headers intact.)

Overuse of emphasis and emphasis role="bold"

  • HTML::ToDocBook is CPAN perl module who converts an XHTML file into DocBook.

Manual Text Guidelines

In order for the above scripts to work, we need to limit ourselves to a limited set of templates and syntax in the manual, as we cannot support everything. Hence, the Manual section does not have the full capabilities as a normal Mediawiki.

General textual guidelines

  • Only approved templates may be used. These are
    1. {{grampsmanualcopyright}}: the copyright template. This will be stripped out on manual generation.
    2. {{man label|Labels}}: template for GUI elements, example: Labels
    3. {{man button|Buttons}}: template for GUI buttons Buttons
    4. {{man tip| 1=title |2=text.}}: template to add a tip in the text
    5. {{man note| title |text}}: template to add a note to the text
    6. {{man warn| title |text}}: template to add a warning to the text
    7. {{man index|prevpage|nextpage}}: template to add the bottom index bar. This will be stripped out on manual generation.
    8. {{languages}}: template to add language bar. This will be stripped out on manual generation.
  • The following markup code may be used:
    1. ''' bold ''': for bold or menuselections in GRAMPS, eg. Edit->Preferences
    2. '' italic '': for italic or filenames in GRAMPS, eg. filename
    3. <code> code sections</code>: for commands you type in the command line.
    4. <tt>''Replaceable text''</tt>: for GUI elements the user must type in replacing text, eg John Doe.
    5. <tt>Anything you type in</tt>: for GUI elements the user must type in, eg John Doe.

Tables, lists

Tables are special. Can we support them? Perhaps only tables via a template? As far as I can see present manual only has one table, we can easily change that in a list.

Lists. Can we support the list tokens: * and #, also nested?

Indent. Can we support the indent token: :

Cross referencing images/text

One may NOT link to pages, only to subsections in the Manual, so

  • [[manualpage#subsection]]: only this is allowed
  • [http://non_manual_subsection description]: all the other links must be cross site links.

One may also link to Manual images in the text. See discussion on Talk:Manual_Generation_3.0


Pdf output

JadeTeX (need TeX)

openjade -t tex -d DSSL DCL file_XML
  • DSSL is the stylesheet
  • DCL is the declaration, something like : /usr/share/sgml/openjade-1.3.2/pubtext/xml.dcl
pdfjadetex TEX_file

xmlto (need PassiveTeX and TeX)

xmlto pdf mydoc.xml

Is it Tim Waugh ? Ancestors.xsl Birthday.xsl

FOP (need Java)

see Manual generation

A Test

There is a user request for a 2132:downloadable text format users manual on bug manager.

Steps:

  1. go to webpages of the manual (wiki), and save full local copies.
  2. create an empty file and Ctrl+c/Ctrl+v with HTML <body> codes (1.) without scripts or javascript
  3. copy all images into one directory, change href links on code
  4. make some href links as relative links
  5. add/clean anchors
  6. using Tidy