Publication Guidelines
From DocWiki
This DocWiki first describes a document management system that allows:
- Multiple author support
- Authors to document in a single, canonical form
- Collaboration support
- Mixing-and-matching of content from multiple pages and articles to re-purpose for different documents, and
- Excellent version/revision control
Which then allows a publication system that supports:
- Single source publishing: publish in multiple formats (HTML, PDF, doc, csv, RTF?),and
- Separate theming of output products for different users, preferably using CSS.
Furthermore, all of this is to be done using set structures and templates and within the MIKE2.0 organizational framework. This will bring efficiency and consistency to the CIS documentation.
Contents |
Basic Workflow
Export as XHTML
To be provided.
Conversion of XHTML
The conversion script of the exported XHTML does a number of essential tasks:
- It removes any tables of content, if such exist
- It removes the [Edit] commands at section heads, if such exist
- It allows a new root URL to be determined, for use in the new target site
- It makes some clean up and corrections to image conversions
- It lists out images that may need to be separately uploaded to the new target site.
Further Info to be Organized
Mediawiki API
- Clean XHTML can be generated directly from the Mediawiki API. This can be done directly via URL with the action=render command. For example:
- http://www.mediawiki.org/wiki/API:Parsing_wikitext
- http://www.mediawiki.org/wiki/Manual:Parameters_to_index.php
- http://docwiki.citizen-dan.org/api.php
Wikitext Converters
PHP
- Perhaps the best choice from a PHP developers point of view is the Text_Wiki PEAR Package. It "transforms Wiki and BBCode markup into XHTML, LaTeX or plain text markup." It has even a adapter for Mediawiki code
- PHP: http://www.ffnn.nl/pages/projects/wiki-text-to-html.php
- PHP, with good explanation and online demo: http://www.novell.com/communities/node/3370/wiki-cool-solutions-converter
Others
- Wiky is a clientside Wiki markup to HTML converter written in javascript. As it is bidirectional, it can convert Wiki markup to HTML and later convert that generated HTML text back to Wiki markup.
- http://confluence.atlassian.com/pages/viewpage.action?pageId=157448
- http://deplate.sourceforge.net/
- http://search.cpan.org/~migo/wikitext-perl-1.01/lib/Text/WikiText.pm
- http://code.google.com/p/gwtwiki/wiki/Mediawiki2HTML
Use of Mediawiki
- http://www.mediawiki.org/wiki/Extension:Collections
- http://www.wikipublisher.org/wiki/ (PMWiki only)
- http://www.mediawiki.org/wiki/Alternative_parsers
- https://code.google.com/p/wikidocbook/
- Documentation of how the internal Mediawiki parser to HTML works: http://www.mediawiki.org/wiki/User:HappyDog/WikiText_parsing
- Magnus' magic MediaWiki-to-XML-to-stuff converter; VERY NICE and in PHP: http://toolserver.org/~magnus/wiki2xml/w2x.php
- http://fedoraproject.org/wiki/DocsProject/Wiki2XML
- http://github.com/dcramer/py-wikimarkup
- WikiModel: http://code.google.com/p/wikimodel/ (Nepomuk project; incredibly interesting and useful)
- SwiD (semantic wiki editor, based on WikiModel): http://code.google.com/p/swid/wiki/SwidFeatures
- http://wikimodel.sourceforge.net/ (same project, but older project host??)
- http://www.uop.edu.jo/download/PdfCourses/DS/D1.2_v10_CDS-Tools.pdf.
- UIMA Wikipedia collection reader v.0.4: http://www.fabienpoulard.info/index.php?tpost/en/2010/03/04/Release-of-the-Wikipedia-collection-reader-v04
HTML to DocBook
- From http://wiki.docbook.org/topic/Html2DocBook:
- Convert all of your HTML to XHTML using Tidy. Enable 'enclose-block-text' in the configfile, else any unenclosed text (where this is allowed under XHTML Transitional but not under XHTML Strict) will vanish.
- Use the XSL stylesheet (below) to convert the XHTML into DocBook (There's no way to merge the multiple XHTML files into a single document, so the stylesheet converts each HTML page into a <section>). Be sure to pass in the filename (minus the extension) as a parameter. This will become the section id.
- Combine the multiple DocBook <section> files into a single file, and re-arrange the sections into the proper order
- Correct any validity errors. (At this point, there are likely to be a few, depending on how good the original HTML was.)
- Peruse the now valid DocBook document, and look for the following:
- Broken links
- <xref> elements that should be <link>s
- Missing headers (the heading logic isn't perfect. You'll lose at most 1 header per page, though, and most pages come through with all headers intact.)
- Overuse of <emphasis> and <emphasis role="bold">
- also see XSLT example
- http://code.google.com/p/gwtwiki/wiki/Mediawiki2Docbook
Stylesheets (CSS)
- http://www.re.be/css2xslfo/examples.xhtml
- CSStoXML is a little command-line java-tool based on CSSParser for converting CSS2 to XML. There is also an option to XSL-transform generated XML
- http://snippets.dzone.com/posts/show/7510 (Ruby)
- From http://www.xmlplease.com/xmlcss: <?xml-stylesheet type="text/css" href="products.css"?>, a so-called Processing Instruction pointing to a CSS stylesheet. A very small W3C standard defines how an XML document can include a CSS or an XSLT stylesheet, Associating Style Sheets with XML documents