Technology articles and news by Dawn Ahukanna and Anni Poulsen

XML Publishing - Part 1

December 1st, 2006 by Dawn Ahukanna

XML Publishing - Part 1: Producing HTML. This article, part of a series, will show you how to do it.

Click to open all Panels
Click to close all Panels

Click on the panel titles below to view the contents.

Copyright
© 2006 Dawn Ahukanna
Terms
The copyright of this article belongs to the author, unless otherwise marked. Any kind of reproduction, or storing of any part of the design, text, images or other content in this article is not allowed without the permission of the owner.
Summary
Have you ever needed to produce HTML, PDF, Text and RTF versions of the same document, in seconds? This article, part of a series, will show you how to do it. These multiple document formats can be produced using an XML document which is transformed using XSL code. In this first article, we’ll look at producing HTML from an XML document.
Section 1: Source code for the article
Download the source code bundle from

Ref3

For convenience, I have created a source code download bundle that contains a batch script for Windows called ‘runAnt.cmd’, a sample Ant build script called ‘build.xml’, an XML file called ‘article.xml’ located in the ’src/xml’ sub-directory and an XSL file called ‘article.xsl’ located in the ’src/xsl’ sub-directory. These files will produce an HTML file of this very article. I’ve only provided a Windows batch script as most Linux users know how to write scripts etc. Leave a comment letting me know if you need a Linux/Unix version.
Source code directory structure
The source code bundle contains the following directories and files:
Code:
build.xml - Ant build script
build.properties - Properties for Ant build script
runAnt.cmd - Windows batch file for:
             1. Checking Java and Ant versions.
             2. Running Ant to create an HTML document from XML and XSL.
             3. Printing help instructions for using the batch file.

browser - Directory containing browser example and includes the files listed below.
  browser\article.css
  browser\article.js
  browser\effects.js
  browser\prototype.js
  browser\scriptaculous.js
  browser\accordion.js
  browser\articleHtml.xsl
  browser\article.xml
  browser\extern.gif
  browser\process-simple.jpg
  browser\xmldialect9.jpg
  browser\xmldialect1.jpg
  browser\xmldialect10.jpg
  browser\xmldialect7.jpg
  browser\xmldialect8.jpg
  browser\xslcode.jpg

src - Directory containing all the source code and includes the files listed below.
  src\js- Directory containing JavaScript files including scriptaculous libraries.
    src\js\effects.js
    src\js\scriptaculous.js
    src\js\accordion.js
    src\js\prototype.js
    src\js\article.js

src\css - Directory containing CSS file for the article.
  src\css\article.css

src\xsl - Directory containing XSL file for the article.
  src\xsl\articleHtml.xsl

src\images - Directory containing image files for the article and includes the files listed below.
  src\images\process-simple.jpg
  src\images\extern.gif
  src\images\xmldialect1.jpg
  src\images\xmldialect7.jpg
  src\images\xmldialect8.jpg
  src\images\xmldialect9.jpg
  src\images\xmldialect10.jpg
  src\images\xslcode.jpg

src\xml - Directory containing XML file for the article.
  src\xml\article.xml

Section 2: So you have 5 minutes …
If you really can’t wait …
If you are impatient like me and just want to see something instantly:
  1. Download the source code bundle from
  2. Ref3

    Section 1 of this article gives full details of the contents of the source code bundle.
  3. Enable JavaScript in your browser if you want the scriptaculous (JavaScript libraries) Accordion effect to work in the generated HTML file. For details see
  4. Ref4

  5. Unzip the source code bundle and go to the sub-directory called ‘browser’.
  6. Double-click on the ‘article.xml’ file.
  7. You should be able to see the HTML for this article generated by the browser’s XSLT processor. However, the HTML exists only in browser’s memory and no actual HTML file is created. To generate an actual HTML file, you can use the process described in Section 3 of this article.
Section 3: You still here ?
Introduction
Have you ever needed to produce HTML, PDF, Text and RTF versions of the same document, in seconds? This article, part of a series, will show you how to do it. These multiple document formats can be produced using an XML document which is transformed using XSL code. In this first article, we’ll look at producing HTML from an XML document.
You don’t need to know everything about HTML, XML and XSL to understand this article. The definitions section of this article contains brief descriptions and references for these acronyms.
If you follow the steps below, you should be able to use the sample XML document with the sample XSL file from the source code download bundle to generate an HTML document of this article. If you are really feeling adventurous, you could produce your own XML document and use it to generate an HTML document.
How it all started:
I had a problem. How do I write a document once and quickly produce it in different document formats? I wanted to write the documents as ordinary text documents and somehow automatically create the different document formats later. I settled on creating the originals as XML documents, using XSL code to produce the formatted document. To produce a new document format, all I had to do was to write the corresponding XSL code file.
To create the different documents formats, I needed an XSLT Processor. I decided to use the XSLT Processor embedded in Apache Ant 1.6.5 which is a standard, open source software build tool.
System Information
As the XSLT processor is provided by Apache Ant 1.6.5., which is Java-based, there needs to be a compatible Java J2SE Runtime Environment (JRE) installed on the computer. The transformation process should work on any operating system which supports Ant 1.6.5. i.e. Windows 2000, Windows XP and Unix operating systems, e.g. Linux.
Outline of the transformation process
The transformation process is shown in the diagram below.
XML Transformation Process

XML Transformation Process

The XSLT processor takes as inputs optional parameters, an XML file and an XSL file.
The XSLT processor produces a corresponding output HTML document.

The basic idea is to use one XML document containing the information and have one XSL file for each document format I wanted to produce. The advantage is that I write the source material once and generate different document formats as required in seconds.
Things you must do first
  1. You will need Administrator rights on your computer. At the very least, you need to be able to install software.
  2. Download and install a Java J2SE Runtime Environment (JRE) 1.4.x or greater from
  3. Ref1

  4. Download and install Ant 1.6.5 from
  5. Ref2

  6. Set the JAVA_HOME and ANT_HOME environment variables on your computer.
  7. Alternatively just update these variables in the batch file called ‘runAnt.cmd’ supplied in the source code download bundle. See below for details how to do this.
  8. Download the source code bundle from
  9. Ref3

    See Section 1 of this article for the full details of the contents of the source code bundle.
  10. Unzip the source code download bundle into an empty directory. If you haven’t set the JAVA_HOME and ANT_HOME environment variables, update them in the Windows batch script ‘runAnt.cmd’ using your favorite text editor. These are currently set to the installation defaults for Ant and the Java Runtime Environment (JRE).
  11. In the ‘runAnt.cmd’ batch file
    Code:
    ...
    set JAVA_HOME=C:\progra~1\java\jre1.5.0_09
    ...
    set ANT_HOME=C:\progra~1\apache-ant-1.6.5

    Note:
    For directory paths that have spaces in them, use the short 8 character names in JAVA_HOME and ANT_HOME. ‘progra~1′ is the default 8 character name for the ‘Program Files’ directory.
  12. Check that your set up works by opening a command window, changing to the extracted source code bundle directory and running the following command in the command window at the prompt: ‘runAnt -check’. You should get the following output below. In the example below, the source code download bundle was unzipped to the ‘C:\tools\xmlpublish’ directory.
  13. Change directory to the unzipped source code bundle
    Code:
    C:\> cd \tools\xmlpublish

    Run the batch script to check versions of Java and Ant
    Code:
    C:\tools\xmlpublish>runAnt.cmd -check

    Output
    Code:
    C:\tools\xmlpublish>
    Setting JAVA_HOME and ANT_HOME.
    ==========================================================
    Checking Java Version
    ==========================================================
    java version "1.4.2_12"
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_12-b03)
    Java HotSpot(TM) Client VM (build 1.4.2_12-b03, mixed mode)
    ==========================================================
    Checking Ant Version
    ==========================================================
    Unable to locate tools.jar. Expected to find it in C:\progra~1\java\j2re1.4.2_12\lib\tools.jar
    Apache Ant version 1.6.5 compiled on June 2 2005

    The check confirms the version of both the Java Runtime Environment (JRE) and Ant. In the example, the JRE version is 1.4.2 update 12 and the Ant version is 1.6.5.
    Please ignore the complaint about ‘Unable to locate tools.jar. Expected to find it in C:\progra~1\java\j2re1.4.2_12\lib\tools.jar’ as it does not affect this process. If you are interested, take a look at the batch file code for other JREs that were tested.
  14. Use the command ‘runAnt -help’ to display all the possible options or just take a look at the batch file code.
  15. Run the batch script to get usage information
    Code:
    C:\tools\xmlpublish>runAnt.cmd

    or
    Code:
    C:\tools\xmlpublish>runAnt.cmd -help

    Produces the following output:
    Code:
    C:\tools\xmlpublish>
    ==========================================================
    Using runAnt.
    ==========================================================
    Type the following:
    "runAnt -build" : To run the supplied build script.
    "runAnt -check" : To check the versions of the Java JDK and Ant installed.
    "runAnt -help" : To get this message.

  16. Enable JavaScript in your browser if you want the scriptaculous (JavaScript libraries) Accordion effect to work in the generated HTML file. See
  17. Ref4

Using the supplied sample XML and XSL

To produce an HTML document using the Ant build tool, run your ‘runAnt.bat’ batch script from the bundle download directory using the command, ‘runAnt -build’.

Code:
C:\tools\xmlpublish>runAnt.cmd -build

Successful Output
Code:
C:\tools\xmlpublish>
==========================================================
Running Ant.
==========================================================
Unable to locate tools.jar. Expected to find it in C:\progra~1\java\j2re1.4.2_12\lib\tools.jar
Buildfile: build.xml

clean:
     [echo] build.dir = ./build
   [delete] Deleting directory C:\tools\xmlpublish\build
     [echo]  publish.target.dir = ./complete
   [delete] Deleting directory C:\tools\xmlpublish\complete

init:
     [echo] start time = November 23 2006
     [echo]  build.dir = ./build
    [mkdir] Created dir: C:\tools\xmlpublish\build
     [echo]  publish.target.dir = ./complete
    [mkdir] Created dir: C:\tools\xmlpublish\complete

generate:
     [echo] generate.debug:0
     [echo] start.touch.time:23/Nov/2006 10:35 PM
     [xslt] Processing C:\tools\xmlpublish\src\xml\article.xml to C:\tools\xmlpublish\build\xmlpublish.htm
     [xslt] Loading stylesheet C:\tools\xmlpublish\src\xsl\articleHtml.xsl

copyfiles:
     [copy] Copying 15 files to C:\tools\xmlpublish\complete

copyimages:
     [copy] Copying 4 files to C:\tools\xmlpublish\complete

BUILD SUCCESSFUL
Total time: 1 second

Wow, all of this done in 1 second. I challenge anyone to try and type just the output above in one second. It makes this process very easy to repeat and test often. I’ve probably run this process at least 50 times whilst writing this article.

If the process is successful, go to the ‘complete’ sub-directory and double-click on the ‘xmlPublishing.htm’ file. Your browser should open, displaying your spanking brand-new generated copy of this article.

If the process fails, and it shouldn’t if you are using the sample, the output screen will also give details of the cause of the error.

Unsuccessful Output
Code:
C:\tools\xmlpublish>
==========================================================
Running Ant.
==========================================================
Unable to locate tools.jar. Expected to find it in C:\progra~1\java\j2re1.4.2_12\lib\tools.jar
Buildfile: build.xml

clean:
     [echo] build.dir = ./build
   [delete] Deleting directory C:\tools\xmlpublish\build
     [echo]  publish.target.dir = ./complete
   [delete] Deleting directory C:\tools\xmlpublish\complete

init:
     [echo] start time = November 23 2006
     [echo]  build.dir = ./build
    [mkdir] Created dir: C:\tools\xmlpublish\build
     [echo]  publish.target.dir = ./complete
    [mkdir] Created dir: C:\tools\xmlpublish\complete

generate:
     [echo] generate.debug:0
     [echo] start.touch.time:23/Nov/2006 10:35 PM
     [xslt] Processing C:\tools\xmlpublish\src\xml\article.xml to C:\tools\xmlpublish\build\xmlpublish.htm
     [xslt] Loading stylesheet C:\tools\xmlpublish\src\xsl\articleHtml.xsl
     [xslt] C:\tools\xmlpublish\src\xsl\articleHtml.xsl:304:47: Fatal Error! Could not find template named: buildImage
     [xslt] : Fatal Error! Fatal error during transformation Cause: Fatal error
during transformation
     [xslt] Failed to process C:\tools\xmlpublish\src\xml\article.xml

BUILD FAILED
C:\tools\xmlpublish\build.xml:92: Fatal error
during transformation

Total time: 0 seconds

In the output above, I had made an error in naming my XSL template that was being referred to on line 304, column 47 in articleHtml.xsl. This built-in level of error reporting and debug information is one of the reasons I decided to use Ant 1.6.5 as the XSLT processor.
You should be able to see the generated HTML file for this article in a directory called ‘complete’. Open the ‘xmlpublish.htm’ file in your browser and you should see this very article. To get an explanation of the XML dialect created and used for this article go to Section 4, otherwise head straight to Section 5.
Section 4: Now for the explanation …
I decided to supply ‘real world’ examples of the Ant build script, XML and XSL files. Whilst they do not embody every possible best practice, they are fully ‘working’ examples that can also be used as a reference. As opposed to really simple examples that don’t really tell you anything that is not obvious.
How many times have you asked for an example that was a complete solution as opposed to code snippets ?
If you are not familiar with XML or XSL, see the references and definitions sections of this article. The next 2 sub sections explain the structure of the XML for the article and the XSL code.
The sample XML file contains a completely made up XML dialect (XML tag set) to describe the article in a way that is intuitive for me. Hopefully, it will be as intuitive for you. If it’s not, change it and make sure you reflect the changes in the XSL file.
XML dialect basic principles
    I have applied some personal preferences and principles for creating the XML dialect. Explaining them may help to understand the XML dialect and the XSL code for this article.
  1. XML structure=Document structure

    The XML document structure, order, position of elements should be the same in the transformed document. This just makes it easy to edit the XML document in a text editor, makes it easy to have a good idea of the resulting document structure and helps with debugging XSL errors. But that doesn’t stop you from changing it in the transforming XSL, it just means less XSL code.

  2. Use short, descriptive, meaningful tag names

    A ‘list’ tag groups different kinds of other tags.

  3. Use plural tag names to group singular tag names

    For tags grouping other tags of the same type, use plural names. So the ‘references’ tag will always contain the ‘reference’ tags. Same applies for the ‘definitions’ tag etc.

  4. Create the tags to simplify the XSL code

    To simply the XSL code, I have ‘text’ tags surrounding text so that I don’t have to include a lot of formatting logic.

  5. Declare and use variables

    Don’t hard code values in your XSL code. Yes, it is quicker but a lazy and sloppy practice. Even if you have a photographic memory, you cannot remember every single line you wrote and, more importantly, why. Plus I’m sure you have better things to use to clog your memory banks. Using variables means fewer changes to the code, therefore less syntax and logic errors. It also makes the code easy to read and maintain, so that you don’t need a photographic memory.

  6. Be consistent

    Use attributes to supply specific information about a particular tag and make them optional. If I’m checking any attribute of a tag in my XSL code, I have to assume that it may not be present and code accordingly. I can also use the same XSL code, if the attribute is present on more than one tag e.g. the title attribute.

  7. Comment, comment, comment

    You can never have too many comments. For XSL I try to keep my comments on the same line as the code, if it fits on one line, to reduce the blank lines in the transformed HTML document.

  8. Include debugging code

    You can never do too much testing. See my tip on using ‘xsl:comment’ for debugging.

  9. Ref5

  10. Don’t get too hung up on frameworks and standards

    Frameworks and standards are there to help not hinder you, but they are not a ’silver bullet’. For instance, I could have used DocBook XML and XSL for this exercise. If you can code something yourself that fully meets the requirement, follows best software practice and is a more productive, predictable use of your time that does the job, just do it.

There are probably a few more points but you get the basic method to my madness. Self explanatory XML, minimal and flexible XSL code, both containing comments, debugging code and the coder clearly identified.
XML dialect (or XML tag set) for the article
The overall structure of the XML dialect could also be represented as a DTD or XSD. That could be something to try as I’m not supplying any. The XML dialect is made up of the following set of tags. All the relationships are defined with reference to the ‘article’ root tag. The structure has been updated and changed as I’ve added more information to the article or made changes to the XSL code and CSS stylesheet. Every tag has an optional ‘title’ element or attribute that is used to give it a headline.

Article XML dialect structure for the article. This could also be represented as a DTD or XSD.

Code:
article (root tag)
  copyright (child tag) - Contains text only.
  terms (child tag) - Contains text only.
  title (child tag) - Contains text only.
  subtitle (child tag) - Contains text only.
  summary (child tag) - Contains text only.
  sections (child tag) - Contains a set of one or more 'section' tags.
  references (child tag) - Contains a set of one or more 'reference' tags.
  definitions (child tag) - Contains a set of one or more 'definition' tags.

The section tag can contain the following tags.

Section XML dialect structure for the article. This could also be represented as part of a DTD or XSD.

Code:
sections (child tag)
  section (grandchild tag) - Contains other various tags.
  The optional 'type' attribute is used to apply different presentation styles.

section (grandchild tag)
  subsection (great grandchild tag) - Is exactly the same as a section.

  text (great grandchild tag) - Contains text only.
    The optional 'stack' attribute is used to apply different float
    presentation styles.

  image (great grandchild tag) - Contains a 'url' tag and a 'description' tag.
    The optional 'stack' attribute is used to apply different float
    presentation styles.

  url (great great grandchild tag) -
    Contains the full URL that identifies the source of the image

  description (great great grandchild tag) -
    Contains text that describes the external reference and
    is used for the alt and title HTML tags.

  code (great grandchild tag) - Contains text only, for displaying code.
    The optional 'stack' attribute is used to apply different float
    presentation styles.

  output (great grandchild tag) -
    Contains text only, for displaying the resulting output of any code.

  reflink (great grandchild tag) -
    Contains an 'id' attribute that connects to a 'reference' tag.

  list (great grandchild tag) -
    Contains a set of any of section 'grandchild' tags as well as the 'item' tag.
    The 'ordered' attribute indicates whether the list is numbered or bullets.
    ordered='true' means 'item' tags are numbered, otherwise they are bullets.

  item (great great grandchild tag) -
    Contains text and is formatted as a bullet or number.

The references tag contains the following tags.

Reference XML dialect structure for the article. This could also be represented as part of a DTD or XSD.

Code:
references (child tag)

  reference (grandchild tag) - Contains a set of 'title', 'url' and 'description' tags.

reference (grandchild tag)
  title (great grandchild tag) - Contains title of the external reference.

  url (great grandchild tag) -
    Contains the full URL that identifies the source of the external reference.

  description (great grandchild tag) -
    Contains text that that describes the external reference and
    is  used for the alt and title HTML tags.

The definitions tag contains the following tags.

Definitions XML dialect structure for the article. This could also be represented as part of a DTD or XSD.

Code:
definitions (child tag)
  text (grandchild tag) - Contains text only.

  definition (grandchild tag) - Contains a set of 'text' and
  one or more 'url' tags.

definition (grandchild tag)
  text (great grandchild tag) - Contains text only.
  The optional 'stack' attribute is used to apply different float
  presentation styles.

  url (great grandchild tag) -
    Contains the full URL that identifies the source of the external reference.
    The 'type' attribute defines whether it is a definition or tutorial URL.

XSL code for the article XML dialect
If you examine the commented XSL code, it basically carries out the
rules I’ve defined above on the article XML dialect to produce an HTML version of this article. It should be easy to follow and I’ve included code that uses alternate XSL code to do the same thing. I also have XSL code specifically for generating the scriptaculous accordion panel effect like ‘buildPanel’.

All the templates defined in the XSL code.

Code:
xsl:param 'libDebug' - debugging input parameter to the XSLT processor
                       supplied externally from Ant.

xsl:param 'timeStamp' - time stamp input parameter to the XSLT processor
                        supplied externally from Ant.

xsl:template match='n' - Where 'n' is an XML tag/node.
                         For each XML tag that is 'matched' from the XML
                         document, the XSL code in the matched template is run.

xsl:template name='m' - Where 'm' is a custom XSL template.
                        The XSL code is only run when the template is
                        explicitly called using 'xsl:call-template' element.
                        These templates are prefixed with the word 'build'.

Section 5: What next?
Got something to say?
Hope you had some fun working through or just reading this. If you have found this useful, easy, difficult, too long, too short etc. or you just want to express an opinion, please leave me a comment.
Ready for the next bit? …
Coming Soon: In Part 2, create your very own XML document and the corresponding XSL to transform the XML document to HTML.
Definitions
This section provides very brief descriptions and references for all the relevant technologies mentioned in this article.
What is XML?
XML is a markup language much like HTML that was designed to describe data, but the tags are not predefined.

Definition:
http://en.wikipedia.org/wiki/XML'external link'

FAQ:
xml.silmaril.ie/basics/whatisxml/'external link'

Tutorial:
http://www.w3schools.com/xml/default.asp'external link'

What is XSL?

XSL describes how an XML document should be displayed. XSL consists of three parts

XSLT - a language for transforming XML documents

XPath - a language for navigating inside XML documents

XSL-FO - a language for visually formatting XML documents


Run the sample in the bundle download, if you can’t wait.

Definition:
http://en.wikipedia.org/wiki/Extensible_Stylesheet_Language'external link'

Tutorial:
http://www.w3schools.com/xsl/xsl_languages.asp'external link'

What is XQuery?
XQuery is a language for querying any system that can produce XML data such as XML files, databases that produce XML data etc.

Definition:
http://en.wikipedia.org/wiki/XQuery'external link'

Tutorial:
http://www.w3schools.com/xquery/default.asp'external link'

What is DTD?
A DTD (Document Type Definition) defines the legal building blocks and structure of an XML document.

Definition:
http://en.wikipedia.org/wiki/Document_Type_Definition'external link'

Tutorial:
http://www.w3schools.com/dtd/default.asp'external link'

What is XSD?
An XSD (Xml Schema Definition) describes the structure of an XML document and is a more recent alternative to a DTD.

Definition:
http://en.wikipedia.org/wiki/XSD'external link'

Tutorial:
http://www.w3schools.com/schema/default.asp'external link'

What is an XSLT processor?
An XSLT processor is a program that produces the converted document by applying the rules defined in the XSL document to the source XML document. Most modern browsers and build tools contain embedded XSLT processors.
What is a build tool?
A build tool is a software tool that converts source code files into executable code. An Example of this is Apache Ant.

Definition:
http://en.wikipedia.org/wiki/Build_tool'external link'

What is an XML dialect?
A set of XML pre-defined tags arranged in a specific way, as described in a DTD or XSD. A set of tags can be created to capture any sort of structured information. There are a number of XML tag sets or dialects that are defined and commonly used like HTML, RSS, DocBook, MathML, XSL, WSDL, ebXML and the list goes on.

Technorati Tags:




Leave a Reply

ADVERTISEMENT

BY AUTHOR

LATEST COMMENTS

  • Anni: In the case of the Online Rape Plot, I don’t understand, why the internet is being blamed for what these...
  • Anni: It took me a while to figure out that you need to set the “level of window” to “below”...
The copyright to all content on this web site belongs to the authors, unless otherwise marked. Any kind of reproduction, scraping or storing of any part of the design, text, images or other content on this web site is not allowed.
PRIVACY POLICY AND TERMS OF USE