CIS 97YT Index > How XSLT Works

Beginning XSLT

Read pages 194 through 207 of Chapter 6 in Learning XML.

Corrections

On page 202, do not include the processing instruction in any of the work you'll be doing for this class; we'll be using the Xalan XSLT processor as an independent application. In any case, there's a misprint; the processing instruction should read:

<?xml-stylesheet type="text/xsl" href="mytrans.xsl"?>

On page 204, replace this:

<comment>
  Find out which episode
</comment>

with the following (which makes part of Table 6-3 correct):

<!-- Find out which episode -->

Document Madness

When you learned about DTDs, you had an XML file and a DTD file that described how to validate it. Since the DTD didn't look very much like XML, it was easy to tell them apart.

When you used Relax NG, you had two XML-format files. One of them, which we referred to as the “target file,” was the file to be validated. It had tags that described a catalog or a movie list or some such. The second file was a list of instructions that told the validator what to expect when it parsed the first file. Since they both looked like XML, this caused some confusion. (For example, where did <empty/> really belong, and what did it really mean?)

When you learned about Cascading Style Sheets, the confusion went away. A CSS stylesheet doesn't look anything like XML; it's just a list of specifications of how to visually present the elements in the XML file.

Now that you're learning about XSLT, you have three documents to deal with, and at least two of them are in XML format.

  1. A source document; the document that you want to transform. It could be a catalog, a movie list, or an XHTML document.

  2. A transformation document that specifies which portions of the source document you wish to extract and manipulate. The transformation document also tells what output to produce for each chosen portion of the source document. This document is also called the XSLT stylesheet.

  3. An output document that is the result of passing the source document through the transformation document.

Because the XSLT stlyesheet contains two different markups (the transformation commands and the output elements), it must use namespaces to differentiate them. By convention, the xsl: prefix is used for any of the XSL elements. The XSLT stylesheet also refers to the input document in order to tell which items to extract and process, but those are always inside attributes, so we don't need another namespace to differentiate items from the input document.

Here's the sample stylesheet on pages 203 and 204. Elements that are in the XSLT namespace are shown in dark red; selections from the input document are shown in dark turquoise and italic; and output is shown in bold black.

<xsl:stylesheet id="quotes"
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="quotelist">
    <html>
        <body>
            <h1>Quotes</h1>
            <xsl:apply-templates/>
        </body>
    </html>
</xsl:template>

<xsl:template match="quote | aphorism">
    <blockquote>
        <xsl:apply-templates/>
    </blockquote>
</xsl:template>

<xsl:template match="body">
    <p><xsl:apply-templates/></p>
</xsl:template>

<xsl:template match="source">
    <p align="right"><xsl:apply-templates/></p>
</xsl:template>

</xsl:stylesheet>

The XSLT Processing Cycle

Let's build the example stylesheet shown on page 198 to transform the XML on pages 197 and 198 into HTML. Here's the stylesheet, colorized for your reading pleasure. The line numbers at the left are for reference in the following explanation.

     1  <xsl:stylesheet version="1.0"
     2      xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     3  
     4  <xsl:template match="manual">
     5      <html>
     6      <head>
     7          <title><xsl:value-of select="attribute::id"/></title>
     8      </head>
     9      <body>
    10          <h2><xsl:value-of select="attribute::id"/></h2>
    11          <xsl:apply-templates/>
    12      </body>
    13      </html>
    14  </xsl:template>
    15  
    16  <xsl:template match="parts-list">
    17      <h3>Parts List</h3>
    18      <ul>
    19          <xsl:apply-templates/>
    20      </ul>
    21      <hr />
    22  </xsl:template>
    23  
    24  <xsl:template match="part">
    25      <li>Part <xsl:value-of select="attribute::label"/>
    26          (<xsl:value-of select="attribute::count"/>) -
    27          <xsl:apply-templates/>
    28      </li>
    29  </xsl:template>
    30  
    31  <xsl:template match="instructions">
    32      <h3>Instructions</h3>
    33      <ol>
    34          <xsl:apply-templates/>
    35      </ol>
    36  </xsl:template>
    37  
    38  <xsl:template match="step">
    39      <li><p>
    40         <xsl:apply-templates/>
    41       </p></li>
    42  </xsl:template>
    43  
    44  </xsl:stylesheet>

XSLT starts at the root node, which contains the entire document. Its default processing tells it to go to each of the root node's children and find a matching template. In this case, the root node's first child is the root element, the <manual> element. XSLT then searches the stylesheet to see if any template is available to handle that element. Yes, there is; it's on lines 4-14. If you look at lines 7 and 10, you'll see a construct that hasn't been discussed in the book yet. The <xsl:value-of> element lets us extract the text content of an element or the value of an attribute by specifying what to select. In this case, @id means to extract the value of the id attribute. Notice that we've done two things that can't be done with style sheets. We've put the value of an atttribute into the document, and put it in at two different places (once in the <title> and once in the <h2>).

As of line 10 in the stylesheet, our output document looks like this:

<html>
<head>
<title>model-rocket</title>
</head>
<body>
    <h2>model-rocket</h2>

The underlined <xsl:apply-templates/> at line 11 tells XSLT to proceed to the children of the <manual> element.

Note: The XSLT processor “remembers” that it was handling <manual> on on line 11 of the stylesheet. It won't return to line 12 until it finishes processing all of <manual>'s children, as it has been told to do in line 11.

XSLT moves to <manual>'s first child element, the <parts-list> element, and searches the stylesheet for any template that is set up to match that particular element. There it is on lines 16 through 22. XSLT produces the output specified from lines 17 and 18, and the output document now looks like this:

<html>
<head>
<title>model-rocket</title>
</head>
<body>
    <h2>model-rocket</h2>
       <h3>Parts List</h3>
       <ul>

On line 19, the XSLT processor once again encounters a <xsl:apply-templates/>, so it adds line 19 to a list of “where I was before I had to stop and take care of the kids.” XSLT starts processing the children of the <parts-list> element. The first child is a <part>, and there's a template for those elements on lines 24 through 28.

The XSLT processor produces the output on lines 25 through 27 until it hits the <xsl:apply-templates/> on line 27, and, once again, puts line 27 on hold until it finishes processing all the children of the <part> element. The only child is a text node, which is handled by simply putting the text into the output document. Here's what the “hold list” looks like so far. We've drawn it like a set of those little Russian nesting dolls, because that, effectively, is how these sorts of things are implemented.

Processing <manual> via lines 4 through 14, and hold at line 11, until you finish all its children:
Processing <parts-list> via lines 16 through 22 and hold at line 19, until you finish all its children:
Processing first <part> via lines 24 through 28, and hold at line 27, until you finish all its children:
Processing the text within the <part> by placing it into the output document.

Here's our output document once we've put in the text:

<html>
<head>
<title>model-rocket</title>
</head>
<body>
    <h2>model-rocket</h2>
       <h3>Parts List</h3>
       <ul>
       <li>Part A (1) - fuselage, left half

We've now finished the processing on line 27, so we can strike it off our list, and continue on to line 28. Here's what our list now looks like:

Processing <manual> via lines 4 through 14, and hold at line 11, until you finish all its children:
Processing <parts-list> via lines 16 through 22 and hold at line 19, until you finish all its children:
Processing first <part> via lines 24 through 28, and hold at line 27, until you finish all its children:
Processing the text within the <part> by placing it into the output document.

And our output document:

<html>
<head>
<title>model-rocket</title>
</head>
<body>
    <h2>model-rocket</h2>
       <h3>Parts List</h3>
       <ul>
       <li>Part A (1) - fuselage, left half</li>

That finishes line 28, and it goes off of our hold list:

Processing <manual> via lines 4 through 14, and hold at line 11, until you finish all its children:
Processing <parts-list> via lines 16 through 22 and hold at line 19, until you finish all its children:
Processing first <part> via lines 24 through 28, and hold at line 27, until you finish all its children:
Processing the text within the <part> by placing it into the output document.

We're still holding at line 19, since we haven't finished with all the children of <parts-list>. For each of the other <part> elements, we'll go to lines 24 through 28, which will hold at line 27 for the embedded text. Once we finish all the children of the <parts-list> element, we can cross line 19 off our hold list:

Processing <manual> via lines 4 through 14, and hold at line 11, until you finish all its children:
Processing <parts-list> via lines 16 through 22 and hold at line 19, until you finish all its children:

and complete the output on lines 20 and 21:

<html>
<head>
<title>model-rocket</title>
</head>
<body>
    <h2>model-rocket</h2>
       <h3>Parts List</h3>
       <ul>
       <li>Part A (1) - fuselage, left half</li>
       <li>Part B (1) - fuselage, right half</li>
       <li>Part F (4) - steering fin</li>
       <li>Part N (3) - rocket nozzle</li>
       <li>Part C (1) - crew capsule</li>
       </ul>
       <hr/>

To make a long story short, the XSLT processor now goes to the next child of the <manual> element, which is the <instructions> element. This will cause XSLT to process lines 31 through 36. Line 34 gets added to the hold list, until all the children of <instructions> are processed. Each of the children is a <step> element, which is processed by lines 38 through 42, with a hold at line 40 until each step's text goes into the output document.

Once all the <step>s have been processed, we cross line 34 off the hold list, and finish lines 35 and 36, which puts the closing </ol> tag into the output.

That pops us back to the hold at line 11, which sees that there are no more children of the <manual> element, so that hold is stricken from the list, and we complete our output in lines 12 and 13.

If you want to see all the parts:

Note: the resulting HTML file is the honest-to-goodness file that was generated; you'll have to click the back button in your browser to return to this page.