CIS 97YT Index > XPath and XSLT

XPath and XSLT

Read pages 207 through 227 of Chapter 6 in Learning XML.

Controlling the Transformation with XPath

In lecture 7, our example had a template for each and every element in the source document. This is not a typical transformation. There's no law that says you must recursively apply templates for every single element, or that you must represent every source element in the output document. Usually you will want to extract only certain elements from the source, or you may wish to change the order in which elements appear.

The example used the simple <xsl:apply-templates/>, which operates in “all my children” mode–it indiscriminately visits all the child element and text nodes.

Instead, we will use this form of <xsl:apply-templates/> to select exactly which nodes we'd like to visit:

<xsl:apply-templates select=XPath expression/>

Our source document will be an expanded version of sample wrestling club database from lecture 2. We've expanded it to include a subset of XHTML, and have added another association with two clubs in it. Let's write a stylesheet that converts the database to an XHTML file as follows:

And that's all. We don't want any of the other elements in the source document to appear in the output document. Here's the XSLT, with the lines numbered for reference.

   1	<?xml version="1.0"?>
   2	<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   3	    version="1.0">
   4	    
   5	<xsl:template match="club-database">
   6	    <html>
   7	    <head>
   8	        <title>Club Listing</title>
   9	    </head>
  10	    <body>
  11	        <h1>Club Listing</h1>
  12	        <xsl:apply-templates select="association"/>
  13	    </body>
  14	    </html>
  15	</xsl:template>
  16	
  17	<xsl:template match="association">
  18	    <h2><xsl:value-of select="@id"/></h2>
  19	    <ul>
  20	        <xsl:apply-templates select="club"/>
  21	    </ul>
  22	</xsl:template>
  23	
  24	<xsl:template match="club">
  25	    <li><xsl:value-of select="name"/>
  26	    <xsl:text> (</xsl:text>
  27	    <xsl:value-of select="location"/>
  28	    <xsl:text>)</xsl:text></li>
  29	</xsl:template>
  30	</xsl:stylesheet>

Here's the output from the stylesheet. Things to note about this stylesheet:

OK, so when do we gain anything by using XPath? Let's say we wanted to just get a list of people's names from the XML file. Here's an approach that will not work. (We've left out the enclosing <xsl:stylesheet> element.)

<xsl:template match="club-database">
    <html>
    <head>
        <title>List of People</title>
    </head>
    <body>
        <h3>List of People</h3>
        <ul>
            <xsl:apply-templates/>
        </ul>
    </body>
    </html>
</xsl:template>

<xsl:template match="person">
    <li><xsl:value-of select="."/></li>
</xsl:template>

Because the <xsl:apply-templates/> visits all the children, it will gather the text from all the intervening elements, resulting in the following. [The size has been reduced to take up less space on the screen.]

List of People

You may try to fix this by changing the bold line above to <xsl:apply-templates select="person"/>. This won't work; here's the HTML that it generates:

<h3>List of People</h3>
<ul></ul>

This failed because the XPath expression is wrong. When XSLT encounters that line, it is processing the <club-database> node. select="person" asks for all <person> nodes that are immediate children of the current node–and there are no such nodes. We can write one of the following two XPath expressions which will correctly identify the nodes we want. The first one searches for any <person> node at any level below the current node. The second one explicitly details the two different paths to <person> nodes.

<xsl:apply-templates select=".//person"/>
<xsl:apply-templates select="association/club/contact/person | 
    association/club/contact-list/contact/person"/>

In both cases, the nodes which get selected for template application are called the context node set.

Which is better? In this case, a timing test shows no significant difference between the two, but this is a short file. Using an explicit path gives you more flexibility; the following code would find only those people who are the sole contact for a club (i.e. not part of a contact list)

<xsl:apply-templates select="association/club/contact/person"/>

However, this will miss any <contact-list> elements that contain only one <contact>. The answer is to use a predicate to extract those elements.

<xsl:apply-templates select="association/club/contact/person | 
    association/club/contact-list[count(contact)=1]/contact/person"/>

In the example above, the predicate isn't directly tied to the node currently under investigation: the path contact/phone[@type="id"] happens to use a predicate for the id that belongs to the element in question.

The important thing to note about a path is that each new level further restricts the nodes that an XSLT processor selects. Paths are sometimes easier to read from right to left. The long path with the predicate can be interpreted as: “select all <person> elements inside a <contact> whose parent is a <contact-list> with one <contact> child; all of that within a <club> that is nested in an <association>.”

Some XPath Expressions

Here is some XML that describes streets in Paris; their arrondissement (civil district), beginning and ending point relative to other streets, and nearest Metro stop. Some very long streets, such as Cherche-Midi, are divided into segments by building number.

Presume that the context node, the node currently being processed by XSLT, is the node starting at line nine.

 1  <street-list>
 2      <street>
 3          <name arrond="12">Capri</name>
 4          <begin>Wattignies, 59</begin>
 5          <end>Cl.-Decaen, 45</end>
 6          <metro>Michel-Bizot</metro>
 7      </street>
 8      <street>
 9          <name arrond="6">Cherche-Midi</name>
10          <begin>Sèvres, 1</begin>
11          <end>Vaugirard, 144</end>
12          <metro>Sèvres-Babyl.</metro>
13          <segment from="17" to="49" arrond="6">
14              <metro>Rennes</metro>
15          </segment>
16          <segment from="50" to="121" arrond="6">
17              <metro>Vaneau</metro>
18          </segment>
19          <segment from="122" arrond="15">
20              <metro>Flaguière</metro>
21          </segment>
22      </street>
23      <street>
24          <name arrond="2">Evariste Galois</name>
25          <begin>Noisy-le-Sec</begin>
26          <end>Léon-Frapié</end>
27          <metro>St-Farjeau</metro>
28      </street>
29  </street-list>

Here are some XPath expressions, with a list of the nodes that they will select (by line number)

following-sibling::metro
Line 12 The following siblings are the nodes on lines 11, 12, 13, 16, and 19, but only line 12 is a <metro> element.
following::metro
Lines 12, 14, 17, 20, 27. This is because following includes all nodes in the document that come after the context node, no matter what their nesting level.
/street-list/street[count(segment)=0]
Lines 2, 23. This path starts at the root. The action happens in the predicate, which restricts the selection to streets which have no <segment> children.
../segment[@from > 25]
Lines 16, 19. The .. moves us back to the node at line 8. We could have just as well said following-sibling::segment[@from > 25]
descendant-or-self::*[@arrond mod 2 = 1]
No nodes selected, because the context node has no arrond attribute and no descendants.
../descendant-or-self::*[@arrond mod 2 = 1]
Line 19. The .. moves you up to the node at line 8, and it does have descendants.

How I tested these XPath expressions

A Warning About Predicate Expressions

A predicate of the form [expr1 != expr2] will not always yield the same results as [not(expr1 = expr2)]. Let's say you have section of a catalog that looks like this:

<catalog>
    <item>
        <name> Cup </name> <price units="USD">3.00</price>
    </item>
    <item>
        <name> Spoon </name> <price units="CDN">2.00</price>
    </item>
    <item>
        <name> Dish </name> <price>7.00</price>
    </item>
</catalog>

The following template will select only the spoon.

<xsl:apply-templates select="/catalog/item[price/@units != 'USD']"/>

Here's what's happening. For every <item> node, the XSLT processor asks the question “Does the price have a units attribute whose value is not equal to USD?”

NodeAnswerExplanationResult
Cup false USD isn't not-equal-to USD node is ignored
Spoon true CDN is not-equal-to USD node is selected
Dish false The price has no units attribute,
so the expression fails immediately.
node is ignored

The following template is not equivalent; it will select both the spoon and the dish.

<xsl:apply-templates select="item[not(price/@units = 'USD')]"/>

Here's what's happening. For every <item> node, the XSLT processor takes the logical opposite of the answer to this question: “Does the price have a units attribute whose value is equal to USD?”

NodeAnswerExplanationnot(Answer)Result
Cup true USD is equal to USD false node is ignored
Spoon false CDN is not-equal-to USD true node is selected
Dish false The price has no units attribute,
so the expression fails immediately.
true node is selected

Counting Nodes

The count(nodeSet) function tells you how many items there are in the given node set. Using our street example above, we could have a template like this:

<xsl:template match="street-list">
    There are <xsl:value-of select="count(street)"/> streets.
    There are <xsl:value-of
        select="count(street[substring(name,1,1)='C'])"/>
      streets beginning with the letter C.
</xsl:template>

This produces the text:

There are 3 streets.
There are 2 streets beginning with the letter C.

The substring function is described on page 222 of Learning XML. Perl and C programmers, please take note: in XSLT, the first character of a string is at offset number 1, not zero.

Counting and <xsl:for-each>

Here's the model for the <xsl:for-each> element:

<xsl:for-each select="XPath-expression">
    <!-- actions to perform -->
</xsl:for-each>

The XSLT processor will perform the specified actions for each of the nodes selected by your XPath-expression.

The position() function provides you with your numeric position in the context node set as you go through the loop; the last() function gives the number of the last node in the context node set (the list of nodes to be processed). Consider this template to produce a numbered list of the streets.

<xsl:template match="street-list">
    <p>
    <xsl:for-each select="street">
        Street <xsl:value-of select="position()"/> of
        <xsl:value-of select="last()"/>: 
        <xsl:apply-templates select="name"/><br />
    </xsl:for-each>
    </p>
</xsl:template>

We could also get the same results by using <xsl:apply-templates>.

<xsl:template match="street-list">
    <p>
        <xsl:apply-templates select="street"/>
    </p>
</xsl:template>

<xsl:template match="street">
    Street <xsl:value-of select="position()"/> of
    <xsl:value-of select="last()"/>: 
    <xsl:apply-templates select="name"/><br />
</xsl:template>

Simple decisions with <xsl:if>

In the previous example, you'll notice that we're generating a <br /> after every street. In reality, we don't need a line break after the last street; the closing paragraph tag will take care of the line space for us. So, let's use an <xsl:if> element that will add the line break only if the current street is not the last one. The additional code is in bold:

<xsl:template match="street">
    Street <xsl:value-of select="position()"/> of
    <xsl:value-of select="last()"/>: 
    <xsl:apply-templates select="name"/>
    <xsl:if test="position() != last()"><br /></xsl:if>
</xsl:template>

Let's use our wrestling club database to do some conditional processing. In our template for displaying an individual club, we want to put a notice that a club needs a charter renewal if its charter year is less than 2001. The style attribute lets us attach a style property and value to a single HTML element.

<xsl:template match="club">
    <h3><xsl:value-of select="name"/>
        (<xsl:value-of select="@id"/>)</h3>
    <xsl:if test="charter &lt; 2001">
        <div style="color:red;">Club charter requires renewal.</div>
    </xsl:if>
    <p>Location: <xsl:value-of select="location"/></p>
</xsl:template>

Using this template on the club database XML file produces:

Gilroy Hawks (H25)

[Club charter requires renewal.]

Location: Gilroy

California Gold (H23)

Location: San Jose

Benicia USA Wrestling Club (Q18)

Location: Benicia

North Bay Wrestling Club (Q22)

Location: Sausalito

Complex decisions with <xsl:choose>

For those of you who are familiar with programming languages, you would expect an <xsl:else> for yes/no decisions. Sorry, but that's not in XSLT. Instead, you have to use the <xsl:choose> construct, with individual choices determined <xsl:when> some test condition is met, and <xsl:otherwise> if no other conditions are met.

Let's use our wrestling club database to do a two-way test. In our template for displaying an individual club, we want to show which clubs are up to date (charter year 2001 or 2002), and which need to be renewed (charter year 2000 or less). In this example, the style attribute lets us attach a style property and value to a single HTML element.

<xsl:template match="club">
    <h3><xsl:value-of select="name"/>
        (<xsl:value-of select="@id"/>)</h3>
    <xsl:choose>
        <xsl:when test="charter &lt; 2001">
            <div style="color:red;">[Club charter requires renewal.]</div>
        </xsl:when>
        <xsl:otherwise>
            <div style="color:green;">[Club charter is up to date.]</div>
        </xsl:otherwise>
    </xsl:choose>
    <p>Location: <xsl:value-of select="location"/></p>
</xsl:template>

Here's the beginning of the HTML that it produces:

Gilroy Hawks (H25)

[Club charter requires renewal.]
Location: Gilroy

California Gold (H23)

[Club charter is up to date.]
Location: San Jose

Here's a multi-way test for labelling the type of phone numbers:

<xsl:template match="phone">
    <p>
    <xsl:value-of select="."/>
    <xsl:choose>
        <xsl:when test="@type='home'"> (home)</xsl:when>
        <xsl:when test="@type='work'"> (work)</xsl:when>
        <xsl:when test="@type='fax'"> (fax)</xsl:when>
        <xsl:when test="@type='cell'"> (cell phone)</xsl:when>
    </xsl:choose>
    </p>
</xsl:template>

Creating Attributes

To this point, the elements that we've created in the output document have the same attributes. Sometimes you may want to create an element whose attributes depend upon some value in the input document. For example, consider this list of statistics:

<statistics>
    <stat n="50"/>
    <stat n="30"/>
    <stat n="100"/>
</statistics>

We'd like to create a crude bar graph, which we shall do by using a style attribute to set the width of each bar.

 1 <xsl:template match="statistics">
 2  <html>
 3      <head>
 4      <title>Statistics</title>
 5      <style type="text/css">
 6          .bargraph { background-color: #66cc66; }
 7      </style>
 8      </head>
 9      <body>
10      <xsl:apply-templates select="stat"/>
11      </body>
12  </html>
13  </xsl:template>
14  
15  <xsl:template match="stat">
16  <p class="bargraph" style="width: {@n};">
17      <xsl:value-of select="@n"/>
18  </p>
19  </xsl:template>
Lines 5-6
When we construct the <head> of the output documents, we'll create a stylesheet to go along with it.
Line 16
Each statistic produces a paragraph element with the bargraph class and a style attribute. The curly braces are a special XSLT notation; they are the moral equivalent of <xsl:value-of select="@n"/>, which we can't use here because you can't put a tag inside a tag.
Line 17
This writes the value of the input attribute n.

And here's the result:

50

30

100

The <xsl:attribute> Element

There are times when even the curly-brace notation won't work. For example, if the attribute value that you want to create is conditional upon some other test, you can't use the simple equivalent of the <xsl:value-of>. Consider the following where you want to set the color of a numeric quantity depending only if it's negative. In this case, you need to use <xsl:attribute>. Which adds an attribute to its enclosing output document element. This template:

Template:
<xsl:template match="balance">
<span>
    <xsl:if test=". &lt; 0">
        <xsl:attribute name="style">color: red;</xsl:attribute>
    </xsl:if><xsl:value-of select="."/>
</span>
</xsl:template>

Applied to this XML:
<balance>3.00</balance>
<balance>-2.00</balance>

Resulting HTML:
<span>3.00</span>
<span style="color: red;">-2.00</span>