CIS 97YT Index > Relax NG (continued)

Relax NG (continued)

References and Definitions

What we've been doing so far works well for simple markup. In any complex grammar, however, the notation will become far too dense to be understandable, and the indentation will march off the right side of the paper. To avoid this problem, we'll use <define> to give a name to part of the pattern, and use <ref> to refer to it from another section of the grammar. Here's the current address book grammar, modularized. To save space, we've eliminated some whitespace.

<grammar xmlns="http://relaxng.org/ns/structure/1.0">
<start>
<element name="addressBook">
    <zeroOrMore>
        <ref name="cardContent"/>
    </zeroOrMore>
</element>
</start>

<define name="cardContent">
    <element name="card">
        <ref name="nameContent"/>
        <choice>
            <element name="email"> <text/> </element>
            <element name="phone"> <text/> </element>
        </choice>
        <optional>
            <element name="note"> <text/> </element>
        </optional>
    </element>
</define>

<define name="nameContent">
    <choice>
        <element name="name"> <text/> </element>
        <group>
            <element name="firstname"> <text/>  </element>
            <element name="lastname"> <text/> </element>
        </group>
    </choice>
</define>
</grammar>

This is not just a simple notational convenience; it's an absolute necessity if we are to have recursive grammars. This is a grammar where an item is referred to in terms of itself. Here's an example:

A document consists of one or more lists. A list consists of one or more items, each of which may contain either text or another nested list.

This specification requires definitions and references. We will show a sample valid document first, and follow it by the Relax NG:

<document>
  <list>
    <item> First item outer </item>
    <item> Second item outer </item>
    <item>
      <list>
          <item> nested first </item>
          <item> nested second </item>
      </list>
    </item>
    <item> Third item outer </item>
  </list>
</document>
<grammar
   xmlns="http://relaxng.org/ns/structure/1.0">

<start>
<element name="document">
   <oneOrMore>
      <ref name="List"/>
   </oneOrMore>
</element>
</start>

<define name="List">
   <element name="list">
      <oneOrMore>
         <element name="item">
            <choice>
               <text />
               <ref name="List"/>
            </choice>
         </element>
      </oneOrMore>
   </element>
</define>

</grammar>

Note: a definition name can be the same as an element name, but it's better to give it a different name. We've used an initial capital letter for this purpose.

Empty Elements

To create an empty element, specify <empty/> as the element content rather than <text/>. Specifying <text/> allows you to put no text between opening and closing tags. Specifying <empty/> forbids text or child elements between opening and closing tags.

Attributes

To specify an element's attributes, you use <attribute>. Here's how we'd specify some of the attributes for HTML's <img/> element. This element requires a src and alt attribute, and has optional width and height.

<element name="img">
    <empty/>
    <attribute name="alt"> <text/> </attribute>
    <attribute name="src"> <text/> </attribute>
    <optional>
        <attribute name="width"> <text/> </attribute>
    </optional>
    <optional>
        <attribute name="height"> <text/> </attribute>
    </optional>
</element>

Note: the following text is copied directly from the Relax NG tutorial; it's beautifully written and nearly impossible to improve upon.

Copyright © The Organization for the Advancement of Structured Information Standards [OASIS] 2001. All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to OASIS, except as needed for the purpose of developing OASIS specifications, in which case the procedures for copyrights defined in the OASIS Intellectual Property Rights document must be followed, or as required to translate it into languages other than English.

The group and choice patterns can be applied to attribute patterns in the same way they are applied to element patterns. For example, if we wanted to allow either a name attribute or both a givenName and a familyName attribute, we can specify this in the same way that we would if we were using elements:

<element name="addressBook">
   <zeroOrMore>
      <element name="card">
         <choice>
            <attribute name="name">
               <text/>
            </attribute>
            <group>
               <attribute name="givenName">
                  <text/>
               </attribute>
               <attribute name="familyName">
                  <text/>
               </attribute>
            </group>
         </choice>
         <attribute name="email">
            <text/>
         </attribute>
      </element>
   </zeroOrMore>
</element>

The group and choice patterns can combine element and attribute patterns without restriction. For example, the following pattern would allow a choice of elements and attributes independently for both the name and the email part of a card:

<element name="addressBook">
   <zeroOrMore>
      <element name="card">
         <choice>
            <element name="name">
               <text/>
            </element>
            <attribute name="name">
               <text/>
            </attribute>
         </choice>
         <choice>
            <element name="email">
               <text/>
            </element>
            <attribute name="email">
               <text/>
            </attribute>
         </choice>
      </element>
   </zeroOrMore>
</element>

As usual, the relative order of elements is significant, but the relative order of attributes is not. Thus the above would match any of:

<card name="John Smith" email="js@example.com"/>
<card email="js@example.com" name="John Smith"/>
<card email="js@example.com"><name>John Smith</name></card>
<card name="John Smith"><email>js@example.com</email></card>
<card><name>John Smith</name><email>js@example.com</email></card>

However, it would not match

<card><email>js@example.com</email><name>John Smith</name></card>

because the pattern for card requires any email child element to follow any name child element.


While the preceding examples show the power and flexibility of Relax NG, I don't recommend them as an example of good design. If you give people too many options for data-oriented markup, they will wonder why that last example doesn't work, given that “every other combination works great.” We'll see a way to get around even this problem a bit later.

There is one difference between attribute and element: <text/> is the default for the content of an attribute pattern, whereas an element pattern is not allowed to be empty. Thus, we can rewrite the specification of <img/> as follows:

<element name="img">
    <empty/>
    <attribute name="alt"/>
    <attribute name="src"/>
    <optional>
        <attribute name="width"/>
    </optional>
    <optional>
        <attribute name="height"/>
    </optional>
</element>

Choices for Attribute Values

We can use the <choice> element in Relax NG to specify that an attribute can have one of a specific set of values. For example, if we want an align attribute to have the possible values left, right, or center, we'd specify:

<attribute name="align">
    <choice>
        <value type="string">left</value>
        <value type="string">right</value>
        <value type="string">center</value>
    </choice>
</attribute>

Mini-Exercise

Let's take the description for the wrestling club database from the second lecture, and translate it into Relax NG.

See the solution

Mixed Content

All of our examples so far have been data-oriented; most of the meaning and structure is carried by the elements; the text simply fills in the blanks.

Let us now turn our attention to describing narrative-oriented markup. These are markup languages like HTML, where text is king, and elements are sprinkled throughout to add structure. Consider the following folksy version of a weather report:

<report>
Here's your weather for <month>April</month> <day>3</day>,<year>2002</year>.
Morning <cloud time="am">fog</cloud>, <cloud time="pm">clearing</cloud> in the afternoon.
The high will be from <min-high>75</min-high> to <max-high>79</max-high> degrees,
with an overnight low between <min-low>46</min-low> and <max-low>50</max-low> degrees.
A total of  <precip type="rain" units="in">1.5</precip> inches
of rain fell, much to the delight of local farmers.
</report>

This is called mixed content, since it has text mixed with elements, and the elements may appear in any order. Relax NG lets you specify mixed content with the <interleave> pattern. Its children may appear in any order. Here's the pattern for a weather report. You'll notice that our patterns become longer as we go along, since we are able to make them more detailed and specific.

<element name="report">
<interleave>
    <text />
    <element name="day"><text/></element>
    <element name="month"><text/></element>
    <element name="year"><text/></element>
    <zeroOrMore>
        <element name="cloud">
            <text/>
            <attribute name="time">
                <choice>
                    <value type="string">am</value>
                    <value type="string">pm</value>
                </choice>
            </attribute>
        </element>
    </zeroOrMore>
    <element name="min-low"><text/></element>
    <element name="min-high"><text/></element>
    <element name="max-low"><text/></element>
    <element name="max-high"><text/></element>
    <zeroOrMore>
        <element name="precip">
            <text/>
            <attribute name="type"/>
            <attribute name="units"/>
        </element>
    </zeroOrMore>
</interleave>
</element>

Note that using <interleave> does not automatically allow an infinite number of any of the child elements. In the specification above, you can have exactly one <month>, <day>, and <year>. We had to use <zeroOrMore> to allow multiple <cloud> elements within the weather report.

Finally, you may declare
<mixed> <!-- some pattern --> </mixed>
as a shortcut for
<interleave> <text/> <!-- some pattern --> </interleave>

Data Types

In the weather report, we specified the minimum and maximum temperatures as <text/>, but that's really a bit too broad a categorization. Content like 73 or -12.5 is fine; it would be nice to say that content like low 90's or twenty-two is invalid. If you have a markup language that keeps track of people's personal information, you want to ensure that the person's age is an integer and that their bank balance is a floating point number.

Relax NG is able to validate the data types of element content and attribute values. It borrows its data typing language from XML Schema. You tell Relax to use these datatypes by modifying the root element of your grammar as follows:

<grammar xmlns="http://relaxng.org/ns/structure/1.0"
    datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">

You may now specify the required data type for an element content or attribute value. The following fragment specifies that a bank account's acct-id must be an ID (i.e., unique and must begin with a letter or underscore), the <age> must contain an integer, and the <balance> a decimal number.

<element name="account">
    <attribute name="acct-id">
        <data type="ID"/>
    </attribute>
    <element name="owner"> <text /> </element>
    <element name="age">
        <data type="integer"/>
    </element>
    <element name="balance">
        <data type="decimal"/>
    </element>
</element>

Here's a list of the most popular data types.

integer
Positive or negative number without decimal point. Subtypes are positiveInteger, negativeInteger, nonPositiveInteger, and nonNegativeInteger.
decimal
Decimal number, uses period as decimal point.
float and double
Allows exponential E notation. double allows larger range of exponents than float. Examples: 3.5e12, 0.4e-2
ID
An ID must begin with a letter or underscore, and is followed by a series of letters, digits, dots, hyphens, or underscores. An ID must be unique within a document. Note that this datatype is capitalized.
IDREF
The value must be an ID that exists in the current document.
NMTOKEN
A name token; follows the same rules as an ID, except that it doesn't have to be unique.
string
Any string. This is effectively the same as <text/>, but it is the specification you must use if you wish to have parameters.

Dates and Times

Dates and times are specified as per the ISO 8601 specification.

Data TypeExample
date 2002-05-27
gYear 2002
gMonth --05--
gDay ---21
gYearMonth 2002-05
gMonthDay --05-27
time 13:20:48
13:20:37-05:00

Further Refinement of Values

Things like positiveInteger include a lot of territory (and work only with integers). What if you decide that you need a price to be a positive decimal number? Or a quantity must be an integer greater than or equal to 10 and less than or equal to 100? You can attach parameters to data types to further restrict the valid values between inclusive or exclusive minimum and maximum values.

<element name="price">
    <data type="decimal">
        <param name="minExclusive">0</param>
    </data>
</element>
<element name="qty">
    <data type="integer">
        <param name="minInclusive">10</param>
        <param name="maxInclusive">100</param>
    </data>
</element>

You may restrict a text element or attribute's length with the length, minLength and maxLength parameters. Here's a fragment that restricts a postal-code attribute to be exactly seven characters long, and a city to be at least four but no more than seventeen characters long:

<attribute name="postal-code">
    <data type="string">
        <param name="length">7</param>
    </data>
</attribute>
<attribute name="city">
    <data type="string">
        <param name="minLength">4</param>
        <param name="maxLength">17</param>
    </data>
</attribute>

Finally, the most important and powerful way to restrict a string's values: regular expressions. The keyword for these parameters, pattern, has been inherited from XML Schema, and is not to be confused with the patterns of elements and attributes that Relax NG sets up. Here's an element for verifying a Canadian Postal code (letter, digit, letter, space, digit, letter, digit) and a US phone number in the form 408-555-1212

<element name="canada-post">
    <data type="string">
        <param name="pattern">[A-Z]\d[A-Z]\s+\d[A-Z]\d</param>
    </data>
</element>
<element name="us-phone">
    <data type="string">
        <param name="pattern">\d{3}-\d{3}-\d{4}</param>
    </data>
</element>

This, by the way, now lets us update the wrestling club database grammar so that we can check that the age groups consist of the letters K, C, J, and O, in that order, and at most one of each:

<element name="age-groups">
    <empty/>
    <attribute name="type">
        <data type="string">
            <param name="pattern">K?C?J?O?</param>
        </data>
    </attribute>
</element>