What we've been doing so far works well for simple markup. In any
complex grammar, however, the notation will become far too dense to be
understandable, and the indentation will march off the right
side of the paper. To avoid this problem, we'll use
<define>
to give a name to part of the pattern, and
use <ref>
to refer to it from another section of the
grammar. Here's the current address book grammar, modularized. To save
space, we've eliminated some whitespace.
<grammar xmlns="http://relaxng.org/ns/structure/1.0"> <start> <element name="addressBook"> <zeroOrMore> <ref name="cardContent"/> </zeroOrMore> </element> </start> <define name="cardContent"> <element name="card"> <ref name="nameContent"/> <choice> <element name="email"> <text/> </element> <element name="phone"> <text/> </element> </choice> <optional> <element name="note"> <text/> </element> </optional> </element> </define> <define name="nameContent"> <choice> <element name="name"> <text/> </element> <group> <element name="firstname"> <text/> </element> <element name="lastname"> <text/> </element> </group> </choice> </define> </grammar>
This is not just a simple notational convenience; it's an absolute necessity if we are to have recursive grammars. This is a grammar where an item is referred to in terms of itself. Here's an example:
A document consists of one or more lists. A list consists of one or more items, each of which may contain either text or another nested list.
This specification requires definitions and references. We will show a sample valid document first, and follow it by the Relax NG:
<document> <list> <item> First item outer </item> <item> Second item outer </item> <item> <list> <item> nested first </item> <item> nested second </item> </list> </item> <item> Third item outer </item> </list> </document>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"> <start> <element name="document"> <oneOrMore> <ref name="List"/> </oneOrMore> </element> </start> <define name="List"> <element name="list"> <oneOrMore> <element name="item"> <choice> <text /> <ref name="List"/> </choice> </element> </oneOrMore> </element> </define> </grammar>
Note: a definition name can be the same as an element name, but it's better to give it a different name. We've used an initial capital letter for this purpose.
To create an empty element, specify <empty/>
as the
element content rather than <text/>
. Specifying
<text/>
allows you to put no text between opening
and closing tags. Specifying <empty/>
forbids text
or child elements between opening and closing tags.
To specify an element's attributes, you use
<attribute>
. Here's how we'd specify
some of the attributes for HTML's <img/>
element. This element requires a src
and alt
attribute,
and has optional width
and height
.
<element name="img"> <empty/> <attribute name="alt"> <text/> </attribute> <attribute name="src"> <text/> </attribute> <optional> <attribute name="width"> <text/> </attribute> </optional> <optional> <attribute name="height"> <text/> </attribute> </optional> </element>
Note: the following text is copied directly from the Relax NG tutorial; it's beautifully written and nearly impossible to improve upon.
Copyright © The Organization for the Advancement of Structured Information Standards [OASIS] 2001. All Rights Reserved.
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to OASIS, except as needed for the purpose of developing OASIS specifications, in which case the procedures for copyrights defined in the OASIS Intellectual Property Rights document must be followed, or as required to translate it into languages other than English.
The group
and choice
patterns can be
applied to attribute
patterns in the same way they are
applied to element
patterns. For example, if we wanted
to allow either a name
attribute or both a
givenName
and a familyName
attribute, we can
specify this in the same way that we would if we were using
elements:
<element name="addressBook"> <zeroOrMore> <element name="card"> <choice> <attribute name="name"> <text/> </attribute> <group> <attribute name="givenName"> <text/> </attribute> <attribute name="familyName"> <text/> </attribute> </group> </choice> <attribute name="email"> <text/> </attribute> </element> </zeroOrMore> </element>
The group
and choice
patterns can combine element
and
attribute
patterns without restriction. For
example, the following pattern would allow a choice of elements and
attributes independently for both the name
and the
email
part of a card
:
<element name="addressBook"> <zeroOrMore> <element name="card"> <choice> <element name="name"> <text/> </element> <attribute name="name"> <text/> </attribute> </choice> <choice> <element name="email"> <text/> </element> <attribute name="email"> <text/> </attribute> </choice> </element> </zeroOrMore> </element>
As usual, the relative order of elements is significant, but the relative order of attributes is not. Thus the above would match any of:
<card name="John Smith" email="js@example.com"/> <card email="js@example.com" name="John Smith"/> <card email="js@example.com"><name>John Smith</name></card> <card name="John Smith"><email>js@example.com</email></card> <card><name>John Smith</name><email>js@example.com</email></card>
However, it would not match
<card><email>js@example.com</email><name>John Smith</name></card>
because the pattern for card
requires any
email
child element to follow any name
child
element.
While the preceding examples show the power and flexibility of Relax NG, I don't recommend them as an example of good design. If you give people too many options for data-oriented markup, they will wonder why that last example doesn't work, given that “every other combination works great.” We'll see a way to get around even this problem a bit later.
There is one difference between
attribute
and element
:
<text/>
is the default for the content of an
attribute pattern, whereas an element pattern is not allowed to be
empty. Thus, we can rewrite the specification of
<img/>
as follows:
<element name="img"> <empty/> <attribute name="alt"/> <attribute name="src"/> <optional> <attribute name="width"/> </optional> <optional> <attribute name="height"/> </optional> </element>
We can use the <choice>
element in Relax NG to
specify that an attribute can have one of a specific set of values.
For example, if we want an align
attribute to have
the possible values left
, right
, or
center
, we'd specify:
<attribute name="align"> <choice> <value type="string">left</value> <value type="string">right</value> <value type="string">center</value> </choice> </attribute>
Let's take the description for the wrestling club database from the second lecture, and translate it into Relax NG.
All of our examples so far have been data-oriented; most of the meaning and structure is carried by the elements; the text simply fills in the blanks.
Let us now turn our attention to describing narrative-oriented markup. These are markup languages like HTML, where text is king, and elements are sprinkled throughout to add structure. Consider the following folksy version of a weather report:
<report> Here's your weather for <month>April</month> <day>3</day>,<year>2002</year>. Morning <cloud time="am">fog</cloud>, <cloud time="pm">clearing</cloud> in the afternoon. The high will be from <min-high>75</min-high> to <max-high>79</max-high> degrees, with an overnight low between <min-low>46</min-low> and <max-low>50</max-low> degrees. A total of <precip type="rain" units="in">1.5</precip> inches of rain fell, much to the delight of local farmers. </report>
This is called mixed content, since
it has text mixed with elements, and the elements may appear in any
order. Relax NG lets you specify mixed content with the
<interleave>
pattern. Its children may appear in
any order. Here's the pattern for a weather report. You'll notice that
our patterns become longer as we go along, since we are able to make
them more detailed and specific.
<element name="report"> <interleave> <text /> <element name="day"><text/></element> <element name="month"><text/></element> <element name="year"><text/></element> <zeroOrMore> <element name="cloud"> <text/> <attribute name="time"> <choice> <value type="string">am</value> <value type="string">pm</value> </choice> </attribute> </element> </zeroOrMore> <element name="min-low"><text/></element> <element name="min-high"><text/></element> <element name="max-low"><text/></element> <element name="max-high"><text/></element> <zeroOrMore> <element name="precip"> <text/> <attribute name="type"/> <attribute name="units"/> </element> </zeroOrMore> </interleave> </element>
Note that using <interleave>
does not
automatically allow an infinite number of any of the child elements. In
the specification above, you can have exactly one
<month>
, <day>
, and
<year>
. We had to use <zeroOrMore>
to allow multiple <cloud>
elements within the
weather report.
Finally, you may declare
<mixed> <!-- some pattern --> </mixed>
as a shortcut for
<interleave> <text/> <!-- some pattern --> </interleave>
In the weather report, we specified the minimum and maximum
temperatures as <text/>
, but that's really a bit
too broad a categorization. Content like 73
or
-12.5
is fine; it would be nice to say that content
like low 90's
or twenty-two
is invalid.
If you have a markup language that keeps track of people's personal
information, you want to ensure that the person's
age is an integer and that their bank balance is a floating point
number.
Relax NG is able to validate the data types of element content and attribute values. It borrows its data typing language from XML Schema. You tell Relax to use these datatypes by modifying the root element of your grammar as follows:
<grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
You may now specify the required data type for an element content
or attribute value. The following fragment specifies that a bank
account's acct-id
must be an ID (i.e., unique and must
begin with a letter or underscore),
the <age>
must contain an integer, and
the <balance>
a decimal number.
<element name="account"> <attribute name="acct-id"> <data type="ID"/> </attribute> <element name="owner"> <text /> </element> <element name="age"> <data type="integer"/> </element> <element name="balance"> <data type="decimal"/> </element> </element>
Here's a list of the most popular data types.
integer
positiveInteger
,
negativeInteger
, nonPositiveInteger
,
and nonNegativeInteger
.decimal
float
and double
E
notation. double
allows larger range of exponents than float
. Examples:
3.5e12
, 0.4e-2
ID
IDREF
NMTOKEN
string
<text/>
, but it is the specification you
must use if you wish to have parameters.Dates and times are specified as per the ISO 8601 specification.
Data Type | Example |
---|---|
date |
2002-05-27 |
gYear |
2002 |
gMonth |
--05-- |
gDay |
---21 |
gYearMonth |
2002-05 |
gMonthDay |
--05-27 |
time |
13:20:48 13:20:37-05:00 |
Things like positiveInteger
include a lot of territory
(and work only with integers). What if you decide that you need a price
to be a positive decimal number? Or a quantity must be an integer
greater than or equal to 10 and less than or equal to 100? You can
attach parameters to data types to further restrict the valid values
between inclusive or exclusive minimum and maximum values.
<element name="price"> <data type="decimal"> <param name="minExclusive">0</param> </data> </element> <element name="qty"> <data type="integer"> <param name="minInclusive">10</param> <param name="maxInclusive">100</param> </data> </element>
You may restrict a text element or attribute's length
with the length
, minLength
and
maxLength
parameters. Here's a fragment that
restricts a postal-code
attribute to be exactly
seven characters long, and a city
to
be at least four but no more than seventeen characters long:
<attribute name="postal-code"> <data type="string"> <param name="length">7</param> </data> </attribute> <attribute name="city"> <data type="string"> <param name="minLength">4</param> <param name="maxLength">17</param> </data> </attribute>
Finally, the most important and powerful way to restrict a
string's values: regular expressions.
The keyword for these
parameters, pattern
, has been inherited from XML Schema,
and is not to be confused with the patterns of
elements and attributes that Relax NG sets up. Here's an element
for verifying a Canadian Postal code (letter, digit, letter, space,
digit, letter, digit) and a US phone number in the form
408-555-1212
<element name="canada-post"> <data type="string"> <param name="pattern">[A-Z]\d[A-Z]\s+\d[A-Z]\d</param> </data> </element> <element name="us-phone"> <data type="string"> <param name="pattern">\d{3}-\d{3}-\d{4}</param> </data> </element>
This, by the way, now lets us update the wrestling club database grammar so that we can check that the age groups consist of the letters K, C, J, and O, in that order, and at most one of each:
<element name="age-groups"> <empty/> <attribute name="type"> <data type="string"> <param name="pattern">K?C?J?O?</param> </data> </attribute> </element>