CIS 97YT Index > Advanced DTDs/Relax NG

General Entities

Read pages 162-165, and pages 172-177 of Learning XML.

All XML processors are required to recognize the following five entities:

Those are the only ones that XML understands. If you want any others, you can use the numeric values, either in hexadecimal or decimal. This is less than satisfactory. Quick: what do these entities represent when displayed? Ñ á é í ó ó ú ¡ and ¿.

All right, what about these: ñ á é í ó ú ¡ and ¿. That's right - those are the entities for doing Spanish text. Here's how you define them. (We'll do them all in decimal to be consistent.)

<!ENTITY ntilde "&#241;">
<!ENTITY aacute "&#225;">
<!ENTITY eacute "&#233;">
<!ENTITY iacute "&#237;">
<!ENTITY oacute "&#243;">
<!ENTITY uacute "&#250;">
<!ENTITY iexcl  "&#161;">
<!ENTITY iquest "&#191;">

To write the words ¡Acción en español!, we'd use the entities as follows:

&iexcl;Acci&oacute;n en espa&ntilde;ol!

Of course, you may use entities for any abbreviation you want:

<!ENTITY dac "De Anza College">
<!ENTITY cis "Computer and Information Science">
<!ENTITY fhda "Foothill-De Anza">

In addition to general entities, which are used in an XML document, there are also parameter entities, which are used as “shortcuts” within the DTD itself. As long as we're talking about the Spanish entities, they'd clearly be useful in many different DTDs. Parameter entities let you “include” other files. Let's say we put all the entities for the Spanish characters into a file called spanish.ent By adding this to the wrestling club DTD, we can then use the easy-to-read entities for a Spanish translation of the database.

<!ENTITY % spanish SYSTEM "spanish.ent">
%spanish;

<!ELEMENT   club-database   (association+) >
<!ELEMENT    association     (club+) >
<!ATTLIST    association
    id      ID          #REQUIRED
>

Note: If you want to include files within an XML file rather than the DTD, use general entities. For example, if you have a book divided into chapters, you can put each chapter into a separate file, and use general entities to include them. This example uses the internal subset of the DTD, as described on pages 176-177 of Learning XML.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book SYSTEM "/usr/local/mybook/docbook.dtd"
[
    <!ENTITY ch01   SYSTEM "ch01.xml">
    <!ENTITY ch02   SYSTEM "ch02.xml">
    <!ENTITY ch03   SYSTEM "ch03.xml">
]>

<book>
    <title>My Book</title>
    <subtitle>An Example of Including XML Files</subtitle>
&ch01;
&ch02;
&ch03;
</book>

Modularizing

The other use of parameter entities is to modularize code. For example, if you have a genealogy DTD, there is quite a bit of duplicated markup:

<!ELEMENT birth (year, month, day)>
<!ELEMENT marriage (person-ref, year, month, day)>
<!ELEMENT death (year, month, day)>

Using a parameter entity eliminates the duplication and makes the DTD easier to read. You may also use a parameter entity for a repeated set of attributes, as shown on page 178 of Learning XML.

<!ENTITY % date "year, month, day">
<!ELEMENT birth (%date;)>
<!ELEMENT marriage (person-ref, %date;)>
<!ELEMENT death (%date;)>

Some people invent parameter entities to make the content of elements or attributes clearer. For example, in the weather report, we might wish to let document writers know that temperatures can have decimals, but water reservoir information must be integers.

<!ENTITY % integer "#PCDATA">
<!ENTITY % float "#PCDATA">
<!ENTITY % text "CDATA">
<!ELEMENT report (temperatures, water-banks)>
<!ELEMENT temperatures (city+)>
<!ELEMENT city (max, min)>
<!ATTLIST city
    name %text; #REQUIRED>
<!ELEMENT max (%float;)>
<!ELEMENT min (%float;)>
<!ELEMENT water-banks (reservoir+)>
<!ELEMENT reservoir (current, capacity)>
<!ATTLIST reservoir
    name %text; #REQUIRED>
<!ELEMENT current (%integer;)>
<!ELEMENT capacity (%integer;)>

Validators that use DTDs can't enforce this; you could still write the following, and the validator would think everything is fine. Newer methods of writing grammars and their validators can do this enforcement; we'll see it later.

<current>five hundred</current>
<capacity>320.5</capacity>

Conditional Sections

The examples on pages 173-175 of Learning XML explain this nicely. The only additional thing to note is that, in a DTD, the first definition is the one that counts, and the internal subset is parsed before any external DTD. Thus, in the example of the disclaimer, if %use-disclaimer is set to INCLUDE, the DTD will use the first definition of disclaimer, not the empty string. You may redefine an ATTRIBUTE or ENTITY, but not an ELEMENT.


Introduction to Relax NG

As mentioned before, DTDs are not the only way to specify an XML grammar. There are several other candidates, and I've decided to go with Relax NG (RNG) rather than the World Wide Web Consortium's XML Schema. In my opinion, XML Schema is the unfortunate result that occurs when a group of highly intelligent and well-intentioned designers try to create a notation that will be all things to all people.

The following material is not in Learning XML. Much of it has been derived from the Relax NG tutorial, which is online at http://www.oasis-open.org/committees/relax-ng/tutorial.html.

First Steps

Let's take a very simple grammar: an address book consists of zero or more cards, each of which consists of a name and email address. Here's the specification in RNG. The first line has an xmlns attribute, which we will discuss in a future lecture. The <start> element tells a validator where to start validating; i.e., which element is the root element. The rest is a pattern that tells what a valid document should look like. Relax NG works by specifying a pattern for structure and content of valid documents. Any document that matches the pattern is valid; any document that doesn't, isn't. If you've read the tutorial, you'll notice that they don't have the <grammar> and <start> elements. They aren't necessary for a “self-contained” example like this one, but as soon as we start specifying more complex grammars, we'll need them. So why not now?

<grammar xmlns="http://relaxng.org/ns/structure/1.0">

<start>
<element name="addressBook">
    <zeroOrMore>
        <element name="card">
            <element name="name">
                <text/>
            </element>
            <element name="email">
                <text/>
            </element>
        </element>
    </zeroOrMore>
</element>
</start>

</grammar>

If you want to require at least one <card> element, replace <zeroOrMore> with <oneOrMore>. An optional element is enclosed in the <optional>. So, if we want a card to be able to contain an optional <note> element, we'd have this specification (the added material is in boldface):

<grammar xmlns="http://relaxng.org/ns/structure/1.0">

<start>
<element name="addressBook">
    <zeroOrMore>
        <element name="card">
            <element name="name">
                <text/>
            </element>
            <element name="email">
                <text/>
            </element>
            <optional>
                <element name="note">
                    <text/>
                </element>
            </optional>
        </element>
    </zeroOrMore>
</element>
</start>

</grammar>

Choices and Groups

Let's say that we can contact someone either by email or by phone number (but not both). We'd modify our specification with a <choice> specification:

<choice>
    <element name="email">
        <text/>
    </element>
    <element name="phone">
        <text/>
    </element>
</choice>

Now let's say that we can either have a name (for example, a company name) or a first name/last name pair. The pattern on the left will not work. It would allow only a first name or a last name, but not both. The pattern on the right groups the first and last name together with the <group> specifier, and it works great.

<choice>
    <element name="name">
        <text/>
    </element>
    <element name="firstname">
        <text/>
    </element>
    <element name="lastname">
        <text/>
    </element>
</choice>
<choice>
    <element name="name">
        <text/>
    </element>
    <group>
        <element name="firstname">
            <text/>
        </element>
        <element name="lastname">
            <text/>
        </element>
    </group>
</choice>