CIS 97YT - Advanced Relax NG

Some XML markup languges let you have whitespace-separated lists of items as attribute values or element content. For example, you might want to have any number of stylesheet classes in HTML, or a custom complex number element that has two floating point values, as follows:

<div class="floatleft bordered emphatic">
<complex-number>3.2 4.5</complex-number>

<element name="div">
    <text/>
    <attribute name="class">
        <list>
            <oneOrMore>
                <data type="NMTOKEN"/>
            </oneOrMore>
        </list>
    </attribute>
</element>
<element name="complex-number">
    <list>
        <data type="double"/>
        <data type="double"/>
    </list>
</element>

Referencing External Files

If you have a module that you want to use among many different markups, use the <externalRef> element. Let's say you have elements and attributes for mathematical formulas, and you want to include them all as one package. Put them into a file called formula.rng and say this:

<element name="math">
    <externalRef href="formula.rng"/>
</element>

Namespaces

Please read pages 41-45 of Learning XML. Note: on page 45, the line of code that reads </myns:abstract> should simply read </abstract>

DTDs are not inherently namespace-aware. You can make a direct declaration of an element (such as the one on page 44) with the prefix included, as in the following example, but that ties you down to a specific prefix.

<!ELEMENT eq:variable (#PCDATA)>

<!-- prefix can be overriden in the internal subset of a
     schema document to establish a different namespace prefix -->
<!ENTITY % prefix 'eq:'> 

<!-- if %prefix is defined (e.g. as foo:) then you must also define %suffix
     as the suffix for the appropriate namespace declaration (e.g. :foo) -->
<!ENTITY % suffix ':eq'>

<!ENTITY % nds 'xmlns%suffix;'>

<!-- Define all the element names, with optional prefix -->
<!ENTITY % formula "%prefix;formula">
<!ENTITY % variable "%prefix;variable">

<!ELEMENT %formula; (#PCDATA | %variable;)*>
<!ATTLIST %formula;
    %nds;   CDATA   #FIXED 'http://www.mathstuff.org'>
<!ELEMENT %variable;>

While this technique does solve the problem for DTDs, it's not truly aware of namespaces; it just works by tacking on the appropriate prefixes. To be truly namespace aware, you must actually connect the namespace with a URI. This is what Relax NG does with the ns attribute. Here's the preceding example, written in Relax NG:

<element name="formula" ns="http://www.mathstuff.org">
    <interleave>
        <text />
        <zeroOrMore>
            <element name="variable"> <text/> </element>
        </zeroOrMore>
    </interleave>
</element>

The URI declared in the outer ns attribute is “inherited” by all the children of that element; that's why we didn't have to specify an ns attribute on the <element name="variable">

Once this is set up, the RNG will validate an XML file using any prefix, so long as its xmlns specification points to the proper URI. This will validate:

<math:formula xmlns:math="http://www.mathstuff.org">
    <math:variable>P</math:variable> =
    <math:variable>m</math:variable>
</math:formula>

<eq:formula xmlns:eq="http://www.mathworld.org">
    <eq:variable>P</eq:variable> =
    <eq:variable>m</eq:variable>
</eq:formula>

Applying Namespaces

We can now apply this knowledge to the wrestling club database. Often, a club will have its website URL in the <info> element, and may wish to use <b> and <i> elements. While we could add these as part of the definition of our club database markup language, they are really HTML elements, and it is appropriate to use namespaces to mark them as such:

<club-database xmlns:html="http://www.w3.org/1999/xhtml">
<association id="SCVWA">
<club id="H25">
    <charter>2000</charter>
    <name>Gilroy Hawks</name>
    <location>Gilroy</location>
    <!-- [snip] -->
    <info>
        USA Wrestling card
        <html:b>required - <html:i>No Exceptions</html:i></html:b>.
        See <html:a href="http://www.someclub.com/">our website</html:a>
        for further details. 
    </info>
</club>
<!-- remainder of document -->

<element name="info">
    <ref name="HTML"/>
</element>

<define name="HTML">
<interleave>
    <text/>
    <zeroOrMore ns="http://www.w3.org/1999/xhtml">
    <choice>
        <element name="b">
            <ref name="HTML"/>
        </element>
        <element name="i">
            <ref name="HTML"/>
        </element>
        <element name="a">
            <attribute name="href"/>
            <ref name="HTML"/>
        </element>
    </choice>
    </zeroOrMore>
</interleave>
</define>

Context Sensitivity

So far, so good. However, the current definition lets us have an HTML <a> element nested within another <a> element. This is a meaningless construct, and should be invalid. Relax NG lets you provide several definitions for an element, and the context tells you which one is correct. We will change the definition of our HTML subset to say that an <a> element contains HTML without links; that new definition will redefine the <b> and <i> elements.

<define name="HTML">
<interleave>
    <text/>
    <zeroOrMore ns="http://www.w3.org/1999/xhtml">
    <choice>
        <element name="b">
            <ref name="HTML"/>
        </element>
        <element name="i">
            <ref name="HTML"/>
        </element>
        <element name="a">
            <attribute name="href"/>
            <ref name="HTML_without_link"/>
        </element>
    </choice>
    </zeroOrMore>
</interleave>
</define>

<define name="HTML_without_link">
<interleave>
    <text/>
    <zeroOrMore ns="http://www.w3.org/1999/xhtml">
    <choice>
        <element name="b">
            <ref name="HTML_without_link"/>
        </element>
        <element name="i">
            <ref name="HTML_without_link"/>
        </element>
    </choice>
    </zeroOrMore>
</interleave>
</define>

Advanced Relax NG

Lists

Referencing External Files

Namespaces

Applying Namespaces

Context Sensitivity