This document was updated on 3 July 2002. Updates are marked with a red bar at the side, as this paragraph is marked.
The objects of this assignment are:
The following HTML file is not well-formed. You must convert it to well-formed XHTML. You may not do so by eliminating all the tags and keeping only the text! The resulting file must look the same in a web browser as the original file.
Copy and paste the text below into a file. Name
the file lastname_initial_1a.html
.
For example, if your
name is “Frank Smith,” the file would have a name like
smith_f_1a.html
<html> <head> <title> Exercise 1a </title> </head> <body bgcolor=white> <div align="center"> <h2>Exercise 1a</h2> <img src=xml_logo.png border=1 width=100 height=40 alt="XML Logo"> </div> <p> The goal of this exercise is to convert a non-well-formed HTML file to a file that will be <b><i>well-formed</b></i> with respect to the rules of XML. </p> <hr width="50%" noshade> <p> Quick quiz: XML is... </p> <ol type="A"> <li> A format for structuring data. <li> A three-letter acronym. <li> A really cool technology. <li> All of the above. </OL> <!-- The correct answer is D -- as if you didn't know that already. --> </body> </html>
Click this link to view the image that the HTML page uses. Then right-click the image; a pop-up menu will let you save the image on your disk. (You don't have to include that file in the results that you email to me.)
If you haven't already created the
Windows batch file for the wellformed checker (or the Unix
shell file), go to the batch files
page (or shell files page) and
follow the instructions to create the wellformed.bat
or wellformed.sh
file. You should put this file
in the same directory as your HTML file.
Again, using the example of a person named Frank Smith who is using Windows, he could test his file by typing this at the MS-DOS command prompt:
wellformed smith_f_1a.html
You may object, “But that's an HTML file. Don't I have to
name it smith_f_1a.xml
before I can check it?” No,
you don't. The wellformed checker will parse any file that
you give it, no matter what its name. Unlike most other applications,
the XML tools that we are using don't care about the file name. They
care only about what is inside the file.
The following XML file is not well-formed. You must convert it to well-formed XML. Since you've never seen this markup language before (it's a custom one that was designed for this particular exercise), you may be wondering how to be sure that you've made the corrections the “right way.” I want you to have that unsettled feeling; it leads into the next topic that we'll take up.
For this exercise, modify the tags in any way that you feel is reasonable so that the end result is a well-formed document.
Copy and paste the text below into a file. Name
the file lastname_initial_1b.xml
.
For example, if your
name is “Frank Smith,” the file would have a name like
smith_f_1b.xml
<?xml version="1.0"?> <catalog> <company>Office Magick</company> <department name="Office Supplies" code=235> <item> <name>Stapler</name> <manufacturer>Bostich</manufacturer> <color-list> <color sku="S367-B" hex=#000000>black <color sku="S367-PY"hex=#ffffcc>pastel yellow </color-list> <price amt="8.95"> <summary>Heavy-duty office stapler</summary> <description> This stapler has a 30-sheet capacity and a lever action for accurate placement. </description> </item> <item> <!-- On backorder -- cannot restock. --> <name>Notebook</name> <color-list> <color sku="NB1783-G"hex="#00ff00">Green</color> <color sku="NB1783-Y" hex="#ffff00">Yellow</color> <color hex="#ff0000" sku="NB1783-R">Red</color> </color-list> </item> <department code="240"> Computer Peripherals <item> <name>Mouse</name> <color-list> <color sku="M-0115-LG" hex="#cccccc">Light Gray <color hex="#ffffff">White </color> <price units="USD" />10.95</price> </item> </department> </catalog>
Windows users may use the
wellformed.bat
file to check
to see if their files are well-formed. You will run this batch file
from an MS-DOS prompt.
Unix/Linux users may use
the
wellformed.sh
shell script. Run
it from a shell prompt.
Note: XML parsers stop at the first well-formedness error that they encounter. XML parsers will attempt to parse as much as they can. They will generate an error only when it is impossible to proceed with the parse.
In the following example, because it's possible to nest
tags, the parser can't know that you haven't
closed the first <ul>
element until it hits
the final </div>
closing tag. This XHTML, with line
numbers for reference:
1 <div align="left"> 2 <ul> 3 <li>List 1, item 1</li> 4 <li>List 1, item 2</li> 5 6 <ul> 7 <li>List 2, item 1</li> 8 <li>List 2, item 2</li> 9 </ul> 10 </div>
Generates this error (the 10:6
means line ten, character
number six).
[Fatal Error] testfile.html:10:6: The element type "ul" must be terminated by the matching end-tag "</ul>".
You will have two files. The first one,
lastname_initial_1a.html
will contain
well-formed XHTML. The second one,
lastname_initial_1b.xml
will contain
well-formed XML. Send them as attachments to my email address. If
you wish, you can create a .ZIP file and send that as an attachment.