Markup comes from the bad old days before word processors. If you needed a brochure, you'd type it on a typewriter, and then literally mark it up with a red pen to tell the typesetter what you wanted it to look like. The typesetter would follow your instructions and return a finished document to you:
There are two kinds of wrenches: wrenches with fixed size, and adjustable wrenches.
In this instance, we're using markup not only to show how text should be presented (italic versus normal text), but also to tell how the document is structured: some of the words form a heading, the other words are just ordinary text.
The idea of using markup to impose structure on otherwise anonymous data is such a good one that people came up with a standardized way to create markups for general use. This method was called the Standard Generalized Markup Language, or SGML. SGML really isn't a language in and of itself, it is more of a “rulebook” that tells you how to develop these markup languages. Any markup that follows the SGML rulebook is called an application of SGML.
The most widely known application of SGML is a language used to mark up text for delivery and presentation on the World Wide Web. That language is HTML, the HyperText Markup Language. In HTML, we can mark up the example above to send to a web browser instead of a typesetter:
<h3>How to Buy a Wrench</h3> <p> There are two kinds of wrenches: wrenches with fixed size, and <i>adjustable</i> wrenches. </p>
There are many other applications of SGML, but they're mostly found in
large corporations and government agencies. That's because the SGML
rulebook is very complex, which makes it hard to learn.
For example, SGML allows optional opening and closing tags.
Quick: is
</li>
required or not? How about
<body>
?
Additionally, it's difficult (and expensive!) to develop tools
that can manage data that's marked up according to those rules.
While HTML is a good thing, it doesn't solve all our problems. Consider the following two tables. While the data is structured into rows and cells, there's nothing to tell you (other than your intuition) that the first table gives maximum and minimum temperatures, while the second table gives current and maximum capacities for water reservoirs.
<table border="1"> <tr> <td>Chicago</td><td>13</td><td>6</td> </tr> <tr> <td>Dallas</td><td>60</td><td>20</td> </tr> </table>
<table border="1"> <tr> <td>Calero</td><td>5538</td><td>10050</td> </tr> <tr> <td>Uvas</td><td>6095</td><td>9935</td> </tr> </table>
To solve the complexity issue, XML was designed as a subset of SGML. It eliminates the features that make SGML difficult to learn and parse while retaining 90% of the power of SGML. Tools that analyze and display XML are easier to write, and are widespread and inexpensive. Since XML is a subset of SGML, it lets you devise any set of tags you wish, thus solving the problem of differentiating what would be otherwise be anonymous numbers:
<temperatures> <city name="Chicago"> <max>13</max><min>6</min> </city> <city name="Dallas"> <max>60</max><min>20</min> </city> </temperatures>
<water-banks> <reservoir name="Calero"> <current>5538</current><capacity>10050</capacity> </reservoir> <reservoir name="Dallas"> <current>6095</current><capacity>9935</capacity> </reservoir> </water-banks>
Please read pages 31-40 in Learning XML.
Consider the following example:
<p>Here is some <b>important</b> and <i>useful</i> information.</p>
The <p>
element is the parent of five children:
Here is some
<b>
element and
<i>
element information.
Each of these children is the sibling of the other
children. Note that the <b>
and
<i>
elements also have children.
Please read pages 53-58 in Learning XML.