XML



index
Disabled back button Next Section
printable version

Section 1: Introduction

XML, or eXtensible Markup Language, is a widely used language for data representation.

It is extensible because you can add data and information without disruption of current functionality.

It can be implemented on many platforms with very little effort and provides the rules that facilitate the development of industry- and application-specific markup languages.

XML serves the same purpose that grammar and punctuation rules would serve in an essay.

XML is similar to other markup languages in that it has opening and closing tags, attributes, and variables.

XML differs from other markup languages, because its tags, attributes and variables are not necessarily predefined.

The most crucial difference between XML and other pre-existing markup languages is that it uses tags to assign meaning to the actual content being marked up.

XML was designed to store and send information.

Components of a web page

Page structure is specified in HTML, and CSS is used to control presentation. XML files can be used to provide the content for web pages. JavaScript can be used to read the data from the XML file and build the resulting page. An example of this functionality can be seen in this page. When the page loads a script will run that reads the data from the XML file and echoes it on the screen. We'll see another version of this in a later section.

Another example demonstrates the survey simulation example from the array notes using data from an XML file rather than randomly generated data: link

Section 2: Structure

XML follows a family structure with parent child and sibling relationships:

XML Document Structure

The basic structure of XML looks something like this:

<root>
  <child>
    <subchild>...</subchild>
  </child>
</root>

Well Formed XML File

Valid XML File

The following is an example of XML code:

<?xml version="1.1" encoding="ISO-8859-1"?>
<note>
  <to>Students</to>
  <from>Dr. Parker</from>
  <subject>Reminder</subject>
  <body>Don't forget to study hard for the final!</body>
</note>

This note has a recipient, sender, subject, and body. It's nothing more than pure information wrapped in XML tags. Notice how you can tell exactly what the code is describing based on the tags alone. This is what is meant by XML being a self-descriptive language.

Using the example, we'll work through the code line by line:

Section 3: Syntax

XML syntax is very strict, but very simple.



All XML Elements Must Have a Closing Tag

In HTML, the following code is legal:

<p>This is a paragraph.
<p>This is another one.

In XML, all elements must have the closing tag. It would look something like this:

<p>This is a paragraph.</p>
<p>This is another one.</p>



XML Tags are Case Sensitive

With XML, the tag <Letter> is different from the tag <letter>. You must make sure that opening and closing tags are written in the same case.

<Body>This is incorrect.</body>
<body>This is correct.</body>



All XML Tags Must be Properly Nested

In HTML some elements can be improperly nested within each other like this:

<b><i>This text is bold and italic.</b></i>

In XML all elements must be properly nested within each other like this:

<b><i>This text is bold and italic.</i></b>



XML Documents Must Have a Root Element

All other elements must be within this root element. All elements can have sub elements (child elements). Sub elements must be correctly nested within their parent element:

<root>
  <child>
    <subchild>...</subchild>
  </child>
</root>



XML Attribute Values Must be Quoted

XML elements can have attributes in name/value pairs just like in HTML. In XML the attribute value must always be quoted. The following example is incorrect:

<note date=1/1/2011>
...
</note>

This is correct:

<note date="1/1/2011">
...
</note>



White Space is Preserved in XML

In XML, white space is preserved. Here is an example of some HTML code with white space:

<p>This sentence has      white space in it. It won't show up in the browser.</p>

In the browser, the sentence would look like this:

This sentence has white space in it. It won't show up in the browser.

This is because HTML truncates multiple, consecutive spaces to a single white space. In XML, the gap between the words "has" and "white" would be preserved.



Miscellaneous XML Syntax Rules

A comment in XML looks identical to an HTML comment. It looks like this:

<!-- Remember to comment confusing code, or the grader will get you. -->

One last note - in XML a CR/LF is converted to LF only. If you're unsure of what this means, you probably don't need to worry about it.

Section 4: Elements

XML documents can be extended to carry more information at any time. Referring to the note example we've been using, we could decide to add tags such as <date>, <header>, etc. Adding these tags won't break the XML application. As long as it's still be able to locate the <to>, <from>, <subject>, and <body> tags, it will continue to function. In this way, XML is an extensible language.

To understand XML terminology, you have to know how relationships between XML elements are named, and how element content is described. Elements are related as parents, siblings, and children. Examine the code below:

<book>
  <title>My First XML</title>
  <prod id="33-657" media="paper"></prod>

  <chapter>Introduction to XML
    <para>What is HTML</para>
    <para>What is XML</para>
  </chapter>

  <chapter>XML Syntax
    <para>Elements must have a closing tag</para>
    <para>Elements must be properly nested</para>
  </chapter>
</book>

In this case, <book> is the root element of this document. The elements <title>, <prod>, and <chapter> are sibling elements because they have the same parent element (<book>). They are also child elements of <book>.

XML elements must follow these naming rules:

Naming rules stipulate that you cannot use spaces. Avoid "-" and "." in names as well. For example, if you name something "first-name," it could be a mess if your software tries to subtract name from first. Or if you name something "first.name," your software may think that "name" is a property of the object "first." It's a better idea to use "_" as a separator. Also, use common sense when naming your elements. The element <book_title> will always beat <the_title_of_the_book>.

Section 5: Viewing an XML file

Many browsers (Firefox and Internet Explorer, for example) use a built-in style sheet to format the data in files with a .xml extension.

Section 6: DTDs and schemas

Because XML is intended to share information among many applications, it is important that each application understands the information being shared. Schemas and DTDs provide the ground rules for communicating.

For example, the date 02/04/09 could be interpreted as April 2, 2009 or February 4, 2009. A DTD or schema will define how the date should be interpreted.

A DTD sets the rules for your XML document and is comprised of the following:

A schema, though more robust, serves the same purpose as a DTD. You will see them as .xsd (XML schema definition) files.

Section 7: Validation

XML documents can be described as well formed and valid. A well formed XML document is one that adheres to the XML syntax rules found above.

An XML document can optionally reference a Document Type Definition (DTD) or a schema (preferred choice) that defines the proper structure of the XML document. Parsers can read the DTD/schema and check that the XML document follows the structure defined by the DTD/schema. A valid XML document is a well formed XML document that conforms to the rules of the DTD/schema. The purpose of a DTD/schema is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements. For much more information about DTDs, visit the w3schools.com tutorial found here.

XML Schema is an XML-based alterative to a DTD. Again, it's worth reading the w3schools.com tutorial found here.

DTDs and schemas are essential for business-to-business (B2B) transactions and mission-critical systems. Validating XML documents ensures that disparate systems can manipulate data structured in standardized ways and prevents errors caused by missing or malformed data.

An XML document is not required to reference a DTD/schema, but validating XML parsers can use a DTD/schema to ensure that the document has the proper structure.

Validating an XML document helps guarantee that independent developers will exchange data in a standardized form that conforms to the DTD/schema.

W3C Markup Validation Service

Internet Explorer Tools for Validating XML and Viewing XSLT Output

XML and XSLT

Section 8: XML Coding

Details for curious students only....

To learn how to create a webpage from data stored in an XML document, proceed to the XML Coding notes.

Section 9: XBRL

XBRL, or extensible business reporting language, is legally required for publicly traded companies. You can obtain financial reports online and in real time.

Section 10: Resources

XML is an enormous topic, so there's a ton of material out there to explore.

Files to play with (remember to view source) by Lee Griffin