Section 1: Introduction
XML, or eXtensible Markup Language, is a widely used language for data representation.
It is extensible because you can add data and information without disruption of current functionality.
It can be implemented on many platforms with very little effort and provides the rules that facilitate the development of industry- and application-specific markup languages.
- These specialized XML languages allow for the transmission of data between users regardless of the platform, communication channel, operating system, or software application being used.
XML serves the same purpose that grammar and punctuation rules would serve in an essay.
- XML provides the structure that makes the data well presented and understandable.
XML is similar to other markup languages in that it has opening and closing tags, attributes, and variables.
XML differs from other markup languages, because its tags, attributes and variables are not necessarily predefined.
- New tags can be added as needed.
- XML is (or should be) self explanatory and gives structure to presented information in easily interpreted and stored forms.
The most crucial difference between XML and other pre-existing markup languages is that it uses tags to assign meaning to the actual content being marked up.
- This allows applications to be developed that truly separate the data content from the formatting and structure aspects of the document.
XML was designed to store and send information.
- Whereas HTML is used to structure information display, XML is used to describe information.
- XML does not actually do anything on its own, but rather is used as a complement to languages such as HTML, JavaScript, and PHP.
- It is also used in the Office Open XML Formats and by major companies like Salesforce.

Page structure is specified in HTML, and CSS is used to control presentation. XML files can be used to provide the content for web pages. JavaScript can be used to read the data from the XML file and build the resulting page. An example of this functionality can be seen in this page. When the page loads a script will run that reads the data from the XML file and echoes it on the screen. We'll see another version of this in a later section.
Another example demonstrates the survey simulation example from the array notes using data from an XML file rather than randomly generated data: link
Section 2: Structure
XML follows a family structure with parent child and sibling relationships:

The basic structure of XML looks something like this:
<root>
<child>
<subchild>...</subchild>
</child>
</root>
Well Formed XML File
- One root element
- Proper Nesting
- Well-formed Entities
Valid XML File
- Well-formed
- Must have an inline or external DTD (Document Type Declaration) or schema
- Must follow the rules of the DTD or schema
- You can validate your XML here
The following is an example of XML code:
<?xml version="1.1" encoding="ISO-8859-1"?>
<note>
<to>Students</to>
<from>Dr. Parker</from>
<subject>Reminder</subject>
<body>Don't forget to study hard for the final!</body>
</note>
This note has a recipient, sender, subject, and body. It's nothing more than pure information wrapped in XML tags. Notice how you can tell exactly what the code is describing based on the tags alone. This is what is meant by XML being a self-descriptive language.
Using the example, we'll work through the code line by line:
- Line 1 is the XML declaration. It defines the XML version and the character encoding used in the document. In this case the document conforms to the 1.1 specification of XML and uses the ISO-8859-1 (Latin-1/West European) character set. Documents should include the XML declaration to identify the version of XML used. A document that lacks an XML declaration might be assumed to conform to the latest version of XML – when it does not, errors could result. Note that XML 1.1 is the latest version.
- Line 2 is <note> root element. It describes the root element of document. It's equivalent to saying, "this document is a note".
- Lines 3 - 6 contain child elements of the <note> element (to, from, subject, and body).
- Line 7 defines the end of the <note> element.
Section 3: Syntax
XML syntax is very strict, but very simple.
All XML Elements Must Have a Closing Tag
In HTML, the following code is legal:
<p>This is a paragraph.
<p>This is another one.
In XML, all elements must have the closing tag. It would look something like this:
<p>This is a paragraph.</p>
<p>This is another one.</p>
XML Tags are Case Sensitive
With XML, the tag <Letter> is different from the tag <letter>. You must make sure that opening and closing tags are written in the same case.
<Body>This is incorrect.</body>
<body>This is correct.</body>
All XML Tags Must be Properly Nested
In HTML some elements can be improperly nested within each other like this:
<b><i>This text is bold and italic.</b></i>
In XML all elements must be properly nested within each other like this:
<b><i>This text is bold and italic.</i></b>
XML Documents Must Have a Root Element
All other elements must be within this root element. All elements can have sub elements (child elements). Sub elements must be correctly nested within their parent element:
<root>
<child>
<subchild>...</subchild>
</child>
</root>
XML Attribute Values Must be Quoted
XML elements can have attributes in name/value pairs just like in HTML. In XML the attribute value must always be quoted. The following example is incorrect:
<note date=1/1/2011>
...
</note>
This is correct:
<note date="1/1/2011">
...
</note>
White Space is Preserved in XML
In XML, white space is preserved. Here is an example of some HTML code with white space:
<p>This sentence has white space in it. It won't show up in the browser.</p>
In the browser, the sentence would look like this:
This sentence has white space in it. It won't show up in the browser.
This is because HTML truncates multiple, consecutive spaces to a single white space. In XML, the gap between the words "has" and "white" would be preserved.
Miscellaneous XML Syntax Rules
A comment in XML looks identical to an HTML comment. It looks like this:
<!-- Remember to comment confusing code, or the grader will get you. -->
One last note - in XML a CR/LF is converted to LF only. If you're unsure of what this means, you probably don't need to worry about it.
Section 4: Elements
XML documents can be extended to carry more information at any time. Referring to the note example we've been using, we could decide to add tags such as <date>, <header>, etc. Adding these tags won't break the XML application. As long as it's still be able to locate the <to>, <from>, <subject>, and <body> tags, it will continue to function. In this way, XML is an extensible language.
To understand XML terminology, you have to know how relationships between XML elements are named, and how element content is described. Elements are related as parents, siblings, and children. Examine the code below:
<book>
<title>My First XML</title>
<prod id="33-657" media="paper"></prod>
<chapter>Introduction to XML
<para>What is HTML</para>
<para>What is XML</para>
</chapter>
<chapter>XML Syntax
<para>Elements must have a closing tag</para>
<para>Elements must be properly nested</para>
</chapter>
</book>
In this case, <book> is the root element of this document. The elements <title>, <prod>, and <chapter> are sibling elements because they have the same parent element (<book>). They are also child elements of <book>.
XML elements must follow these naming rules:
- Names can contain letters, numbers, and other characters
- Names must not start with a number or punctuation character
- Names must not start with the letters xml (or XML, or Xml, etc.)
- Names cannot contain spaces
Naming rules stipulate that you cannot use spaces. Avoid "-" and "." in names as well. For example, if you name something "first-name," it could be a mess if your software tries to subtract name from first. Or if you name something "first.name," your software may think that "name" is a property of the object "first." It's a better idea to use "_" as a separator. Also, use common sense when naming your elements. The element <book_title> will always beat <the_title_of_the_book>.
Section 5: Viewing an XML file
Many browsers (Firefox and Internet Explorer, for example) use a built-in style sheet to format the data in files with a .xml extension.
- A "–" sign indicates that all child elements are being displayed
- Clicking the "-" sign collapses the container element and hides all the children
- Clicking the "+" sign expands the container element and shows all the children
Section 6: DTDs and schemas
Because XML is intended to share information among many applications, it is important that each application understands the information being shared. Schemas and DTDs provide the ground rules for communicating.
For example, the date 02/04/09 could be interpreted as April 2, 2009 or February 4, 2009. A DTD or schema will define how the date should be interpreted.
A DTD sets the rules for your XML document and is comprised of the following:
- Elements - Represented by <!ELEMENT element-name category> or <!ELEMENT element-name (element-content)>
- (Some you are familiar with are <html>, <b>, etc.) When one element contains another it is a compound element.
- Entities are data you obtain from databases, scripts, applications, etc. They can be internal or external (such as a style sheet for external, and typed text within the document for internal)
- Internal = <!ENTITY name "text">
- External = <!ENTITY name system "URI"> the word system lets the parser know it is an external entity
- Attributes - additional information within elements (<a href="...">). Possible attributes must be included in the DTD.
- Declared by <!ATTLIST element_name attribute_name type default_value>
- DTD Example
A schema, though more robust, serves the same purpose as a DTD. You will see them as .xsd (XML schema definition) files.
- Schemas will eventually replace DTDs.
- They use XML syntax, and are more extensible than DTDs
- They support data types
- They support name spaces
- They make it easier to convert data between data types
Section 7: Validation
XML documents can be described as well formed and valid. A well formed XML document is one that adheres to the XML syntax rules found above.
An XML document can optionally reference a Document Type Definition (DTD) or a schema (preferred choice) that defines the proper structure of the XML document. Parsers can read the DTD/schema and check that the XML document follows the structure defined by the DTD/schema. A valid XML document is a well formed XML document that conforms to the rules of the DTD/schema. The purpose of a DTD/schema is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements. For much more information about DTDs, visit the w3schools.com tutorial found here.
XML Schema is an XML-based alterative to a DTD. Again, it's worth reading the w3schools.com tutorial found here.
DTDs and schemas are essential for business-to-business (B2B) transactions and mission-critical systems. Validating XML documents ensures that disparate systems can manipulate data structured in standardized ways and prevents errors caused by missing or malformed data.
An XML document is not required to reference a DTD/schema, but validating XML parsers can use a DTD/schema to ensure that the document has the proper structure.
Validating an XML document helps guarantee that independent developers will exchange data in a standardized form that conforms to the DTD/schema.
Internet Explorer Tools for Validating XML and Viewing XSLT Output
Section 8: XML Coding
Details for curious students only....
To learn how to create a webpage from data stored in an XML document, proceed to the XML Coding notes.
Section 9: XBRL
XBRL, or extensible business reporting language, is legally required for publicly traded companies. You can obtain financial reports online and in real time.
- SEC XBRL Rule
- XBRL began in the American Institute of Certified Public Accountants (AICPA)
- Resolves a lot of issues with differing reporting and regulatory rules world wide.
- Chances are that your tax returns will be filed using this technology.
- XBRL Example
Section 10: Resources
XML is an enormous topic, so there's a ton of material out there to explore.
- XML - Wikipedia - Wikipedia XML entry, tons of information and worth skimming
- W3Schools XML - Spectacular resource, absolutely essential to pros and newbies alike
- O'Reilly XML.com - Technically verbose tutorial, slightly outdated as noted its introduction
- Editors - List of XML editors
- Tizag tutorial
- XML for the absolute beginner
- CodeProject Tutorial
Files to play with (remember to view source) by Lee Griffin