An Introduction to XSLT and XPath

Photo of author

By Miro Stoichev

Introduction

With the advent of WAP, more and more sites are being re-coded in Wireless markup language to cater to WAP devices. While coding in WML is relatively easy, it can be a daunting task if you need to maintain multiple sets of WML documents for different WAP devices since at the moment they all have very different look-and-feel.

Ideally, we should have a single mark-up language that can be used to support all devices; be it the Web browser or a WAP device. However, the state of affairs at the moment seems that this dream would still remain a dream for the next couple of years. Until WAP 2.0 declares the standard markup language to be XHTML, developers have to devise mechanisms to tailor their site content for the various devices in the market.

The Extensible Markup Language (XML), a technology that has been around for quite some time but has not gained wide acceptance until recently, seems a perfect candidate for content encoding. So instead of coding your site in HTML and WML, you would just need to code your site in XML. Depending on the types of devices that are accessing your site, you can then transform the XML documents into the appropriate markup languages (HTML or WML) that can be understood by the devices.

In this article, we will look at how you can use the Extensible Markup Language Transformation (XSLT) and XPath to transform XML so that you can tailor your content for different devices. I assume readers are familiar with HTML, WML and XML.

Microsoft XML Parser technology

For this article, I will make use of the Microsoft XML Parser 3.0 (MSXML3) July Preview release. Microsoft has committed to supporting the W3C’s XML Recommendation version 1.0 and it is releasing a new preview release of its XML parser every alternate month. For the latest MXSML parser, point your web browser to http://msdn.microsoft.com/xml/default.html.

If you are currently using Microsoft Internet Explorer 5.0, you would have version 2.0 installed in your system. It is important that you download the latest version of MSXML since the older version supports an older version of XSL. The latest MSXML release supports the W3C’s XSLT specification.

XSLT, XPath and MSXML3

XSLT is an XML-based language that allows you to transform an XML document to another XML document of a different structure. You can think of XSLT as simply a language that allows you to specify commands (in the form of XSLT elements) to transform an XML document from one form to another.

What about XPath? Since an XML document primarily contains elements and texts, there must be a way for XSLT to locate specific elements. And so the task of XPath is to help XSLT locate specific elements so that they can be processed.

To perform the transformation using XSLT and XPath, you need an XSLT processor. There are quite a number of excellent XSLT processors available on the Internet – XT, Saxon, to name a few. For this article, we will make use of the MXSML3. MSXML3 is more than just an XML parser; it is also an XSLT processor.

Enough theory! I am a strong believer of learning by examples, so let’s get started!

A Real-World Example

I run some training courses for developers in Singapore and have a web site that contains information about all the courses available. There are a couple of different courses and I simply maintain them in plain text files although storing all these information in a database seems like a better idea. However, besides wanting my customers to view the site using a web browser, I also want my customers to be able to check the course information using a WAP device. This is an added convenience for people who wanted to quickly confirm the timing of the course as well as the course content.

The obvious choice for the markup language to use is XML. There is little motivation for me to code the pages in HTML, because by doing so I have to create another set of WML pages for WAP devices.

A typical course page looks like this:

<?xml version=’1.0′?> <course id=”xmlxslt”> <title>XML/XSLT – Extensible Markup Language & Extensible Stylesheet Language</title> <duration>16</duration> <!–in hours–> <synopsis>XML is the language used for describing data. With the advent of WAP, web sites developers are increasingly deploying their sites in XML and using the transformation engine of XSL, which is XSLT, to tailor their web pages to different browsers. Participants will be developing applications that can dynamically adapt to different browsers. </synopsis> <fees>800</fees> <!–In S$ dollars–> <coursedates> <date> <mode>Evenings</mode> <day>14</day> <month>10</month> <year>2000</year> <venue>Rock Tower</venue> <time> <from>1730</from> <to>2130</to> </time> </date> <date> <mode>Full Days</mode> <day>21</day> <month>10</month> <year>2000</year> <venue>Developers Unit</venue> <time> <from>0900</from> <to>1700</to> </time> </date> </coursedates> </course>

The XML document contains the following information:

  • Course title and synopsis
  • Cost of the course in S$ dollars
  • The timing of the course. You can have more than one run of a course.

If you load the XML document using Microsoft Internet Explorer 5 (IE5), you should see the following display:

Obviously we wouldn’t want to let our customers see the XML document in its raw form (what IE5 did was to apply a default stylesheet to an XML document without any stylesheet specified). For customers using a web browser, we should display the course page using HTML. Let’s see how we can do that.

Creating The Stylesheet

Now that we have the course information marked up in XML, let’s see how we can devise a stylesheet for displaying the XML document nicely on a web browser.

<?xml version=’1.0′?> <xsl:stylesheet xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” version=”1.0″> <xsl:template match=”/”> <html> <body bgcolor=”wheat”> <b>Course Title: </b><xsl:value-of select=”Course/Title”/><br/> <b>Synopsis: </b><center><i><xsl:value-of select=”Course/Synopsis”/></i></center><br/> <b>Duration: </b><xsl:value-of select=”(Course/Duration) div 8″/> days <br/> <b>Fees: </b>S$<xsl:value-of select=”Course/Fees”/><br/> <b>Dates:</b><br/> <xsl:for-each select=”Course/CourseDates/Date[Day!=”]”> <xsl:value-of select=”Day”/>/<xsl:value-of select=”Month”/>/<xsl:value-of select=”Year”/> From <xsl:value-of select=”Time/From”/>hrs to <xsl:value-of select=”Time/To”/>hrs @ <xsl:value-of select=”Venue”/> – <xsl:value-of select=”(/Course/Duration) div (((Time/To) – (Time/From)) div 100)”/> <xsl:value-of select=”Mode”/> <br/> </xsl:for-each> </body> </html> </xsl:template> </xsl:stylesheet>

To associate the stylesheet with the XML document, simply insert an additional PI at the top of the XML document:

<?xml version=’1.0′?> <?xml:stylesheet type=’text/xsl’ href=’HTML.xsl’?> <course id=”xmlxslt”>

Refreshing the XML document on IE5 will yield a nicely formatted document:

What IE 5 did was to load the specified stylesheet and transform the XML document into HTML. It is important to note that the transformation takes place on the browser-side. This is known as client-side transformation.

Dissecting The Stylesheet

Let’s now take a closer look at the stylesheet. The first line of the XSLT stylesheet is:

<?xml version=”1.0″?>

Because an XSLT stylesheet is an XML document by itself, we have the above processing instruction.

<xsl:stylesheet xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” version=”1.0″>

Next we define the namespace for the XSL stylesheet with the <xsl:stylesheet> element. Note that the newer MSXML3 parser supports both the MS XSL standard as well as the W3C XSLT Recommendation. The difference is in the namespace. To use the MS XSL standard, change the namespace to:

&lt;xsl:stylesheet xmlns:xsl=”http://www.w3.org/TR/WD-xsl”&gt;

The older MS XSL specification does not conform to the current XSLT Specification 1.0. As such, it contains some XSLT elements that do not work with other XSLT processors. Also note that using the W3C XSLT recommendation requires an addition attribute named version to be included.

Note: The MSXML3 parser was, at the time of this writing, in beta release and as such Microsoft does not recommend that you use it on a production server.

With this namespace declaration, all those elements that begin with the prefix would be treated as XSL elements. The XSL processor would operate according to the functions of the XSL elements.

Let’s take a look at the next XSL element in our stylesheet:

&lt;xsl:template match=”/”&gt;

The <xsl:template> element is used as a template to match against the source XML document. What this element is doing is basically matching against the root of the XML document. The root of the XML document is the start of the XML document (not to be confused with the root element of the XML document). This match is indicated by the attribute match in the <xsl:template> element. The value of “/” indicates the “root” of the XML document.

The value of the match attribute contains an XPath expression. XPath is a language for addressing parts of an XML document. As its name implies, XPath is like your directory path that you used to locate files in your hard disk drive. It specifies an expression to locate the required elements in your XML document. In addition to the expression, it also supports functions to perform actions on a group of elements. We will look at XPath functions shortly.

Back to the discussion, when the root of the XML document is located, the current element that is selected is known as the context node. The context node is analogous to your current directory.

Next we see that we have some familiar tags:

&lt;html&gt; &lt;body bgcolor=”wheat”&gt;

Since these tags do not begin with the “xsl” prefix, they are copied to the output tree. The output tree is the final document produced by the transformation.

The next line contains another XSL element, <xsl:value-of>. Basically, the <xsl:value-of> element displays the value of the element that is indicated by the select attribute. In this case, it is “Course/Title”, which means look under the context node (root of the XML document, in this case) for the element <Course> which contains the child element <Title>.

Once the required element is located, the text of the element (<Title>) would be printed in the output tree.

&lt;b&gt;Course Title: &lt;/b&gt;&lt;xsl:value-of select=”Course/Title”/&gt;&lt;br/&gt;

The next line prints the text contained within the <Synopsis> element.

&lt;b&gt;Synopsis: &lt;/b&gt;&lt;center&gt;&lt;i&gt;&lt;xsl:value-of select=”Course/Synopsis”/&gt;&lt;/i&gt;&lt;/center&gt;&lt;br/&gt;

The next line is interesting. It shows that besides locating element within an XML document, XPath can also be used to perform mathematical operations. In this case, since the duration of a course is represented in hours, we divide (by using the div operator) it by eight to return the number of days per course.

&lt;b&gt;Duration: &lt;/b&gt;&lt;xsl:value-of select=”(Course/Duration) div 8″/&gt; days &lt;br/&gt; &lt;b&gt;Fees: &lt;/b&gt;S$&lt;xsl:value-of select=”Course/Fees”/&gt;&lt;br/&gt; &lt;b&gt;Dates: &lt;/b&gt;&lt;br/&gt;

An XPath expression can also perform decision making. In the next line, we select all the <Course>/<CourseDates>/<Date> elements whose <Day> does not contain an empty string. We also use the <xsl:for-each> element to perform looping. This element is very much like the looping construct found in typical programming languages. And so it reads: “Find all <Course>/ <CourseDates>/<Date> elements whose <Day> element is non empty and loop through each element.

&lt;xsl:for-each select=”Course/CourseDates/Date[Day!=”]”&gt;

Once in the loop, the context node is now at the first <Course>/<CourseDates>/<Date> element. So the XPath expression in the select attribute of the <xsl:value-of> element can simply be written relative to the context node. Thus the attribute value select=”Month” is functionally equivalent to “/Course/CourseDates/Date/Month”, which specifies the absolute path.

&lt;xsl:value-of select=”Day”/&gt;/&lt;xsl:value-of select=”Month”/&gt;/&lt;xsl:value-of select=”Year”/&gt; From &lt;xsl:value-of select=”Time/From”/&gt;hrs to &lt;xsl:value-of select=”Time/To”/&gt;hrs @ &lt;xsl:value-of select=”Venue”/&gt; –

The next line calculates the number of days the course is being run. It calculates the value based on the number of hours allocated to this course (/Course/Duration) divided by the duration of the course per day.

&lt;xsl:value-of select=”(/Course/Duration) div (((Time/To) – (Time/From)) div 100)”/&gt; &lt;xsl:value-of select=”Mode”/&gt; &lt;br/&gt; &lt;/xsl:for-each&gt;

We then close off our HTML document:

&lt;/body&gt; &lt;/html&gt;

And finally we close off the <xsl:template> and <xsl:styleheet> elements.

&lt;/xsl:template&gt; &lt;/xsl:stylesheet&gt;

Note: One interesting thing to note is that when you do a View Source, you will see XML as the source, not HTML. In IE5, you won’t be able to see the HTML code if you are performing client-side transformation.

Leave a Comment