Content Management and Distribution Using XML

Photo of author

By Miro Stoichev

Introduction

With more and more devices hitting the market that are Internet-Enabled (such as wireless phones, handheld computers, even refrigerators), more and more people are going to be surfing the web on things that hardly (if at all) resemble the desktop computer and web browser that started all the commotion in the first place. This is a good thing. It means greater connectivity and can lead to wonderful things like voice recognition and wireless broadband. But right now, in the real world, with all those different devices (not to mention different formats and supposed “standards” popping up all the time), developing numerous outputs for the same underlying content can easily become overwhelming.

That’s one of the reasons we’ve started a project on the Wireless Developer Network that will attempt to solve at least part of the problem of outputting content to a number of different formats. It’s called XMLCast, and it basically takes content formatted in a tidy XML structure, parses it, and outputs the content in a number of different formats, such as HTML, simple HTML (for less robust browsers like those found on handheld devices), RSS, and of course, WML. We’ll kick XMLCast off with this article and will follow up over the next several weeks by continually adding to the application. When we’re done, you should have a good idea as to how you too can use XML to standardize data sharing and distribution between multiple applications, devices, or customers.

We’ve developed XMLCast using Microsoft’s Active Server Pages. We chose ASP because of Microsoft’s feature-rich XML parser, the simplistic nature of VBScript, and the fact that there are a whole bunch of web sites running on IIS out there. And on the plus side, it’s not necessary to set the mime type (though you may want to do it anyway, in case you’re serving regular .wml files) because we do it in the ASP script itself.

Quite essential to understanding XMLCast is a working understanding of what XML is and how it works. For everything you would ever need to know about XML, give xml.com a visit. Other than that, a good working knowledge of ASP, HTML, and WML will help you understand what we’re doing in this application.

For simplicity, we’re going to be using a news page as our content, but this application can be altered to allow for any sort of content, as long as you know the XML structure. Our news is made up of four basic elements:

  • Heading
  • Link
  • Summary
  • Paragraphs

The Heading section refers to the title of the news article, the Link is a unique identifier for each article so the full article can be retrieved from a “Summary” page. The Summary is a short description of the article that is listed on the Summary page. The ‘s are enclosed within a tag, mostly because for wml, XMLCast sends out each paragraph as a separate card. But we’re getting ahead of ourselves, so let’s have a look at the XML DTD and an example news xml document.

Here’s the news.dtd file

news.dtd <?xml version=’1.0′?> <!–ELEMENT NEWS (ARTICLE)–> <!–ELEMENT ARTICLE (HEADING, LINK, SUMMARY, PARAGRAPHS)–> <!–ELEMENT HEADING (#PCDATA)–> <!–ELEMENT LINK (#PCDATA)–> <!–ELEMENT SUMMARY (#PCDATA)–> <!–ELEMENT PARAGRAPHS (PARAGRAPH+)–> <!–ELEMENT PARAGRAPH (#PCDATA)–> We separated the DTD from the XML document and store it in a dtd directory on our site. The xml documents point to this dtd.

Here’s an example news.xml file

news.xml &lt;?xml version=’1.0′?&gt; &lt;!DOCTYPE NEWS SYSTEM “http://localhost/dtd/news.dtd”&gt; &lt;news&gt; &lt;logo&gt;devnetlogo.gif&lt;/logo&gt; &lt;date&gt;May 30, 2000&lt;/date&gt; &lt;article&gt; &lt;heading&gt;This is the heading for the first article&lt;/heading&gt; &lt;link&gt;news1&lt;/link&gt; &lt;summary&gt;This is the summary for the first article.&lt;/summary&gt; &lt;paragraphs&gt; &lt;paragraph&gt;This is the first paragraph for the first article.&lt;/paragraph&gt; &lt;paragraph&gt;This is the second paragraph for the first article.&lt;/paragraph&gt; &lt;paragraph&gt;This is the third paragraph for the first article.&lt;/paragraph&gt; &lt;paragraph&gt;This is the fourth paragraph for the first article.&lt;/paragraph&gt; &lt;paragraph&gt;This is the fifth paragraph for the first article.&lt;/paragraph&gt; &lt;/paragraphs&gt; &lt;/article&gt; &lt;article&gt; &lt;heading&gt;This is the heading for the second article.&lt;/heading&gt; &lt;link&gt;news2&lt;/link&gt; &lt;summary&gt;This is the summary for the second article.&lt;/summary&gt; &lt;paragraphs&gt; &lt;paragraph&gt;This is the first paragraph for the second article.&lt;/paragraph&gt; &lt;paragraph&gt;This is the second paragraph for the second article.&lt;/paragraph&gt; &lt;paragraph&gt;This is the third paragraph for the second article.&lt;/paragraph&gt; &lt;paragraph&gt;This is the fourth paragraph for the second article.&lt;/paragraph&gt; &lt;paragraph&gt;This is the fifth paragraph for the second article.&lt;/paragraph&gt; &lt;/paragraphs&gt; &lt;/article&gt; &lt;article&gt; &lt;heading&gt;This is the heading for the third article.&lt;/heading&gt; &lt;link&gt;news3&lt;/link&gt; &lt;summary&gt;This is the summary for the third article.&lt;/summary&gt; &lt;paragraphs&gt; &lt;paragraph&gt;This is the first paragraph for the third article.&lt;/paragraph&gt; &lt;paragraph&gt;This is the second paragraph for the third article.&lt;/paragraph&gt; &lt;paragraph&gt;This is the third paragraph for the third article.&lt;/paragraph&gt; &lt;paragraph&gt;This is the fourth paragraph for the third article.&lt;/paragraph&gt; &lt;paragraph&gt;This is the fifth paragraph for the third article.&lt;/paragraph&gt; &lt;/paragraphs&gt; &lt;/article&gt; &lt;/news&gt; You get the idea. We didn’t use any tricky xml here to cut down on surprises. One thing to note is that if you choose to us #PCDATA as the data type for a field (like we do in PARAGRAPH), make sure you won’t be including any special HTML characters [&'”<>…], as the PCDATA doesn’t like those special HTML characters unless you quantify them like & < > etc. Ok, now that we’ve gone through the bones, let’s look at the skin. Once the XML document is completed, XMLCast loads it up, checks to see which output method was requested (HTML, sHTML, RSS or WML), and then calls the appropriate function for output. Now that we’ve got our XML document, let’s parse the data out so we can output it any way we want.

Leave a Comment