Electronic Design

Software Directory: Internet Standards Series

XML: Extended Markup Language
XML makes data interchange easier on embedded systems by using a standard, extensible encoding scheme for structured data. It's not as efficient as custom binary data structures, but XML's text-based implementation allows standard encoding and decoding engines to convert data between XML documents and program data structures.

Beyond being useful for data exchange alone, XML is a framework for a host of other protocols and interpretations of data. XML and its variants are changing the way that the Internet deals with information as well as how embedded systems will handle information. Even HyperText Markup Language (HTML) will eventually be superceded by XHTML, its XML-based cousin.

XML itself is actually a simple document definition syntax, which is a subset of the Standard Generalized Markup Language, ISO Standard 8879 (SGML). A variety of other standards or proposed standards, such as XLink and XSLT, also support XML.

Although XML isn't a communication protocol or an execution or scripting system, it can be used as a document format employed by either. For example, the Simple Object Access Protocol (SOAP) implements XML for passing data as part of its protocol.

See associated table

XML Syntax
XML has a relatively straightforward syntax. Still, XML documents typically will be generated and manipulated by applications versus developers using a text editor.

A well-formed XML document consists of three sections: a prolog, an element, and miscellaneous elements, which are optional. The prolog defines the document type and potentially the syntax deployed in the document element or a DTD that performs the same function. A DTD can specify the type, values, and number of items for an element.

The figure presents a simple XML document and a tree diagram of the matching data structure. Applications that generate or parse an XML document use a similar data structure.

Each element consists of a starting and an ending tag. The tag names, bounded by angle brackets, are arbitrary, but they normally describe the contents of the element. An element can contain a value or additional elements. An empty element is indicated by a trailing />. Also, an element can have any number of named attributes associated with it.

Values may contain any character, although special characters are encoded so no confusion arises with respect to tags and values.

SOAP: Simple Object Access Protocol
Don't get dirty hands developing a custom messaging protocol for a distributed application if a little SOAP can do the job. SOAP isn't as space-efficient as a custom binary protocol, but it provides a standards-based framework that a custom protocol never will. SOAP also makes publishing an interface for an embedded device much easier with its well-defined base protocol and data-encoding schemes.

Its lightweight XML-based protocol facilitates data exchange in a distributed environment. The standard defines three components: an envelope, encoding rules for its contents, and a data-exchange protocol. Even though SOAP does not, additional standards that use SOAP define the interpretation of data being exchanged. SOAP's simplicity has made it a popular base for other distributed Internet protocols.

SOAP is a one-way protocol that describes how data can be moved from one point to another. Typically, it's implemented in a client/server environment where an initial SOAP message to a server generates a response. The data in the request is application-specific, as is the kind of response returned by the server. An equally viable scenario has a server generating multiple messages in response to a request.

The SOAP standard describes how SOAP operates over a HyperText Transport Protocol (HTTP) connection and the HTTP Extension Framework. SOAP is protocol-independent and not restricted to HTTP, yet most implementations use HTTP. One reason for this is because SOAP's client/server orientation maps well with HTTP Web servers and browser clients. It works equally well for embedded applications where an embedded Web server is involved. SOAP has no restriction on the type of client or server.

Other standards are being built using SOAP, including eBusiness XML (ebXML). Further improvements are being made to the SOAP standard too, adding such features as attachments, encryption, and digital-signature support. Like many XML-related standards, though, these are still in a state of flux.

XML and SOAP are only frameworks. They're not useful by themselves. Application-specific schemas, actions, and protocols must be defined for individual applications. Employing SOAP and XML is advantageous because XML encoders and decoders are becoming standard components in embedded Web browsers and servers. Moreover, it's simple to generate XML documents directly from applications.

See associated table

The Protocol
An HTTP message is the usual transport mechanism for a SOAP envelope. Additions to the typical HTTP header, such as the SOAPAction entry, indicate the intent of the message.

The envelope is an XML document containing a header and body. It provides environmental details about the body and the encoding scheme that it uses.

The header is optional. Values of attributes in header elements are used by the recipient of the message. The recipient may forward the SOAP message. If it does so, the header is removed and may be replaced by the recipient before the new message is sent.

The body element has an XML tag of SOAP-ENV:Body. The application-specific XML elements are included in the contents of the body. The SOAP encoding rules supply definitions for standard data types, such as strings and integers. This makes creating XML data structures easier because new XML schemas don't have to be created.

SOAP includes definitions for data structures and arrays. Data structures are typically the basis for implementing more complex actions, such as remote procedure calls. Standard error responses are defined by the SOAP standard.

See associated figure

XHTML: Extensible HyperText Markup Language
XHTML could replace HTML as the document format of choice on the Internet. HTML documents make up most of the document files presently on the Internet. Still, even the latest HTML 4.01 specification lacks the extensibility of XHTML, HTML's XML-based cousin. Migration to XHTML will take years, but it's incorporated into the latest version of Web browsers. XHTML will also provide better support for new elements, attributes, and features.

Using XHTML documents in place of HTML documents is a simple alternative to employing XML documents and XML Stylesheet Language (XSL) documents because XHTML is very close to HTML in terms of syntax and semantics. XHTML uses the same tags as HTML within a more rigorous XML syntax. Unlike HTML, which might have unmatched tags, XHTML must follow XML syntax and semantic rules. Plus, HTML-style scripting is supported.

The similarity between XHTML and HTML is no accident. The designers of XHTML made changes to HTML by adding features to tighten up some of the HTML syntax so it now meets XML requirements and requires the use of DTDs. Three DTDs are defined for XHTML:

Strict Uses cascading style sheets (CSS) for formatting
Transitional Uses embedded formatting
Frameset Similar to HTML for frame support

The Strict DTD is designed to spit out the style and presentation information to CSS documents in much the same way that XML can be displayed by using XSL documents. The Transitional DTD supports embedded formatting normally found in HTML documents that don't use CSS formatting. The Frameset DTD provides HTML-style frame support. A DTD must be implemented with XHTML, whereas DTDs are optional with HTML.

See associated table

Bigger Is Better
HTML is more compact. But this often causes problems upon its usage and display, especially when displaying HTML documents on different Web browsers. One browser may ignore an inconsistency while another doesn't display the document at all. XHTML's more rigorous syntax and semantics should make browser support more consistent. It also should simplify browser implementation because a significant portion of browser implementation addresses these types of HTML exceptions.

The example is relatively well formed, although it shows off some of the differences between HTML and XHTML. Two major areas make the XHTML document larger than the HTML document. The first comprises the XML and DOCTYPE elements, while the latter specifies the DTD used with the document. In this case, it's the Strict DTD. The second includes the additional attributes for the <html> tag. HTML documents may include attributes in this tag. Therefore, an XHTML document might be closer in size to a comparable HTML document

The paragraph tab <p> is an example of where HTML usually differs from XHTML. A trailing </p> tag can be used with HTML, but that rarely happens. It's required with XHTML. Minor differences also exist with empty XML elements. For instance, the HTML <br> paragraph break changes to <br/> with XHTML.

XHTML support is just beginning to appear in HTML editors. A more difficult task will be changing systems with application-generated documents.

HTML

<html>
  <head>
    <title>Page Title</title>
  </head>
  <body>
    <p align="center">Click <br>
    <a href="page1.html">Page 1</a><br>
    <a href="page2.html">Page 2</a><br>
    to see more.
  </body>
</html>
 

XHTML

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html 
 PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
     xml:lang="en" lang="en">
  <head>
    <title>Page Title</title>
  </head>
  <body>
    <p align="center">Click <br/>
    <a href="page1.html">Page 1</a><br/>
    <a href="page2.html">Page 2</a><br/>
    to see more.</p>
  </body>
</html>
Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish