Tutorial: XML and Java for Scientists/Engineers

TABLE OF CONTENTS

  1. The Business Case for XML and Java
    Purpose : ....
    Topics : ....

  2. Introduction to XML Technology
    Purpose : ....
    Topics : ....

  3. How can Java and XML work together?
    Purpose : ....
    Topics : ....

  4. Getting Started with Java and XML
    Purpose : Explain how write and run a Java/XML application
    Topics : Writing and Running an XML-based Application. Mechanics of the SAX and DOM parsers

  5. Examples: Validating XML files with SAX and DOM Technology
    Purpose : Demonstrate essential elements of SAX and DOM parsing technologies
    Topics : Validate a simple XML file with SAX. Validate a simple XML files with DOM.

  6. XML Schema
    Purpose : ....
    Topics : ....

  7. Application Programs Based on Java and XML
    Purpose : Demostrate a range of applications for Java and XML
    Topics : ....

  8. XML and Databases
    Purpose : ...
    Topics : ....

  9. XML and the Semantic Web
    Purpose : ...
    Topics : ....

  10. Summary
    Purpose : Summarize the key points.

  11. References and Web Resources

  12. Glossary of Terms


The Business Case for XML and Java

BUSINESS MODEL [Complex Systems]

COMPETITIVE FACTORS

To be competitive, systems engineering development procedures must address:

COMPETITIVE STRATEGY AND OPERATIONAL PERFORMANCE

Competitive strategy improved by:

  1. Lowering costs through web-enabled procurement of goods and services.
  2. Just-in-time delivery of inventory.
  3. Concurrent development of project subsystems.
  4. Improved communication among project participants -- everyone needs to know what they are supposed to be working on, and when it's due!
  5. Organizational structure that promotes quality control -- give all employees responsibility and feeling of ownership in business objectives.

Operational performance affected by the extent to which we can:

  1. Access and interact with a wide range of data and information sources at appropriate times.
  2. Maintain a shared view of the project objectives, and at the same time focus on specific tasks.
  3. Draw on experience of simular projects when making project oriented decisions.
  4. Get up-to-speed on project history soon after being employed.
  5. Express and measure progress of groups and individuals.

KEY STRATEGIES AND ENABLING TECHNOLOGIES

We need computational support for the collection, organization, generation and sharing of design/project data and information.

Key Technical Areas

  1. Object modeling of systems using the Unified Modeling Language (UML) and automation of model-based system behavior simulation.

  2. Semi-formal and formal languages for the representation of system requirements, system specifications, and allocation flowdown in heterogeneous (from the physical layer perspective) hierarchies.

    Textual descriptions of systems tend to be informal and, unfortunately, ambiguious. Much of the abiguity can be removed with the use of semi-formal descriptions, for example, activity and statechart diagrams.

  3. Object-relational databases and multiple views (engineering and others) of system data.

Role of UML and XML. The unified modeling language (UML) is a visual modeling language that provides coverage for nearly all aspects of engineering system development, including, representation of system requirements, system structure, and system behavior.

There is a one-to-one correspondence between the visual format for UML and an extended markup language (XML) schema denoted XMI. This means that UML diagrams can be converted to their equivalent XML representations, and stored in either file or (XML) database format. The latter means that UML descriptions of engineering products/components and processes can be stored in company-wide databases and queried and reused on an as-needed basis.

Enabling Technologies

  1. High-speed Internet Infrastructure.
  2. Applications software (e.g., scheduling programs, office productivity suites, systems engineering design software linking project requirements to models of design).
  3. Relational and object-oriented databases.
  4. Computer languages that provide a framework for human-to-human, human-to-computer and computer-to-computer communication.
  5. Human-computer interfaces that are easy to use.

ROLE OF COMPUTER LANGUAGES AND DATABASES

Computer languages are designed to strike a balance in simplicity and problem solving support to a well-defined range of tasks:

No model can claim to be the definitive one. New generations of computer languages are needed for:

Language Definition and Use

Also need guidance on how to use the language (e.g., online training).

Capabilities and Limitations of HTML

HTML is designed for markup of hypertext and hypermedia (i.e., text, images, audio, video) that is linked together in an arbitrary way, and is intended for display in a web browser.

HTML has a fixed tag set:

Example. Here is the HTML code for the previous paragraph:

    <p>
    HTML has a fixed tag set:

    <ul>
       <li>
       Early versions of HTML were not flexible enough to easily
       display content in a wide range of devices (e.g., web
       browsers, palm pilot, cell phone).
       <li>
       Poor support for content searches -- if you want to view part
       of a page you need to download the entire document.
       <li>
       No support for multilingual documents (e.g., Canada).
    </ul>

As we will soon see, XML mitigates the limitations of HTML, and in doing so, enhances business productivity in numerous ways (e.g., allows for machine-to-machine communication of content; flexible formating content (stylesheets)).


Introduction to XML Technology

WHAT IS XML?

XML is simply an acronym for eXtensible Markup Language, and it enables people and computers to search for and exchange scientific/business data, commercial products, and multilingual products with ease and speed.

XML technology provides a method for tagging (or marking up) text in the document so that the meaning of its components can be esablished, thereby allowing for the documents use in multiple applications. Ulike HTML (which has the aforementioned limitations), XML allows for user-defined tags and attributes, and its uses stretch far beyond just presentation of material. Because you can create your own tags and attributes in XML, you can use markup to describe the content of the element. In otherwords, XML is just a way of marking up data so that it is self-describing.

Key Design Principles

XML is an open standard developed by the World Wide Web consortium (W3C). This means that XML is non-proprietary and can be used with a wide range of applications, including authoring tools (e.g., better tag semantics for specialist information), content display engines (i.e., different views of the same data), translation tools, and database exchange/applications.

Separation of Data and Presentation

XML Technology employs three languages:

Note. The specification of a document's contents (XML) is completely separated from the formatting instructions (XSL).

[XML-XSL ]

This allows for the export of data to various types of user device.

XML IS MARKUP

Simply put, markup refers to anything put on a document that provides special meaning or provides extra information (e.g., underlined text in a word processor is a form of markup). Markup by itself is of very limited use unless others understand its use. Hence, we need a set of rules encompassing the following points:

A markup language is such a set of rules (e.g., HTML is markup language for the display of documents on the Web). As we will soon see in the applications below, language elements include declarations, and tags and their attributes. XML provides support for correctly nested, matching tags, attributes with quotes, well formed and validated.

TECHNOLOGIES BUILT ON TOP OF XML

A notation for trees is not enough! The real power of XML is generic languages and tools supporting:

XML-BASED STANDARDS AND MARKUP LANGUAGES

Customized Markup Languages include:


How can Java and XML work together?

COMPLEMENTARY BENEFITS OF JAVA AND XML

We want software components that:

Benefits of Java:

Benefits of XML:

Application categories include:

  1. Those requiring the web client to present different views of the data to different users.
  2. Those requiring the web client to mediate between two or more heterogeneous databases (e.g., to publish or exchange database contents).
  3. Those wishing to distribute a significant proportion of the processing load from the web server to the web client.

THREE-TIER ARCHITECTURE INVOLVING XML

[XML Three Tier Architecture ]

User services:

  1. Prepare data for human users.

Business services:

  1. Take user's request, decide that information in required and how to get it. After information is obtained, synthesize into a suitable format for user.

Data services:

  1. Host data, descriptions of data, and small programs for presenting and manipulating the data.

AGGREGATING XML DATA FROM MULTIPLE SOURCES

[XML Java Servlet ]

Points to note:

  1. JDBC is a standard SQL (Standard Query Language) database access application interface.
  2. Java servlets collect and aggregate data from suppliers, and then create a presentation-neutral XML document for distribution.

CLIENT-SIDE PROCESSING OF WEB SEARCH RESULTS

[XML Client Side Computing ]

Client-side computing:

  1. Reduces the number of iterations needed for a precise search.
  2. Reduces network traffic by moving some of the required processing from the server to client-side computers.

GETTING STARTED WITH JAVA AND XML

In this section we introduce document type definitions (DTDs) and describe the mechanics of the SAX (Simple API for XML) and DOM (document object model) parsers.

WRITING AND RUNNING AN XML-BASED APPLICATION

The adjacent figure shows the pathway of development and execution for an xml-based application.

[XML DOM Tree ]

It is convenient to partition this pathway into two phases: (1) Definition of the XML Document, and (2) XML document processing.

Phase 1: Definition of the XML Document.

XML documents are defined in two parts: (1) a document type definition (DTD), and (2) the XML markup itself.

  1. Document Type Definitions (DTDs). A document type definition (DTD) provides applications with advance notice of what names and structures can be used in a particular document type (i.e., it is a formal description in XML declaration syntax of a particular type of document. A DTD sets out what names are to be used for the different types of element, where they may occur, and how they all fit together.

  2. XML Markup. The XML markup is composed to pairs of tags, elements and attributes. XML documents must be well-formed. Start and end tags must match. Element tags must be properly nested.

Most often these parts will be defined in separte files (as shown in the figure above), but it is possible to embed the DTD in the same file as the XML markup. An example of the latter is shown below.

Phase 2: Processing an XML Document Processing.

XML documents are processed by a combination of: (1) parser and (2) application programming interface (API) software.

  1. A parser reads the XML document as plain text. The API offers programmers a set of functionality that they can call from a program to request information from the parser as it processes the document. The two most common APIs are the document object model (DOM) and the Simple API for XML (SAX).

    Some pasers are able to check (or validate) that an instance of an XML document against the data type definition (DTD) that is used to describe the vocabulary, and to check whether the actual markup conforms to the rules of the markup language.

  2. Processing Instructions. P876; X6.

Note. Using a DTD when editing files means you can be certain that all documents which belong to a particular type will be constructed and named in a consistent and conformant manner. DTDs are less important for processing documents already known to be well-formed, but they are still needed if you want to take advantage of XML's special attribute types like the built-in ID/IDREF cross-reference mechanism.

WHAT IS SAX? AND WHY USE IT?

You should think of the Simple API for XML (SAX) as a "serial access" protocol for XML. This is the fast-to-execute mechanism you would use to read and write XML data in a server, for example. This is also called an event-driven protocol, because the technique is to register your handler with a SAX parser, after which the parser invokes your callback methods whenever it sees a new XML tag (or encounters an error, or wants to tell you anything else).

[SAX Processing]

Figure : Processing a SAX Document

The basic outline of the SAX parsing APIs are shown at right. To start the process, an instance of the SAXParserFactory classed is used to generate an instance of the parser.

The parser wraps a SAXReader object. When the parser's parse() method is invoked, the reader invokes one of several callback methods implemented in the application. Those methods are defined by the interfaces ContentHandler, ErrorHandler, DTDHandler, and EntityResolver.

Here is a summary of the key SAX APIs:

  1. SAXParserFactory. A SAXParserFactory object creates an instance of the parser determined by the system property, javax.xml.parsers.SAXParserFactory.

  2. SAXParser. The SAXParser interface defines several kinds of parse() methods. In general, you pass an XML data source and a DefaultHandler object to the parser, which processes the XML and invokes the appropriate methods in the handler object.

  3. SAXReader. The SAXParser wraps a SAXReader. Typically, you don't care about that, but every once in a while you need to get hold of it using SAXParser's getXMLReader(), so you can configure it. It is the SAXReader which carries on the conversation with the SAX event handlers you define.

  4. DefaultHandler. Not shown in the diagram, a DefaultHandler implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces (with null methods), so you can override only the ones you're interested in.

  5. ContentHandler. Methods like startDocument, endDocument, startElement, and endElement are invoked when an XML tag is recognized. This interface also defines methods characters and processingInstruction, which are invoked when the parser encounters the text in an XML element or an inline processing instruction, respectively.

  6. ErrorHandler. Methods error, fatalError, and warning are invoked in response to various parsing errors. The default error handler throws an exception for fatal errors and ignores other errors (including validation errors). That's one reason you need to know something about the SAX parser, even if you are using the DOM. Sometimes, the application may be able to recover from a validation error. Other times, it may need to generate an exception. To ensure the correct handling, you'll need to supply your own error handler to the parser.

  7. DTDHandler. Defines methods you will generally never be called upon to use. Used when pro

  8. EntityResolver. The resolveEntity method is invoked when the parser must identify data identified by a URI. In most cases, a URI is simply a URL, which specifies the location of a document, but in some cases the document may be identified by a URN -- a public identifier, or name, that is unique in the web space. he public identifier may be specified in addition to the URL. The EntityResolver can then use the public identifier instead of the URL to find the document, for example to access a local copy of the document if one exists.

Note. SAX2 is a new version of the popular Simple API for XML, incorporating support for namespaces, for filter chains, and for querying and setting features and properties in the parser. If you're looking for SAX1, the original version released in May 1998, it is still available at http://www.megginson.com/SAX/SAX1/ .

SAX Parsers.

Insert material soon ..... see Williams et al., pg 242.

Overview of Event-Driven XML Processing

Processing XML Documents Using SAX

WHAT IS DOM? AND WHY USE IT?

The document object model (DOM) protocol converts an XML document into a collection of objects in your program. You can then manipulate the object model in any way that makes sense. This mechanism is also known as the "random access" protocol, because you can visit any part of the data at any time. You can then modify the data, remove it, or insert new data.

Processing a DOM document ......

[DOM Processing]

Figure : Processing a DOM Document

You use the javax.xml.parsers.DocumentBuilderFactory class to get a DocumentBuilder instance, and use that to produce a Document (a DOM) that conforms to the DOM specification. The builder you get, in fact, is determined by the System property, javax.xml.parsers.DocumentBuilderFactory, which selects the factory implementation that is used to produce the builder. (The platform's default value can be overridden from the command line.)

You can also use the DocumentBuilder newDocument() method to create an empty Document that implements the org.w3c.dom.Document interface. Alternatively, you can use one of the builder's parse methods to create a Document from existing XML data. The result is a DOM tree like that shown in the diagram.

Note. Although they are called objects, the entries in the DOM tree are actually fairly low-level data structures. For example, under every element node (which corresponds to an XML element) there is a text node which contains the name of the element tag! This issue will be explored at length in the DOM section of the tutorial, but users who are expecting objects are usually surprised to find that invoking the text() method on an element object returns nothing! For a truly object-oriented tree, see the JDOM API.

Thinking in Trees. So far in our discussion we have considered XML documents to be flat textual files. However, in order to understand how XML documents fit together we really need to think of XML documents and their contents as a labeled tree containing:

Mapping DOM Interfaces to Java

Manipulating DOM Trees

DOM Parsers.

Insert material soon .....


Examples: Validating XML files with SAX and DOM Technology

EXAMPLE 1: VALIDATING XML INPUT WITH DOM TECHNOLOGY

This section is adapted from Deitel and Deitel, Section 8-14.


simpleContact.xml. Consider the xml file:
    <?xml version = "1.0"?>
    <!-- Fig 8.16 : simpleContact.xml    -->
    <!-- Input file for traverseDOM.java -->

    <!DOCTYPE contacts [
       <!ELEMENT contacts ( contact+ )>
       <!ELEMENT contact ( FirstName, LastName )>
       <!ATTLIST contact gender ( M | F ) "M">
       <!ELEMENT FirstName ( #PCDATA )>
       <!ELEMENT LastName ( #PCDATA )>
    ]>

    <!-- And here is the XML content -->

    <contacts>
       <contact gender = "M">
         <FirstName> John   </FirstName>
         <LastName>  Lennon </LastName>
       </contact>
       <contact gender = "F">
         <FirstName> Yoko  </FirstName>
         <LastName>  Ono   </LastName>
       </contact>
    </contacts>

The program below uses DOM technology to read and validate the xml file. The command:

    prompt >> java TraverseDOM simpleContact.xml

generates the output:

    Document node: #document
    Root element: contacts

    Element node: contacts

    Element node: contact
            Attribute: gender ; Value = M

    Element node: FirstName
            Text:  John   

    Element node: LastName
            Text:   Lennon 

    Element node: contact
            Attribute: gender ; Value = F

    Element node: FirstName
            Text:  Yoko  

    Element node: LastName
            Text:   Ono   

Points to note are as follows:

  1. The XML Declaration. The XML declaration (which, in fact, is optional):
        <?xml version = "1.0"?>
    

    tells a parser that the document contains XML content. The declaration needs to appear right at the beginning of the document.

  2. Comments in XML Files. XML comments look just like HTML comments:
        <!-- Fig 8.16 : simpleContact.xml    -->
        <!-- Input file for traverseDOM.java -->
    

  3. Document Type Definition (DTD). Usually, the document type defintion is specified in a file that is seperate from the XML document. However, in this simple example we lump the DTD and XML markup into a single file called "simpleContact.xml."

    The line of code:

           <!ELEMENT contacts ( contact+ )>
    

    defines "contacts" as a list containing one or more "contact" items. Each contact:

           <!ELEMENT contact ( FirstName, LastName )>
    

    will contain "FirstName" and "LastName" elements, and have one argument:

           <!ATTLIST contact gender ( M | F ) "M">
    

    "gender" which can be set to "M" or "F." The "FirstName" and "LastName" arguments

           <!ELEMENT FirstName ( #PCDATA )>
           <!ELEMENT LastName ( #PCDATA )>
    

    contain just plain text (Parsed Character Data or PCDATA). Validating parsers read the DTD before they read your document so that they can identify where every element type ought to come and how each relates to the other, so that applications which need to know this in advance (most editors, search engines, navigators, databases) can set themselves up correctly.

  4. Tags. Tags describe the role of items in your XML documents, and are written as angled brackets (known as delimiters) with a "tag name" between them. For example, <p> is a tag indicating the beginning of a paragraph.

  5. Elements. Elements refer to to tags plus their content.

    Tags tell a program (usually an application program or web browser) to do something to whatever is contained within the tags. Elements may be used in one of two ways:

    1. As shown in the graphic above, elements may be composed of a start tag, followed by content and an end tag. End tags always start with a forward slash, followed by the name of the matching start-ing tag.

    2. As an empty element. These are used where there is no content between a start and end tag, instead a single tag is used with a forward slash before the closing bracket. For example:
          <?xml version=1.0 ?>
          <SampleDoc />
      

    Note -- Unlike HTML, XML is case sensitive.

    Attributes. Any tag can have an attribute as long as it is defined. Attributes take the form of name/value pairs and take the form:

        < tagname attribute = "value" >
    

    In HTML 4.0, for example, the <body> tag can take the following attributes:

        CLASS ID    DIR       LANG     STYLE   TITLE
        BACKGROUND  BGCOLOR   ALINK    LINK    VLINK   TEXT
    

    An example is:

        <body bgcolor="red">
    
              ..... content of page removed .....
    
        </body>
    

Program Source Code

The program source code is partitioned into two files: TraverseDOM.java and MyErrorHandler.java.


TraverseDOM.java. The source code is as follows:
/*
 *  =============================================================
 *  TraverseDOM.java: Traverses DOM and prints various nodes.
 * 
 *  Developed by Dietel and Deitel, Prentice-Hall, 2001 
 *  See Chapter 8. 
 * 
 *  Modified for JAXP compatibility                   March, 2002
 *  =============================================================
 */

import java.io.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import javax.xml.parsers.*;

public class TraverseDOM {

   private Document document;   
   
   public TraverseDOM( String file ) {

      try {

         // Obtain the default parser

         DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
         factory.setValidating( true );
         DocumentBuilder builder = factory.newDocumentBuilder();

         // Set error handler for validation errors

         builder.setErrorHandler( new MyErrorHandler() );

         // Obtain document object from XML document

         document = builder.parse( new File( file ) );
         processNode( document );

      } 

      // Catch parsing errors .......

      catch ( SAXParseException spe ) {
         System.err.println( 
            "Parse error: " + spe.getMessage() );
         System.exit( 1 );
      }

      catch ( SAXException se ) {
         se.printStackTrace();         
      }

      catch ( FileNotFoundException fne ) {
         System.err.println( "File \'" 
            + file + "\' not found. " );
         System.exit( 1 );
      }

      catch ( Exception e ) {
         e.printStackTrace();
      }
   }

   // Method to process document, element, cdata and text nodes.

   public void processNode( Node currentNode ) {

      switch ( currentNode.getNodeType() ) {

         // Process a Document node

         case Node.DOCUMENT_NODE:
              Document doc = ( Document ) currentNode;

              System.out.println( 
                 "Document node: " + doc.getNodeName() +
                 "\nRoot element: " +
                 doc.getDocumentElement().getNodeName() );

              processChildNodes( doc.getChildNodes() );
              break;

         // Process an Element node

         case Node.ELEMENT_NODE:   
              System.out.println( "\nElement node: " + currentNode.getNodeName() );

              NamedNodeMap attributeNodes = currentNode.getAttributes();

              for ( int i = 0; i < attributeNodes.getLength(); i++){
                 Attr attribute = ( Attr ) attributeNodes.item( i );

                 System.out.println( "\tAttribute: " + 
                    attribute.getNodeName() + " ; Value = " +
                    attribute.getNodeValue() );
              }

              processChildNodes( currentNode.getChildNodes() );
              break;

         // Process a text node and a CDATA section

         case Node.CDATA_SECTION_NODE:
         case Node.TEXT_NODE: 
              Text text = ( Text ) currentNode;

              if ( !text.getNodeValue().trim().equals( "" ) )
                  System.out.println( "\tText: " + text.getNodeValue() );
              break;
      }
   }

   // Process child nodes .....

   public void processChildNodes( NodeList children ) {

      if ( children.getLength() != 0 ) 
         for ( int i = 0; i < children.getLength(); i++)
            processNode( children.item( i ) );

   }

   // Main method for traversing the printing the DOM tree ....

   public static void main( String args[] ) {

      if ( args.length < 1 ) {
         System.err.println( 
            "Usage: java TraverseDOM " );
         System.exit( 1 );
      }

      TraverseDOM traverseDOM = new TraverseDOM( args[ 0 ] );    
   }
}


TraverseDOM traverses and prints nodes in the DOM. Points to note are as follows:

  1. The fragment of code:
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setValidating( true );
        DocumentBuilder builder = factory.newDocumentBuilder();
    

    obtains the default parser. The first line creates and assigns a "DocumentBuilderFactory" object to the reference variable "factory". The class "DocumentBuilderFactory" obtains an instance of a DOM parser, the default JAXP parser in this case. The second line indicates that a validating parser is being used. Line three creates a "DocumentBuilder" object -- this object provides an interface for loading and parsing XML documents.

  2. The line of code:
        builder.setErrorHandler( new MyErrorHandler() );
    

    sets the error handler for validation errors (i.e., handling exceptions related to parsing the XML document).

  3. Obtain document object from XML document ....
        document = builder.parse( new File( file ) );
        processNode( document );
    

  4. The abbreviated method:
       public void processNode( Node currentNode ) {
    
          switch ( currentNode.getNodeType() ) {
    
             // Process a Document node
    
             case Node.DOCUMENT_NODE:
                  .... code removed .....
                  break;
    
             // Process an Element node
    
             case Node.ELEMENT_NODE:   
                  .... code removed .....
                  break;
    
             // Process a text node and a CDATA section
    
             case Node.CDATA_SECTION_NODE:
             case Node.TEXT_NODE: 
                  .... code removed .....
                  break;
          }
       }
    
       // Process child nodes .....
    
       public void processChildNodes( NodeList children ) {
    
          if ( children.getLength() != 0 ) 
             for ( int i = 0; i < children.getLength(); i++)
                processNode( children.item( i ) );
    
       }
    

    takes one "Node" argument and outputs information about that node to the child elements. The handling of current node types is handled by the switch statement. Document nodes are handled by the case:

             case Node.DOCUMENT_NODE:
    

    This case outputs the document node and processes its child nodes by calling "processChildNodes." Element nodes are handled by the case:

             case Node.ELEMENT_NODE:   
    

    After the elements attributes have been printed, this case processes the child through calls to "processChildNodes." The pair of cases:

             case Node.CDATA_SECTION_NODE:
             case Node.TEXT_NODE: 
    

    are matched by the CDATA section and text nodes.


MyErrorHandler.java. The details of MyErrorHandler.java are:
/*
 *  =============================================================
 *  MyErrorHandler.java: Error Handler for validation errors.
 *  
 *  Developed by Dietel and Deitel, Prentice-Hall, 2001 
 *  See Chapter 8. 
 * 
 *  Modified for JAXP compatibility                   March, 2002
 *  =============================================================
 */

import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

public class MyErrorHandler implements ErrorHandler { 

   // Throw SAXException for fatal errors

   public void fatalError( SAXParseException exception ) throws SAXException {
      throw exception;
   }

   public void error( SAXParseException e ) throws SAXParseException {
      throw e;
   }

   // Print any warnings 

   public void warning( SAXParseException err ) throws SAXParseException {
      System.err.println( "Warning: " + err.getMessage() );
   }

}

Points to note are as follows:

  1. .....
  2. .....

EXAMPLE 2: VALIDATING XML INPUT WITH SAX TECHNOLOGY

Insert material soon .....


XML Schema

LIMITATIONS OF DTD's

A DTD is for specifying the structure (only) of an XML file -- it gives the names of the elements, attributes, and entities that can be used, and how they fit together. Because DTDs were designed for use with traditional text documents, they have no mechanism for defining the content of elements in terms of data types, because XML has no data types: text is just text. A DTD therefore cannot be used to specify numeric ranges or to define limitations or checks on the text content, only on the markup that surrounds it.

Secondly, they felt that traditional DTD notation didn't allow document type designers the power to impose enough constraints on the data--for example, the ability to say that a certain element type must always have a positive integer value, that it may not be empty, or that it must be one of a list of possible choices. This eases the development of software using that data because the developer has less error-checking code to write.

XML SCHEMA

The XML Schema recommendation provides a means of specifying element content in terms of data types, so that document type designers can provide criteria for validating the content of elements as well as the markup itself. Schemas are written as XML files, thus avoiding the need for processing software to be able to read XML Declaration Syntax, which is different from XML Instance Syntax.

XML PARSERS WITH SCHEMA SUPPORT

Some Notes.

  1. Schemas are now a formal Recommendation, and a number of sites are serving useful applications. See, for example http://www.schema.net . There is a separate Schema FAQ at http://www.schemavalid.com . The term "vocabulary" is sometimes used to refer to `DTDs and Schemas' together.


Application Programs based on Java and XML

EXAMPLE 3: A SIMPLE PLOT2D PROGRAM

This example demonstrates the use of SAX technology in a 2D barchart plotting program implemented as an applet. The DTD and XML markup files are downloaded from a server to the client machine.

Run the Applet.

Source code.

This program is defined in three files: TestXmlPanel.java , barchart.dtd and barchart.xml . The program files before and after compilation are as follows:

    ==============================================================
    Before Compilation                           After Compilation
    ==============================================================
    TestXmlPanel.java                            TestXmlPanel.java 
    barchart.xml                                      barchart.xml
    barchart.dtd                                      barchart.dtd
                                                      jpanel.class
                                                TestXmlPanel.class
                                          WindowEventHandler.class
                                                    parsexml.class  
    ==============================================================

barchart.dtd. Details of barchart.dtd:

    <?xml encoding="UTF-8"?>
    <!ELEMENT barchart  (rectangle+) > 

    <!ELEMENT rectangle  (titleofbar,width,height,color) > 

    <!ELEMENT titleofbar (#PCDATA)> 
    <!ELEMENT width  (#PCDATA)> 
    <!ELEMENT height (#PCDATA)> 
    <!ELEMENT color  (#PCDATA)> 

The second line states that a barchart is composed of one or more "rectangle" elements. Each "rectangle" element can contain "titleofbar", "width", "height", and "color" fields containing just plain text (Parsed Character Data or PCDATA).

barchart.xml. Details of barchart.xml:

    <?xml version="1.0"?> 
    <!DOCTYPE barchart SYSTEM "barchart.dtd">

    <barchart>
   
      <rectangle>
        <titleofbar>  Bar1  </titleofbar> 
        <width>         20  </width> 
        <height>       100  </height> 
        <color>        Red  </color>  
      </rectangle>  

      <rectangle>
        <titleofbar>  Bar2  </titleofbar> 
        <width>         20  </width> 
        <height>       150  </height> 
        <color>       Blue  </color>  
      </rectangle>  
    
      <rectangle>
        <titleofbar>  Bar3  </titleofbar> 
        <width>         20  </width> 
        <height>       175  </height> 
        <color>     Yellow  </color>  
      </rectangle>  
   
      <rectangle>
        <titleofbar>  Bar4  </titleofbar> 
        <width>         20  </width> 
        <height>       125  </height> 
        <color>      Green  </color>  
      </rectangle>  
 </barchart> 

EXAMPLE 4: INTERACTIVE TRAVEL PLANNER (FROM DEITEL ET AL.)

In this example (adapted from Deitel and Deitel, 2001) we use DOM technology for the implementation of an interactive travel planner. This program is defined in five files:

  1. planner.dtd . Document type defintion file for the day planner application.
  2. planner.xml . XML markup file for the day planner application.
  3. DayPlanner.java . Defines GUI interface for the day planner application.
  4. DOMPlanner.java . Uses DOM to build tree of planning activities. Also contains code for querying and searching the DOM tree.
  5. MyErrorHandler.java . Error handling for validation errors.

The program files before and after compilation are as follows:

    ==============================================================
    Before Compilation                           After Compilation
    ==============================================================
    DayPlanner.java                                DayPlanner.java 
    DOMPlanner.java                                DOMPlanner.java
    MyErrorHandler.java                        MyErrorHandler.java 
    planner.xml                                        planner.xml
    planner.dtd                                        planner.dtd
                                                  DayPlanner.class   
                                                DayPlanner$1.class  
                                                  DOMPlanner.class 
                                              MyErrorHandler.class 
    ==============================================================

Planner.dtd. Details of planner.dtd:

    <!-- Fig. 6.14: planner.dtd -->
    <!-- DTD for day planner    -->

    <!ELEMENT planner ( year* )>

    <!ELEMENT year ( date+ )>
    <!ATTLIST year value CDATA #REQUIRED>

    <!ELEMENT date ( note+ )>
    <!ATTLIST date month CDATA #REQUIRED>
    <!ATTLIST date day CDATA #REQUIRED>

    <!ELEMENT note ( #PCDATA )>
    <!ATTLIST note time CDATA #IMPLIED>

Planner.xml. A sample travel planner XML document is as follows:

    <?xml version = "1.0"?>
    <!-- Travel Planner XML document -->
    <!DOCTYPE planner SYSTEM "planner.dtd">

    <planner>
        <year value = "2001">
           <date month = "12" day = "22">
              <note time = "0630">Go to airport             </note>
              <note time = "0930">Catch flight to Vancouver </note>
           </date>
        </year>

        <year value = "2002">

           <date month = "1" day = "3">
              <note time = "1430">Catch flight to Washington DC </note>
           </date>

           <date month = "1" day = "8">
              <note time = "0830">Leave Washington DC for New Zealand </note>
           </date>

           <date month = "1" day = "10">
              <note time = "0800">Arrive in Auckland, New Zealand   </note>
              <note time = "1130">Arrive in Queenstown, New Zealand </note>
           </date>

           <date month = "1" day = "22">
              <note time = "1700">Leave Queenstown for Washington DC </note>
              <note time = "1200">Meet Jim at LAX for a drink        </note>
              <note time = "2000">Arrive back in Washington DC       </note>
           </date>

           <date month = "3" day = "22">
              <note time = "2030"> Leave Washington DC for Sweden </note>
           </date>

           <date month = "3" day = "30">
              <note time = "0800"> Leave Copenhagen for Washington DC </note>
           </date>
        </year>
    </planner>

EXAMPLE 5: USING SCALABLE VECTOR GRAPHICS (SVG)

[Instruments] Motivation for SVG. Major problems/challenges with graphics on the Web:

  1. Large download times.
  2. Not scaleable.
  3. No Separation of style and content.
  4. No hyperlinking.
  5. No or few metadata.
  6. No or limited animation.

SVG is an XML based graphics specification designed to handle those issues. SVG or Scalable Vector Graphics is an open standard language proposed by the W3C for describing two-dimensional graphics in XML. The official Scalable Vector Graphics (SVG) 1.0 Specification is available from: http://www.w3.org/TR/SVG/ .

The W3C candidate recommendation documentation for SVG describes SVG as capable of rendering vector graphic shapes, images and text. Graphical objects can be grouped, styled, transformed and composited into previously rendered objects. The feature set includes nested transformations, clipping paths, alpha masks, filter effects and template objects.

The full power of SVG graphics lies in its ability to be interactive and dynamic. Animation effects can be defined and triggered via scripting. Being written in XML makes it is possible to generate nice charts and graphs on the fly from data pulled off an XML repository.

Characteristics of SVG.

The key characteristics of SVG are as follows (Cheong M.S., 2001):

  1. Graphical Objects. SVG supports three types of graphics elements - shapes, text and raster images. The basic shapes in SVG are rectangles, circles, elipses, lines, polylines (connecting straight line segments forming open shapes), and polygons. These shapes are constructed using paths and can be filled with paint and stroked. Each fill and stroke operation has adjustable opacity settings. The types of supported paint are solid colours, linear and radial gradients and patterns. This means you can draw a circle with a black outline filled with a pattern of your choice all by just using text descriptions without the need of any graphic files. [SVG Image1]

    Example. Drawing an object in SVG comes down to issuing a series of instructions like draw a red rectangle with a blue outline that is 10cm in width and 5cm in height. The codes will look something like this:

        < rect  x =   "30"
                   y =   "30"
               width = "10cm"
              height =  "5cm"
                fill =  "red"
              stroke = "blue" />
    

    The positioning of SVG objects on a page can be controlled precisely using Cascading Style Sheets (CSS). Positioning in this case is controlled mainly by -x and -y coordinates. Other styling properties can be specified using CSS or XSL (eXtensible Stylesheet Language). Layering graphics is also possible with SVG's z-index feature, similar to that offered in CSS.

    If several drawing elements share similar attributes, they can be grouped together using the group <g> element. For example:

        <g transform="translate(255, 30)" id="clock">
        <circle style="fill: #ffffcc; stroke: black;" r="20" cy="20" cx="20"/>
        <line style="stroke: black;"
                 y2="20" x2="38" y1="20" x1="20"
                 transform="rotate(186, 20, 20)"/>
        <line style="stroke: black;"
                 y2="20" x2="33" y1="20" x1="20"
                 transform="rotate(53, 20, 20)"/>
        </g>
    

    As illustrated in this fragment of code, individual objects or groups of drawings can be labeled with id tags. Later they can be manipulated later using scripts to produce animation.

    It is still possible to embed raster images even though SVG is a vector format. The <image> element allows you to embed a bitmap image like PNG or JPEG similar to the way in HTML.

  2. Raster Effects and Other Tricks. Say you have a circle that you want to add a shadow and apply some lighting effects to. Normally, to do this, you would have to apply the filter effects in a paint program like Photoshop, and then export the image into a web-compliant format like GIF or JPEG. With SVG, it is possible to declare the specification of these filters in your codes directly without using any paint program. These are specified in such a way that the graphics are still scalable and displayable at different resolutions. [ENCE 489C logo]

    Just to give you an idea, some examples of "Photoshop-like" filters declarable using SVG are lighting effects, Gaussian blurring, blending, displacement mapping and many more. It is also possible to adjust the hue, saturation and luminance of drawings. The masking feature (similar to that found in Photoshop) allows you to composite an object into the background. You have seen what different combination of effects adjustment can produce in a paint program; just imagine the possibilities available to you here using SVG.

    Since SVG uses a coordinate system to position elements on a page you can transform the elements by rotating, skewing or scaling. It is simply a matter of transforming the coordinates of the current element into a new one.

    Another of SVG's advantage is its palette of 16 million colours with support for colour profiles. This will make for more consistent rendering of colour across different viewing platforms and output devices. What's even better, these graphics can be printed off without loss of colour or detail, nor displaying jagged edges due to dithering.

  3. Text and Fonts. SVG certainly offers web designers more control over the layout and rendering of text on web pages. I am sure many web designers share the frustration of not being able to render text in specific font faces without having to preserve them as image files. Limitations and inconsistencies in web standards today have made rendering text in particular fonts a tricky task for web designers. SVG will prove to be the answer to all this frustration as it will be possible to embed fonts directly into a page.

    Styling functions available to graphical objects are available to text as well, which means you can add shadow effects to a string of text or even gradate the text.

    Another great feature of text rendition in SVG is that text strings can be rendered in a single line or along the outline of a path. You can chart the path of a curve and have the baseline of the text string follow that path.

    SVG will improve the issue of accessibility in web design especially to the visually impaired due to the fact that SVG is a text-based format. Users who rely on screen readers and speech synthesizers will be able to read text within an SVG image. You can also include explanations for text elements whose meaning is not clear from their text content. Text within SVG images can also be indexed by search engines or searched within a browser by a user, features not available with Flash.

  4. Support for Scripting and Animation. SVG's full potential can be exploited with the help of scripting languages like Javascript, Perl or Java. Graphics can be generated on the fly by having a script query a database to present high quality charts and graphs that update dynamically.

    SVG's support for scripting also makes web content more interactive. Using scripts, you can specify events to be triggered with certain actions, indicate zooming and panning options and even specify different cursors to use in different circumstances.

    SVG also has full support for the Document Object Model (DOM), which is a platform and language neutral interface that programs and scripts can use to access and update HTML, XHTML and XML documents. Besides using SVG DOM, animation effects can be achieved using SVG's animation elements which describes time based modifications to the document's elements. This will let us achieve animations akin to that in Flash intros.

Complete Examples

We now use the Batik SVG Toolkit to present a handful of examples demonstrating capabilities of SVG. They are adapted from Eisenberg J.D., SVG Essentials, O'Reilly, 2002.

  1. Simple Grid . Create a simple grid with assembled boxes of blue and black squares. (adapted from ).
  2. ENCE 489C Logo . (adapted from ).
  3. Instruments . Create an array of instrument graphics. (adapted from Chapter 12, Eisenberg J.D., SVG Essentials, O'Reilly, 2002 ).

Viewing Support. Currently, a plug-in is needed in order for a browser to render SVG graphics (i.e., there is not yet any browser which fully supports SVG graphics). There are already Netscape and IE compatible SVG viewers available for both Windows and Macintosh platforms.

The following are sites where SVG viewers can be downloaded for free: Adobe SVG Viewer ( www.adobe.com/svg ); Batik ( xml.apache.org/batik ); IBM SVG Viewer (www.alphaworks.ibm.com/tech/svgview).

SVG Editors. Current SVG editors include Adobe Illustrator 9 and 10, Adobe LiveMotion, Beatwave E-Picture Pro, CorelDraw!, ILOG JViews, Jasc Trajectory Pro, Mayura Draw, Quark Express and Sphinx Open.

SVG Conversion Tools. There are already tools available to convert different formats into SVG. The Graphics2D SVG Generator from Apache's Batik project is one example which allows all Java applications to export their graphics in SVG format. These graphics can then be imported, viewed, and modified in different environments.

EXAMPLE 6: READING SCIENTIFIC DATA ...

Try to use something from NASA Goddard XML site .... insert material soon ..... (i.e., April 2002)

EXAMPLE 7: HIERARCHAL GRAPH MODEL (ADAPTED FROM NATASHA KOSITSYNA)

Create simple hierarchal graph models ...

Graph.dtd.



Graph.xml.




XML and Databases

WHY XML AND DATABASES

Communicating Database Structures Using XML

Exchanging XML Documents in Distributed Applications

XML DESIGN FOR DATA

MAPPING BETWEEN RDBMS AND XML STRUCTURES

MORE COMPLEX RELATIONSHIPS

DATABASE STRUCTURES FOR EXISTING XML


XML and the Semantic Web

LIMITATIONS OF XML

THE SEMANTIC WEB LAYER CAKE


Summary

WHERE CAN I FIND AN XML BROWSER?

Because the XML specification is still relatively new, a lot of what you see now is experimental, and because the potential number of different XML applications is unlimited, no single browser can be expected to handle 100% of everything.

Some of the generic parts of XML (eg parsing, tree management, searching, formatting, etc) are being combined into general-purpose libraries or toolkits to make it easier for developers to take a consistent line when writing XML applications. Such applications can then be customized by adding semantics for specific markets, or using languages like Java to develop plugins for generic browsers and have the specialist modules delivered transparently over the Web.

  1. MSIE5.5 handles XML but currently still renders it via the HTML model. Microsoft were also the architects of a hybrid (invalid) solution (islands) in which you could embed fragments of XML in HTML files because current HTML-only browsers simply ignored element markup which they didn't recognize, but his has now been superseded by XHTML.

  2. The publicly-released Netscape code (Mozilla) and the almost indistinguishable Netscape 6 (there is no v5) have XML/CSS support, based on James Clark's expat XML parser, and this seems to be more robust, if less slick, than MSIE.

  3. Opera now supports XML and CSS on MS-Windows and Linux and is the most complete implementation so far. The browser size is tiny by comparison with the others, but features are good and the speed is excellent, although the earlier slavish insistence on mimicking everything Netscape did, especially the bugs, still shows through in places.

CHOOSING BETWEEN SAX AND DOM

Best uses of SAX.

SAX is the best choice of parser:

Best uses of DOM.

DOM is the best choice of parser:


References and Web Resources

REFERENCES

  1. Ahmed K. et al., Professional Java XML , Wrox Press, 2001.
  2. Akif M. et al., Java XML : Programmer's Reference , Wrox Press, 2001.
  3. Berners-Lee T., Hendler J., Lassila O. The Semantic Web , Scientific American, May 2001.
  4. Berners-Lee T., Business Model for the Semantic Web , October, 2001.
  5. Deitel and Deitel, XML : How to Program (2nd Edition), Prentice Hall, 2002. ISBN: 0-13-067396-X. (This book is scheduled for publication, February, 2002).
  6. Frauenfelder M., A Smarter Web , MIT Technology Review, November, 2001.
  7. Williams K. et al., "Professional XML Databases," Wrox Press, 2000.

WEB RESOURCES

  1. Scientific American Article : XML and the 2nd-Generation Web
  2. XML, Java, and the Future of the Web
  3. XML at the World Wide Web Consortium (W3C)
  4. XML at SUN Microsystems
  5. XML at Microsoft
  6. XML at IBM
  7. Java and XML Tutorial from SUN Microsystems
  8. Java Technology and XML : FAQ
  9. Peter Flynn's XML FAQ
  10. XML.com : Integration by Parts XSLT, XML and SVG
  11. XML Tutorial.
  12. XML Schema
  13. XML Stylesheets
  14. What is the Document Object Model (DOM)?
  15. XML Metadata Interchange (XMI)
  16. Astronomical Data Center (ADC) XML Resource Page . This page contains a summary of XML resources and tools currently under development at NASA Goddard Space Flight Center.

XML-BASED STANDARDS AND MARKUP LANGUAGES

  1. DrawML Specification
  2. Mathematical Markup Language (MathML) 1.0 Specification
  3. Vector Markup Language (VML) (has now been superseded by SVG).
  4. Scalable Vector Graphics (SVG) 1.0 Specification
  5. Eisenberg J.D., SVG Essentials , O'Reilly, 2002.
  6. Synchronized Multimedia Integration Language (SMIL) 1.0 Specification
  7. XDF : eXtensible Data Format . XDF is a common scientific data format based on general mathematical principles, object models, and XML that can be used throughout the scientific disciplines. It includes these key features: (a) Hierarchical data structures, (b) any dimensional arrays merged with coordinate information, and (c) high dimensional tables. These elements are merged with field information, variable resolution, easy wrapping of existing data, user specified coordinate systems, and searchable ASCII metadata. Other space-related XML development links can be found at the CCSDS workshop page .
  8. Markup language for comand & control and data interoperability for end-to-end misison operations .
  9. The Earth Science Markup Language.
  10. Cheong M.S., An Introduction to SVG See; http://tutorials.freeskills.com/read/id/347, 30th August 2001.

XML-BASED APPLICATIONS

  1. Batik 1.1 SVG Toolkit . A companion technology is the SVGGraphics2D toolkit . This toolkit: (1) Allows applications to export their graphics into SVG format; (2) Does not require any modification of the graphics code to export to SVG; and (3) Offers the user the ability to use the DOM API to manipulate the generated document.
  2. Ptplot from the Ptolemy project at Berkeley
  3. Applicability of XML to Intelligent Transportation Systems
  4. A technology comparison: Shockwave Flash (SWF) versus Scalable Vector Graphics (SVG) .

PRESENTATIONS

  1. Creating a JSVGCanvas.
  2. W3C Presentation on XML Technolgies and Applications
  3. Sun Microsystems SVG Developer

KNOWLEDGE MANAGEMENT

  1. An Introduction to Markup Languages and Ontologies .
  2. W3C's Resource Description Framework (RDF) .
  3. Resource Description Framework (RDF) for Idiots. An essential read!
  4. An Introduction to RDF . This is an article published by IBM Developer works.
  5. Thinking XML: Basic XML and RDF techniques for Knowledge Management . This is an article published by IBM Developer works.
  6. The DARPA Agent Markup Language Homepage .
  7. W3C Semantic Web Advanced Development (SWAD)
  8. Agents on the Semantic Web , IEEE Intelligent Systems, 2001.
  9. W3C Semantic Web Advanced Development (SWAD)

Developed in December 2001 by Mark Austin
Copyright © 2001-2002, Mark Austin, University of Maryland