Friday, May 20, 2011

Java API for XML Processing (JAXP)

The Java API for XML Processing (JAXP), part of the Java SE platform, supports the processing of XML documents using Document Object Model (DOM), Simple API for XML (SAX), and Extensible Stylesheet Language Transformations (XSLT). JAXP enables applications to parse, transform, validate and query XML documents using an API that is independent of a particular XML processor implementation.

JAXP also provides namespace support, which lets you resolve naming conflicts. JAXP allows the user to use any XML-compliant parser or XSL processor from within the application. JAXP provides a pluggability layer to enable vendors to provide their own implementations without introducing dependencies in application code. Using this software, application and tool developers can build fully-functional XML-enabled Java applications for e-commerce, application integration, and web publishing.

JAXP Parsers

JAXP supports Object-based and Event-based parsing. In Object-based, W3C DOM parsing is supported. In Event-based, SAX and StAX parsing is supported.


JAXP API can be divided into two main parts: a parsing API and a transform API. Implementations that support the transform API are typically XSLT processors which require an XML parser to read input documents. Because of this, these implementations typically bundle an XML parser as part of their distribution.

The factory APIs let you plug in an XML implementation offered by another vendor without changing your source code. The implementation you get depends on the setting of the javax.xml.parsers. SAXParserFactory,javax.xml.parsers.DocumentBuilderFactory, and javax.xml.transform.TransformerFactory system properties, using System.setProperties() in the code, in an Ant build script, or -DpropertyName="..." on the command line. The default values (unless overridden at runtime on the command line or in the code) point to Sun's implementation.

DOM (Document Object Model)

DOM provides a familiar tree structure of objects. You can use the DOM API to manipulate the hierarchy of application objects it encapsulates. The DOM API is ideal for interactive applications because the entire object model is present in memory, where it can be accessed and manipulated by the user. Constructing the DOM requires reading the entire XML structure and holding the object tree in memory, so it is much more CPU and memory-intensive. For this reason, the SAX API tends to be preferred for server-side applications and data filters that do not require an in-memory representation of the data.

SAX (Simple API for XML)

SAX is an event-driven, serial-access mechanism that does element-by-element processing. The API for this level reads and writes XML to a data repository or the web. For server-side and high-performance applications, SAX API is mainly used.


The XSLT APIs defined in javax.xml.transform let you write XML data to a file or convert it into other forms. You can even use it in conjunction with the SAX APIs to convert legacy data to XML.


The Java Platform, Standard Edition version 6.0 includes JAXP 1.4. JAXP 1.4 is a maintenance release of JAXP 1.3 with support for the Streaming API for XML (StAX). JAXP defines a plugability mechanism to dynamically load compliant implementations of SAX and DOM parsers using the javax.xml.parsers and javax.xml.transform APIs. In an analagous manner, the StAX APIs define plugability mechanisms which allow applications to dynamcially load compliant implementations of StAX.

The StAX API exposes methods for iterative, event-based processing of XML documents. XML documents are treated as a filtered series of events, and infoset states can be stored in a procedural fashion. Moreover, unlike SAX, the StAX API is bidirectional, enabling both reading and writing of XML documents.

JAXP Implementations

Apache has several Java parsers: Crimson, Xerces 1, and Xerces 2. The reason is historical -- because Apache accepted two donations from two different companies. IBM donated XML4J which became Apache Xerces 1. Sun donated Project X which became Apache Crimson. Xerces 2 is a new third parser which is a rewrite. It has goals such as maintainability, modularity, and the implementation of certain features, which neither of the previous original parsers has achieved. Xerces 2 was designed to fill the long-term needs of Apache projects going forward. The current version of Xerces is 2.11.

The following implementations support the transform component of JAXP and also bundle a parser:

Name Parser Implementation XSLT Processor Implementation Comment

Apache Xalan-J

Xerces 2.7 Xalan XSLT None

JAXP Reference Implementation

Xerces 2.7 XSLT

J2SE 1.4

Crimson Xalan-J XSLT, cvs tag: xalan_2_2_d10 Uses JAXP RI version later than 1.1.2

J2SE 5.0

Xerxes 2.7 XSLT Uses JAXP RI version 1.3

J2SE 6.0

Xerces 2.7 Xalan 2.6 Uses JAXP RI version 1.4

1 comment:

abhi's said...

i really like your post!!! it helps for a java developer.. Keep Posting/Sharing.