RPGLE XML Parser - XML-SAX and XML-INTO sample code

What is XML and why do we need XML in today ?
Well the very reason you are looking this post today is you came across an XML data from a third party vendor that wants to communicate with you and you need the data to parse it and put it into the relational database most probably a Db2 table. Now I have been parsing XML data for a while using Java Xerces parser from Apache. I took a peek at the RPG parser a few years back but it didn't suit me as there is too much coding involved. Well the very idea I like RPG over COBOL is the same reason why I liked Java parser over the RPG parser.

But now you don't even have to learn Op-Codes to do parsing  when using XML-SAX. I have given here an actual code that will help you parse and explain how to do it without the knowledge of parsing an XML. Before I go ahead with example code for RPGLE lets see how the inner mechanics of XML-SAX work.
-- OR --
Click here to skip this and jump directly to example code using XML-SAX.

The DOM parser available in RPG using op-code XML-INTO can do little stuff but its annoying for me to define all these data structures. Plus the DOM parser XML-INTO op-code has its limitations too. They are well documented by Scott Klement in his articles. But recently I came across a request to do parsing for a project that involved a lot of these XML data and with skill limitation of personnel  knowing Java along with iSeries Back-End forced me take a hard look again at the RPG parser. Here is when I came across another document by Scott Klement why he likes the SAX parser XML-SAX over the DOM parser XML-INTO.

Why XML-SAX?


While RPG's XML-INTO op-code makes XML parsing easy, it has its limitations. It doesn't work when there are special characters in a tag name or when the name of the XML element is stored in a variable. Or if you're still running 5.4, it can't handle character data and attributes in the same element. When you hit these limits, it's XML-SAX to the rescue! Also in addition to that it cumbersome to define the Data Structures with same layout and field names as the XML document.

Benefits of SAX Parser


  • SAX parsers have certain benefits over DOM-style parsers. The quantity of memory that a SAX parser must use in order to function is typically much smaller than that of a DOM parser. DOM parsers must have the entire tree in memory before any processing can begin, so the amount of memory used by a DOM parser depends entirely on the size of the input data. The memory footprint of a SAX parser, by contrast, is based only on the maximum depth of the XML file (the maximum depth of the XML tree) and the maximum data stored in XML attributes on a single XML element. Both of these are always smaller than the size of the parsed tree itself.
  • Because of the event-driven nature of SAX, processing documents can often be faster than DOM-style parsers. Memory allocation takes time, so the larger memory footprint of the DOM is also a performance issue.
  • Due to the nature of DOM, streamed reading from disk is impossible. Processing XML documents larger than main memory is also impossible with DOM parsers but can be done with SAX parsers. However, DOM parsers may make use of disk space as memory to sidestep this limitation.
  • Say you have to process one order per XML document DOM is good enough but for a whole batch of Orders coming in an XML document I would rather go with XML-SAX parser.

SAX (Simple API for XML) is an event-based sequential access parser API developed by the XML-DEV mailing list for XML documents. A parser that implements SAX (i.e., a SAX Parser) functions as a stream parser, with an event-driven API. The user defines a number of callback methods that will be called when events occur during parsing. The SAX events include:


  • XML Text nodes
  • XML Element nodes
  • XML Processing Instructions
  • XML Comments

Events are fired when each of these XML features are encountered, and again when the end of them is encountered. XML attributes are provided as part of the data passed to element events.SAX parsing is unidirectional; previously parsed data cannot be re-read without starting the parsing operation again.

The syntax of the XML-SAX op-code itself is simple enough. It will look something like this if you're parsing the XML from a variable in your program:

xml-sax %handler( your-handler-here : your-parameter-here )
%XML( xml string : 'doc=string');

Or it will look like this if you're parsing an XML document that's stored in a stream file in the IFS:

xml-sax %handler( your-handler-here : your-parameter-here )
%XML( ifs path name : 'doc=file');

Click here for next Chapter


Recommended Reading

No comments:

Post a Comment

NO JUNK, Please try to keep this clean and related to the topic at hand.
Comments are for users to ask questions, collaborate or improve on existing.