XTRAN Example — Analyze XML Tags & Attributes
Scenario — you want to analyze the use of tags and tag attributes in XML.
XTRAN to the rescue!
XTRAN treats XML as a computer language, in which each tag, line and segment of text, or end tag is a "statement", and each tag attribute is a "statement attribute".
The following example uses an XTRAN rules file comprising 131 non-comment lines of XTRAN's rules language ("meta-code") to analyze all tags and attributes in XML.
The XML mining rules for this example can easily be enhanced
to produce DSV output that can be interactively queried using
existing XTRAN rules.
The following is an English paraphrase of the XTRAN rules used for this example:
For each XML tag occurrence Tally tag occurrence For each of tag's attributes if any Tally attribute occurrence for tag Sort tags For each XML tag seen, alphabetically Report tag tally Sort attributes for tag For each attribute seen for this tag, alphabetically Report attribute tally
How can such powerful and generalized XML analysis be automated in only 131 code lines of XTRAN rules? Because there is so much capability already available as part of XTRAN's rules language. These rules take advantage of the following functionality:
- Text file input and output
- Text manipulation
- Text formatting
- Delimited list manipulation
- Environment variable manipulation
- "Per statement" recursive iterator
- Access to XTRAN's Internal Representation (XIR)
- Navigation in XIR
- Meta-variable pointers
Process Flowchart
Here is a flowchart for this process, in which the elements are color coded:
- BLUE for XTRAN versions (runnable programs)
- ORANGE for XTRAN rules (text files)
- RED for
code - PURPLE for text data files

XML input:
<tag1><tag2/><tag3 att1="att1v" att2="att2v"> This is the first line of text, followed by a comment.<!-- comment 1 --> This is the second line of text, with no comment. </tag3> <!-- comment 2 --> This is the third line of text, preceded by a comment. <tag3>Some text in a "tag3".</tag3> <tag4 attb="attbv1" atta/> This is more nonmarkup text.<tag4 attb="attbv2"/> </tag1>
Output:
Running the rules shown above on the XML input shown above generated the
following XTRAN analysis output. Note how each
"empty" tag is indicated with a trailing /
.
XML Tag and Attribute Usage tag1 1 tag2/ 1 tag3 2 att1 1 att2 1 tag4/ 2 atta 1 attb 2