XSLT is miserable, but isn't this old news? My issue is that most real-world tra...

masklinn · on April 7, 2012

> Anyone seen any good schemes?

I've been meaning to look at HXT, I've read it had something like that, but I have not needed to transform XML in a long time so it's fallen by the wayside. On the other hand, TFAA qualifies himself of "Haskell programmer" and does not use HXT so maybe it's not that good.

An alternative I've thought about (but not implemented on grounds of having absolutely no need for it these days, as noted above) is implementing what I consider the good part of XSLT (tree transformation via template matching through XPath selectors) in Python on top of lxml. Something akin to Flask, where the app would be a group of templates, and the routing would be a sequence of XPath assertions (instead of http PATHs + methods). Along with a few helper functions or methods (to easily recurse into the rest of the tree), this ought materialize most of XSLT's strengths in a general-purpose language (making extensibility trivial), and template groups would improve modularity significantly.

chernevik · on April 7, 2012

If you ever want to push that forward, hit my email (in profile) and I'll try to help.

But I have to wonder if Python is the right tool for the problem. I get the sense that the XSLT transform engines are deployed to handle really big documents, and I wonder if a Python based tool could compete on speed with xalan or saxon.

ShabbyDoo · on April 7, 2012

Most of my work has been in Java, so I can't speak authoritatively outside of its ecosystem. When transforming XML, I've found I almost always need to write tree-walking/visitor-ish code and end-up using a tool like XMLBeans to provide a more literate interface to the source and target documents. When I say "literate", I mean that, instead of writing sourceDocument.getElement("foo").getElement("bar"), I can just say sourceDocument.getFoo().getBar(). XML schemas certainly are helpful in that they enable a bunch of tools like XMLBeans to generate language-friendly abstractions. Often, I've written my own schemas for sources or targets which only had implicitly-specified schemas (via documentation, examples, or simply observation of actual messages).

The reason I prefer to work in non-declarative code land is that I usually must inject many service references into the translators. When converting an industry standard XML format into a company's internal domain model for quoting insurance policies, I had to employ a set of heuristics to create a valid policy from a set of coverage requirements which likely were ill-specified. For example, we didn't offer a $750 auto deductible. Should this be converted (with a note attached) to $500 or $1000? This decision varied by state, policy type, etc. We had a metamodel which I injected into the transformers at the points where such decisions were made.

That the source and/or target of a transformation are XML is a red herring, though. Most of my time is not spent on XML-ness itself but on solving fundamental impedance mismatch issues when converting between two different domain models, sets of assumptions, etc. Document formats don't matter for these problems although those formats with better surrounding toolsets certainly allow one to concentrate immediately on the part of the problem which is the hardest. I even prefer talking about these problems using terms like "model conversion" instead of "document translation" -- too many marketing folks have convinced IT managers that, through the magic of their overpriced ETL tools, "document conversion" problems are a trivial drag & drop matter.

One way I've thought of architecting these model transformations is through the invention of a few intermediate model definitions, each one becoming less source-like and more target-like. I think some stages of conversion are more compatible with declarative approaches. Perhaps attempting conversion in only a single pass has led me to throw the baby out with the bathwater w.r.t. declarative schemes?

For those of you who don't do corporate IT development, the sad reality is that a huge percentage of development effort is spent on data conversion/translation between systems. The ratio of glue:substance is highly skewed toward glue. Furthermore, the ratio gets worse as short-term benefits are prioritized over long-term ones, development is silo-ed between business groups, and data modeling takes a backseat to gettin' stuff "done".

justincormack · on April 7, 2012

You could call it "xslt the good parts".

masklinn · on April 7, 2012

Pretty much.