The best way I've encountered to construct XML: <person> <property&#...

arethuza · on June 23, 2011

If that is the worst XML that you have experienced then you have been very very lucky. I've seen stuff that:

- Has been built by string manipulation and therefore isn't well formed and needs hacky preprocessing before being parsed (no not HTML)

- Is full of redundant information (e.g. count attributes giving the number of child elements)

- Makes evil use of vast numbers of namespaces where the element names are all the same

- Is a basically a container for delimited or fixed format data

- Had attributes that contained entire encoded XML documents

<sob>

There are probably some other horrors that therapy and/or alchohol have let me forget (like systems doing SQL queries doing string compares on lumps of XML).

I really like JSON these days...

pavel_lishin · on June 23, 2011

> - Had attributes that contained entire encoded XML documents

I've got that beat. I've dealt with XML that was basically a wrapper for JSON, which contained - you know where this is going - an XML string.

It's retarded elephants all the way down.

RyanMcGreal · on June 23, 2011

This is getting downright Yorkshiremenesque.

pavel_lishin · on June 23, 2011

Had to look that one up before dispensing an upvote.

arethuza · on June 23, 2011

Notice the "encoded" - they were (if I recall correctly) base64 encoded XML documents...

Someone · on June 23, 2011

Hm. I never heard of XJsonX, but it would be a trivial logical extension of JsonX (http://publib.boulder.ibm.com/infocenter/wsdatap/v3r8m1/inde...)

pushingbits · on June 23, 2011

I've seen CDATA elements that contained complete XML documents including other CDATA elements. Good thing they had a hand rolled non-standard XML parser that allowed nested CDATA tags.

I noticed it when I wanted to read the doc in with tinyXML while working 70 hour weeks to fix up some other issue before a deadline. I ended up doing a memset(nestedXMLStart, ' ', nestedXMLLength) as a preprocessing step to bulldoze the whole construct. Not pretty, but it worked.

ynniv · on June 23, 2011

Two words: external references [1].

XML is a data representation that desperately wants to be Turing complete through syntax.

And then they call their schema definition language (because defining XML schema is practical by using a specialized language) RELAX...

You can't make this stuff up.

[1|http://books.xmlschemata.org/relaxng/relax-CHP-10-SECT-1.htm...]

iacvlvs · on June 23, 2011

I've worked with worse:

  <PERSON>
    <PERSON_HEADER_1>
      <PERSON_HEADER_1_NAME>
        <PERSON_HEADER_1_NAME_FIRST_NAME>John</PERSON_HEADER_1_NAME_FIRST_NAME>
        <PERSON_HEADER_1_NAME_LAST_NAME>Smith</PERSON_HEADER_1_NAME_LAST_NAME>
      </PERSON_HEADER_1_NAME>
      <PERSON_HEADER_1_ADDRESS>
        <PERSON_HEADER_1_ADDRESS_1>123 Some Street</PERSON_HEADER_1_ADDRESS_1>
        <PERSON_HEADER_1_ADDRESS_2>Blah</PERSON_HEADER_1_ADDRESS_2>
        <PERSON_HEADER_1_ADDRESS_3>Blah</PERSON_HEADER_1_ADDRESS_3>
        <PERSON_HEADER_1_ADDRESS_4>Blah</PERSON_HEADER_1_ADDRESS_4>
      </PERSON_HEADER_1_ADDRESS>

... and so on. I can't think of any advantage to (or excuse for) doing it this way. The only possible reason for it that I can think of, is rather comprehensive ignorance of how XML and related standards work.

kalleboo · on June 23, 2011

It looks like it was written by someone who wanted to parse their XML using regexps instead of a real parser.

MatthewPhillips · on June 23, 2011

I can top this; I once worked with an api that returned XML with 1 node that contained a string of more XML.

thwarted · on June 23, 2011

The apache module that provides the gateway to Ecometry? That thing was a bear. I can't believe they did all the processing in the apache module rather than with a CGI.

It didn't actually use regular expressions, but parsed the input with a lot of char pointer manipulation.

The reason it worked like this was to have the "XML" tags map directly to the terminal input screens run on the backend, because it essentially input data as if a human was typing it into the terminal and navigating the forms.

I keep meaning to write this up as the worst example of XML abuse I've ever seen; I'm actually surprised someone else here has used the same thing (or even more surprised that more than one party has implemented the same braindead thing).

neovive · on June 23, 2011

I have to work with MB's of this type of XML. It forces the use of special XML tools to even make is readable.

geebee · on June 23, 2011

I'm a little embarrassed to admit this, but when I first read your post, I honestly thought that you were giving an example of how simple XML can be! After all, I can quickly glance at this bit of XML and understand what it is intended to represent. This doesn't look like SOAP-style xml to me at all, it looks like simple XML over HTTP.

It gets so much worse that this. Much of the generated XML I've seen (especially the stuff used for SOAP) is nearly unreadable by humans.

RyanMcGreal · on June 23, 2011

In fairness, I pared down the example for clarity (and to protect the guilty).

geebee · on June 23, 2011

Oh, it wasn't meant as a criticism. I think my original misperception just strengthens your point. XML written as simply as possible is still more complicated than JSON.

That said, I feel about XML much the same way I feel about Java. They are typcially more complicated and verbose than the alternatives, but I don't think you can pin the horrendous level of complexity on the languages themselves. There seems to be a stronger cultural distain for complexity in the communities around other frameworks and languages, and JSON reflects this.

Everyone is (in theory) opposed to "unnecessary" complexity. The real issue is where you draw the line. For some reason, the python/django and ruby/rails communities seem more inclined to say "no" to complexity, even if it means giving something up. To turn it around, you have to get a hell of a lot of benefit to convince rails/django people to accept the overhead of additional complexity. I'm not sure why - maybe this is because these frameworks were created by people who were agast at what they saw developing in the Java world?

This seems to be the case with JSON vs XML as well. XML isn't going to be as simple as JSON, but there's no reason you can't describe data in XML in a way that is concise, reasonably, simple, and easy to read (both for people and computers).

prewett · on June 23, 2011

> but there's no reason you can't describe data in XML in a way that is concise, reasonably, simple, and easy to read (both for people and computers)

Yes, there is. Each item in XML requires 5 + len(tag) extra characters, minimally: <name>asdf</name>. Unless you make everything a property (e.g. <x name="asdf"/>), in which case you only need 5 extra characters (<x />) per parent item. JSON is minimal: you need no extra characters (or, arguably, 2 extra characters for strings with no spaces)

That's for people. It's even worse for machines, because they have to be able to parse <name>asdf</name> and <person name="asdf"/> and <person name="asdf">...</person> and then be able to verify it using a schema external to the XML in question.

What I don't understand is why people ever thought XML was a good idea. I took one look at in back in '98 when it came out (or whenever it was) and said "this isn't human readable (unless maybe you like writing your HTML by hand)" and proceeded to successfully ignore it unless I have to use it.

RyanMcGreal · on June 23, 2011

> I feel about XML much the same way I feel about Java.

Heh. The SOAP WSDL I'm dealing with was generated by a Java application, and it has the Java cultural stamp all over it.

jleader · on June 23, 2011

Heh, I'm embarrassed to admit that the protocol-buffer system I'm using at work looks kind of like that. Our most common message types contain a repeated field, each one holding a name value pair, instead of just containing a list of non-required fields

I'm embarrassed because I was the lead designer of said system. In my defense, protobufs had just been open-sourced, and it was my first time using them. I thought that name/value pairs would give me greater flexibility. We're using a dynamic language, and I though re-compiling the .proto files every time I added a new item would be a pain. It turns out that there's still a file somewhere in the system listing all the valid names, so we could have just as easily put the list of valid names into a .proto file, and avoided sending the names over the wire with every message, and saving them in every log record, etc. Our build system is sufficiently well automated that recompiling the .protos is painless and almost unnoticeable.

Oh, well, maybe someday I'll get around to redoing our protocol. Fortunately, it's purely internal, and we don't yet store long-lived data in that format, so there's still hope.

brehaut · on June 23, 2011

xmlrpc[1] is similar.

    <?xml version="1.0"?>
    <methodCall>
       <methodName>examples.getStateName</methodName>
       <params>
          <param>
             <value><i4>41</i4></value>
          </param>
       </params>
    </methodCall>

my favorite part[2] is how <params> only contain <param> elements within. each <param> has exactly one child. Why is it not just <params><i4>41</i4></params>?. Its no surprise that the creator of xmlrpc was involved with soap.

[1] http://www.xmlrpc.com/spec [2] not really.

eli · on June 23, 2011

At least you get a <person> tag. I deal with a web service that returns a <dataset> <row> ... </row> <row> ... </row> </dataset>

bartl · on June 23, 2011

You should check out the format iTunes uses for its music database. It's just as bad as this.

knieveltech · on June 23, 2011

That makes me want to claw my eyes out, set fire to my keyboard and immigrate to China, where I will start a new life as a blind monk.

Most of the worst experiences of my career as a developer involve some enterprise SOAP API or another.

jasonlotito · on June 23, 2011

Are you really having to deal with the SOAP protocol yourself? I mean, their are libraries that handle all this for you. Just read in the WSDL, and call the methods as you need them. No need to muck around with XML.

arethuza · on June 23, 2011

Of course there are libraries that do all of this stuff - and they work nearly all the time. However, when they stop working you end up having to look at what is actually going over the wire (or WDSL if you are really unlucky) to try and work out what on earth is going on.

98% of the time SOAP works without too many problems - the trouble is that those other 2% of cases can be truly awful to debug because you have then got to wade through all of the crap that is in there to support the "easy to use" functionality.

SOAP should die.

arethuza · on June 23, 2011

Typo: should have been WSDL

RyanMcGreal · on June 23, 2011

Afraid not. The WSDL is so baroque, so deeply nested with objects inside objects inside objects, that no SOAP client is capable of parsing it.

To write requests, I've had to resort to building templates manually. To read responses, I've had to resort to walking the DOM tree to hunt-and-peck for the fields I need.

jasonlotito · on June 23, 2011

Would a JSON implementation be better then? I mean, it sounds like the design is just painful, and regardless of the implementation it would be horrible.

Or do you think JSON encourages simplicity enough to overcome these issues? That the person who created his interface would have created something cleaner?

I guess what I'm asking is, is it the API that sucks, or the implementation (or both)?

* I've never had issues with XML-RPC or SOAP implementation. I prefer JSON because it can use it from JavaScript easily. But having consumed SOAP and XML-RPC API's (mostly with banks), I've never had problems that I'd blame on the implementation.

Goladus · on June 23, 2011

I can't fairly evaluate that without understanding why it was done that way. It looks to me like something that might have been dumped from a database table named "person" with two columns one called "name" the other called "value."

It might be a lazy way to dump the database but that's not XML's fault.

yannickmahe · on June 23, 2011

I've encountered that many times, with all the headaches involved. What is the reason for this?

cheez · on June 23, 2011

I think the difference is that XML protocols are meant to be generated while JSON protocols are meant to be hand-written to some extent.

Otherwise, what's the difference with:

{'ret': 'pump me up'}

And

If it's a computer generating it?

rmc · on June 23, 2011

The difference is that in JSON there's pretty much only one way to do it, the way you have there. With XML there's many ways to do it (see examples on this thread). So when using XML, you need to write code adapted to the way it's 'encoded'.

zbowling · on June 23, 2011

huh? thats not the motivation or intent of either format's design.

cheez · on June 23, 2011

Replace "hand-written" and "machine generated" with "human consumption" and "machine consumption"

olavk · on June 23, 2011

I think it is the other way around. XML is explicitly designed to be writable and editable by hand, which is also the reason for some of it's syntactic redundancy.

cheez · on June 23, 2011

If you think XML is writable, you are a bigger man than me.

olavk · on June 24, 2011

You need to understand the initial use case for XML. It was invented for document-oriented markup languages like HTML, MathML, Docbook etc. You can definitely write XHTML by hand, and a JSON-based syntax for the same kind of documents (which mixed content and so on) would be a lot harder to read and write.

greyfade · on June 26, 2011

My understanding is that XML was derived from document-oriented SGML, to beat SGML into a form that would work well with XSL and XPath.

But I'd like to point out that the way SGML-derived markup distinguishes attributes and child nodes is entirely arbitrary. You could as easily make attributes child nodes - it's all in how you interpret what's written. Likewise, you can "convert" SGML to JSON (or YAML or S-expr or whatever) very easily, bearing in mind that attributes and child nodes sit in the same space with each other - a well-formed XHTML document, for example, can be re-expressed in JSON without ambiguity, since tags have a well-specified, unambiguous list of allowed children and attributes - just give text nodes the name "text" and you're golden.

jpr · on June 23, 2011

Apparently one doesn't need to be awake, sane or sober to become an enterprise "programmer".

mattmanser · on June 23, 2011

Apparently one doesn't need to be awake, sane or sober to become a "programmer".

Fixed. I've seen plenty of crap programming everywhere, no need to pick on enterprise programmers.

Produce · on June 23, 2011

Except that bad programmers in other domains will typically shrug and say "so what" when you point out the lack of code quality while enterprise programmers will try to convince you that their twisted ways are actually better. IME, ofcourse.