If that is the worst XML that you have experienced then you have been very very lucky. I've seen stuff that:
- Has been built by string manipulation and therefore isn't well formed and needs hacky preprocessing before being parsed (no not HTML)
- Is full of redundant information (e.g. count attributes giving the number of child elements)
- Makes evil use of vast numbers of namespaces where the element names are all the same
- Is a basically a container for delimited or fixed format data
- Had attributes that contained entire encoded XML documents
<sob>
There are probably some other horrors that therapy and/or alchohol have let me forget (like systems doing SQL queries doing string compares on lumps of XML).
I've seen CDATA elements that contained complete XML documents including other CDATA elements. Good thing they had a hand rolled non-standard XML parser that allowed nested CDATA tags.
I noticed it when I wanted to read the doc in with tinyXML while working 70 hour weeks to fix up some other issue before a deadline. I ended up doing a memset(nestedXMLStart, ' ', nestedXMLLength) as a preprocessing step to bulldoze the whole construct. Not pretty, but it worked.
... and so on. I can't think of any advantage to (or excuse for) doing it this way. The only possible reason for it that I can think of, is rather comprehensive ignorance of how XML and related standards work.
The apache module that provides the gateway to Ecometry? That thing was a bear. I can't believe they did all the processing in the apache module rather than with a CGI.
It didn't actually use regular expressions, but parsed the input with a lot of char pointer manipulation.
The reason it worked like this was to have the "XML" tags map directly to the terminal input screens run on the backend, because it essentially input data as if a human was typing it into the terminal and navigating the forms.
I keep meaning to write this up as the worst example of XML abuse I've ever seen; I'm actually surprised someone else here has used the same thing (or even more surprised that more than one party has implemented the same braindead thing).
I'm a little embarrassed to admit this, but when I first read your post, I honestly thought that you were giving an example of how simple XML can be! After all, I can quickly glance at this bit of XML and understand what it is intended to represent. This doesn't look like SOAP-style xml to me at all, it looks like simple XML over HTTP.
It gets so much worse that this. Much of the generated XML I've seen (especially the stuff used for SOAP) is nearly unreadable by humans.
Oh, it wasn't meant as a criticism. I think my original misperception just strengthens your point. XML written as simply as possible is still more complicated than JSON.
That said, I feel about XML much the same way I feel about Java. They are typcially more complicated and verbose than the alternatives, but I don't think you can pin the horrendous level of complexity on the languages themselves. There seems to be a stronger cultural distain for complexity in the communities around other frameworks and languages, and JSON reflects this.
Everyone is (in theory) opposed to "unnecessary" complexity. The real issue is where you draw the line. For some reason, the python/django and ruby/rails communities seem more inclined to say "no" to complexity, even if it means giving something up. To turn it around, you have to get a hell of a lot of benefit to convince rails/django people to accept the overhead of additional complexity. I'm not sure why - maybe this is because these frameworks were created by people who were agast at what they saw developing in the Java world?
This seems to be the case with JSON vs XML as well. XML isn't going to be as simple as JSON, but there's no reason you can't describe data in XML in a way that is concise, reasonably, simple, and easy to read (both for people and computers).
> but there's no reason you can't describe data in XML in a way that is concise, reasonably, simple, and easy to read (both for people and computers)
Yes, there is. Each item in XML requires 5 + len(tag) extra characters, minimally: <name>asdf</name>. Unless you make everything a property (e.g. <x name="asdf"/>), in which case you only need 5 extra characters (<x />) per parent item. JSON is minimal: you need no extra characters (or, arguably, 2 extra characters for strings with no spaces)
That's for people. It's even worse for machines, because they have to be able to parse <name>asdf</name> and <person name="asdf"/> and <person name="asdf">...</person> and then be able to verify it using a schema external to the XML in question.
What I don't understand is why people ever thought XML was a good idea. I took one look at in back in '98 when it came out (or whenever it was) and said "this isn't human readable (unless maybe you like writing your HTML by hand)" and proceeded to successfully ignore it unless I have to use it.
Heh, I'm embarrassed to admit that the protocol-buffer system I'm using at work looks kind of like that. Our most common message types contain a repeated field, each one holding a name value pair, instead of just containing a list of non-required fields
I'm embarrassed because I was the lead designer of said system. In my defense, protobufs had just been open-sourced, and it was my first time using them. I thought that name/value pairs would give me greater flexibility. We're using a dynamic language, and I though re-compiling the .proto files every time I added a new item would be a pain. It turns out that there's still a file somewhere in the system listing all the valid names, so we could have just as easily put the list of valid names into a .proto file, and avoided sending the names over the wire with every message, and saving them in every log record, etc. Our build system is sufficiently well automated that recompiling the .protos is painless and almost unnoticeable.
Oh, well, maybe someday I'll get around to redoing our protocol. Fortunately, it's purely internal, and we don't yet store long-lived data in that format, so there's still hope.
my favorite part[2] is how <params> only contain <param> elements within. each <param> has exactly one child. Why is it not just <params><i4>41</i4></params>?. Its no surprise that the creator of xmlrpc was involved with soap.
Are you really having to deal with the SOAP protocol yourself? I mean, their are libraries that handle all this for you. Just read in the WSDL, and call the methods as you need them. No need to muck around with XML.
Of course there are libraries that do all of this stuff - and they work nearly all the time. However, when they stop working you end up having to look at what is actually going over the wire (or WDSL if you are really unlucky) to try and work out what on earth is going on.
98% of the time SOAP works without too many problems - the trouble is that those other 2% of cases can be truly awful to debug because you have then got to wade through all of the crap that is in there to support the "easy to use" functionality.
Afraid not. The WSDL is so baroque, so deeply nested with objects inside objects inside objects, that no SOAP client is capable of parsing it.
To write requests, I've had to resort to building templates manually. To read responses, I've had to resort to walking the DOM tree to hunt-and-peck for the fields I need.
Would a JSON implementation be better then? I mean, it sounds like the design is just painful, and regardless of the implementation it would be horrible.
Or do you think JSON encourages simplicity enough to overcome these issues? That the person who created his interface would have created something cleaner?
I guess what I'm asking is, is it the API that sucks, or the implementation (or both)?
* I've never had issues with XML-RPC or SOAP implementation. I prefer JSON because it can use it from JavaScript easily. But having consumed SOAP and XML-RPC API's (mostly with banks), I've never had problems that I'd blame on the implementation.
I can't fairly evaluate that without understanding why it was done that way. It looks to me like something that might have been dumped from a database table named "person" with two columns one called "name" the other called "value."
It might be a lazy way to dump the database but that's not XML's fault.
The difference is that in JSON there's pretty much only one way to do it, the way you have there. With XML there's many ways to do it (see examples on this thread). So when using XML, you need to write code adapted to the way it's 'encoded'.
I think it is the other way around. XML is explicitly designed to be writable and editable by hand, which is also the reason for some of it's syntactic redundancy.
You need to understand the initial use case for XML. It was invented for document-oriented markup languages like HTML, MathML, Docbook etc. You can definitely write XHTML by hand, and a JSON-based syntax for the same kind of documents (which mixed content and so on) would be a lot harder to read and write.
My understanding is that XML was derived from document-oriented SGML, to beat SGML into a form that would work well with XSL and XPath.
But I'd like to point out that the way SGML-derived markup distinguishes attributes and child nodes is entirely arbitrary. You could as easily make attributes child nodes - it's all in how you interpret what's written. Likewise, you can "convert" SGML to JSON (or YAML or S-expr or whatever) very easily, bearing in mind that attributes and child nodes sit in the same space with each other - a well-formed XHTML document, for example, can be re-expressed in JSON without ambiguity, since tags have a well-specified, unambiguous list of allowed children and attributes - just give text nodes the name "text" and you're golden.
Except that bad programmers in other domains will typically shrug and say "so what" when you point out the lack of code quality while enterprise programmers will try to convince you that their twisted ways are actually better. IME, ofcourse.