XML is neat. I like it very much, and get to use it quite a bit at work. I especially like JAXB, Java’s binding system, which has a very useful tool, XJC, the XML to Java Compiler. What it does is take an XSD, that is, a W3C XML Schema file, describing the structure and data types of an XML document format, and turns it into annotated Java classes. These classes can then be used by JAXB’s unmarshalling system to turn any XML file matching this document format, into a set of instance of these classes.
With some utility classes to wrap things up neatly, the code to read in, say, an XML configuration file, is as easy as this:
public void main(String[] args){
Configuration config=new Unmarshaller<Configuration>("config.xsd",Config.class).unmarshall();
Socket socket=new Socket(config.getHostname(),config.getPort());
for(Configuration.User user:config.getUser()){
// Do something with the user object....
}
}
Anyway, Java was not what I was going to write about. It’s just that coding Java, and being reliant on XSD files, I get into contact with some of them written by others. Whether it’s people wanting help on IRC channels I frequent, or a partner system at work that my code has to interface with, I have to read the schema and try to understand it. To get to my point, let me show you two examples.
First example
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3schools.com"
xmlns="http://www.w3schools.com"
elementFormDefault="qualified">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Second example
<?xml version="1.0"?>
<schema xmlns ="http://www.w3.org/2001/XMLSchema"
targetNamespace ="http://www.w3schools.com"
elementFormDefault="qualified">
<element name="note">
<complexType>
<sequence>
<element name="to" type="string"/>
<element name="from" type="string"/>
<element name="heading" type="string"/>
<element name="body" type="string"/>
</sequence>
</complexType>
</element>
</schema>
The first example is from W3schools schema howto, the second is my improved version. Now, besides the better indentation, what was my main improvement? I removed the prefix for the XMLSchema namespace! This is something I see all over the place. Guides and tutorials all over the net do the same, and newbies mimic it.
XML has some attributes that start with xmlns; these have a special meaning. xmlns:something=”http://something.else/” defines a namespace prefix, “something” that is an alias for the namespace “http://something.else/”. This is a way to specify what namespace a tag belongs to. When you later write <something:a/> (given that that a is a descendant of the element where you defined the prefix), that declares that the a tag belongs to the “http://something.else/” namespace. But it is only a way to specify namespaces. Typing prefixes everywhere gets tiresome, and it makes the text harder to read. And having the text easy to read for a human is one of the key goals of XML, and a great benefit of it. So there’s an alternative method. You can just say xmlns=”http://something.else/” and that tag and every tag within it is of the namespace “http://something.else/”! Of course, if you need to mix namespaces, you typically use prefixes. But you can use xmlns for the most prevalent namespace, and the prefixes for the exceptions, or if one namespace set is nested wholly within another, just use the xmlns attribute twice.
So why do people keep asking for help with overly ugly schemas, with documents with a prefix on every single tag? Why do tutorials and guides do the same? I think the problems is merely a lack of understanding of namespaces. People are just doing what the people before them did, without an understanding of WHY those special attributes are what they are, or what those letters before the tag mean. Like the cults that have given cargo cult programming their name, they are merely copying the visible efforts that seem to give the right results, without an understanding of what makes them work.
Don’t be that way. Namespaces might be more complex than the simplicity of tags and attributes, but they aren’t that hard to use. Take the time to learn how they work, and how you declare them. Not only will your documents be much more readable and you’ll be typing less, but you will have learned an important part of what makes XML tick, gained a better understanding of it.