Web Devout tidings


Archive for April 16th, 2006

The problem with the NET

Sunday, April 16th, 2006

Use of XHTML today is decidedly harmful to the health of the Web when the documents are sent with the text/html content type for Internet Explorer compatibility. This is why I wrote the Beware of XHTML article, explaining some of the reasons why this misuse can lead to future problems and illustrating the situation with some examples. Now I’d like to talk about another potential problem with XHTML sent as text/html: Null End Tags.

As explained in the article, when a document is sent with the text/html content type, most major browsers including Internet Explorer, Firefox, and Opera treat the webpage as if it is actually regular HTML. As a result, they don’t correctly handle self-closing tags, instead treating them as start tags with an erroneous character inside them.

However, there’s more of a problem than just this. Technically speaking, if the browsers are really treating the page like HTML (and therefore SGML), they shouldn’t think that the closing slash is an erroneous character. According to the rules of SGML, they should think that it’s part of a Null End Tag, a kind of shorthand for simple elements that only contain character data. For example, according to the rules of SGML, a title element could be written <title/This is the title of the page/ which would be the equivalent of <title>This is the title of the page</title>. The first slash finishes the start tag, and the second slash represents the end tag if the element can have one.

Now let’s look at a situation in which this would present a problem. Think about this markup: <div>This<br/>is<br/>a<br/>test.</div>. That would work as expected in XHTML, but here’s how a browser should see it when treating the page as HTML: <div>This<br>>is<br>>a<br>>test.</div>. Notice the extra > after each br tag. Since the br element is defined as an empty element (it doesn’t have contents and doesn’t have an end tag), only one slash is relevant for each element. The slash finishes the tag right there, meaning the > character isn’t considered part of the tag, but rather character data after the tag. The presence of a space before the slash makes no difference.

The above markup should result in the following output when treated like HTML:

This
>is
>a
>test.

…That is, if browsers supported Null End Tags. Unfortunately, the major ones currently don’t, meaning that this issue gets entirely overlooked by most web developers. Rather than properly treating the slash as part of a Null End Tag, they treat it as an error and just skip past the character, often resulting in something quite like what it would get with the correct XHTML treatment, but for the wrong reasons.

Keep in mind that there are many issues like this, and use of XHTML should be avoided unless used correctly, with a correct XHTML content type.