Web Devout tidings


Archive for February 17th, 2007

XHTML 1.1 Second Edition WD allows text/html… why?

Saturday, February 17th, 2007

An XHTML 1.1 Second Edition Working Draft has just been published in attempt to correct some problems with the previous Recommended specification. However, in the Strictly Conforming Documents section, I feel that they have introduced a brand new problem. The concerned paragraph is:

XHTML 1.1 documents SHOULD be labeled with the Internet Media Type text/html as defined in [RFC2854] or application/xhtml+xml as defined in [RFC3236]. For further information on using media types with XHTML, see the informative note [XHTMLMIME].

text/html is now included with application/xhtml+xml as one of the content types XHTML 1.1 “should” be sent as. I see this as a big mistake that should be fixed before the specification advances further. In theory and in practice, the content type header instructs the user agent on what kind of file it is dealing with. Every major web browser parses text/html documents as HTML, not XML / XHTML. The fact that XHTML 1.0 allowed this has lead to a huge overall misunderstanding in what XHTML is and how user agents handle it, which in turn has resulted in the vast majority of so-called “XHTML” documents being in such a state that, if properly handled as XHTML was meant to be handled, they would fall apart. Even supposed web standards experts fall victim to this misunderstanding all the time, as is evident in this list of standards-related sites that break as XHTML.

XHTML was designed in part to progress from the old state of the Web that tolerated invalid markup and sloppy legacy behaviors in CSS and elsewhere. However, because XHTML was allowed to be sent as text/html, people are in essence writing XHTML just like they wrote HTML before, or even worse. The document looks like XHTML but they depend on browsers treating it as HTML. Most so-called XHTML pages on the Web today aren’t well-formed, which XML and XHTML were designed to forcefully not tolerate. XHTML on the Web has been the same disaster HTML was, except the situation is even more complicated than before. XHTML 1.0 has failed.

Now, XHTML 1.1 is about to do the same thing. By allowing the use of an incorrect content type that instructs browsers to use incorrect behavior, the specification authors are promoting the incorrect use of XHTML.

What warrants this change? Is it because most XHTML pages on the Web use the wrong content type? The road to progress is not to simply approve of whatever poor and harmful practices are used on the Web. XHTML is an XML format. That’s the only significant thing that sets it apart from HTML. If you’re going to allow it to be served and handled as plain old HTML, why bother having an XHTML standard at all? To word it another way, if you’re going to allow an XHTML document to be sent as text/html, which in turn would cause all major browsers to treat it as plain old HTML, why not instead recommend the use of HTML for those documents? Doing otherwise simply further pollutes the already poor state of XHTML on the Web.

For further reading about the problems with XHTML on the Web today, see the Beware of XHTML article.