Web Devout tidings


Validate XHTML parsed as HTML

When you send XHTML to a browser using the common text/html content type, all major browsers will respond by using their regular HTML parsers on your page, regardless of the doctype. For some reason, the W3C HTML Validator doesn’t follow this widely-accepted convention. Instead, if you’re using an XHTML doctype, the W3C Validator will use an XML parser on your page. Obviously, that will give different results than what your browser is seeing. And unfortunately, there was no easy way to force the W3C Validator to parse your page with an HTML parser like everyone else did.

That is, until now. I’ve just released the new Validate XHTML Parsed as HTML tool. It works very much like the HTML Good Practice Checker: you submit the URL you want to test, it makes a few minimal changes to the beginning of your markup in order to modify how the W3C Validator sees your code, you click the button to validate it, and the results appear below.

The purpose of this tool is to illustrate how the compatibility issues between XHTML and HTML are not as simple as whether or not you follow the HTML Compatibility Guidelines. A fully-compliant HTML parser following widely-accepted conventions for parsing mode selection would encounter all of these errors when attempting to parse your page. Popular web browsers don’t support the Null End Tag construct, so they would see it slightly differently, but they would still see errors in each instance of /> on the page. I thought one of the selling qualities of XHTML was that it was supposed to put an end to lax error handling. I guess not.

4 Responses to “Validate XHTML parsed as HTML”

  1. David Cassidy Says:

    I have to wonder if this is how the W3C validator team “politely” reminds us to send the proper content-type when using (X)HTML. This is probably just some minor oversight on their part, however.

    But even if not, there is a bit of legitimacy in doing things this way. You, yourself, have stressed the important of sending the proper content-type on more occasions than I care to count. In fact, your article entitled, “Beware of XHTML” is probably one of the finest ever written. And the “Anti-XHTML Movement” is growing with each day.

    In the meantime however, I can easily see how this is causing more trouble than we need. You’re tool will be a welcomed additional to the arsenal of many developers that may bot yet be enlightened. Thanks!

    Posted using Mozilla Firefox 2.0.0.6 on Linux.

  2. Daniel S Says:

    I personally don’t care very much about the difference between HTML and XHTML. But I may be a somehow special “use-case”.

    For sites I create for myself I use XHTML 1.0 Strict (I could use 1.1, but that somehow doesn’t appeal to me).

    I had a project that was sent as application/xhtml+xml to browsers that know that MediaType and sent text/html to all others. The only problem I saw there was the Cache-ability problem. I made sure CSS worked identical. I guess the great problems start when one wants to use JavaScript and DOM Methods (which I didn’t).

    Now, whenever I write a Website I just use HTML 4 in Strict because I can see that it’s not anywhere worse than XHTML which is most of the time sent as text/html. The only problem I see with that way is that I can’t validate HTML as good as XHTML.

    Where is the error? <a href="href">
    W3C’s Validator can’t tell.

    I just wished there were better Information how XHTML sent as HTML really makes problems (and what the real Problems are when it eventually will get sent as real XHTML).
    Yeah, some are obvious, like some CSS portions, and error handling. I guess DOM methods like createElement(NS) as well.

    On CSS I wished there just weren’t any parts that only apply to HTML or XHTML, where’s the sense in that?

    And why is every list of arguments full of nonsense arguments?

    Yeah, no one wants to write complex comment-CDATA-Area-mixtures in Script- or Style-Areas. But why is this an argument when the best practise is simply to keep Style and Behaviour outside of the document? Examples 3 and 4 of Beware of XHTML.

    Posted using Mozilla Firefox 2.0.0.6 on Windows.

  3. David Hammond Says:

    Now, whenever I write a Website I just use HTML 4 in Strict because I can see that it’s not anywhere worse than XHTML which is most of the time sent as text/html. The only problem I see with that way is that I can’t validate HTML as good as XHTML.

    It sounds like you may be interested in this: HTML Good Practice Checker

    I just wished there were better Information how XHTML sent as HTML really makes problems (and what the real Problems are when it eventually will get sent as real XHTML).

    Most of the big problems come from people mistakenly thinking that browsers will treat their text/html XHTML as XML (<div />, CDATA sections, etc.) or mistakenly thinking that browsers parsing their text/html XHTML as XML will display it like HTML parsers (error handling, script/style contents, tables, CSS, DOM, and so on…).

    Some problems are more a matter of theory and philosophy. Fully-compliant HTML parsers are useless on the Web today because all of the XHTML sent as text/html has polluted the Web. A fully-compliant HTML parser would make a mess of an XHTML document sent as text/html, because XHTML following the HTML Compatibility Guidelines actually isn’t compatible with fully-compliant HTML parsers. Because of this pollution, you’ll probably never see fully-compliant HTML parsers in practice. And I think this shows a pretty bastardized interpretation of what a “standard” is supposed to be. Sending XHTML as text/html is an anti-standard. In the long term it hurts both the HTML and XHTML standards, yet in the short term it gives no benefit worth mentioning. So using it is simply wreckless and disrespectful of the standards in general.

    No web browser can afford to be 100% HTML 4.01 compliant because it would cause almost all XHTML sites (including the valid ones) to break, and no web browser can afford to parse all pages with XHTML doctypes as XML because it would cause the vast majority of XHTML sites (including the valid ones) to break. Serving XHTML as text/html forces web browsers to not follow the standard.

    Posted using Mozilla Firefox 2.0.0.6 on Linux.

  4. Daniel S Says:

    It sounds like you may be interested in this: HTML Good Practice Checker

    Yeah, I’m already using your tool. Good work there.

    However, more people are using the validator only and not your tool. Like I said, [img[/a] (missing “]” for img-element) is not consistently parsed by browsers. Yet, for the validator that code is perfectly fine. That’s definitely a problem!

    On the other hand, stricter DTD rules can’t find many obvious mistakes. Stress the comment content which is shown in IE 7 and Opera 9 but not in Firefox.

    or mistakenly thinking that browsers parsing their text/html XHTML as XML will display it like HTML parsers (error handling, script/style contents, tables, CSS, DOM, and so on…).

    I completely understand this. But imagine what a shock It was to learn, that the most basic DOM-methods wouldn’t function correctly when used in real XHTML (well, if browsers would implement it like specced). So I really understand how the web is polluted and standards are hurt (don’t rely on browser bugs).
    Yet, Hixie only mentions document.write() as problem. Nowehere he states how dangerous createElement() is, nor that the method that should be used instead isn’t reliably implemented cross-browser.

    Posted using Mozilla Firefox 2.0.0.6 on Windows.