Web Devout tidings

The whimzical world of HTML 5

A lot of scary stuff is going on in HTML 5 development. You know all the things we’ve learned about browser/engine-neutral code, building standards on top of other standards, using semantic markup, and so on? Well from what I’ve seen, the HTML working group seems to be throwing all of that out the window.

I should first note that I only just recently subscribed to the HTML WG mailing list, and I haven’t yet had a chance to read the full breadth of the discussion, but the talk right now seems to be gathered around something called “bugmode”, a new standard mechanism for browsers to add an infinite number of “quirks modes” which webpages can subscribe to. It’s currently proposed as something like this:

<html bugmode="ie7 gecko1.8 opera9">

This would basically cause these browsers to use snapshots of the respective layout engines when displaying the page. All future versions of Internet Explorer would use the IE 7 engine, all future versions of Firefox would use the Gecko 1.8 engine, and all future versions of Opera would use the Opera 9 engine.

Am I the only one who thinks this is a terrible idea?

First of all, since when do web developers experience significant problems with new versions of Firefox or Opera? I’ve never had anything important break with the release of a new version. I’ve only experienced such a problem in Internet Explorer, since IE has to fix major implementation flaws in very fundamental areas of the standards, like the basic behavior of the width and height properties. IE is uniquely in this position because most of their engine was developed before the current CSS standards were in place (they basically extrapolated off of CSS 1 however they saw fit at the time) and the engine had no development work for half a decade to correct the inconsistencies.

So I personally wouldn’t mind it if IE added some sort of conditional comment type of thing to target new quirks modes in IE, but I don’t see why there should be a whole new attribute added to the HTML standard just for triggering new browser-specific quirks modes.

I should point out that this is still very much a brainstorming session, and this idea may fade away in a couple weeks, but I’m still bothered by the number of people who seem to be taking this discussion seriously.

I talked a little more about this issue in a comment on Chris Wilson’s blog.

Now, Ian Hickson, who was responsible for a lot of the WHATWG Web Applications 1.0 work and will serve as an editor for the W3C HTML 5 specification, has it in his mind that the HTML WG is chartered to deviate HTML 5 from SGML. That is, he believes it is one of the stated intentions of the HTML WG that future versions of HTML will not be SGML languages.

Here is the charter quote from which he derived this idea:

The Group will define conformance and parsing requirements for ‘classic HTML’, taking into account legacy implementations; the Group will not assume that an SGML parser is used for ‘classic HTML’.

The charter uses the term “classic HTML” to refer to non-XHTML HTML. In SGML terms, this would be the markup using HTML 4.01’s SGML declaration, rather than XML as used by XHTML. Currently, no major web browser uses a full-featured SGML parser to parse classic HTML content. Therefore, it is wise not to assume that a browser can handle any SGML rules thrown at it in a new version of HTML. What the charter is saying is that the group will take into account this fact when developing the new standard. It does not say that HTML 5 shouldn’t be parseable by an SGML parser; it just says not to assume that an SGML parser will always be used.

However, Ian Hickson and others in the HTML WG have used this twisted interpretation of the charter as an excuse to unnecessarily break compatibility with the SGML standard. I’ll say it again: unnecessarily breaking compatibility with the SGML standard. I haven’t yet seen anything they’re trying to accomplish with HTML 5 that couldn’t be done in an SGML-compatible way.

They want to circumvent the issue of XML-style self-closing tag constructs causing problems in user agents which support the default SGML syntax for null end tags? Just set NETENABL to NO in the SGML declaration. This wouldn’t expressly allow XML-style self-closing constructs in HTML, and in these cases the “/” character would be considered invalid, but it brings a fully-compliant SGML parser to the behavior that all major browsers currently exhibit. Note that this would only be handled as intended when used on elements defined as EMPTY, as is currently the case in all major web browsers. If you were to truly support XML-style self-closing tags even for non-EMPTY elements (which may indeed require a significant departure from how HTML is currently constructed), that would cause problems with legacy user agents, which the HTML WG charter says to avoid. A change to the SGML declaration would be somewhat of an issue for fully compliant SGML parsers, since they generally use the Content-type header to determine which SGML grammar is being used, and we should probably avoid giving HTML 5 a different content-type than HTML 4, but at least this would keep HTML 5 compatible with SGML so it isn’t impossible for an SGML parser to parse it.

It has also been proposed that HTML 5 should have no DTD. For similar reasons, I ask, why? I’ve seen the proposed elements in Web Applications 1.0, which is roughly considered the starting point for HTML 5 development, and I don’t see anything there that would require the absence of a DTD. I’m curious what the W3C Validator development team thinks of this. The W3C Validator currently operates strictly via an SGML/DTD parser (the upcoming new version of the Validator also comes equipped with an XML parser in order to also check for well-formedness). Without a DTD, the validator would have to hard-code all of the rules for HTML 5. And how exactly does omitting a DTD benefit anyone?

Not only does Ian Hickson want to omit a DTD, but he doesn’t seem to think that a version indicator is even necessary. His proposed new doctype declaration is simply <!DOCTYPE html>. So that’s it. Every future version of HTML had better be 100% backwards compatible. No mistakes may be made or else the HTML standard is screwed for life. I think history has shown us that this assumption that we can reasonably keep a sane standard backwards-compatible forever is a bit unwise. At one time, the isindex element seemed like a good idea. There are plenty of people who want the q element redefined in HTML 5 so that the browser doesn’t display quotation marks by itself. HTML 5 already attempts to redefine some elements and attributes from HTML 4. I guarantee that there are features currently in Web Applications 1.0 which people are going to see as a mistake several years down the road and want to correct. It will end up causing compatibility problems if there isn’t a version number to go along with those changes. Maybe we’ll have to use bugmode after all.

Speaking of new features, let’s talk about some of them. To start off, there are some good things proposed in Web Applications 1.0. I like the section element, nav element, article element, aside element, the redefinition of the dl element, and some of the other stuff. But there are some elements and attributes that just make me scratch my head:

Why do we have a canvas element? Why not simply use a script to apply some state to any given element to turn it into a canvas? People who have worked with the Google Maps API are familiar with the idea of using a script to replace an arbitrary element (be it a div, p, etc.) with a new object. In most cases, a canvas element could be replaced with a div element, and then the script just sets it to a canvas just as browsers often allow scripts to set arbitrary elements to be contentEditable. What ever happened to semantic markup? What semantics does a canvas element express?
ping attributes? In my a? Thanks for slowing my Web experience and using up more of my bandwidth so that advertising companies can track my habits. Much appreciated. I hope my browser quickly adds an option to disable this functionality, because I for one don’t want it. If a website is going to gossip to others about how I’m using the site, it should put in the effort to do it server-side with its own bandwidth.
embed element, why won’t you die? Is it the popular thing these days to just call whatever is out there on the Web “the standard”?

I could go on, but my point is that a lot of stuff is being proposed pretty quickly, and I question the motivation and thought behind a lot of these propositions. People seem to be caught up on how to add such-and-such functionality to web apps rather than focusing on semantics and other things we were supposed to have learned since the old boom days of the Web. I dunno, it just feels like we’ve been through all of this before. Even though this is being discussed in a public forum, the types of propositions are all too reminiscent of the seemingly random “sounded-good-at-the-time” features Netscape and Internet Explorer kept adding during the last browser wars. Does anyone know where I can buy some cheap shock collars?

This entry was posted on Monday, April 23rd, 2007 at 04:13 UTC and is filed under HTML 5, Specifications. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

7 Responses to “The whimzical world of HTML 5”

Jeff Schiller Says:

April 24th, 2007 at 00:55 UTC

> Is it the popular thing these days to just call whatever is out there on the Web â€œthe standardâ€

From what I understand, this is one way of stating Ian Hickson’s goal: to document the popular things that authors and browsers have done with the web today so that future generations (hundred of years from now) can write their own browser to browse those ancient web pages.

I agree – lots of scary discussions right now. But frankly, I don’t think too many people think “bugmode” is a good idea (I, for one, DO NOT).

Posted using Mozilla Firefox 2.0.0.2pre on Linux.
Wulf Says:

April 24th, 2007 at 23:30 UTC

I disagree about canvas (same principle behind :hover â€“ why resort to a script when you don’t have to?) and ping (uses much less bandwidth than a hidden redirect, which is the current method [used by Google]).

However, other than the aforementioned exceptions, I agree â€“ there’s some pretty scary stuff going on with HTML 5.

Posted using Mozilla Firefox 3.0a on Windows.
Chris Wilson Says:

April 27th, 2007 at 20:38 UTC

David, you missed the bit where they’re considering putting the marquee element in HTML5. :)

Posted using Internet Explorer (Windows) 7.0 on Windows.
mpt Says:

April 29th, 2007 at 06:20 UTC

* Yes, bugmode is a terrible idea, because it would make writing a new useful Web browser in 2030 prohibitively difficult. It’s an anticompetitive move.

* It’s already impossible for an SGML parser to parse HTML4 as used in the real world, because the mass of humans who author the Web (and the computer programs written by those humans) will never be that perfect. The main change in HTML5 is the level of honesty.

* A DTD can represent only a small fraction of HTML5’s conformance requirements. The judgement is that misleading authors, by providing DTD-based “validation” that tests only a small fraction of the requirements, does more harm than good.

* Backward compatibility is vital for the same reason bugmode is a bad idea: to make it feasible for programmers to write new useful Web browsers in decades to come. Making obsolete elements non-conforming does not contradict this goal: the specification will continue to describe how to render those things that are no longer conforming, as long as a non-trivial number of Web pages use them.

* As far as I know, canvas and embed both exist because browser vendors find it extremely difficult to implement generalized behavior (which is why object is so patchily implemented, for example).

* The argument for ping is pretty much exactly what you stated: that people who care will be able to disable it in their browser, whereas it’s difficult or impossible to do the same with server-side tracking. I disagree on the grounds that browsers won’t let you do this because the UI would be ridiculous. (”Let Web sites track which links you click on: (*) Always ( ) Sometimes”)

Posted using Safari 312.6 on Macintosh.
kalikiana Says:

May 5th, 2007 at 21:48 UTC

> Not only does Ian Hickson want to omit a DTD, but he doesnâ€™t seem to think that a version indicator is even necessary.

Wow. I heard and read bad things about HTML5 before. I know that not everybody likes XHTML. But now I can hardly believe any more that this is coming from the W3C Group. They really ought to go order an internet connection.

Posted using Mozilla Firefox 2.0.0.3 on Linux.
greg Says:

May 22nd, 2007 at 20:20 UTC

wait, so that Im clear – requiring bug mode means that in order to make a browser 10 years from now, you will need include every version of every browser’s rendering engine from the last decade?

what a horrible idea.

FWIW – canvas doesn’t sound like a bad idea. semantics – “visual stuff goes here.” Better than a semantically vacuous div.

Posted using Mozilla Firefox 2.0.0.3 on Windows.
Jacques Distler Says:

May 26th, 2007 at 04:43 UTC

I don’t know why you are so enamoured of DTDs. DTD-based validators suck.

Posted using SeaMonkey/Mozilla Suite 1.5a on Macintosh.