Web Devout tidings


Archive for April 23rd, 2007

The whimzical world of HTML 5

Monday, April 23rd, 2007

A lot of scary stuff is going on in HTML 5 development. You know all the things we’ve learned about browser/engine-neutral code, building standards on top of other standards, using semantic markup, and so on? Well from what I’ve seen, the HTML working group seems to be throwing all of that out the window.

I should first note that I only just recently subscribed to the HTML WG mailing list, and I haven’t yet had a chance to read the full breadth of the discussion, but the talk right now seems to be gathered around something called “bugmode”, a new standard mechanism for browsers to add an infinite number of “quirks modes” which webpages can subscribe to. It’s currently proposed as something like this:

<html bugmode="ie7 gecko1.8 opera9">

This would basically cause these browsers to use snapshots of the respective layout engines when displaying the page. All future versions of Internet Explorer would use the IE 7 engine, all future versions of Firefox would use the Gecko 1.8 engine, and all future versions of Opera would use the Opera 9 engine.

Am I the only one who thinks this is a terrible idea?

First of all, since when do web developers experience significant problems with new versions of Firefox or Opera? I’ve never had anything important break with the release of a new version. I’ve only experienced such a problem in Internet Explorer, since IE has to fix major implementation flaws in very fundamental areas of the standards, like the basic behavior of the width and height properties. IE is uniquely in this position because most of their engine was developed before the current CSS standards were in place (they basically extrapolated off of CSS 1 however they saw fit at the time) and the engine had no development work for half a decade to correct the inconsistencies.

So I personally wouldn’t mind it if IE added some sort of conditional comment type of thing to target new quirks modes in IE, but I don’t see why there should be a whole new attribute added to the HTML standard just for triggering new browser-specific quirks modes.

I should point out that this is still very much a brainstorming session, and this idea may fade away in a couple weeks, but I’m still bothered by the number of people who seem to be taking this discussion seriously.

I talked a little more about this issue in a comment on Chris Wilson’s blog.

Now, Ian Hickson, who was responsible for a lot of the WHATWG Web Applications 1.0 work and will serve as an editor for the W3C HTML 5 specification, has it in his mind that the HTML WG is chartered to deviate HTML 5 from SGML. That is, he believes it is one of the stated intentions of the HTML WG that future versions of HTML will not be SGML languages.

Here is the charter quote from which he derived this idea:

The Group will define conformance and parsing requirements for ‘classic HTML’, taking into account legacy implementations; the Group will not assume that an SGML parser is used for ‘classic HTML’.

The charter uses the term “classic HTML” to refer to non-XHTML HTML. In SGML terms, this would be the markup using HTML 4.01’s SGML declaration, rather than XML as used by XHTML. Currently, no major web browser uses a full-featured SGML parser to parse classic HTML content. Therefore, it is wise not to assume that a browser can handle any SGML rules thrown at it in a new version of HTML. What the charter is saying is that the group will take into account this fact when developing the new standard. It does not say that HTML 5 shouldn’t be parseable by an SGML parser; it just says not to assume that an SGML parser will always be used.

However, Ian Hickson and others in the HTML WG have used this twisted interpretation of the charter as an excuse to unnecessarily break compatibility with the SGML standard. I’ll say it again: unnecessarily breaking compatibility with the SGML standard. I haven’t yet seen anything they’re trying to accomplish with HTML 5 that couldn’t be done in an SGML-compatible way.

They want to circumvent the issue of XML-style self-closing tag constructs causing problems in user agents which support the default SGML syntax for null end tags? Just set NETENABL to NO in the SGML declaration. This wouldn’t expressly allow XML-style self-closing constructs in HTML, and in these cases the “/” character would be considered invalid, but it brings a fully-compliant SGML parser to the behavior that all major browsers currently exhibit. Note that this would only be handled as intended when used on elements defined as EMPTY, as is currently the case in all major web browsers. If you were to truly support XML-style self-closing tags even for non-EMPTY elements (which may indeed require a significant departure from how HTML is currently constructed), that would cause problems with legacy user agents, which the HTML WG charter says to avoid. A change to the SGML declaration would be somewhat of an issue for fully compliant SGML parsers, since they generally use the Content-type header to determine which SGML grammar is being used, and we should probably avoid giving HTML 5 a different content-type than HTML 4, but at least this would keep HTML 5 compatible with SGML so it isn’t impossible for an SGML parser to parse it.

It has also been proposed that HTML 5 should have no DTD. For similar reasons, I ask, why? I’ve seen the proposed elements in Web Applications 1.0, which is roughly considered the starting point for HTML 5 development, and I don’t see anything there that would require the absence of a DTD. I’m curious what the W3C Validator development team thinks of this. The W3C Validator currently operates strictly via an SGML/DTD parser (the upcoming new version of the Validator also comes equipped with an XML parser in order to also check for well-formedness). Without a DTD, the validator would have to hard-code all of the rules for HTML 5. And how exactly does omitting a DTD benefit anyone?

Not only does Ian Hickson want to omit a DTD, but he doesn’t seem to think that a version indicator is even necessary. His proposed new doctype declaration is simply <!DOCTYPE html>. So that’s it. Every future version of HTML had better be 100% backwards compatible. No mistakes may be made or else the HTML standard is screwed for life. I think history has shown us that this assumption that we can reasonably keep a sane standard backwards-compatible forever is a bit unwise. At one time, the isindex element seemed like a good idea. There are plenty of people who want the q element redefined in HTML 5 so that the browser doesn’t display quotation marks by itself. HTML 5 already attempts to redefine some elements and attributes from HTML 4. I guarantee that there are features currently in Web Applications 1.0 which people are going to see as a mistake several years down the road and want to correct. It will end up causing compatibility problems if there isn’t a version number to go along with those changes. Maybe we’ll have to use bugmode after all.

Speaking of new features, let’s talk about some of them. To start off, there are some good things proposed in Web Applications 1.0. I like the section element, nav element, article element, aside element, the redefinition of the dl element, and some of the other stuff. But there are some elements and attributes that just make me scratch my head:

  • Why do we have a canvas element? Why not simply use a script to apply some state to any given element to turn it into a canvas? People who have worked with the Google Maps API are familiar with the idea of using a script to replace an arbitrary element (be it a div, p, etc.) with a new object. In most cases, a canvas element could be replaced with a div element, and then the script just sets it to a canvas just as browsers often allow scripts to set arbitrary elements to be contentEditable. What ever happened to semantic markup? What semantics does a canvas element express?
  • ping attributes? In my a? Thanks for slowing my Web experience and using up more of my bandwidth so that advertising companies can track my habits. Much appreciated. I hope my browser quickly adds an option to disable this functionality, because I for one don’t want it. If a website is going to gossip to others about how I’m using the site, it should put in the effort to do it server-side with its own bandwidth.
  • embed element, why won’t you die? Is it the popular thing these days to just call whatever is out there on the Web “the standard”?

I could go on, but my point is that a lot of stuff is being proposed pretty quickly, and I question the motivation and thought behind a lot of these propositions. People seem to be caught up on how to add such-and-such functionality to web apps rather than focusing on semantics and other things we were supposed to have learned since the old boom days of the Web. I dunno, it just feels like we’ve been through all of this before. Even though this is being discussed in a public forum, the types of propositions are all too reminiscent of the seemingly random “sounded-good-at-the-time” features Netscape and Internet Explorer kept adding during the last browser wars. Does anyone know where I can buy some cheap shock collars?