Null end tags in XHTML
I have mentioned here and there that XHTML (and XML in general) wasn’t designed to support SGML null end tags. This isn’t completely true. XML supports a restricted and altered form of null end tags, and in fact they are used all the time.
Null end tags are a way to abbreviate an end tag to a single character. They are not supported by most common HTML user agents, but they do exist in HTML’s profile of SGML and there are HTML user agents that support them. In HTML and the default SGML profile, null end tags look like this:
<title/This is the title of the page/
For fully compliant HTML user agents, the above is equivalent to the following:
<title>This is the title of the page</title>
As you can see, the contents of the element are surrounded by “/” characters, which are used more or less like the quotes used for attribute values. If an SGML profile and DTD requires a certain element’s end tag to be omitted, only one slash is relevant for the element (any further slashes will be treated as character data). For example, the following tag is valid in HTML:
<img src="image.png" alt="An image"/
Although it doesn’t save any characters and isn’t widely supported, it is perfectly legal according to the standard. This is where I have discussed problems with XHTML. The above isn’t legal in XHTML, but the following is the closest equivalent:
<img src="image.png" alt="An image"/>
An XHTML user agent would see the above as a single
img tag, but a fully compliant HTML user agent would see it as a shortened null end tag like the previous example but with a “>” character after it. The “>” character would be seen as regular character data and would display on the page itself. Despite common practice, a space before the “/” character wouldn’t change this.
I have said that this issue is due to XHTML/XML not supporting null end tags. However, it’s more accurate to say that XML doesn’t support null end tags in the same way as HTML. Rather than the contents being surrounded by two slashes, they are surrounded by a slash and a greater-than sign (
/ ... >) with the additional constraint that they may only be used when the contents are empty. So the second
img example is actually XML’s version of null end tags: the start tag ends with the “/” character and the end tag is represented by the “>” character. Because end tags may not be omitted in XML, the “>” character is always required, and because the null end tag rule in XML is defined as “IMMEDNET” (explained below), it must close immediately after it is started, so there may be no actual content.
Although the specifications don’t clearly discuss these issues, they are a result of the respective standards’ SGML declarations that define the profile of SGML used. See the HTML SGML declaration and the XML SGML declaration. These declarations are automatically assumed by the browser when they are given hint to treat the page as HTML or XML (such as via the content type). The SGML declaration defines the most basic level of how the SGML document is written. It defines which characters define tags, marked sections, character references, processing instructions, etc., what kinds of shorthand features may be used, possibly some default character entities that are available regardless of the DTD, and other lexical aspects of the document. The SGML declaration is applied on top of a default profile, called the “reference concrete syntax” that is defined in the SGML standard itself.
The HTML SGML declaration isn’t very big because it mostly uses SGML’s defaults. The defaults include “/” for syntax.delim.net, meaning that null end tag contents are delimited by “/” characters. XML uses “>” for syntax.delim.net, plus “/” for syntax.delim.nestc. Nestc is an extension to the original SGML standard that provides a different value for the null end tag delimiter that finishes the start tag. XML uses other extensions, such as more specific options in the “features” section. HTML enables features.minimize.shorttag, which allows shorthand constructs like null end tags, while XML specifically has features.minimize.shorttag.starttag.netenabl set to “IMMEDNET” which, as mentioned above, enables null end tags with the restriction that they must close immediately after opening.
The reason XML was designed to support this form of null end tags was to reduce the potential clutter caused by a large number of empty elements. The null end tag delimiters were altered so that the null end tags don’t look too alien for people who are used to the widely supported parts of the HTML standard. They were designed to look like regular start tags with a simple slash before the end, which reminds people of the function of end tags. In this way, they managed to design XML to be strict, efficient, intuitive, and compatible with modern SGML user agents that know where to find the SGML declaration.
August 9th, 2006 at 16:32 UTC
Do you know which HTML user agents actually support them? Iâ€™ve checked all of these lesser used features of SGML in a range of browsers, but havenâ€™t come across a single one which doesnâ€™t choke horribly themâ€¦
Posted using Mozilla Firefox 220.127.116.11 on Linux.
August 9th, 2006 at 18:24 UTC
Because so many websites are written like XHTML but sent as
text/html, no practical web browser can safely support null end tags in HTML without resulting in a bunch of greater-than signs spewed over the pages. Only user agents that are more true to the SGML standard support them. Off-hand, I know the W3C HTML validator supports them, as you can see if you set the option to display the parse tree. Some other classic SGML user agents (not necessarily web browsers) also support it. I am currently finishing up writing an SGML / XML parser and syntax highlighter which properly supports null end tags, empty tags, unclosed tags, and some other lesser-known features of SGML. The reason few user agents support it is not likely difficulty of implementation, but rather legacy and real world compatibility reasons.
Posted using Mozilla Firefox 18.104.22.168 on Linux.
August 10th, 2006 at 05:09 UTC
Cool. Any idea on a release date?
Also, the comment system says you’re running Linux. If you don’t mind me asking, what distro (and DE) are you using? [Until Wine gets better] I’m planning on setting up a dual-boot system, but I can’t seem to decide on a distro & DE…
BTW: Awesome site :)
Posted using Mozilla Firefox 2.0b1 on Windows.
August 10th, 2006 at 14:54 UTC
The parser/syntax highlighter will be an integrated part of the Web Devout rewrite I’m working on. It’ll be used to highlight source in various new articles, and I plan to somehow implement it into the webpage test system as well. It’s being written in PHP because that’s the primary language used on this site, although output will be cached to improve performance.
I’m currently using Ubuntu Dapper Drake and GNOME. I use Wine to test Internet Explorer 4 through 6, and I run IE7 in a VMware virtual machine with Windows XP (yes, I bought a legitimate copy). These days, Ubuntu is pretty much the leader in desktop Linux, and it’s a really solid distro. KDE isn’t bad as a desktop environment, but it proved way too unstable for me (especially the sound system). GNOME has worked great for me.
Posted using Mozilla Firefox 22.214.171.124 on Linux.
February 20th, 2007 at 05:41 UTC
“Despite common practice, a space before the â€œ/â€ character wouldnâ€™t change this.”
Actually, the XHTML spec says to include a space. In addition, omitting the space removes compatibility with Netscape 4.
Still, what you said is obviously correct.
Posted using Mozilla Firefox 126.96.36.199 on Windows.
February 20th, 2007 at 14:08 UTC
The XHTML spec only suggests a space to maximize (but not guarantee!) compatibility with older user agents such as Netscape 4. It isn’t required, only strongly encouraged if you’re sending the page as
Posted using Mozilla Firefox 188.8.131.52 on Linux.