Web Devout tidings


Archive for the 'Web Devout' Category

HTML good practice checker

Sunday, July 1st, 2007

Do you like clean markup? Do you use HTML and still prefer to quote all your attribute values, use lower-case tag names, and generally follow good clean markup practices? Do you wish you could force the HTML Validator to be even more strict so you could quickly identify stray XHTML-style self-closing tags in your HTML and other issues that it usually ignores?

If so, then you may find my new HTML good practice checker useful. It sets up a custom SGML declaration (for markup parsing rules) and DTD (for document structure rules) which instruct the W3C HTML Validator to be more strict with your document.

Here is a partial list of the new rules enforced:

  • All tag and attribute names must be lower-case.
  • All attribute values must be quoted.
  • Declarations are case-sensitive like in XML.
  • SGML Null End Tags (NET) are not allowed. This means that the validator will recognize that a <br /> in an HTML document is a problem.
  • End tags must be used on all non-empty elements. Note: If an end tag is forbidden in normal HTML, it’s still forbidden here.
  • Start tags must be used on all elements.
  • You may not write <tr> tags directly inside the table contents; you must include them in a tbody. In fact, in terms of document structure, tr was never truly allowed as a child of table in HTML. They were normally assumed to be within a tbody element with omitted start and end tags. So this rule is actually just a natural consequence of the above two rules. Note that HTML’s behavior is different from XHTML, where tr actually is allowed as a child of table, and the good practice rule of an explicit tbody element improves consistency between HTML and XHTML.
  • Nested tables are not allowed.
  • Unclosed tags and empty tags (obscure and poorly-supported SGML shorthand rules) are no longer allowed.
  • Attributes may no longer use minimized form (for example, the disabled attribute must be written disabled="disabled").
  • Hexadecimal character references must use a lower-case “x” like in XML.
  • The following presentational elements may not be used: tt, i, b, big, small.
  • The q element may not be used, due to major unresolvable compatibility issues.
  • The width and height attributes are required on img elements.
  • The name attribute has been removed on the a element. You should use id instead.
  • The following attributes were removed from the table element: width, border, frame, rules, cellspacing, cellpadding, datapagesize (a reserved attribute).
  • The following attributes were removed from all other table-related elements: width, align, char, charoff, valign.
  • On the script element, the reserved event and for attributes have been removed.
  • In order to avoid issues when user agents confuse UTF-8 and ISO-8859-1, characters above &#126; are no longer allowed to be written directly in the document. You should use character references for them.

I’m always open to feedback. For the most part, the things this system can check are currently limited to rules you can specify in the SGML declaration and DTD. Keep in mind that this system is new and it’s possible that there are bugs. If you come across any, please let me know.

Oops. New Webpage Test system sort of up

Tuesday, May 1st, 2007

Well, I had planned to make this new Webpage Test version a big release with lots of fanfare, but during a minor site-wide update I accidentally put up part of the new system and overwrote the old version. So let’s call this a version 2.5.

Here’s the scoop:

Over the last few months, I’ve been working on a full-featured SGML parser in PHP in whatever spare time I had between work and dealing with a spine/leg problem. Right now, it’s nearly finished aside from a few important bugs and some tweaks here and there. But the SGML parser is not live right now. The point of using a full SGML parser for syntax highlighting rather than a traditional regexp-based highlighter is that I want it to be accurate, and it currently doesn’t handle certain situations correctly.

What’s up right now is the highlighter I planned to use for content that included PHP and other preprocessing languages. This makes use of a regexp-based highlighting framework I designed inspired by Bluefish’s syntax highlighting framework. Unlike the SGML parser, this syntax highlighter does not have knowledge of the document structure and does not check content against a DTD. It uses a hard-coded list of known elements, attributes, and entities, and generally makes an effort to highlight typical HTML markup reasonably well for a regexp-based highlighter. I plan to also use this framework for CSS and ECMAScript syntax highlighting in the future, at least at first.

This new version of the Webpage Test system displays the highlighted (X)HTML source when viewing a saved page. The next version will also display the highlighted CSS and ECMAScript source if relevant.

Here are some of the design criteria that went into this new feature:

  • Pure (X)HTML markup should be highlighted as accurately as possible with proper indicators for errors and common mistakes. This would have been delivered well with the SGML parser, but for now you’ll get the cheap imitation.
  • (X)HTML markup containing server-side preprocessing instructions like PHP cannot be assumed to be valid (X)HTML before those instructions are executed, so a full SGML parser is not appropriate for this source. A simpler regexp-based highlighter will be used.
  • Highlighting color schemes should be consistent across languages. Constructs from different languages which serve similar respective purposes should be the same color if possible. By default, strings are red, variables are cyan, “special” constructs are yellow, escaped characters are blue, comments are grey and italic, etc.
  • No highlighting color scheme will please everyone. The content is ambitiously highlighted using code elements and semantic class names, and a single modular stylesheet is used for the styling. This allows people to change the styles through user stylesheets or for Web Devout to easily provide options for highlighting schemes in the future.
  • The highlighted source shouldn’t be altered from the original source other than adding in the highlighting elements. The highlighter doesn’t mess with whitespace, throw in extra attributes, screw with empty elements, or anything of the sort. The characters you see are the characters that were inputted.
  • Lines should be numbered, but if possible, selecting the highlighted source and copy/pasting into another application shouldn’t result in any extra characters beyond the original source. An ordered list can’t be used since the different list items would overlap with the highlighting elements, and Firefox and other browsers are known to copy the numbers to the clipboard when copy/pasting. Instead, all modern browsers should get generated content with CSS counters. Since Internet Explorer doesn’t support generated content and CSS counters, it is unfortunately given conditional comments with the numbers as inline data, which means copy/pasting would include the line numbers if you’re using Internet Explorer.
  • A single test case will likely be viewed several times. Because of this, the syntax highlighting is only done once upon submission and is cached.

There are several other improvements in this version besides the syntax highlighter:

  • Newly saved test cases now reuse expired IDs in order to minimize the length of the URL.
  • You may now easily load in remote sites by typing something like http://www.webdevout.net/test?http://www.w3.org/. The HTTP headers are displayed and highlighted for clarity.
  • There’s nicer feedback when saving a test case.

There are still some known bugs in this version. The highlighter doesn’t yet make sure that all highlighting code elements are closed, so it’s possible for the output to be invalid if your markup ends unexpectedly. As mentioned above, this wasn’t supposed to go live yet, but it’s reasonably stable anyway, so I figure it isn’t worth regressing it to the old version.

Safari displays 1×1 alphatransparent PNGs too dark

Friday, April 20th, 2007

I finally figured out why Safari was displaying the heading backgrounds on the main Web Devout site too dark: In general, Safari 2.0 seems to screw up the brightness or gamma correction on 1-pixel by 1-pixel alphatransparent PNGs. This is even true for PNGs which don’t have any gamma correction information included. Interestingly, if you change the image size to anything else, the brightness problem goes away. Why does Safari decide to darken 1×1 PNGs? Your guess is as good as mine.

I was using a repeating 1×1 alphatransparent PNG as the background in order to simulate an RGBA value in a CSS 2.x-compatible way. To fix the problem in Safari, I simply changed the image size to 2×1.

I just wanted to point this out in case anyone else runs into it and becomes stumped like I was for a while. The problem seems unique to Safari/WebKit; Konqueror doesn’t seem to have this problem.

Frankly, this is just one of a seemingly endless list of bang-your-head-on-the-desk bugs I regularly find in Safari in quite basic areas. Another one that bothered me for a while was that background images in Safari will repeat if the box is shorter or thinner than the background image even if you have background-repeat: no-repeat;, which you’ll notice if you also use a background-position. This just shows that passing something like Acid2 first doesn’t necessarily mean you’re the cream of the crop. Please exterminate these weird bugs.

Webpage Test tool updates

Sunday, March 4th, 2007

The Webpage Test tool has been updated with a new look and a few new features.

The biggest improvement is the ability to temporarily save test pages, similar to services like pastebin. The saved pages will remain on the server for at least two days before being deleted to save space. The saved page URL is also a short identifier rather than the entire source like the (now removed) “Link” feature used.

You may also specify a base URL outside the HTML. This is useful if you’re trying to offer someone corrected webpage source with URLs relative to the original URL but don’t want to confuse the person by including a base element in the HTML.

Basic HTML templates for HTML 4.01 Strict, XHTML 1.0, and XHTML 1.1 are available via links at the top of the page. Note that the XHTML templates come with the correct content-type (application/xhtml+xml) so browsers handle it like real XHTML. Because Internet Explorer doesn’t support true XHTML, IE will give you the usual download dialog instead of the webpage.

Finally, the system has been updated with a snazzier look. The new look is supported by Firefox, Opera, Safari, Konqueror, and other modern browsers. Internet Explorer currently falls back to a simpler look.

I have recently resumed work on a PHP-based SGML parser and syntax highlighter I’ve been developing, which I will try to eventually incorporate into this system. It aims to support much more of the SGML standard than the common alternatives and also provides indication of some common errors like invalidly placed elements and unrecognized character entities, elements, and attributes. It will not, however, attempt to be a complete validator.

Validity and well-formedness

Tuesday, February 20th, 2007

I’ve just published a new web development article called Validity and Well-Formedness, which explains the distinctions between valid and well-formed XHTML.

If the W3C HTML Validator says your XHTML page is valid, that means it’s also well-formed, right? Wrong! This article has several examples of XHTML documents which are perfectly valid but are malformed and won’t even load in an XML parser.