Web Devout tidings


Archive for the 'Web Devout' Category

No, I haven’t been silenced

Sunday, October 28th, 2007

I haven’t really added anything to the Web Devout site in a while. In fact, I’ve even fallen behind on the security summary updates for all three browsers. A lot of it is due to my very busy work week, trying to fulfill the “guy who solves all our problems” role for several different projects. But I have been committing my weekends to Web Devout.

So why haven’t there been any changes on the site? Well, changes are happening, but they’re happening somewhere else. A couple of weeks ago, I was given a powerful new server, all to myself, with tons of bandwidth to spare. Because this is the first time I actually have full administrative control over the server, and I can actually use recent versions of PHP and whatnot, I’ve decided it’s time to give several sections of the site some much-needed backend love (nmiaow). And while I was at it, I went ahead and started rewriting the entire site from scratch, based around a new modular homegrown infrastructure that will make future development a lot cleaner. I’m designing it with the idea that I’ll eventually open source it for other people to use. This rewrite will also give me a chance to implement an OpenID-based user account system seamlessly with all services on this site, and I hope to develop an early working version of my long-planned public bug tracking system by the time the new site goes public.

For those of you just tuning in, I want to make a single public bug reporting and tracking system for all versions of all web browsers, with focus on simplicity and ease of use (as opposed to, well, Bugzilla). Well-confirmed bugs will be automatically put into a table structure similar to the current web browser standards support resource. I hope it to be entirely user-driven, with little to no editorial oversight needed. We’ll see how well that goes. ;)

I’m still a long way off before I have something to show you, but hopefully it will be worth it.

Validate XHTML parsed as HTML

Sunday, August 12th, 2007

When you send XHTML to a browser using the common text/html content type, all major browsers will respond by using their regular HTML parsers on your page, regardless of the doctype. For some reason, the W3C HTML Validator doesn’t follow this widely-accepted convention. Instead, if you’re using an XHTML doctype, the W3C Validator will use an XML parser on your page. Obviously, that will give different results than what your browser is seeing. And unfortunately, there was no easy way to force the W3C Validator to parse your page with an HTML parser like everyone else did.

That is, until now. I’ve just released the new Validate XHTML Parsed as HTML tool. It works very much like the HTML Good Practice Checker: you submit the URL you want to test, it makes a few minimal changes to the beginning of your markup in order to modify how the W3C Validator sees your code, you click the button to validate it, and the results appear below.

The purpose of this tool is to illustrate how the compatibility issues between XHTML and HTML are not as simple as whether or not you follow the HTML Compatibility Guidelines. A fully-compliant HTML parser following widely-accepted conventions for parsing mode selection would encounter all of these errors when attempting to parse your page. Popular web browsers don’t support the Null End Tag construct, so they would see it slightly differently, but they would still see errors in each instance of /> on the page. I thought one of the selling qualities of XHTML was that it was supposed to put an end to lax error handling. I guess not.

What’s up

Friday, August 10th, 2007

I apologize for the lack of updates here lately. I’ve been swamped at work with projects of biblical proportions (you know, getting colleges to actually talk to each other), all due at the end of August or September. Hopefully after that I can get back full-swing into some of the stuff I’m developing for Web Devout.

In the meantime, you can read some of my latest posts at Tech Center Current or my random non-serious blog Nanobits.

HTML good practice checker

Sunday, July 1st, 2007

Do you like clean markup? Do you use HTML and still prefer to quote all your attribute values, use lower-case tag names, and generally follow good clean markup practices? Do you wish you could force the HTML Validator to be even more strict so you could quickly identify stray XHTML-style self-closing tags in your HTML and other issues that it usually ignores?

If so, then you may find my new HTML good practice checker useful. It sets up a custom SGML declaration (for markup parsing rules) and DTD (for document structure rules) which instruct the W3C HTML Validator to be more strict with your document.

Here is a partial list of the new rules enforced:

  • All tag and attribute names must be lower-case.
  • All attribute values must be quoted.
  • Declarations are case-sensitive like in XML.
  • SGML Null End Tags (NET) are not allowed. This means that the validator will recognize that a <br /> in an HTML document is a problem.
  • End tags must be used on all non-empty elements. Note: If an end tag is forbidden in normal HTML, it’s still forbidden here.
  • Start tags must be used on all elements.
  • You may not write <tr> tags directly inside the table contents; you must include them in a tbody. In fact, in terms of document structure, tr was never truly allowed as a child of table in HTML. They were normally assumed to be within a tbody element with omitted start and end tags. So this rule is actually just a natural consequence of the above two rules. Note that HTML’s behavior is different from XHTML, where tr actually is allowed as a child of table, and the good practice rule of an explicit tbody element improves consistency between HTML and XHTML.
  • Nested tables are not allowed.
  • Unclosed tags and empty tags (obscure and poorly-supported SGML shorthand rules) are no longer allowed.
  • Attributes may no longer use minimized form (for example, the disabled attribute must be written disabled="disabled").
  • Hexadecimal character references must use a lower-case “x” like in XML.
  • The following presentational elements may not be used: tt, i, b, big, small.
  • The q element may not be used, due to major unresolvable compatibility issues.
  • The width and height attributes are required on img elements.
  • The name attribute has been removed on the a element. You should use id instead.
  • The following attributes were removed from the table element: width, border, frame, rules, cellspacing, cellpadding, datapagesize (a reserved attribute).
  • The following attributes were removed from all other table-related elements: width, align, char, charoff, valign.
  • On the script element, the reserved event and for attributes have been removed.
  • In order to avoid issues when user agents confuse UTF-8 and ISO-8859-1, characters above &#126; are no longer allowed to be written directly in the document. You should use character references for them.

I’m always open to feedback. For the most part, the things this system can check are currently limited to rules you can specify in the SGML declaration and DTD. Keep in mind that this system is new and it’s possible that there are bugs. If you come across any, please let me know.

Oops. New Webpage Test system sort of up

Tuesday, May 1st, 2007

Well, I had planned to make this new Webpage Test version a big release with lots of fanfare, but during a minor site-wide update I accidentally put up part of the new system and overwrote the old version. So let’s call this a version 2.5.

Here’s the scoop:

Over the last few months, I’ve been working on a full-featured SGML parser in PHP in whatever spare time I had between work and dealing with a spine/leg problem. Right now, it’s nearly finished aside from a few important bugs and some tweaks here and there. But the SGML parser is not live right now. The point of using a full SGML parser for syntax highlighting rather than a traditional regexp-based highlighter is that I want it to be accurate, and it currently doesn’t handle certain situations correctly.

What’s up right now is the highlighter I planned to use for content that included PHP and other preprocessing languages. This makes use of a regexp-based highlighting framework I designed inspired by Bluefish’s syntax highlighting framework. Unlike the SGML parser, this syntax highlighter does not have knowledge of the document structure and does not check content against a DTD. It uses a hard-coded list of known elements, attributes, and entities, and generally makes an effort to highlight typical HTML markup reasonably well for a regexp-based highlighter. I plan to also use this framework for CSS and ECMAScript syntax highlighting in the future, at least at first.

This new version of the Webpage Test system displays the highlighted (X)HTML source when viewing a saved page. The next version will also display the highlighted CSS and ECMAScript source if relevant.

Here are some of the design criteria that went into this new feature:

  • Pure (X)HTML markup should be highlighted as accurately as possible with proper indicators for errors and common mistakes. This would have been delivered well with the SGML parser, but for now you’ll get the cheap imitation.
  • (X)HTML markup containing server-side preprocessing instructions like PHP cannot be assumed to be valid (X)HTML before those instructions are executed, so a full SGML parser is not appropriate for this source. A simpler regexp-based highlighter will be used.
  • Highlighting color schemes should be consistent across languages. Constructs from different languages which serve similar respective purposes should be the same color if possible. By default, strings are red, variables are cyan, “special” constructs are yellow, escaped characters are blue, comments are grey and italic, etc.
  • No highlighting color scheme will please everyone. The content is ambitiously highlighted using code elements and semantic class names, and a single modular stylesheet is used for the styling. This allows people to change the styles through user stylesheets or for Web Devout to easily provide options for highlighting schemes in the future.
  • The highlighted source shouldn’t be altered from the original source other than adding in the highlighting elements. The highlighter doesn’t mess with whitespace, throw in extra attributes, screw with empty elements, or anything of the sort. The characters you see are the characters that were inputted.
  • Lines should be numbered, but if possible, selecting the highlighted source and copy/pasting into another application shouldn’t result in any extra characters beyond the original source. An ordered list can’t be used since the different list items would overlap with the highlighting elements, and Firefox and other browsers are known to copy the numbers to the clipboard when copy/pasting. Instead, all modern browsers should get generated content with CSS counters. Since Internet Explorer doesn’t support generated content and CSS counters, it is unfortunately given conditional comments with the numbers as inline data, which means copy/pasting would include the line numbers if you’re using Internet Explorer.
  • A single test case will likely be viewed several times. Because of this, the syntax highlighting is only done once upon submission and is cached.

There are several other improvements in this version besides the syntax highlighter:

  • Newly saved test cases now reuse expired IDs in order to minimize the length of the URL.
  • You may now easily load in remote sites by typing something like http://www.webdevout.net/test?http://www.w3.org/. The HTTP headers are displayed and highlighted for clarity.
  • There’s nicer feedback when saving a test case.

There are still some known bugs in this version. The highlighter doesn’t yet make sure that all highlighting code elements are closed, so it’s possible for the output to be invalid if your markup ends unexpectedly. As mentioned above, this wasn’t supposed to go live yet, but it’s reasonably stable anyway, so I figure it isn’t worth regressing it to the old version.