Web Devout tidings

Archive for May 1st, 2007

Oops. New Webpage Test system sort of up

Tuesday, May 1st, 2007

Well, I had planned to make this new Webpage Test version a big release with lots of fanfare, but during a minor site-wide update I accidentally put up part of the new system and overwrote the old version. So let’s call this a version 2.5.

Here’s the scoop:

Over the last few months, I’ve been working on a full-featured SGML parser in PHP in whatever spare time I had between work and dealing with a spine/leg problem. Right now, it’s nearly finished aside from a few important bugs and some tweaks here and there. But the SGML parser is not live right now. The point of using a full SGML parser for syntax highlighting rather than a traditional regexp-based highlighter is that I want it to be accurate, and it currently doesn’t handle certain situations correctly.

What’s up right now is the highlighter I planned to use for content that included PHP and other preprocessing languages. This makes use of a regexp-based highlighting framework I designed inspired by Bluefish’s syntax highlighting framework. Unlike the SGML parser, this syntax highlighter does not have knowledge of the document structure and does not check content against a DTD. It uses a hard-coded list of known elements, attributes, and entities, and generally makes an effort to highlight typical HTML markup reasonably well for a regexp-based highlighter. I plan to also use this framework for CSS and ECMAScript syntax highlighting in the future, at least at first.

This new version of the Webpage Test system displays the highlighted (X)HTML source when viewing a saved page. The next version will also display the highlighted CSS and ECMAScript source if relevant.

Here are some of the design criteria that went into this new feature:

  • Pure (X)HTML markup should be highlighted as accurately as possible with proper indicators for errors and common mistakes. This would have been delivered well with the SGML parser, but for now you’ll get the cheap imitation.
  • (X)HTML markup containing server-side preprocessing instructions like PHP cannot be assumed to be valid (X)HTML before those instructions are executed, so a full SGML parser is not appropriate for this source. A simpler regexp-based highlighter will be used.
  • Highlighting color schemes should be consistent across languages. Constructs from different languages which serve similar respective purposes should be the same color if possible. By default, strings are red, variables are cyan, “special” constructs are yellow, escaped characters are blue, comments are grey and italic, etc.
  • No highlighting color scheme will please everyone. The content is ambitiously highlighted using code elements and semantic class names, and a single modular stylesheet is used for the styling. This allows people to change the styles through user stylesheets or for Web Devout to easily provide options for highlighting schemes in the future.
  • The highlighted source shouldn’t be altered from the original source other than adding in the highlighting elements. The highlighter doesn’t mess with whitespace, throw in extra attributes, screw with empty elements, or anything of the sort. The characters you see are the characters that were inputted.
  • Lines should be numbered, but if possible, selecting the highlighted source and copy/pasting into another application shouldn’t result in any extra characters beyond the original source. An ordered list can’t be used since the different list items would overlap with the highlighting elements, and Firefox and other browsers are known to copy the numbers to the clipboard when copy/pasting. Instead, all modern browsers should get generated content with CSS counters. Since Internet Explorer doesn’t support generated content and CSS counters, it is unfortunately given conditional comments with the numbers as inline data, which means copy/pasting would include the line numbers if you’re using Internet Explorer.
  • A single test case will likely be viewed several times. Because of this, the syntax highlighting is only done once upon submission and is cached.

There are several other improvements in this version besides the syntax highlighter:

  • Newly saved test cases now reuse expired IDs in order to minimize the length of the URL.
  • You may now easily load in remote sites by typing something like http://www.webdevout.net/test?http://www.w3.org/. The HTTP headers are displayed and highlighted for clarity.
  • There’s nicer feedback when saving a test case.

There are still some known bugs in this version. The highlighter doesn’t yet make sure that all highlighting code elements are closed, so it’s possible for the output to be invalid if your markup ends unexpectedly. As mentioned above, this wasn’t supposed to go live yet, but it’s reasonably stable anyway, so I figure it isn’t worth regressing it to the old version.