Escaping style and script data

When including style or script information in a webpage, it's usually best to simply refer to external files rather than including the data in the HTML file. If for some reason you prefer to use inline style information, this article will describe ways to prevent potential parsing problems.

Hiding from unsupporting browsers

This issue is extremely small in modern times, but there is a problem with very old browsers when they come across a style or script element with inline content. Browsers that didn't support these elements would accidentally display the contents on the webpage itself. Although these browsers have negligible usage these days, it is relatively trivial to avoid this problem, and there are even features in HTML and CSS made specifically to help.

The basic idea behind preventing old browsers from displaying this data is to make them believe that the data is inside a comment. CSS deliberately allows you to put meaningless <!-- and --> tokens anywhere outside at-rules and rulesets, and the tokens are simply ignored. This means you can enclose your entire inline style data in what looks like a regular HTML comment, but browsers that support CSS will still handle the style information normally. Here is an example:

<style type="text/css">
	<!--
	body
	{
		background: #eee;
		color: #000;
	}
	-->
</style>

With the script element, it's a little different. ECMAScript (including JavaScript and JScript) allows a <!-- token to be ignored at the beginning of the script and any additional characters up to the next newline are also ignored. However, it doesn't automatically ignore a --> token, but instead it throws an error. The solution is to use a single line ECMAScript comment (//) to hide that token from browsers that support ECMAScript. Here is the result:

<script type="text/javascript">
	<!--
	function foo ()
	{
		bar();
	}
	//-->
</script>

Parsing differences between HTML and XHTML

The style and script elements have slightly different definitions between HTML and XHTML. In HTML, their contents are defined as CDATA, meaning everything from the start of the contents to the next occurrence of the closing token (</ in this case) is considered character data that isn't parsed as markup. However, in XHTML their contents are defined as PCDATA, meaning the contents are parsed as markup: comment directives are hidden, ampersands (&) are treated as the beginning of character references, marked sections are recognized, etc. If you wish to include < and & characters in your script or stylesheet, you will run into problems in XHTML.

To force the browser to handle the contents as CDATA in XHTML, you must mark up a special CDATA section. This looks like the following:

<script type="text/javascript">
	<![CDATA[
		document.title = "Foo & Bar";
	]]>
</script>

Most major browsers don't support CDATA sections when they parse the document as HTML. If you use XHTML that is sent as text/html (the default in most setups and not recommended), most major browsers will parse it like regular HTML and will stumble on the CDATA sections even though the element definitions in the XHTML DTD allow for them. If you plan to send your XHTML as text/html and want to make sure it's handled correctly in both cases, use the styling or scripting language to comment out the CDATA section markers. This way, if CDATA sections aren't supported, the section markers will be effectively ignored, and if they are supported, the markers will be properly interpretted and the comments will basically do nothing. The following code illustrates the concept. The code is given twice to highlight the differences in parsing modes in common browsers.

<!-- application/xhtml+xml -->

<style type="text/css">
	/*<![CDATA[*/
		body
		{
			background: #eee;
		}
	/*]]>*/
</style>

<script type="text/javascript">
	//<![CDATA[
		document.title = "Foo & Bar";
	//]]>
</script>
<!-- text/html, no support for marked sections -->

<style type="text/css">
	/*<![CDATA[*/
		body
		{
			background: #eee;
		}
	/*]]>*/
</style>

<script type="text/javascript">
	//<![CDATA[
		document.title = "Foo & Bar";
	//]]>
</script>

Putting the two together

If you want to write XHTML that can also be handled by common HTML user agents as well as old browsers that don't support the style and script elements, there is a monster of a setup available to accomplish this. The following code is given thrice to highlight the different parsing results.

<!-- application/xhtml+xml -->

<style type="text/css">
	<!--/*--><![CDATA[/*><!--*/
		body
		{
			background: #eee;
		}
	/*]]>*/-->
</style>

<script type="text/javascript">
	<!--//--><![CDATA[//><!--
		document.title = "Foo & Bar";
	//--><!]]>
</script>
<!-- text/html -->

<style type="text/css">
	<!--/*--><![CDATA[/*><!--*/
		body
		{
			background: #eee;
		}
	/*]]>*/-->
</style>

<script type="text/javascript">
	<!--//--><![CDATA[//><!--
		document.title = "Foo & Bar";
	//--><!]]>
</script>
<!-- text/html, no support for style, script, or marked sections -->

<style type="text/css">
	<!--/*--><![CDATA[/*><!--*/
		body
		{
			background: #eee;
		}
	/*]]>*/-->
</style>

<script type="text/javascript">
	<!--//--><![CDATA[//><!--
		document.title = "Foo & Bar";
	//--><!]]>
</script>

Some tokens in the above code were added to catch other miscellaneous potential problems. For example, The first // was added in case a browser supports the script element but doesn't ignore the entire line after <!--. Some situations cannot be accounted for, such as if a browser supports CDATA sections but doesn't support the style or script elements. Remember that the very concept of sending XHTML to be handled by an HTML engine is essentially a hack and cannot be done flawlessly.

Credits go to Ian Hickson for the final version.