Understanding HTML5 Validation
One of things that we need to get used to when making the switch from HTML4/XHTML to HTML5 is the way HTML5 validation works, because it’s drastically different from what we’ve become accustomed to in previous iterations of web markup.
First, it should be noted that the W3C’s HTML5 validation engine is “experimental”, so it’s a work in progress that will likely see many changes over the next year or more. Also, we shouldn’t refer to it as a “validator” anymore; it’s now more accurately referred to as a “conformance checker” (although for simplicity I’ll be using the term “validation” and its derivatives).
Thus, when you validate a page, the following warning is given:
The validator checked your document with an experimental feature: HTML5 Conformance Checker. This feature has been made available for your convenience, but be aware that it may be unreliable, or not perfectly up to date with the latest development of some cutting-edge technologies.
That having been said, let’s compare validation results using the same code for both HTML5 and XHTML. Here’s the code we’re going to validate in HTML5 and XHTML:
- <!DOCTYPE html>
- <head>
- <meta charset="UTF-8" />
- <title>HTML5 Validation</title>
- <link rel="stylesheet" href="style.css">
- <script></script>
- </head>
- <embed>
- Text Snippet #1<br>
- <p>
- <p>Text Snippet #2</P>
- <FOrM>
- <input>
- </form>
- <textarea></textarea>
- <a href=index.html target="_blank"><div>& Text Snippet #3</div></a>
<!DOCTYPE html> <head> <meta charset="UTF-8" /> <title>HTML5 Validation</title> <link rel="stylesheet" href="style.css"> <script></script> </head> <embed> Text Snippet #1<br> <p> <p>Text Snippet #2</P> <FOrM> <input> </form> <textarea></textarea> & Text Snippet #3
When we switch to XHTML, we’ll make two changes: We’ll add the proper doctype, and we’ll use the old character encoding meta tag.
Just to make something clear: I’m not doing this comparison in order to imply that HTML5 is better or that XHTML is too strict. The purpose of this experiment is to help us understand what direction HTML5 validation has now taken.
HTML5: 0 Errors; XHTML: 23 Errors
The code shown above is (believe it or not) 100% valid HTML5. The only warnings given by the HTML5 validator are those that are given when validating virtually any script (the warning I mentioned above and another warning related to direct input). But there are no reported errors (using Validator.nu or W3C Markup Validator).
On the other hand, if you take the same code and validate it using XHTML (changing the doctype and character encoding), the W3C validator will print 23 validation errors.

For reference, below you’ll find the code I’m using for XHTML validation. It’s exactly the same as the code example above, except it has the XHTML strict doctype and the meta tag has been changed. Go ahead and copy the script and try validating it (use the “view plain” link; the “copy…” link doesn’t seem to work properly):
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
- <head>
- <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
- <title>XHTML Validation</title>
- <link rel="stylesheet" href="style.css">
- <script></script>
- </head>
- <embed>
- Text Snippet #1<br>
- <p>
- <p>Text Snippet #2</P>
- <FOrM>
- <input>
- </form>
- <textarea></textarea>
- <a href=index.html target="_blank"><div>& Text Snippet #3</div></a>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <title>XHTML Validation</title> <link rel="stylesheet" href="style.css"> <script></script> </head> <embed> Text Snippet #1<br> <p> <p>Text Snippet #2</P> <FOrM> <input> </form> <textarea></textarea> & Text Snippet #3
If you look carefully at the code, you’ll see a whole slew of seemingly atrocious code mistakes. Here’s a list of all the problems that the code has from the standpoint of XHTML validation:
- No
<html>element - The
<meta>element is not closed - The
<link>element is not closed - The
<script>element doesn’t have atypeattribute - No
<body>tag - A nonstandard
<embed>element is used, and it’s not closed - Stray text (i.e. “character data”) with no paragraph or other required parent element
- A
<br>element with no self-closing slash - A paragraph element with no closing tag
- A closing paragraph element in uppercase
- A form element in mixed case with no
actionattribute - A stray non-closed
<input>element that’s not wrapped in a<div>or other required parent element - A stray
<textarea>element with missingrowsandcolsattributes - An anchor element with an unquoted
hrefattribute - A deprecated
targetelement on the anchor - A block-level element (
<div>) nested inside an inline element (<a>) - An ampersand that’s not coded as a special entity
- No closing
<body>and<html>tags
As you can see there are quite a few problems in that document that the XHTML validator flags as errors, while the HTML5 validator has no problem with any of those things listed, and gives the user the feel-good green screen that we all know and love. While it would be beneficial to discuss a number of these “errors” that are now acceptable in HTML5, that’s not the purpose of this article, so I’ll leave those for another time.
What Accounts For These Differences?
The reason there’s such a big difference is simple: HTML validation is now separated from “linting”. A validator should not throw errors for code styling inconsistencies, but should only throw errors for, well, code errors. Thus, developers have been asking for HTML lint tools to aid us in creating consistent and maintainable code. At least one such tool is now available for use, but I’m not completely sure of the quality of the tool, so use at your own discretion.
Also, HTML5 is designed to be backwards-compatible, so it will conform to both HTML4 and XHTML coding styles. Jeffrey Zeldman alluded to this feature of HTML5 when he wrote that the oldest web document is almost valid HTML5.
What Does This Mean?
The fact that the validators don’t spit out any errors does not mean the code is good. Developers should still endeavor to adopt consistent coding methods to keep their code clean and organized. Thus, I’m not trying to discourage developers from paying attention to their coding style, but instead helping us recognize that the validator is now concerned only with real markup errors.
So what do you think? What are your thoughts on this direction in HTML5 validation (or conformance checking) compared to HTML4 and XHTML?
You Might Also Like:
Leave a comment...