Thread: HTML Validation
View Single Post
Old 2006-04-12, 12:32 PM   #7
cd34
a.k.a. Sparky
 
cd34's Avatar
 
Join Date: Sep 2004
Location: West Palm Beach, FL, USA
Posts: 2,396
It is my belief that extremely broken html will cause extreme problems.

What the browser does, and how a google bot reads a page are totally different. Google uses Python, so, we'll assume that they use Python for their bot. Python has an sgml parser which takes your page, dissects it into a tree structure, then goes to work on that.

Things like

Code:
<a href="page.html>hi there</a>
Will sometimes properly be rendered in browsers. Hanging table cells, improperly nested cells, etc -- all go towards making an automated process have problems. It used to be that you didn't need to close the <td>, <tr>, <b>, etc, however, those all break automatic parsing.

While I don't follow every recommendation that the validators give -- I do make sure that the html isn't broken. There are pages I have that have incorrect html, bgimage, bgcolor, etc as attributes on html that isn't in the standard, but, I'll let that slide. That won't break a parser.

However, improperly nested content can sometimes cause problems.

Code:
<a href="page.html"><h1>hi there</a></h1>
An automated process will get confused with the above. Depending on how they are parsing, I would suspect you might lose the effect of the <h1>. Now, google probably goes to all lengths to make sure they can spider the web to the best of their ability, but, why gamble on that?
__________________
SnapReplay.com a different way to share photos - iPhone & Android
cd34 is offline   Reply With Quote