Greenguy's Board


Go Back   Greenguy's Board > Search Engines
Register FAQ Calendar Search Today's Posts Mark Forums Read

 
 
Thread Tools Search this Thread Rate Thread Display Modes
Prev Previous Post   Next Post Next
Old 2006-04-12, 01:32 PM   #7
cd34
a.k.a. Sparky
 
cd34's Avatar
 
Join Date: Sep 2004
Location: West Palm Beach, FL, USA
Posts: 2,396
It is my belief that extremely broken html will cause extreme problems.

What the browser does, and how a google bot reads a page are totally different. Google uses Python, so, we'll assume that they use Python for their bot. Python has an sgml parser which takes your page, dissects it into a tree structure, then goes to work on that.

Things like

Code:
<a href="page.html>hi there</a>
Will sometimes properly be rendered in browsers. Hanging table cells, improperly nested cells, etc -- all go towards making an automated process have problems. It used to be that you didn't need to close the <td>, <tr>, <b>, etc, however, those all break automatic parsing.

While I don't follow every recommendation that the validators give -- I do make sure that the html isn't broken. There are pages I have that have incorrect html, bgimage, bgcolor, etc as attributes on html that isn't in the standard, but, I'll let that slide. That won't break a parser.

However, improperly nested content can sometimes cause problems.

Code:
<a href="page.html"><h1>hi there</a></h1>
An automated process will get confused with the above. Depending on how they are parsing, I would suspect you might lose the effect of the <h1>. Now, google probably goes to all lengths to make sure they can spider the web to the best of their ability, but, why gamble on that?
__________________
SnapReplay.com a different way to share photos - iPhone & Android
cd34 is offline   Reply With Quote
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 12:27 AM.


Mark Read
Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2026, Jelsoft Enterprises Ltd.
© Greenguy Marketing Inc