Greenguy's Board - View Single Post

Cleo · 2005-03-16, 10:13 AM

For those too lazy to go over and read the AVN article.

Quote:

How can a Webmaster tell if his or her page has been Googlejacked? For starters, type “allinurl:yourdomain.com” in the search box at Google to see what comes up. The subject domain should lead the list. If there are other entries on the page bearing the correct page title and excerpt – and in some cases, cached result – but incorrect URLs, chances are good the page has been targeted by a Googlejack, accidental or otherwise.

Fixing the problem is not as easy as determining if it exists. Webmasters can’t ban 302 referrers or most redirect scripts because servers don’t receive that information during connection requests. Click-throughs from the redirect script-bearing page can be banned, but that will only affect surfers and not search engine spiders, where the problem resides. Webmasters can request the removal of pages from Google, but that’s a lengthy, tedious process that only works within specific parameters.

Schmidt suggests several steps Webmasters can take to minimize the chances that their pages will be hijacked:

Always redirect “non-www” domains (yourdomain.com) to the www version (www.yourdomain.com) or vice-versa, and do it using a 301 code instead of a 302 code.

Always use absolute internal linking on Websites (include the full domain name in links that are pointing from one page to another page within the same site).

Include a bit of constantly updated content on all pages, like a time stamp, a random quote, or a page counter.

Use the meta tag on all pages.

Make all pages confirm their URL “artificially” by inserting a 302 redirect from any URL to the exact same URL and then serving a “200 OK” status code.

Schmidt also suggests that Webmasters take precautions to avoid becoming inadvertent hijackers:

Always use 301 redirects instead of 302 redirects or disallow redirect scripts in the “robots.txt” file or both.*

Request removal of all redirect script URLs from Google’s index. Simply including the URLs in the robots.txt file won’t remove them from Google. That move just ensures the URLs are not revisited by Google spiders.

If you discover that one of your pages has hijacked someone else’s in Google’s index accidentally, make the script in question return a 404 (page not found) error and then request removal of the script from Google’s index.