![]() |
Fast 404 error checker for Link Lists
does anyone know where to get the subject?
It must check links for 404/redirect... Php one so I am able to cronjob it :) The one I have fails from time to time. |
Please guys, I know you're hiding something :)
|
Do you have the links to be checked in a dbase or does it just have to check from the site?
Regards, Thomas |
it doesn matter..I think from db will be faster..so..But will be happy with bot also
|
I don't know any PHP script which will do this 'out-of-the-box'
Do you have any coding experience? If so, check: http://www.php.net/manual/nl/ref.curl.php If not, feel free to contact me ;) |
|
thanks so much guys!
Mr. Stiff I can php..but confused what to do with curl? how it can help me out? 2Joneze...thanks for the link! |
Hi,
Curl is a good program for getting webpages, headers, etc. It's installed on most (good) hosting servers. Here's how I use it: - Column 'lastspider' on my gallery table - Query table, getting URL's not spidered the last xxx days/hours/weeks/whatever - Use curl extension to connect to URL. - You can choose only to download headers, which is much faster than downloading the full page - Check header respons (must be 200). If it's 404 -> page not found, 301 or 302 -> redirect) - Update your table! |
Quote:
|
I have checker..it does not use curl() and it fails me..gives invalid results most of the times.
I dont like scripts that are zend since I use to optimise script myself..making it unique. Thanks guys, this thread should be usefull for these who dont know about it. |
mr stiff, its a good idea.."researching" curl right now..
$ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "http://www.sortlinks.com"); curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_CURLOPT_REFERER, $host); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"); curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0); $t = curl_exec($ch); echo $t; I was only able get full page like php file() could you please let me know how to get header and get so called spider response? |
code = curl_easy_setopt(http_headconn, CURLOPT_NOBODY, 1);
|
Fatal error: Call to undefined function: curl_easy_setopt()
|
On my domains I block all known offline browsers, email harvesters, download managers, etc.
Curl is one of those that I block... because I don't want anyone 'mirroring' my content. I've tried using scripts to clean out the 404s and redirects, but nothing is 100% accurate. Even manual checking isn't perfect, as you could check at the time the server is going thru a reset for whatever reason. You should use (and trust) whichever you find the most satisfactory for you... or a combination of 2 or 3. Just my 2c worth. |
oast, how do you manage to dist. good bots from bad ones?
|
Thru the User Agent string that (nearly) all programs use to identify themselves.
I use htaccess then to forbid (or redirect) the 'bad boys'. |
A small extract from the filtering lines of my .htaccess file looks like this:
Quote:
mod_rewrite can be a very powerful tool if used correctly. AFAIK Apache is the only server it is available on, but as a large number of hosting companies prefer Apache, you should be OK |
Quote:
Honestly Bill, I was naive at the time. I don't do things like that any more |blowkiss| |
Quote:
Definatly leave the line 'curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)");' This will fool those webmasters checking for Curl ;) |
thanks ost and Mr. Stiff - good job :)
|
All times are GMT -4. The time now is 08:07 AM. |
Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
© Greenguy Marketing Inc