Hi,
Curl is a good program for getting webpages, headers, etc. It's installed on most (good) hosting servers.
Here's how I use it:
- Column 'lastspider' on my gallery table
- Query table, getting URL's not spidered the last xxx days/hours/weeks/whatever
- Use curl extension to connect to URL.
- You can choose only to download headers, which is much faster than downloading the full page
- Check header respons (must be 200). If it's 404 -> page not found, 301 or 302 -> redirect)
- Update your table!
|