Here's what some spiders will do:
telnet adevil.com 80
Trying 67.19.97.178...
Connected to 178.67-19-97.reverse.theplanet.com.
Escape character is '^]'.
GET /tgpweb/F14/b/index.htm HTTP/1.0
Host: adevil.com
HTTP/1.1 401 Authorization Required
Date: Thu, 21 Oct 2004 12:31:15 GMT
Server: Apache/1.3.31 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2634a mod_ssl/2.8.20 OpenSSL/0.9.7a PHP-CGI/0.1b
Connection: close
Content-Type: text/html; charset=iso-8859-1
here's what a good spider should do:
telnet adevil.com 80
Trying 67.19.97.178...
Connected to 178.67-19-97.reverse.theplanet.com.
Escape character is '^]'.
GET /tgpweb/F14/b/index.htm HTTP/1.0
Host: adevil.com
User-Agent: wGet
HTTP/1.1 200 OK
Date: Thu, 21 Oct 2004 12:21:05 GMT
Server: Apache/1.3.31 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2634a mod_ssl/2.8.20 OpenSSL/0.9.7a PHP-CGI/0.1b
Last-Modified: Wed, 20 Oct 2004 12:48:14 GMT
ETag: "28042d-31bb-41765e8e"
Accept-Ranges: bytes
Content-Length: 12731
Connection: close
Content-Type: text/html
Basically, the first request is missing the User-Agent: entry and will receive a 401. The second response with the User-Agent: header indeed returns the content in question.
I tried a few different requests and I am a little unsure exactly what he is blocking, but, most User-Agent's go through, so, it appears he is only blocking blank User-Agent's (Browser id string).
__________________
SnapReplay.com a different way to share photos - iPhone & Android
|