|
|
|
|
|
|
![]() |
#1 |
If something goes wrong at the plant, blame the guy who can't speak English
Join Date: Aug 2003
Location: Kent, UK
Posts: 33
|
Link Bots
Im having trouble submitting to quite a few sites, getting 401 errors (authorisation required). The guy who manages the server where my sites are says theres an empty .htaccess file but nothing else that would generate this error. Ive just seen reference to a 'link bot' Does this mean spider and if so (or not) any ideas what my server guy should be looking for ? Thanks
__________________
British Dollars |
![]() |
![]() |
![]() |
#2 |
The Original Greenguy (Est'd 1996) & AVN HOF Member - I Crop Pics For Thumbs In My Sleep
|
Post a URL that's going 401 & we'll have a look at it
![]() |
![]() |
![]() |
![]() |
#3 |
If something goes wrong at the plant, blame the guy who can't speak English
Join Date: Aug 2003
Location: Kent, UK
Posts: 33
|
This is one of them
http://www.adevil.com/tgpweb/F14/b/index.htm Outlawsporn script said it went 401 and so far out of 120 submissions, 9 said it right away and probably others whenthe owner runs their bot/script. I do have a paysite on the same webspace (www.adevilsamateurs.com) which has .htaccess for the members area but all my free sites and associated pic's and banners are in the public part of my web space
__________________
British Dollars |
![]() |
![]() |
![]() |
#4 |
Subversive filth of the hedonistic decadent West
Join Date: Mar 2003
Location: Southeast Florida
Posts: 27,936
|
Having spaces in your file names isn't helping you.
The HPA on your main page has what looks like clickable thumbs so it go rejected at my LL. Also it loaded really slowly when I was doing reviews yesterday. |
![]() |
![]() |
![]() |
#5 |
a.k.a. Sparky
Join Date: Sep 2004
Location: West Palm Beach, FL, USA
Posts: 2,396
|
Here's what some spiders will do:
telnet adevil.com 80 Trying 67.19.97.178... Connected to 178.67-19-97.reverse.theplanet.com. Escape character is '^]'. GET /tgpweb/F14/b/index.htm HTTP/1.0 Host: adevil.com HTTP/1.1 401 Authorization Required Date: Thu, 21 Oct 2004 12:31:15 GMT Server: Apache/1.3.31 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2634a mod_ssl/2.8.20 OpenSSL/0.9.7a PHP-CGI/0.1b Connection: close Content-Type: text/html; charset=iso-8859-1 here's what a good spider should do: telnet adevil.com 80 Trying 67.19.97.178... Connected to 178.67-19-97.reverse.theplanet.com. Escape character is '^]'. GET /tgpweb/F14/b/index.htm HTTP/1.0 Host: adevil.com User-Agent: wGet HTTP/1.1 200 OK Date: Thu, 21 Oct 2004 12:21:05 GMT Server: Apache/1.3.31 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2634a mod_ssl/2.8.20 OpenSSL/0.9.7a PHP-CGI/0.1b Last-Modified: Wed, 20 Oct 2004 12:48:14 GMT ETag: "28042d-31bb-41765e8e" Accept-Ranges: bytes Content-Length: 12731 Connection: close Content-Type: text/html Basically, the first request is missing the User-Agent: entry and will receive a 401. The second response with the User-Agent: header indeed returns the content in question. I tried a few different requests and I am a little unsure exactly what he is blocking, but, most User-Agent's go through, so, it appears he is only blocking blank User-Agent's (Browser id string).
__________________
SnapReplay.com a different way to share photos - iPhone & Android |
![]() |
![]() |
![]() |
#6 |
Subversive filth of the hedonistic decadent West
Join Date: Mar 2003
Location: Southeast Florida
Posts: 27,936
|
How can I see what User-Agent string my bots are sending?
|
![]() |
![]() |
![]() |
#7 |
a.k.a. Sparky
Join Date: Sep 2004
Location: West Palm Beach, FL, USA
Posts: 2,396
|
You can have it spider a page where you have access to the raw logs. Then grep the weblogs for the page that you had it spider.
You can: grep -r User-Agent: * in the directory where the script resides. You could run ettercap, tcpdump or some other packet capture program on the server to watch packets as it does a check. Might be a bit overwhelming as a LOT of stuff might be going on at the same time. Should require root access to do this -- only for the insanely curious. There are a bunch of other ways -- a cgi script that dumps the environment to a location on the server, or emails you the info when someone hits it, etc.
__________________
SnapReplay.com a different way to share photos - iPhone & Android |
![]() |
![]() |
![]() |
#8 |
The Original Greenguy (Est'd 1996) & AVN HOF Member - I Crop Pics For Thumbs In My Sleep
|
Are you submitting it with the /index.htm at the end of just http://www.adevil.com/tgpweb/F14/b/ - reason I ask is that there's not index.html page in that directory & if the script/bot is thinking there should be one & not finding it, that may cause the 401 error.
Of course, I am not a tech guy ![]() |
![]() |
![]() |
![]() |
#9 |
Screw you, guys. I'm going home.
|
Neither adevil.com nor adevilsamateurs.com domain load for me at all
![]() |
![]() |
![]() |
![]() |
#10 |
If something goes wrong at the plant, blame the guy who can't speak English
Join Date: Aug 2003
Location: Kent, UK
Posts: 33
|
I have given the list of sites to my server guy who turned me down in my last set of uploads and he has come back with the following which may be of some info for anyone interested :-
''These sites are requesting URL's on your domain with a "^" character in the heading. This character is prohibited for security reasons and can be exploited. It is usually included in poorly written scripts and unnecessary to normal web server trafficing. However I have rescinded this security directive server wide to confirm/deny this fact. Please confirm that these link trades are now permissible?'' It will now be interesting to see what happens with my next batch of free site submissions. With regards the .htm and html I tried both and that wasnt the problem and the adevil.com etc should have the prefix in front of both http://www.adevil.com/ - they seem to load ok for me - hope they are ok for everyone else
__________________
British Dollars |
![]() |
![]() |
![]() |
#11 |
a.k.a. Sparky
Join Date: Sep 2004
Location: West Palm Beach, FL, USA
Posts: 2,396
|
personally, I don't think his argument holds water -- I demonstrated a GET request that didn't have a ^ in the request, however, he does appear to have fixed the problem.
telnet adevil.com 80 Trying 67.19.97.178... Connected to 178.67-19-97.reverse.theplanet.com. Escape character is '^]'. GET /tgpweb/F14/b/index.htm HTTP/1.0 Host: adevil.com HTTP/1.1 200 OK Date: Thu, 21 Oct 2004 18:17:53 GMT Server: Apache/1.3.31 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2634a mod_ssl/2.8.20 OpenSSL/0.9.7a PHP-CGI/0.1b Last-Modified: Wed, 20 Oct 2004 12:48:14 GMT ETag: "28042d-31bb-41765e8e" Accept-Ranges: bytes Content-Length: 12731 Connection: close Content-Type: text/htm That request failed earlier.
__________________
SnapReplay.com a different way to share photos - iPhone & Android |
![]() |
![]() |
![]() |
|
|