![]() |
Script To Spider My Server?
As you all know if you have been doing this for a long time and you have been lazy like myself then you have a server full of thousands and thousands of gallery or site urls on your server....
Well i would like to see or have something that you can put on your server and click a button to spider your server for all index pages that exsist on it then compile them into a nice html page and see everything you have out there... Hope this make sence? anyone know of anything that might be availiable? Thanks Tom |
|
|
You guys are close to what im looking for i think cleo is closer but im not wanting to spider a site or a database...
cleo do you know if that script will pull the html pages they arent in a database they are just on my server..... I want to spider my entire server.... there are like 30 domain names and because of years of building and submitting gallerys there are literally millions of gallerys on my server in each of the domains so i want to spider my server for all the webpages that are on it then if that isnt enough...lol i want it to place everything into a nice format so that 1 i can see what all i have out there and 2 i could even take that one page that has all my galleries in it and submit it to search engines etc etc.... |
Here is a quick-and-ugly one-liner for bash:
host=www.yourdomain.com;for var in `find . -type f -name index.html| cut -c 3-`;do echo \$host/$var\<\/a\>\ >> /tmp/$host-indexpages.html;done You would have to run this from the root of each domain, so it's definately not a run-once solution, but it will get the job the done. |
The one that I'm using only does databases. I found it over at www.hotscripts.com and I seem to remember seeing ones that do what you want so you may want to de a search over there.
|
Quote:
You won't be able to spider all the URLs black-box style by spidering all your web sites because you don't necessarily have links going to *all* your pages. You'll get a full picture by spidering your directories itself on the server. Entreri. |
Quote:
|
Quote:
Entreri. |
You could open a shell and type
find / -name "*.html" -print |
Quote:
Entreri. |
Thanks to everyone but for someone who knows Very Little code none of this helps me at all... :(
So i guess a custom script is the answer.... |
The problem with doing searches directly on the server is the script will only be able to find the index pages relative to the server structure...it won't know the domain setup. Of course one could add to the script an ability to parse your web server settings to try to translate all that.
If your structure is relatively simple though you could get around it using Cleo's suggestion find / -name "index.html" -print or find / -name "index.html" -print > filenames.txt to have the results put into a file you can download. Then use a global search/replace on the file on your favorite word processor changing the server relative directories to the url equivalent...example if you had /home/websites/foobar.com/site1/index.html /home/websites/foobar.com/site2/index.html Replace /home/websites/foobar.com/ with http://foobar.com/ to become http://foobar.com/site1/index.html http://foobar.com/site2/index.html |
Hey Tom,
I had a custom script done a couple of years ago. The script reads every directory and searches for index.html. Then it writes every thing to new html page create a url and using the description for the text link. Basically ir creates a list of all index.html pages that I have. I've only used it for single domains at a time so I'll have to look at it to see if it can handle complete servers but it works for me. Hit me up on ICQ this afternoon. I'm taking my girls out to brunch soon but I'll be working later. |
All times are GMT -4. The time now is 06:36 PM. |
Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
© Greenguy Marketing Inc