|
|
![]() |
#1 |
They have the Internet on computers, now?
|
Script To Spider My Server?
As you all know if you have been doing this for a long time and you have been lazy like myself then you have a server full of thousands and thousands of gallery or site urls on your server....
Well i would like to see or have something that you can put on your server and click a button to spider your server for all index pages that exsist on it then compile them into a nice html page and see everything you have out there... Hope this make sence? anyone know of anything that might be availiable? Thanks Tom
__________________
Webmasters Start Making Big Time Money Here |
![]() |
![]() |
![]() |
#2 |
Subversive filth of the hedonistic decadent West
Join Date: Mar 2003
Location: Southeast Florida
Posts: 27,936
|
|
![]() |
![]() |
![]() |
#3 |
Eighteen 'til I Die
|
|
![]() |
![]() |
![]() |
#4 |
They have the Internet on computers, now?
|
You guys are close to what im looking for i think cleo is closer but im not wanting to spider a site or a database...
cleo do you know if that script will pull the html pages they arent in a database they are just on my server..... I want to spider my entire server.... there are like 30 domain names and because of years of building and submitting gallerys there are literally millions of gallerys on my server in each of the domains so i want to spider my server for all the webpages that are on it then if that isnt enough...lol i want it to place everything into a nice format so that 1 i can see what all i have out there and 2 i could even take that one page that has all my galleries in it and submit it to search engines etc etc....
__________________
Webmasters Start Making Big Time Money Here |
![]() |
![]() |
![]() |
#5 |
Shut up brain, or I'll stab you with a Q-tip!
Join Date: Aug 2003
Posts: 114
|
Here is a quick-and-ugly one-liner for bash:
host=www.yourdomain.com;for var in `find . -type f -name index.html| cut -c 3-`;do echo \<a\ href=\"http://$host/$var\"\>$host/$var\<\/a\>\<br\> >> /tmp/$host-indexpages.html;done You would have to run this from the root of each domain, so it's definately not a run-once solution, but it will get the job the done. |
![]() |
![]() |
![]() |
#6 |
Subversive filth of the hedonistic decadent West
Join Date: Mar 2003
Location: Southeast Florida
Posts: 27,936
|
The one that I'm using only does databases. I found it over at www.hotscripts.com and I seem to remember seeing ones that do what you want so you may want to de a search over there.
|
![]() |
![]() |
![]() |
#7 | |
WHO IS FONZY!?! Don't they teach you anything at school?
Join Date: Feb 2004
Posts: 42
|
Quote:
You won't be able to spider all the URLs black-box style by spidering all your web sites because you don't necessarily have links going to *all* your pages. You'll get a full picture by spidering your directories itself on the server. Entreri. |
|
![]() |
![]() |
![]() |
#8 | |
They have the Internet on computers, now?
|
Quote:
__________________
Webmasters Start Making Big Time Money Here |
|
![]() |
![]() |
![]() |
#9 | |
WHO IS FONZY!?! Don't they teach you anything at school?
Join Date: Feb 2004
Posts: 42
|
Quote:
![]() Entreri. |
|
![]() |
![]() |
![]() |
#10 |
Subversive filth of the hedonistic decadent West
Join Date: Mar 2003
Location: Southeast Florida
Posts: 27,936
|
You could open a shell and type
find / -name "*.html" -print |
![]() |
![]() |
![]() |
#11 | |
WHO IS FONZY!?! Don't they teach you anything at school?
Join Date: Feb 2004
Posts: 42
|
Quote:
Entreri. |
|
![]() |
![]() |
![]() |
#12 |
They have the Internet on computers, now?
|
Thanks to everyone but for someone who knows Very Little code none of this helps me at all...
![]() So i guess a custom script is the answer....
__________________
Webmasters Start Making Big Time Money Here |
![]() |
![]() |
![]() |
#13 |
Lord help me, I'm just not that bright
Join Date: Jun 2004
Posts: 106
|
The problem with doing searches directly on the server is the script will only be able to find the index pages relative to the server structure...it won't know the domain setup. Of course one could add to the script an ability to parse your web server settings to try to translate all that.
If your structure is relatively simple though you could get around it using Cleo's suggestion find / -name "index.html" -print or find / -name "index.html" -print > filenames.txt to have the results put into a file you can download. Then use a global search/replace on the file on your favorite word processor changing the server relative directories to the url equivalent...example if you had /home/websites/foobar.com/site1/index.html /home/websites/foobar.com/site2/index.html Replace /home/websites/foobar.com/ with http://foobar.com/ to become http://foobar.com/site1/index.html http://foobar.com/site2/index.html |
![]() |
![]() |
![]() |
#14 |
Jim? I heard he's a dirty pornographer.
Join Date: Aug 2003
Location: Washington, DC
Posts: 2,706
|
Hey Tom,
I had a custom script done a couple of years ago. The script reads every directory and searches for index.html. Then it writes every thing to new html page create a url and using the description for the text link. Basically ir creates a list of all index.html pages that I have. I've only used it for single domains at a time so I'll have to look at it to see if it can handle complete servers but it works for me. Hit me up on ICQ this afternoon. I'm taking my girls out to brunch soon but I'll be working later. |
![]() |
![]() |
![]() |
|
|