Greenguy's Board

Greenguy's Board (http://www.greenguysboard.com/board/index.php)
-   General Business Knowledge (http://www.greenguysboard.com/board/forumdisplay.php?f=10)
-   -   Script To Spider My Server? (http://www.greenguysboard.com/board/showthread.php?t=9160)

3xTom 2004-07-10 03:07 PM

Script To Spider My Server?
 
As you all know if you have been doing this for a long time and you have been lazy like myself then you have a server full of thousands and thousands of gallery or site urls on your server....

Well i would like to see or have something that you can put on your server and click a button to spider your server for all index pages that exsist on it then compile them into a nice html page and see everything you have out there...

Hope this make sence? anyone know of anything that might be availiable?

Thanks
Tom

Cleo 2004-07-10 03:45 PM

I would use something that searches database.

Like this
http://www.ezscripting.co.uk/csvsearch/

Chop Smith 2004-07-10 07:17 PM

This might be what you are looking for

http://home.snafu.de/tilman/xenulink.html

3xTom 2004-07-10 08:42 PM

You guys are close to what im looking for i think cleo is closer but im not wanting to spider a site or a database...

cleo do you know if that script will pull the html pages they arent in a database they are just on my server.....

I want to spider my entire server.... there are like 30 domain names and because of years of building and submitting gallerys there are literally millions of gallerys on my server in each of the domains so i want to spider my server for all the webpages that are on it then if that isnt enough...lol i want it to place everything into a nice format so that 1 i can see what all i have out there and 2 i could even take that one page that has all my galleries in it and submit it to search engines etc etc....

airdick 2004-07-10 09:09 PM

Here is a quick-and-ugly one-liner for bash:

host=www.yourdomain.com;for var in `find . -type f -name index.html| cut -c 3-`;do echo \$host/$var\<\/a\>\ >> /tmp/$host-indexpages.html;done

You would have to run this from the root of each domain, so it's definately not a run-once solution, but it will get the job the done.

Cleo 2004-07-10 09:11 PM

The one that I'm using only does databases. I found it over at www.hotscripts.com and I seem to remember seeing ones that do what you want so you may want to de a search over there.

Entreri 2004-07-10 09:15 PM

Quote:

Originally posted by 3xlinks
You guys are close to what im looking for i think cleo is closer but im not wanting to spider a site or a database...

cleo do you know if that script will pull the html pages they arent in a database they are just on my server.....

I want to spider my entire server.... there are like 30 domain names and because of years of building and submitting gallerys there are literally millions of gallerys on my server in each of the domains so i want to spider my server for all the webpages that are on it then if that isnt enough...lol i want it to place everything into a nice format so that 1 i can see what all i have out there and 2 i could even take that one page that has all my galleries in it and submit it to search engines etc etc....

If all your files are on a single server, you could write a small script to create an inventory of all the index pages (or whatever you're looking for) that you have and generate a report. (If you have many servers, then you'll have to run the script on all of them)

You won't be able to spider all the URLs black-box style by spidering all your web sites because you don't necessarily have links going to *all* your pages. You'll get a full picture by spidering your directories itself on the server.

Entreri.

3xTom 2004-07-10 10:52 PM

Quote:

Originally posted by Entreri
If all your files are on a single server, you could write a small script to create an inventory of all the index pages (or whatever you're looking for)

Thats exactally what im looking for

Entreri 2004-07-10 11:03 PM

Quote:

Originally posted by 3xlinks
Thats exactally what im looking for
Then I'm sure Cleo can give you a shell snippet to do just that. ;) If you want something more sophisticated, a perl script should do the trick.

Entreri.

Cleo 2004-07-10 11:17 PM

You could open a shell and type

find / -name "*.html" -print

Entreri 2004-07-10 11:37 PM

Quote:

Originally posted by 3xlinks
Thats exactally what im looking for
I didn't see it initially but airdick's bash snippet above fits your primary need. If you want better reporting or automation, then you'd need to have somebody write a script for your requirements.

Entreri.

3xTom 2004-07-10 11:41 PM

Thanks to everyone but for someone who knows Very Little code none of this helps me at all... :(

So i guess a custom script is the answer....

Bunnyhop 2004-07-11 02:10 AM

The problem with doing searches directly on the server is the script will only be able to find the index pages relative to the server structure...it won't know the domain setup. Of course one could add to the script an ability to parse your web server settings to try to translate all that.

If your structure is relatively simple though you could get around it using Cleo's suggestion

find / -name "index.html" -print

or

find / -name "index.html" -print > filenames.txt

to have the results put into a file you can download.

Then use a global search/replace on the file on your favorite word processor changing the server relative directories to the url equivalent...example if you had

/home/websites/foobar.com/site1/index.html
/home/websites/foobar.com/site2/index.html

Replace /home/websites/foobar.com/
with
http://foobar.com/

to become

http://foobar.com/site1/index.html
http://foobar.com/site2/index.html

SirMoby 2004-07-11 10:40 AM

Hey Tom,

I had a custom script done a couple of years ago. The script reads every directory and searches for index.html. Then it writes every thing to new html page create a url and using the description for the text link.

Basically ir creates a list of all index.html pages that I have. I've only used it for single domains at a time so I'll have to look at it to see if it can handle complete servers but it works for me.

Hit me up on ICQ this afternoon. I'm taking my girls out to brunch soon but I'll be working later.


All times are GMT -4. The time now is 06:36 PM.

Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
© Greenguy Marketing Inc