Greenguy's Board


Go Back   Greenguy's Board > General Business Knowledge
Register FAQ Calendar Today's Posts

Reply
 
Thread Tools Search this Thread Rate Thread Display Modes
Old 2004-07-10, 03:07 PM   #1
3xTom
They have the Internet on computers, now?
 
3xTom's Avatar
 
Join Date: Aug 2003
Location: @home
Posts: 141
Send a message via ICQ to 3xTom
Script To Spider My Server?

As you all know if you have been doing this for a long time and you have been lazy like myself then you have a server full of thousands and thousands of gallery or site urls on your server....

Well i would like to see or have something that you can put on your server and click a button to spider your server for all index pages that exsist on it then compile them into a nice html page and see everything you have out there...

Hope this make sence? anyone know of anything that might be availiable?

Thanks
Tom
3xTom is offline   Reply With Quote
Old 2004-07-10, 03:45 PM   #2
Cleo
Subversive filth of the hedonistic decadent West
 
Cleo's Avatar
 
Join Date: Mar 2003
Location: Southeast Florida
Posts: 27,936
I would use something that searches database.

Like this
http://www.ezscripting.co.uk/csvsearch/
__________________
Free Rides on Uber and Lyft
Uber Car: uberTzTerri
Lyft Car: TZ896289
Cleo is offline   Reply With Quote
Old 2004-07-10, 07:17 PM   #3
Chop Smith
Eighteen 'til I Die
 
Chop Smith's Avatar
 
Join Date: Apr 2003
Location: Mississippi
Posts: 2,168
Send a message via ICQ to Chop Smith
This might be what you are looking for

http://home.snafu.de/tilman/xenulink.html
__________________
Chop Smith is offline   Reply With Quote
Old 2004-07-10, 08:42 PM   #4
3xTom
They have the Internet on computers, now?
 
3xTom's Avatar
 
Join Date: Aug 2003
Location: @home
Posts: 141
Send a message via ICQ to 3xTom
You guys are close to what im looking for i think cleo is closer but im not wanting to spider a site or a database...

cleo do you know if that script will pull the html pages they arent in a database they are just on my server.....

I want to spider my entire server.... there are like 30 domain names and because of years of building and submitting gallerys there are literally millions of gallerys on my server in each of the domains so i want to spider my server for all the webpages that are on it then if that isnt enough...lol i want it to place everything into a nice format so that 1 i can see what all i have out there and 2 i could even take that one page that has all my galleries in it and submit it to search engines etc etc....
3xTom is offline   Reply With Quote
Old 2004-07-10, 09:09 PM   #5
airdick
Shut up brain, or I'll stab you with a Q-tip!
 
Join Date: Aug 2003
Posts: 114
Here is a quick-and-ugly one-liner for bash:

host=www.yourdomain.com;for var in `find . -type f -name index.html| cut -c 3-`;do echo \<a\ href=\"http://$host/$var\"\>$host/$var\<\/a\>\<br\> >> /tmp/$host-indexpages.html;done

You would have to run this from the root of each domain, so it's definately not a run-once solution, but it will get the job the done.
airdick is offline   Reply With Quote
Old 2004-07-10, 09:11 PM   #6
Cleo
Subversive filth of the hedonistic decadent West
 
Cleo's Avatar
 
Join Date: Mar 2003
Location: Southeast Florida
Posts: 27,936
The one that I'm using only does databases. I found it over at www.hotscripts.com and I seem to remember seeing ones that do what you want so you may want to de a search over there.
__________________
Free Rides on Uber and Lyft
Uber Car: uberTzTerri
Lyft Car: TZ896289
Cleo is offline   Reply With Quote
Old 2004-07-10, 09:15 PM   #7
Entreri
WHO IS FONZY!?! Don't they teach you anything at school?
 
Join Date: Feb 2004
Posts: 42
Quote:
Originally posted by 3xlinks
You guys are close to what im looking for i think cleo is closer but im not wanting to spider a site or a database...

cleo do you know if that script will pull the html pages they arent in a database they are just on my server.....

I want to spider my entire server.... there are like 30 domain names and because of years of building and submitting gallerys there are literally millions of gallerys on my server in each of the domains so i want to spider my server for all the webpages that are on it then if that isnt enough...lol i want it to place everything into a nice format so that 1 i can see what all i have out there and 2 i could even take that one page that has all my galleries in it and submit it to search engines etc etc....
If all your files are on a single server, you could write a small script to create an inventory of all the index pages (or whatever you're looking for) that you have and generate a report. (If you have many servers, then you'll have to run the script on all of them)

You won't be able to spider all the URLs black-box style by spidering all your web sites because you don't necessarily have links going to *all* your pages. You'll get a full picture by spidering your directories itself on the server.

Entreri.
Entreri is offline   Reply With Quote
Old 2004-07-10, 10:52 PM   #8
3xTom
They have the Internet on computers, now?
 
3xTom's Avatar
 
Join Date: Aug 2003
Location: @home
Posts: 141
Send a message via ICQ to 3xTom
Quote:
Originally posted by Entreri
If all your files are on a single server, you could write a small script to create an inventory of all the index pages (or whatever you're looking for)
Thats exactally what im looking for
3xTom is offline   Reply With Quote
Old 2004-07-10, 11:03 PM   #9
Entreri
WHO IS FONZY!?! Don't they teach you anything at school?
 
Join Date: Feb 2004
Posts: 42
Quote:
Originally posted by 3xlinks
Thats exactally what im looking for
Then I'm sure Cleo can give you a shell snippet to do just that. If you want something more sophisticated, a perl script should do the trick.

Entreri.
Entreri is offline   Reply With Quote
Old 2004-07-10, 11:17 PM   #10
Cleo
Subversive filth of the hedonistic decadent West
 
Cleo's Avatar
 
Join Date: Mar 2003
Location: Southeast Florida
Posts: 27,936
You could open a shell and type

find / -name "*.html" -print
__________________
Free Rides on Uber and Lyft
Uber Car: uberTzTerri
Lyft Car: TZ896289
Cleo is offline   Reply With Quote
Old 2004-07-10, 11:37 PM   #11
Entreri
WHO IS FONZY!?! Don't they teach you anything at school?
 
Join Date: Feb 2004
Posts: 42
Quote:
Originally posted by 3xlinks
Thats exactally what im looking for
I didn't see it initially but airdick's bash snippet above fits your primary need. If you want better reporting or automation, then you'd need to have somebody write a script for your requirements.

Entreri.
Entreri is offline   Reply With Quote
Old 2004-07-10, 11:41 PM   #12
3xTom
They have the Internet on computers, now?
 
3xTom's Avatar
 
Join Date: Aug 2003
Location: @home
Posts: 141
Send a message via ICQ to 3xTom
Thanks to everyone but for someone who knows Very Little code none of this helps me at all...

So i guess a custom script is the answer....
3xTom is offline   Reply With Quote
Old 2004-07-11, 02:10 AM   #13
Bunnyhop
Lord help me, I'm just not that bright
 
Join Date: Jun 2004
Posts: 106
The problem with doing searches directly on the server is the script will only be able to find the index pages relative to the server structure...it won't know the domain setup. Of course one could add to the script an ability to parse your web server settings to try to translate all that.

If your structure is relatively simple though you could get around it using Cleo's suggestion

find / -name "index.html" -print

or

find / -name "index.html" -print > filenames.txt

to have the results put into a file you can download.

Then use a global search/replace on the file on your favorite word processor changing the server relative directories to the url equivalent...example if you had

/home/websites/foobar.com/site1/index.html
/home/websites/foobar.com/site2/index.html

Replace /home/websites/foobar.com/
with
http://foobar.com/

to become

http://foobar.com/site1/index.html
http://foobar.com/site2/index.html
Bunnyhop is offline   Reply With Quote
Old 2004-07-11, 10:40 AM   #14
SirMoby
Jim? I heard he's a dirty pornographer.
 
SirMoby's Avatar
 
Join Date: Aug 2003
Location: Washington, DC
Posts: 2,706
Hey Tom,

I had a custom script done a couple of years ago. The script reads every directory and searches for index.html. Then it writes every thing to new html page create a url and using the description for the text link.

Basically ir creates a list of all index.html pages that I have. I've only used it for single domains at a time so I'll have to look at it to see if it can handle complete servers but it works for me.

Hit me up on ICQ this afternoon. I'm taking my girls out to brunch soon but I'll be working later.
SirMoby is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 06:50 PM.


Mark Read
Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
© Greenguy Marketing Inc