Greenguy's Board


Go Back   Greenguy's Board > Blogs and Blogging
Register FAQ Calendar Today's Posts

Reply
 
Thread Tools Search this Thread Rate Thread Display Modes
Old 2008-12-08, 09:52 PM   #1
cd34
a.k.a. Sparky
 
cd34's Avatar
 
Join Date: Sep 2004
Location: West Palm Beach, FL, USA
Posts: 2,396
any site ripper would work

if you're comfortable with wget, you could do something like:

wget -r -l 99 http://site.com/

you MIGHT try it without adding -F to the command line which will put in <base href> for relative urls

--limit-rate=300k would limit the fetch to 300k/second... if you do one site at a time, depending on the page composition, that might be enough to keep from killing the server when it spiders everything.

It will put it in a directory structure which you can shuffle around. If a page is not linked to, it won't get spidered here, but, you can supply an input file of urls, so, if you have a sitemap, you could parse that to pull the pages not internally crosslinked. With wordpress, I don't think this is a problem.
__________________
SnapReplay.com a different way to share photos - iPhone & Android
cd34 is offline   Reply With Quote
Old 2008-12-08, 10:00 PM   #2
walrus
Oh no, I'm sweating like Roger Ebert
 
walrus's Avatar
 
Join Date: May 2005
Location: Los Angeles
Posts: 1,773
Send a message via ICQ to walrus Send a message via Yahoo to walrus
I'll give it a shot, but probably not for a couple days. I would love to have this work, the sites might not be dynamic but at least they would be alive.
__________________
Naked Girlfriend Porn TGP
free partner account
walrus is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 12:45 AM.


Mark Read
Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
© Greenguy Marketing Inc