View Single Post
Old 2008-12-08, 08:52 PM   #15
cd34
a.k.a. Sparky
 
cd34's Avatar
 
Join Date: Sep 2004
Location: West Palm Beach, FL, USA
Posts: 2,396
any site ripper would work

if you're comfortable with wget, you could do something like:

wget -r -l 99 http://site.com/

you MIGHT try it without adding -F to the command line which will put in <base href> for relative urls

--limit-rate=300k would limit the fetch to 300k/second... if you do one site at a time, depending on the page composition, that might be enough to keep from killing the server when it spiders everything.

It will put it in a directory structure which you can shuffle around. If a page is not linked to, it won't get spidered here, but, you can supply an input file of urls, so, if you have a sitemap, you could parse that to pull the pages not internally crosslinked. With wordpress, I don't think this is a problem.
__________________
SnapReplay.com a different way to share photos - iPhone & Android
cd34 is offline   Reply With Quote