Greenguy's Board


Go Back   Greenguy's Board > Blogs and Blogging
Register FAQ Calendar Search Today's Posts Mark Forums Read

 
 
Thread Tools Search this Thread Rate Thread Display Modes
Prev Previous Post   Next Post Next
Old 2008-12-08, 08:52 PM   #15
cd34
a.k.a. Sparky
 
cd34's Avatar
 
Join Date: Sep 2004
Location: West Palm Beach, FL, USA
Posts: 2,396
any site ripper would work

if you're comfortable with wget, you could do something like:

wget -r -l 99 http://site.com/

you MIGHT try it without adding -F to the command line which will put in <base href> for relative urls

--limit-rate=300k would limit the fetch to 300k/second... if you do one site at a time, depending on the page composition, that might be enough to keep from killing the server when it spiders everything.

It will put it in a directory structure which you can shuffle around. If a page is not linked to, it won't get spidered here, but, you can supply an input file of urls, so, if you have a sitemap, you could parse that to pull the pages not internally crosslinked. With wordpress, I don't think this is a problem.
__________________
SnapReplay.com a different way to share photos - iPhone & Android
cd34 is offline   Reply With Quote
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 01:39 AM.


Mark Read
Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
© Greenguy Marketing Inc