|
Have you ever seen
a shareware or freeware program that looked nice, but you thought you wouldn't
have any use for it? I did too, and I kick myself for not using it years
and years ago. The program is Page Sucker. It's like TiVO for the internet.
Page Sucker is a page ripper. That
is, it downloads entire web sites to be viewed at a later time. My guess
is that it was designed for those who have to pay long distance for a dial-up
connection. Instead of having the user pay for each minute he or she is online,
Page Sucker retrieves all the content at once and thus cuts down on long
distance charges. Then the user can view the pages offline without having
to pay more money to the phone company. It's speedy, because when
viewing web pages and graphics normally the CPU takes time to not only grab
everything from the site you're at, but display it on your computer screen.
With a page ripper, it just downloads the raw data right to your hard drive,
so the CPU doesn't take the time to interpret and display the data coming
in.
I've found Page sucker to be much, much more valuable though. I have gone into many of my favorite sites and grabbed only what I needed, thanks to Page Sucker's filters. Say if I want to download sounds from MST3K (Mystery Science Theater 3000). I don't want pictures or html pages, just the sounds. I can select just WAV files and AIFF files, and have only those saved. I recently went to the Star Wars web site to get pictures, just JPEGs, and downloaded over 200 MB of them! The only drawback to ripping a site is the bandwidth limit. When you rip the whole site, even the major file types pertaining to the site (i.e. , mp3s on mp3.com, PDFs on Seagate), you're slowing down the server end a bit. There are some articles on the net that show how to peel a site rather than rip it, so look around on the web and you'll see how to do this. With some sites it's more efficient to select a small number of threads (sort of like the number of hands grabbing each item from a web site) as opposed to many which may choke the web site speed.
You can use Page Sucker and apply it to sites that contain XXX pics and movies, as well as sites that have information and files that you must pay a monthly access fee to use. You may find out the hard way though that you may be able to rip one single page from these secure sites, but not the whole site (due to security measures). Let's take a look at a imaginary site I made up called iluvtoyz.com:

Ok, let's say for example that we're at the I Luv Toyz web site looking at a page that shows Star Wars specials from January 16th of 2004 (specials040116.html). There may be some pictures of toys that we want to keep on our computer, so you would press the "Start Download" button. Some unsecure sites will let you download EVERYTHING all the way down to the base URL (www.iluvtoyz.com), but in this case let's assume that you will only download all the data from the specials040116.html page. The "Max Depth to Dig" shows 'INF" for infinite, but for the example in this case it would read 0 (zero) since we only want to download the information on that page and not dig any further for other pages, graphics, and code. Leaving it at INF means that the program will search every public folder and file from iluvtoyz.com and download it to your hard drive.
Now let's change the end file to the one shown above to "specials031224.html", which would be Christmas Eve specials that day. Sometimes it makes more sense to change the information in the web browser first just to check if the file exists. For example, you might type in specials031223.html in your browser or Page Sucker to see if there were specials on the day before Christmas Eve. If you type it in the browser and there was no existing page, you would get the usual 404 message (Page not found), and if you typed it in Page Sucker, it wouldn't download any data. Now let's say you did type it in the browser and it actually did come up with specials for December 23, then the page would load normally, and if you did the same in Page Sucker, all the same data would be saved to your hard drive.
Here we've gone to the root of the folder "Saga", which means all files listed in the Saga subfolder will be downloaded to your drive. Coincidentally, this is the same example as the previous one but having the maximum depth set to"Infinite" as shown. Therefore, all Star Wars pages listed under Saga would be saved to your hard drive. If there's a special everyday, that's a lot of pages and a lot of graphics!
Here were in a totally different directory in the web site, in the Hot Wheels section and looking for the content in the Treasure Hunts directory. If the content in the above example was able to be downloaded, then certainly the same rule applies in this example. You may be asking, "..but how do you know what directory to go to?". It's easy. Just go to your web browser (Netscape, Internet Explorer,etc) and simply navigate like you would on any page. When the page loads, the URL will show in the navigation field (the place where you type in where you want to go). Simply copy the URL from the browser, and paste it into the Base URL field in Page Sucker, then press "Start Download". It's easier than you think.
Now this is the base URL, the root of the web site, the mother lode. One way to check a site's security is to go here first and try downloading the information. If you succeed (and the site isn't secure), you'll get all the stuff you want from the sitethat you selected (HTML pages, gif images, sounds, etc.). If not, Page Sucker may look like it's working and may take a minute or two. It'll finish with a pop-up window saying "Your download is complete!", but when you check for files on your hard drive, there will be nothing there.
Visit www.pagesucker.com for more info. Happy hacking!