Wget examples and scripts

November 27, 2005

Wget is a command-line Web browser for Unix and Windows. Wget can download Web pages and files; it can submit form data and follow links; it can mirror entire Web sites and make local copies. Wget is one of the most useful applications you would ever install on your computer and it is free.

You can download the latest version of Wget from the developers home page. Precompiled versions of Wget are available for Windows and for most flavors of Unix. Many Unix operating system have wget pre-installed, so type which wget to see if you already have it.

Wget supports a multitude of options and parameters. This variety may be confusing to people unfamiliar with Wget. You can view the available Wget options by typing wget –help or on a Unix box type man wget.

Here are a few useful examples on how to use Wget:

1) Download main page of and save it as yahoo.htm

2) Use Wget with an HTTP firewall:

Set proxy in Korn or Bash shells

Set proxy in C-shell

Run wget for anonymous proxy

Run wget for proxy that requires authentication

3) Make a local mirror of Wget home page that you can browse from your hard drive

Here are the options we will use:

-m to mirror the site
-k to make all links local
-D to stay within the specified domain
–follow-ftp to follow FTP links
-np not to ascend to the parent directory

The following two options are to deal with Web sites protected against automated download tools such as Wget:

-U to mascarade as a Mozilla browser
-e robots=off to ignore no-robots server directives

4) Download all images from Playboy site

Here are the options we will use:

-r for recursive download
-l 0 for unlimited levels
-t 1 for one download attempt per link
-nd not to create local directories
-A to download only files with specified extentions

5) Web image collector

The following Korn-shell script reads from a list of URLs and downloads all images found anywhere on those sites. The images are processed and all images smaller than a certain size are deleted. The remaining images are saved in a folder with named after the URL. The url_list.txt file contains one URL per line.

This script was originally written to run under AT&T UWIN on Windows, but it will also work in any native Unix environment that has Korn shell.

6) Wget options

