Quick Review: Boxee Box
December 27, 2011 – 12:22 am | 3 Comments

Some of the technical issues with Boxee Box could have been fixed if the dev team was paying more attention to addressing the bugs rather than adding “features” of dubious value. In the final analysis, for the price and ease of use, Boxee Box is the best in its class and price range. You just need to be mindful of its limitations and buy it in hope of future improvements to its usability.

Read the full story »
Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Commands & Shells, Networking

Automatically Validate HTTP Proxies

Submitted by on August 5, 2009 – 4:14 pmNo Comment
Automatically Validate HTTP Proxies

Let’s say you downloaded a long list of Web proxy servers. Now you are stuck with the task of weeding out the proxies that are dead, slow, fake, or otherwise unusable. There are some applications out there that claim to validate proxy servers. The most common problem with these applications: they are excruciatingly slow. These apps also tend to get stuck once in a while. And, if your list of proxies is too long, these applications may crash altogether because of numerous memory leaks and other such examples of fine programming.

I would like to bring your attention to the following, hopefully, useful script that will go through some very long proxy lists in just a minute or two and will get rid of the trash. A few words about how it works are in order. I created a simple HTML page on my Web server (see $pvcurl variable below). This page contains a unique text string ($pvcstring variable).

The first step is to ping the proxy and see if it responds in a reasonable period of time. The ping commands are launched in background to speed up the process. If the proxy does respond, the next step is to use wget to see if you can download the $pvcurl and match the $pvcstring. If everything checks out, the proxy is added to the final list of good proxies. Just as the ping command, the wget threads are started in background mode with a 30-second timeout.

Download: proxy_validate.ksh

#!/bin/ksh
 
configure() {
	pvcurl="http://www.krazyworks.com/pvc.html"
	pvcstring="191628769290432845414226"
	wget_timeout=30
 
	proxyin="/tmp/proxylist.in"
 
	if [ ! -f "$proxyin" ]
	then
		echo "Proxy list $proxyin not found. Exiting..."
		exit 1
	fi
 
	proxyout="/root/proxylist.out"
 
	if [ -f "$proxyout" ]
	then
		rm "$proxyout"
	fi
}
 
cleanup() {
	killall wget
	for i in 1 2 3 4 5
	do
		if [ -f "/tmp/proxy_verify.tmp$i" ]
		then
			rm "/tmp/proxy_verify.tmp$i"
		fi
	done
}
 
wgetrun() {
	if [ `wget -q --timeout=$wget_timeout --tries=1 -O - "$pvcurl" | grep -c "$pvcstring"` -eq 1 ]
	then
		echo "${proxy}:${port}" >> "$proxyout"
	fi
}
 
pingrun() {
	ping -q -c 1 -W 5 $proxy >/dev/null 2>&1
 
	if [ $? -eq 0 ]
	then
		wgetrun &
	fi
}
 
verify() {
	sort "$proxyin" | uniq > "/tmp/proxy_verify.tmp1"
	mv "/tmp/proxy_verify.tmp1" "$proxyin"
	proxy_total=$(wc -l "$proxyin" | awk '{print $1}')
 
	i=1
	j=1
	cat "$proxyin" | while read line
	do
		echo "Processing proxy $i of $proxy_total"
		proxy=$(echo $line | awk -F':' '{print $1}')
		port=$(echo $line | awk -F':' '{print $2}')
		export http_proxy="${proxy}:${port}"
		(( i = i + 1 ))
 
		pingrun &
 
		if [ $j -eq 100 ]
		then
			if [ `ps -ef | grep -c [w]get` -gt 100 ]
			then
				sleep $wget_timeout
				killall wget
				j=1
			fi
		else
			(( j = j + 1 ))
		fi
	done
 
	echo "Waiting for threads to finish ($wget_timeout seconds)..."
	while [ `ps -ef | egrep -c "[w]get|[p]ing"` -gt 0 ]
	do
		sleep 5
	done
}
 
# RUNTIME
 
configure
cleanup
verify
cleanup

Popularity: 1% [?]

Related posts:

  1. Wget and User-Agent Header

Leave a comment!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.