Automatically Validate HTTP Proxies
Let’s say you downloaded a long list of Web proxy servers. Now you are stuck with the task of weeding out the proxies that are dead, slow, fake, or otherwise unusable. There are some applications out there that claim to validate proxy servers. The most common problem with these applications: they are excruciatingly slow. These apps also tend to get stuck once in a while. And, if your list of proxies is too long, these applications may crash altogether because of numerous memory leaks and other such examples of fine programming.
I would like to bring your attention to the following, hopefully, useful script that will go through some very long proxy lists in just a minute or two and will get rid of the trash. A few words about how it works are in order. I created a simple HTML page on my Web server (see $pvcurl variable below). This page contains a unique text string ($pvcstring variable).
The first step is to ping the proxy and see if it responds in a reasonable period of time. The ping commands are launched in background to speed up the process. If the proxy does respond, the next step is to use wget to see if you can download the $pvcurl and match the $pvcstring. If everything checks out, the proxy is added to the final list of good proxies. Just as the ping command, the wget threads are started in background mode with a 30-second timeout.
Download: proxy_validate.ksh
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
#!/bin/ksh configure() { pvcurl="http://www.krazyworks.com/pvc.html" pvcstring="191628769290432845414226" wget_timeout=30 proxyin="/tmp/proxylist.in" if [ ! -f "$proxyin" ] then echo "Proxy list $proxyin not found. Exiting..." exit 1 fi proxyout="/root/proxylist.out" if [ -f "$proxyout" ] then rm "$proxyout" fi } cleanup() { killall wget for i in 1 2 3 4 5 do if [ -f "/tmp/proxy_verify.tmp$i" ] then rm "/tmp/proxy_verify.tmp$i" fi done } wgetrun() { if [ `wget -q --timeout=$wget_timeout --tries=1 -O - "$pvcurl" | grep -c "$pvcstring"` -eq 1 ] then echo "${proxy}:${port}" >> "$proxyout" fi } pingrun() { ping -q -c 1 -W 5 $proxy >/dev/null 2>&1 if [ $? -eq 0 ] then wgetrun & fi } verify() { sort "$proxyin" | uniq > "/tmp/proxy_verify.tmp1" mv "/tmp/proxy_verify.tmp1" "$proxyin" proxy_total=$(wc -l "$proxyin" | awk '{print $1}') i=1 j=1 cat "$proxyin" | while read line do echo "Processing proxy $i of $proxy_total" proxy=$(echo $line | awk -F':' '{print $1}') port=$(echo $line | awk -F':' '{print $2}') export http_proxy="${proxy}:${port}" (( i = i + 1 )) pingrun & if [ $j -eq 100 ] then if [ `ps -ef | grep -c [w]get` -gt 100 ] then sleep $wget_timeout killall wget j=1 fi else (( j = j + 1 )) fi done echo "Waiting for threads to finish ($wget_timeout seconds)..." while [ `ps -ef | egrep -c "[w]get|[p]ing"` -gt 0 ] do sleep 5 done } # RUNTIME configure cleanup verify cleanup |
-
soccermaster1
-
Adam
-
Sophia C
-
Ev dog
-
cardskid22
-
_marky_mark_
