Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Apache, Commands & Shells, SEO

SEO for Unix Geeks

Submitted by on February 21, 2013 – 11:45 pm One Comment

So, you are finally done with your new shiny Web site and looking for ecom site integration. Congratulations. There is one small problem: the only two visitors to the site are you and Googlebot. And even Google seems to have indexed just few pages and ignored the rest of your creation. Is your site really so lame that even bots avoid it? Visit https://www.akeaweb.com/accessibility-consulting/ to make your web content truly accessible, operable by everyone, understandable for all. Don’t be too hard on yourself: according to a study by Netcraft, there were about 74 million active Web sites out there at the end of 2008 and 109.5 million in May of 2009. That’s about one Web site for every two dozen Internet users, according to World Internet Stats. Clearly, you’ve got an uphill battle to fight.

You will be competing against a global army of millions of Webmasters, advertisers, PR experts, and sysadmins. But, to quote “Futurama’s” Professor Farnsworth, “Good news, everyone!” Most people you are up against are idiots. These are the kind of people who would pay thousands of dollars to a SEO “expert” to generate useless and, in many cases, fake traffic to their site. Would you believe me if I told you that in just a couple of hours I can swamp your Web site with hits from thousands of unique users from all over the world without spending a dime on SEO or advertising? Go to https://digitalinnos.com/seo-service-company to get help. I can and it will take me about ten seconds of actual work. It will be actual, real traffic coming from unique IPs from dozens of countries, accessing random pages on your site and this traffic will not arouse Google’s suspicions. In fact, the only limiting factor will be the performance of your Web server. Check out LocalViking.com to get professional advice and consultation on effectice SEO strategies and software. But what do seo companies do? is a strategy to promote website exposure through higher visibility in the search engines; ultimately to achieve an increase in website traffic. Knowing the benefits of hiring an SEO company is good to maximize your time and be assured of results. If you want your website to attract customers, talk to Schure Consulting LLC today.

Stupid Tricks

Don’t worry, I will reveal this “secret” method that SEO “experts” use to rip off naive site owners. And I won’t charge you anything. It all starts with collecting a list of Web proxies. There are thousands of them and they are free to use. You can even grab a copy of a simple script I wrote that will quickly validate a huge list of proxies. A number of sites on the Net maintain current proxy lists. The second step is to generate a list of URLs on your site. Most CMS applications can create a sitemap – a nicely-formatted list of pages on your site. Now you have two text files: one containing the list of proxy server, one per line, looking something like this:

202.175.3.112:80
119.40.99.2:8080
193.37.152.236:3128
83.2.83.44:8080
151.11.83.170:80
119.70.40.101:8080
208.78.125.18:80
189.109.46.210:3128

And the other text file contains the list of URLs from your site, also one per line:

https://www.krazyworks.com/
Copying directories using tar and rsync
Find largest files
Using more on multiple outputs
Veritas Cluster Troubleshooting
Installing and Configuring Ganglia
Improving WordPress Performance
Numeric File Permissions in Unix
Recovering from Veritas VEA GUI errors
VxVM Recovery Cheatsheet for Solaris

All you have to do now is to write a simple shell script that will use “wget” to download each URL through each proxy server. Here’s an example:

cat proxy_list.txt | while read proxy_server
    do
        export http_proxy="$proxy_server"
        
        echo "Visiting 5 random URLs thorugh $proxy_server"
        cat url_list.txt | while read url
        do
            echo "`expr $RANDOM % 1000`:$url"
        done | sort -n| sed 's/[0-9]*://' | head -5 | while read url2
        do
            echo "  getting $url2"
            wget --tries=1 --timeout=5 -U Mozilla -q --proxy=on -O /dev/null "$url2"
        done
    done

For each proxy server in proxy_list.txt, this script will pick five random URLs and download them to /dev/null. If now you use AWStats or Webalizer to view the updated traffic stats, you will see many unique visitors and many more hits. This process can be further randomized by rotating the user agent string (Mozilla, IE, Opera, etc.) and the number of URLs accessed through each proxy. For example, in the script above you can replace:

done | sort -n| sed 's/[0-9]*://' | head -5 | while read url2

with

done | sort -n| sed 's/[0-9]*://' | head -$(echo "`expr $RANDOM % 10`+1"|bc -l) | while read url2

This will select a random number (from 1 to 10) of random URLs to be downloaded through each proxy. A random delay can be added between Wget runs and several Wget instances can be started in parallel to make your fake traffic look even more realistic. You can even randomize the order in which the proxies are being used. This simple trick has been used by SEO “experts” for years to fleece their clients.

And here’s an example of using wget with random user-agent strings

#!/bin/bash
 
proxies_total=$(wc -l proxy_list.txt | awk '{print $1}')
user_agents_total=$(wc -l user_agents.txt | awk '{print $1}')
 
cat url_list.txt | while read url
do
    # Select a random proxy server from proxy_list.txt
    read proxy_server_random=$(cat proxy_list.txt | while read proxy_server
    do
        echo "`expr $RANDOM % $proxies_total`^$proxy_server"
    done | sort -n | sed 's/[0-9]*^//' | head -1)
 
    # Set the shell HTTP proxy variable
    export http_proxy="$proxy_server_random"
 
    # Select random user-agent from user_agents.txt
    user_agent_random=$(cat user_agents.txt | while read user_agent
    do
        echo "`expr $RANDOM % $user_agents_total`:$user_agent"
    done | sort -n | sed 's/[0-9]*://' | head -1)
 
    # Download the URL
    echo "Downloading $url" 
    echo "Proxy server: $proxy_server_random"
    echo "User agent: $user_agent_random"
 
    $WGET -q --proxy=on -U "$user_agent_random" "$url"
done

If you don’t like wget, you can use the script with an actual browser. In the example below we set the maximum number of URLs ($maxurls) to visit between 1 and 15. Then we randomize the list of proxies and pick a random number of proxies from 1 to the maximum available. We use each selected proxy to visit a random number of URLs from 1 to $maxurls. The URLs themselves are picked at random as well. There are plugins for Firefox that will use http_proxy shell variable on browser startup and will allow you to dynamically modify user-agent string.

#!/bin/ksh
maxurls=$(echo "`expr $RANDOM % 15 `+1"|bc -l)
proxies_total=$(wc -l /tmp/traffic_builder_proxylist.txt | awk '{print $1}')
maxproxies=$(echo "`expr $RANDOM % $proxies_total`+1"|bc -l)
cat /tmp/traffic_builder_proxylist.txt | while read proxy
do
   echo "`expr $RANDOM % $proxies_total`^$proxy"
done | sort -n |  sed 's/[0-9]*^//' | head -${maxproxies} | 
while read random_proxy
do
   export http_proxy=$random_proxy
   urls_total=$(wc -l /tmp/traffic_builder_sitemap.txt | awk '{print $1}')
   urls_to_visit=$(echo "`expr $RANDOM % $maxurls`+1"|bc -l)
   cat /tmp/traffic_builder_sitemap.txt | while read url
   do
      echo "`expr $RANDOM % $urls_total`^$url"
   done | sort -n | sed 's/[0-9]*^//' | head -${urls_to_visit} | 
   while read random_url
   do
      firefox "$url" &
   done
   sleep 60
   killall firefox
done

Here’s another ridiculous gimmick offered by the SEO crowd to the unsuspecting site owners: article submission engines. You see, there are sites out there that publish your press releases and other articles. There are also applications that would submit your press release to hundreds of such sites. The idea is that there will be a great many sites out there linking to your site and, therefore, increasing the rank of your site with various search engines (“various search engines” is my shorthand for “Google”).

This is a typical pyramid scheme. You see, these sites publishing your articles already have a higher ranking than your site. When you search for your article on Google, these sites will show up first, even though your site is the original source of the article. Essentially, these enterprising characters expect you to spend your time and money writing articles and giving it to them for free, so they can grow their own site rank, while you have to settle for leftovers. If you have unique material on your site that may be of interest to others, you should make sure that nobody uses it without your permission, and you should definitely not be giving it away for free.

After governments around the world started cracking down on email spam, the spammers found a new outlet for their creativity: blog comments. The idea is the same as with the “free articles”: to have as many pages out there as possible linking to your site and, supposedly, increasing your site’s ranking with the search engines. There are applications out there that will automatically spam thousands of blogs, leaving various stupid comments that contain a link to your site. Most of these comments will be immediately intercepted and killed by filters, but some will make it through.

The problem with this approach is that most blog engines and CMS applications out there use automated filters. These filters communicate with a central database, where they store every new occurrence of suspected spam. It will take only a couple of days for these filters to identify your automated comment as spam. If you persist with your blog spamming campaign, your site will be blacklisted and you will no longer be able to post even legitimate comments. Some spammers resort to a simple solution: they register temporary domain names with DNS records pointing to the IP of your main site. They use these domain names to spam blogs and search engines. When they get blacklisted, they simply register a new dummy domain name, while their main domain name remains clean.

Page Rank

None of these stupid tricks will make your site more popular or make you any serious money. What does make a difference, however, is unique and useful content. Imagine a hypothetical scenario: two Web sites are identical in every respect. They are literally copies of each other with the only difference being the domain name. You go to Google a type a search query. Both sites have been fully indexed by Google and both sites contain a page that matches your search query with equal accuracy. And so here’s the question: which site will appear closer to the top in the search results and why?

If the search engine in question is Google (and it is), the concept of PageRank® comes into play. Here’s the official Google description of the concept:

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page’s value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important”.

On the one hand, Google is talking about the “uniquely democratic nature of the web” and on the other it assumes that some pages are more equal than others. A contradiction in terms, it would seem. This simple definition of PageRank hides a lot of advanced calculus, but the essence of the math behind PageRank is to determine the probability of ending up at a particular page after many clicks. The answer to the question about the two identical sites is that the site with the higher PageRank (in combination with additional factors) will appear first in the search results. For example, nearly 97% of all Wikipedia pages rank in Google’s top 10 because so many pages with a high PageRank link to Wikipedia.

It would seem, your goal is to seek out pages with a high PageRank and to somehow make a link to your own site appear on those pages. This, however, is only a part of the puzzle. If I am using Google to search for “file system inode marked bad incore” – a Veritas filesystem error – and your site does not contain this phrase, then your PageRank doesn’t really matter. If, on the other hand, I am searching for “double-D naturals”, something tells me Google will have many pages of search results for me to explore. What conclusion can we draw from this? It is easier to be near the top of search results if your site matches a fairly obscure search query.

Useful Content

Sure, the number of people searching for an explanation of a filesystem error message pales in comparison to the crowd looking for photos of well-endowed females. But what’s the use of matching popular keywords if your site would appear on page twenty-four of the search results? It is far more advantageous to match an obscure keyword but end up in the top ten search hits.

You can use Google Webmaster Tools to find out the top search queries that led visitors to your site. The list will look something like this:

#    %   Query                                       Position
1   23%     wget examples                               6
2   14%     medussa                                         17
3   11%     mysql grant syntax                      2
4   3%  nfs mount permission denied                 9
5   2%  bonnie++ example                        6
6   2%  pkg-get                                         9
7   2%  vxvm cheat sheet                        6
8   2%  veritas volume manager cheat sheet    6
9   2%  wget user agent                                 10
10  1%  grant syntax                                7
11  1%  mysql grant                                 24
12  1%  calengoo                                        12
13  1%  grant syntax mysql                      2

From this list select all entries with “Position” of 10 or less. These are the topics that interest your audience. These are also the areas in which your site leads the pack, so to speak. You want more traffic – write more useful content on these topics. Apparently, there is a great number of people starving for examples of using wget. Quite a few people are also interested in Medussa distributed password cracking application. MySQL grant syntax gets people every time, as do NFS “permission denied” errors. Veritas volume manager maybe of interest mostly to specialists, but when those vxvm volumes fail to start, specialists go online in search of answers. Luckily, a know a whole lot about these things. You are reading another article about wget (among other things) and I see another NFS-related post coming in the future.

Print Friendly, PDF & Email

Leave a Reply