Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Backups, Networking, Performance, Veritas

Copying Data: Are We There Yet?

Submitted by on December 27, 2009 – 7:12 pm 3 Comments

I am sure this will sound familiar: you are copying a large amount of data – either locally or over the network – and you are wondering how long it will take and if there is a way to make things go faster.You may be surprised, but it does matter what type of files you are copying: 1Gb-worth of many small files will take considerably longer to copy than two 500Mb files. The hardware you are using is an important consideration, but it’s not the only factor limiting data transfer speed.

Here’s one scenario: you are copying 100Gb of data from one partition to another partition of the same disk. The disk is 7200k RPM 3 Gbit/s SATA-II in an external USB 2.0 enclosure. Theoretically, this disk supports up to 300 MB/s data transfer speed. However, since you are reading and writing on the same disk, the speed of data transfer will be only 25% of what the disk supports, or 75 Mb/s. The USB 2.0 interface supports up to 480 Mbit/s rate of transfer, this is about 60 MB/s in theory. This speed will be cut in four, since you are reading from and writing to the same disk. In this example, the absolute best data copy speed you can expect to see is about 15 Mb/s, but you’d be lucky to get half of this.

So, let’s take a look at the actual disk performance for this example:

iostat -xk 1

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          16.00    0.00   27.00   57.00    0.00    0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda6              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     8.00   98.00   84.00  6145.00  8092.00   156.45    23.83   86.20   5.34  97.20
sdc1              0.00     0.00   98.00    0.00  6145.00     0.00   125.41     3.02   27.76   9.47  92.80
sdc2              0.00     8.00    0.00   84.00     0.00  8092.00   192.67    20.82  154.38   9.19  77.20

The data is being copied from /dev/sdc1 partition to /dev/sdc2 partition. The average read speed is about 6 Mb/s. Thus, it may take up to 5 hours for the copy process to complete. Is there a way to speed things up? One option would be to take the disk out of the USB enclosure and connect it internally to the SATA-II interface. This alone will cut the copy time down to about 30 minutes and it will justify the time you would spend on moving the disk. To speed things up even further, copy data from your SATA-II drive to another internal drive (preferably on a different controller) and then copy it back to a different partition on the original disk. This will cut the copy time down to about 15 minutes.

Copying data over the network normally doesn’t stress the hard drives, unless you have an HPC cluster with InfiniBand network or something of that nature. Running the “iostat” command will show you I/O on the disk, but this is not the best way of estimating transfer rates when moving large amounts of data over the network. A simple tool for looking at real-time network upload/download speed is “bmon“. This small but useful application runs in your terminal window and displays detailed network stats for each NIC, as well as a cool ASCII graph.

BMon real-time network speed monitoring

bmon real-time network speed monitoring

However, with bmon there is no way to differentiate between network traffic created by your copy process and all the other network traffic on the system. When moving data over the network, you would normally use FTP for best performance, but you may also use NFS, Samba, or even HTTP. There are many tools that allow you to test network performance. Once of the most common tools is Bonnie++. While not a network testing application, Bonnie++ performs a series of read/write tests on a filesystem of your choice. If that filesystem happens to be NFS- or Samba-mounted, then the test results will show you NFS or Samba performance (unless you have an extremely high-performance network that exceeds the performance of your storage system).

In the following example we run bonnie++ – a popular filesystem testing utility for Linux and Unix – on a system with 512Mb of RAM:

deathstar:~ # bonnie++ -n 0 -u 0 -r 512 -s 1024 -f -b -d /backups
Using uid:0, gid:0.
Writing intelligently...done
Rewriting...done
Reading intelligently...done
start 'em...done...done...done...
Version 1.01d       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
deathstar        1G           16917   5  9500   2           24482   3 115.7   0
deathstar,1G,,,16917,5,9500,2,,,24482,3,115.7,0,,,,,,,,,,,,,

Explanation of the parameters:

“-u 0” Run under root UID
“-r 512” This system has 512Mb RAM
“-s 1024” File size for the test, should be twice the amount of RAM
“-d /backups” Name of the filesystem to test.

It is important to understand that Bonnie++ does not test the hard drive or the network. It tests filesystem performance. If, for example, you run a test on a local disk and see performance lower than expected, it does not mean your disk is going bad. It may be just that your CPU is overloaded, you are running out of RAM, or there may be an OS issue. Therefore, if you are using Bonnie++ to compare performance of different hard drives, you need to make sure that all other system parameters during your testing remain unchanged.

You can read more about Boniee++ and see additional usage examples here.

Let’s say you are copying a large amount of data from /disk1 filesystem to /disk2. You started your copy process – cp, rsync, tar, whatever you decided to use – and now you need to know how long the copy process will take. Below is a simple Korn shell script that will look at the source directory and the target directory and will try to estimate the remaining time. The usage for this example would be: copy_porgress.ksh /disk1 /disk2

#!/bin/ksh

# ------------------
# CONFIGURATION
# ------------------

if [ ! "" -a ! "" ]
then
        echo "Usage: copy_progress.ksh /source /target"
        exit 1
else
        if [ "$1" == "$2" ]
        then
                echo "Error: Source and target directories must be different"
                exit 1
        else
                source="$1"
                target="$2"
        fi
fi

# ------------------
# FUNCTIONS
# ------------------

analyze_source() {
        echo "Calculating size of source"
        source_size=$(du -sk "$source" | awk '{print $1}')
}

analyze_target() {
        echo "Calculating size of target"
        target_size=$(du -sk "$target" | awk '{print $1}')
}

analyze_transfer() {
        echo "Analyzing transfer parameters"
        analyze_target
        start_size=$target_size
        start_time=$SECONDS

        echo "Sleeping 1 minute"
        sleep 60

        analyze_target
        end_size=$target_size
        end_time=$SECONDS

        size_delta=$(echo "scale=0;$end_size - $start_size" | bc -l)
        time_delta=$(echo "scale=0;$end_time - $start_time" | bc -l)

        transfer_rate_kbps=$(echo "scale=2;$size_delta / $time_delta" | bc -l)
        transfer_rate_mbps=$(echo "scale=2;$transfer_rate_kbps / 1024" | bc -l)

        size_remaining=$(echo "scale=0;$source_size - $target_size" | bc -l)
        time_remaining_sec=$(echo "scale=0;$size_remaining / $transfer_rate_kbps" | bc -l)
        time_remaining_min=$(echo "scale=2;$time_remaining_sec / 60" | bc -l)
        time_remaining_hr=$(echo "scale=2;$time_remaining_min / 60" | bc -l)
}

show_results() {

cat << EOF

Current transfer rate:  $transfer_rate_mbps Mb/s
Time remaining:         $time_remaining_min min
EOF
}

# ------------------
# RUNTIME
# ------------------

analyze_source
analyze_transfer
show_results

And sample output of the script:

icebox:/var/adm/bin # ./copy_progress.ksh /disk1 /disk2
Calculating size of source
Analyzing transfer parameters
Calculating size of target
Calculating size of target

Current transfer rate:  22.89 Mb/s
Time remaining:         1.00 min

In the script you can modify the “sleep 60” wait time. If the script “sleeps” for five minutes instead of one, the result will be more accurate. Keep in mind that this script will not work if you are moving files instead of copying them.

Print Friendly, PDF & Email

3 Comments »

  • Hannah says:

    I want to save data stored on computer and yet still leave on computer to work with. Will this flash drive do the job? I was going to give its size, but temporarily it is misplaced. I do know it is large enough. Thanks in advance

  • Alina Elliott says:

    I just got a Galaxy S II, and I was wondering how I can put music on it. I don’t have a Micro SD card yet, and I heard that you can put music on their via Kies Air, but that is taking a very long time. Are there any other, simpler ways?

  • Benihana says:

    I have been given a piece of work and it talks about is the brandt line still relevent. Therefore I need to know what disparites Willy Brandt used when he drew the line so I can compare between data from then and now.
    Thanks guys,
    Note: I have looked everywhere for stuff on the indicators but I am yet to find anything. I have even looked into if the local libary has a copy of the report, but I am yet to find a libary that does.

1 Pingbacks »

Leave a Reply

%d bloggers like this: