Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Commands & Shells, Featured

Making Rsync Faster

Submitted by on November 24, 2012 – 3:13 pm 15 Comments

If you Google something along the lines of “make rsync faster”, the most common thing you’ll see is people saying “hey, I have a gigabit network connection and my rsync is crawling along at a hundred kilobytes per second.” Well, the issue here is not the network. Rsync needs time to analyze source and destination, generate checksums and compare timestamps, build a list of stuff to transfer and then, finally, start the copy process, one item at a time. You see the problem, I am sure.

The logical question that comes to mind: can I run multiple rsyncs in parallel. The quick answer is “no”: you give rsync source, destination and transfer parameters and you get what you get. But you can be more creative feeding this data to rsync. Here’s an example (and, for the sake of simplicity, both source and destination are NFS-mounted filesystems):

The source directory is /tmp/source NFS-mounted from some remote file server with the following contents:

 

As you can see, the first level contains a hundred folders. And each folder contains more subfolders and files. In all, more than 11,000 folders and 100,000 files. If you launch rsync with the most common options to sync source to destination (i.e. rsync -a /tmp/source/ /tmp/target/), you are unlikely to get very good throughput. Let’s time the process:

 

So, the whole thing took about six minutes. We can try launching a separate rsync for each of the one hundred first-level folders like so:

 

This will run multiple rsyncs in parallel – one for each folder, as defined by the “maxdepth” option for the “find” command. There are a couple of potential issues here. First, if you happen to have some files located above the “maxdepth” setting, your rsyncs will miss it. Second, having too many rsyncs running at the same time may simply kill your server. So we need to a) pick up any files located above the “maxdepth” level; and b) introduce some sort of flow control feature to keep the number of rsync threads in check.

 

And this is how you squeeze most performance out of rsync and maximize your available bandwidth. Your network admins will love you for this.

Print Friendly, PDF & Email

15 Comments »

  • Mihai Cristian Satmarean says:

    I was looking for this for a million years!
    Thanks!

  • Andre ten Bohmer says:

    Thanks! Boosted a cache partition copy from 40 minutes to just under 4 minutes.

  • BRUTE says:

    Answer as many as you want :) (Preferably all :P)

    As of Aug 2012, what does the latest version of any Linux distro have that Windows 7 doesn’t? (Must be something significant!)

    If Windows magically became open source freeware tomorrow, would you continue to use Linux or would you switch to Windows now that its free? Why?

    Why should I switch to Linux if it doesn’t run iTunes for my iPhone, Samsung Kies for my tablet, and many other applications (yes, I know, Linux is not to blame for this…but the fact remains: Using Linux would have substantial drawbacks for me personally…so why should I switch?

    Is there any truly notable software that runs ONLY on Linux?

    Which Linux offers the most FUNCTIONAL user interface? (I don’t care about it looking all pretty, and I don’t care which one is the most popular, I want to know which one is most pleasant to work with and most intuitive.

    Most importantly, which distro is the fullest in terms of user customization, administrative freedom, file management simplicity, etc. (i.e., which distro(s) will assume that I’m a computer whiz and would get super annoyed at things like hidden files, having to check a box to show hidden files, etc.

    Name the number 1 strength of Linux that is a weakness/not featured in Windows 7:

    Name the number 1 strength of Linux that is a weakness/not featured in the latest Mac OSX:

    Lastly, the last 2 questions in vise-versa

    **********NOTE: I have 2 internal hard drives so please don’t warn me about losing all my Windows stuff if I install Linux. I would be installing Linux on the secondary HDD and accessing that OS via BIOS…IF you can convince me its worth it ;P**********
    FYI I’m not trying to troll on Linux or make any statement as to a particular OS’ inferiority or superiority. In fact, I WANT to make the migration over to open source because open source doesn’t have an angle like paid-for software does but I obviously have residual non-open source software and hardware that would be a complete waste to just get rid of. I wrote that just for you Charlie Kelly, O’ superior one.

  • Kaylla says:

    I am using ubuntu linux and am trying to create a cron job to run rsync as root while I’m logged in as user.

  • The Villain says:

    Im having trouble finding a way to print output to the command prompt and the press return. For example If I need to type a password in and the hit return how do I do that? Im trying to sftp into a computer and get it to downlaod a certain file but I cant find a way to print output to the command prompt. All help is appreciated thanks

  • skychi99 says:

    How can I create a image of my hard drive every month for backups?

    I can afford almost anything but I rather it be cost effective weather its harddrive, cds, dvds etc.

  • Marlon P says:

    I want to be able to transfer files to and from my home computer remotely while accessing it from an SSH connection…How do I do this? Do I need to set up an FTP server on my home computer? It’s running Debian Linux and I access it from my Android phone using ConnectBot.

  • Ssshhhh Im becoming aroused says:

    I need a good free software to backup my files.
    What do you recommend?

  • Anny says:

    I am creating a script to be run by cron to sync my ftp servers files. My issue is when the script run, it asks for the ssh password. Since it is run at 1am by cron, so I obviously don’t want to enter the password. I have made rsa & dsa keys for my servers but if a reboot occurs , it doesn’t automatically connect. Is there a way to insert the password when it is asked or another method to let these two server connect without a password authentication.

  • Darío Fernández says:

    Thanks man! It works great and fast! You save me a lot of time of downtime :)

  • Roberto Bauco says:

    slight mod for compatibility for long ps

    while [ ps -efww | grep -ci rsync -gt ${maxthreads} ]

  • Ashok Kumar says:

    dose anyone have similer script for AIX server

  • Rare_ONE says:

    Igor, this script is batshit crazy. way too fast than the regular rsync running in multiple sessions..

  • clydevargas says:

    Great script! One thing to consider for anyone using this is that it might miss some things if you’re trying to keep the source and target identical, as you would with “rsync -a –delete”. For example, if a folder is deleted from /tmp/source it will remain on /tmp/target – Same goes for if a folder is added to /tmp/target – it will not get deleted on subsequent runs of the script.

  • magic wed says:

    Too bad if your destination server is an NFS4 server. Trond Myrtlebust’s rdirplus code patch to NFS4 will make your rsync remote listener take forever to produce a basic list of files on the destination server, and it gets even worse with high latency networks.

    To avoid Trond’s buggy code you got to avoid listing files on your destination NFS4 server.

    You could be better off just using a simple cp -rp command. Funny how you can transfer a file a few hundred megabytes in size in just seconds to and from an NFS4 server but to list a folder containing 50,000 files – forget it.

4 Pingbacks »

Leave a Reply

%d bloggers like this: