Posted by: yegorich | April 9, 2011

Get total rsync progress using Python

Doing large backups with rsync can take really long time. So it is very important to know the exact progress. rsync provides such an option as –progress to view the progress of each file:
 
1238099 100%  146.38kB/s    0:00:08  (xfer#5, to-check=169/396)

The last number shows the whole number of files to proceed. This number is true for small amount of files. But if you have to transfer a big amount, rsync will manage them in chunks, so the last number will be increased each time rsync takes another chunk of files.

My solution was just to make a dry-run and extract the real total number of files and then calculate the whole progress using it.

import subprocess
import re
import sys

print('Dry run:')
cmd = 'rsync -az --stats --dry-run ' + sys.argv[1] + ' ' + sys.argv[2]
proc = subprocess.Popen(cmd,
                                   shell=True,
                                   stdin=subprocess.PIPE,
                                   stdout=subprocess.PIPE,
                                   )
remainder = proc.communicate()[0]
mn = re.findall(r'Number of files: (\d+)', remainder)
total_files = int(mn[0])
print('Number of files: ' + str(total_files))

print('Real rsync:')
cmd = 'rsync -avz  --progress ' + sys.argv[1] + ' ' + sys.argv[2]
proc = subprocess.Popen(cmd,
                                   shell=True,
                                   stdin=subprocess.PIPE,
                                   stdout=subprocess.PIPE,
)
while True:
             output = proc.stdout.readline()
if 'to-check' in output:
             m = re.findall(r'to-check=(\d+)/(\d+)', output)
             progress = (100 * (int(m[0][1]) - int(m[0][0]))) / total_files
             sys.stdout.write('\rDone: ' + str(progress) + '%')
             sys.stdout.flush()
             if int(m[0][0]) == 0:
                      break

print('\rFinished')
Advertisements

Responses

  1. Great workaround – line 25 needs an indent.

  2. Nice catch. Thanks

  3. Thanks

  4. yegorich, just for kicks, i did a little mod to your script

    import subprocess
    import re
    import sys

    print(‘Dry run:’)
    cmd = ‘rsync -az –stats –dry-run ‘ + sys.argv[1] + ‘ ‘ + sys.argv[2]
    proc = subprocess.Popen(cmd,
    shell=True,
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    )
    remainder = proc.communicate()[0]
    mn = re.findall(r’Number of files: (\d+)’, remainder)
    total_files = int(mn[0])
    print(‘Number of files: ‘ + str(total_files))

    print(‘Real rsync:’)
    cmd = ‘rsync -ahvz –progress ‘ + sys.argv[1] + ‘ ‘ + sys.argv[2]
    proc = subprocess.Popen(cmd,
    shell=True,
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    )

    line = ”
    while True:
    output = proc.stdout.readline()
    if not output != ”:
    break

    if ‘to-check’ in output:
    m = re.findall(r’to-check=(\d+)/(\d+)’, output)
    progress = (100 * (int(m[0][1]) – int(m[0][0]))) / total_files

    sys.stdout.write(‘\rDone: ‘ + str(progress) + ‘% ‘ + ‘| ‘ + str(m[0][0]) + ‘/’ + str(total_files) + ‘ | ‘ + line)
    sys.stdout.flush()
    else:
    line = output

    print(‘\rFinished’)

  5. Thank you for the tutorial but it didn’t work. Rsync freezes in Real Rsync: Is there something I need to setup? How to fix it and how do I show this progress in Django? Thanks. Will appreciate your help!!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: