Doing large backups with rsync can take really long time. So it is very important to know the exact progress. rsync provides such an option as –progress to view the progress of each file:
1238099 100% 146.38kB/s 0:00:08 (xfer#5, to-check=169/396)
The last number shows the whole number of files to proceed. This number is true for small amount of files. But if you have to transfer a big amount, rsync will manage them in chunks, so the last number will be increased each time rsync takes another chunk of files.
My solution was just to make a dry-run and extract the real total number of files and then calculate the whole progress using it.
import subprocess import re import sys print('Dry run:') cmd = 'rsync -az --stats --dry-run ' + sys.argv[1] + ' ' + sys.argv[2] proc = subprocess.Popen(cmd, shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, ) remainder = proc.communicate()[0] mn = re.findall(r'Number of files: (\d+)', remainder) total_files = int(mn[0]) print('Number of files: ' + str(total_files)) print('Real rsync:') cmd = 'rsync -avz --progress ' + sys.argv[1] + ' ' + sys.argv[2] proc = subprocess.Popen(cmd, shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, ) while True: output = proc.stdout.readline() if 'to-check' in output: m = re.findall(r'to-check=(\d+)/(\d+)', output) progress = (100 * (int(m[0][1]) - int(m[0][0]))) / total_files sys.stdout.write('\rDone: ' + str(progress) + '%') sys.stdout.flush() if int(m[0][0]) == 0: break print('\rFinished')