Need Advices about my Backup Strategy

CarlB · 15. August 2020

Hi everyone!

I opened this thread because I'm a "newbie" on backup-strategies and this kind of stuff. Moreover I'm not a system-admin or a computer engineer ( I do a totally different job), so I also don't have a particular knowledge of linux-servers administration tasks, but I know by experience (I haved loss data in the past) the importance of having data safely stored and backed-up. For this reason about last year I started to look for information about nas-servers, in particular about openmediavault and (with the valuable help of the forum) I set-up my omv-server.

OMV is awsome and with the help of nextcloud and syncthing dockers the server fits perfectly all my needs but as many newbie I've been victim of misconception about RAID that have led me to become interested in real backup and backup strategies only recently.

That being said, last month I bought an additional hard-drive and an usb3 enclosure and started to manually backing-up my data daily with rsync ( as illustrated in the omv-getting started guide).

Having to do the backup manually is became tedious so today I started to think about automating the job and I've do the following:

1) Edit /etc/fstab adding the backup disk entry (I've copy the line generated by the WebUI when i manually mount the backup disk).

2) Created the related mountpoint : /srv/dev-disk-by-label-BACKUP

3) Wrote a bash script that do what I've done manually up to now. I'm totally new to bash (this is my first script, so be good, I do my best..) .

The script execute a daily incremental backup to the backup-drive. After checking that the drive is plugged-in and that the mountpoint exist, the backup drive is mounted, the rsync job is executed and the backup drive is unmounted. The previous backup is retained. (Daily Copy + Last Day One). In order to get the script work the drive is manually power on/off at the backup-time.

The script is the following:

Bash

#!/bin/bash

# DAILY- BACKUP-JOB: The script execute a daily incremental backup to an external USB-drive.
#                    After checking that the drive is plugged-in and that the mntpoint exist,
#                    the backup drive is mounted, the rsync job is executed and the backup drive
#                    is unmounted. The previous backup is retained. (Daily Copy + Last Day One).


# Preliminary Stuff

SECONDS=0

echo -e "\nBACKUP CRON-JOB LOG -- $(date +"%a %d %b %Y")"

echo -e "\n[$(date +"%T")]--Starting Backup-Job ..."

source_lbl="DATA"
destination_lbl="BACKUP"

destination_mntpoint="/srv/dev-disk-by-label-${destination_lbl}"
source_mntpoint="/srv/dev-disk-by-label-${source_lbl}"

# Check the presence of the backup drive
echo "Checking backup drive presence ..."

if findfs LABEL=${destination_lbl} > /dev/null 2>&1; then

  echo "backup drive successfully found at: $(findfs LABEL=${destination_lbl})"

else

  ELAPSED="Elapsed Time: $(($SECONDS / 3600))hrs-$((($SECONDS /60) % 60))min-$(($SECONDS %60))sec"
  echo "error: backup drive not found."
  echo "Backup-Job failed!"
  echo -e "[$(date +"%T")]--Execution aborted.\n"
  echo $ELAPSED
  echo "---------------------------------------------------------"
  echo -e "---------------------------------------------------------\n\n"
  exit 0

fi

# Check the existence of the backup mountpoint
echo "Checking backup mountpoint presence ..."

if cd ${destination_mntpoint} > /dev/null 2>&1; then

  echo -e  "backup mountpoint successfully found at: ${destination_mntpoint}"

else

  ELAPSED="Elapsed Time: $(($SECONDS / 3600))hrs-$((($SECONDS /60) % 60))min-$(($SECONDS %60))sec"
  echo "error: backup mountpoint not found."
  echo "Backup-Job failed!"
  echo -e "[$(date +"%T")]--Execution aborted.\n"
  echo $ELAPSED
  echo "---------------------------------------------------------"
  echo -e "---------------------------------------------------------\n\n"
  exit 0

fi

# Mount the backup filesystem
echo "Mounting the backup filesystem..."
mount LABEL=${destination_lbl} ${destination_mntpoint} > /dev/null 2>&1
echo "backup filesystem successfully mounted."

# Rotating backup directories
echo "Rotating backup directories..."

mv ${destination_mntpoint}/daily.backup.0 ${destination_mntpoint}/daily.backup.2
mv ${destination_mntpoint}/daily.backup.1 ${destination_mntpoint}/daily.backup.0
mv ${destination_mntpoint}/daily.backup.2 ${destination_mntpoint}/daily.backup.1

echo "backup rotation successfully done."

# Executing rsync job
echo "Executing rsync backup job..."

rsync --archive --delete \
      --human-readable --stats \
      ${source_mntpoint}/ ${destination_mntpoint}/daily.backup.0

echo -e "\nrsync backup-job successfully done.\n"

# Unmounting backup filesystem
echo "Unmounting backup filesystem..."
umount LABEL=${destination_lbl} ${destination_mntpoint} > /dev/null 2>&1
echo "backup filesystem successfully unmounted."

# Final Stuff
ELAPSED="Elapsed Time: $(($SECONDS / 3600))hrs-$((($SECONDS /60) % 60))min-$(($SECONDS %60))sec"
echo "Backup-Job succeed!"
echo -e "[$(date +"%T")]--Execution terminated.\n"
echo $ELAPSED
echo "---------------------------------------------------------"
echo -e "---------------------------------------------------------\n\n"
exit 0

Alles anzeigen

4) creating a cron-job in the Web-UI that execute the script daily and send me the output by e-mail.

I've tested the script and the cron-job and everything is good but I'm not fully convinced and satisfied about my backup setup and I'd like to have advices from the forum about the following:

Firstly I have a question on the backup strategy itself. The data source is a RAID1 array of 1TB size, the backup destination is an external 2TB hard drive. On the backup drive are stored two fully indipendent copy of the data: the daily copy and the previous-day one. Is it possible to increase the number of copy stored (going back for example of one week or possibly more) in order to make the most of the backup drive capacity and be able to reach, if needed, also the old versions of the files? Theoretically the idea is to edit the script to mantain only the modified/deleted file on the previous days folders. Is this doable with rsync? Do you have other ideas to reach the same goal?

Secondly I have a question about security. I've read that if one of the client get infected by a ransomware, the malware can reach also the nas-server and encrypt the shares.

Is it capable also to encrypt the backup disk? To minimize risks the backup disk is not shared and is manually power on/off and automatically mounted/unmounted by the script only for the backup-time.

Is it possible to automate also this action? the idea is to let the drive on and plugged-in 24h but to activate/deactivate the usb3 pci card with the backup script only for the backup-time or maybe by using hdparm. Any idea?

But, even if it were possible I'm not sure about safeness of this operation...if I could do it also a ransomware could do that, right?

The second idea is to buy a programmable light switch (timer) and physically connect the backup drive enclosure to it, program the timer to stay on only for about 1-2 hours in the night (the daily backup only takes few minutes in most cases) and execute the cron job in that interval of time. Doing that I'll expose the backup disk to risks for more time but in the night there are also much less probability that a client get infected (they sleep) .What do you think about it? Is a good idea?

henfri · 15. August 2020

To keep more days/weeks/years of Backup with little extra space usage, use rsnapshot

Morlan · 15. August 2020

maybe check the usbbackup-plugin

Adoby · 15. August 2020

As has been mentioned, rsnapshot is an alternative.

However, I did as you did and wrote my own backup script based on rsync. It started more than 10 years ago, and it has been rewritten and improved many times.

Using rsync it is possible to create hardlinked snapshots. This means that you can create what looks like a full backup copy, but only new or modified files have to be backed up. The rest can be hardlinked from a previous snapshot.

This makes it possible to store many versions in little space. I typically store 7 daily snapshots, four weekly and at least three monthly. My script not only creates new snapshots, it also deletes old snapshots in a way that builds up the requested number of snapshots over time.

I don't worry about ransom ware. I only use Linux and Android and I am careful not to use elevated access rights and I am careful about what code is executed and where.

Here is my script: https://github.com/WikiBox/snapshot.sh

CarlB · 15. August 2020

henfri , Morlan thanks for advices. I've checked plugins that you mention and yes, they are great and easy to use, but it seems that they both work with shared folder only so I would have to set a separate job for each shared folder (I don't have a "master" shared folder that contain the others, my fault). Moreover I need to backup the entire DATA disk wich contains all the shared folders but also the nextcloud data folder that is not a shared folder. For theese reasons I prefer to keep going with the personal rsync script way as Adoby does.

I've downloaded your script Adoby and I'll take a look and study that in the next days in order to modify my scripts and get hardlinked snapshots too in order to reach the goal of the first question.

For ransomware topic you are a lucky guy! In the sense that I'm also careful about what code is executed and where but I use windows for work (no alternatives because there aren't softwares alternatives for linux) and the omv-server is also used by "non-careful users" (and they also use windows too) so I'm concerned about theese kind of disasters. So in my position, what would you do? Are the ideas previously explained good (or at least reasonably dangerous)?

In the case the DATA disk get's encrypted, the last snapshot remain clean right? If I understend correctly in this case the cron-job would create a new snapshot corrupted (an independent copy full of encrypted file) but the last clean snapshot is manteined because all the hardlinks are removed (every file is modified/encrypted) right?

Adoby · 15. August 2020

Have at it!

Please post your modified script. I am very interested to see if I can improve my script.

I am not at all sure how the ransom ware works. I assume it tries to detect all volumes and at some certain point in time starts to encrypt them.

One way to protect the backups could be to have the backup script check to see if a sentinel file in the source volume is intact, alternatively store the script in the source volume and run it from there as a parameter to bash. If the source is encrypted the script can't run. I assume...

The sentinel file(s) could be some Word-documents or Excel-files. Just the sort of files ransomware would like to encrypt.

Also you could have the script mount and unmount the backup volume, as needed. That way the backups are only vulnerable while the backup is running. You could also have the script automatically verify that the backup volume is OK, after a successful backup, by mounting the backup volume read only and checksum some sentinel file(s).

Even more protection could be to have two separate backup volumes and alternate between them. Check that both are OK before mounting any of them rw to actually backup anything.

Possibly you could create a safe_mount.sh script that first mounts a filesystem read only, finds all sentinel files and checks them. Sentinel files might match "^Sentinel######\.ext$", where ###### is the checksum of the file and ext is some document format.

Also a safe_unmount.sh that does the same but to verify that the filesystem is OK when it is unmounted.

I am not sure if this will foil ransomware, but it might?

Another option could be to have a two-step backup. First you backup as normal on your NAS, that has Windows clients and SMB. Then you have another NAS that is used ONLY for backups. It only uses NFS (or SSH?) and it can ONLY connect to the main NAS. Perhaps even using a direct connection to a dedicated network card in the main NAS? Then you have the backup NAS to periodically power up and checks sentinel file in the main NAS and update backup snapshots. And then powers down.

CarlB · 15. August 2020

Very intersting advices. Thank you!

I'll study theese solutions and try to implement the script as soon as possible but due to job and to my newbie skills on bash I don't grant that this happen rapidly. Anyway when things ready I'll post everything here to continue the discussion. See you soon!

henfri · 15. August 2020

You can use rsnapshot without the Plugin.

Use Something tested...

Adoby · 15. August 2020

rsync is very well tested...

sismondi · 15. August 2020

duplicati work fine too.

Adoby · 16. August 2020

For me, the main reason to using my own script, rather than some ready-built solution, is that I like to have as much control and understanding as possible of what the backup software is doing. Also I want the files in my main backups to be unencrypted, uncompressed and stored in a way that exactly mirrors how they were stored in the source filesystem. KISS!

My script is very small and relatively simple. Possible to understand in detail. The heavy lifting is done by rsync. It is rsync that does all the copying and hard linking. And I trust rsync.

My script only controls the parameters used by rsync and afterwards purge old backups.

I don't recommend anyone to grab my script and use it. Instead, feel free to have a look at the script. Modify it and include ideas from it into your own script. And use that.

I am (slowly) rewriting my script in C++ to do what my script+rsync does now. But to also use file check sums to avoid copying unchanged files that has only been renamed or moved. And also to use the check sums for bit rot protection. During a backup two copies of each (old) file are available, source and backup. Hopefully at least one of the copies is correct. If one of them has been corrupted it can (most likely) be replaced by the correct copy. Either from source to backup or from backup to source. I will not replicate all of rsync capabilities. Just locally mounted filesystems (possibly via NFS from remote servers) syncing and hard linking, but with added use of check sums, both to identify files and to check that files are not corrupt. Hopefully I will have something working later this year.

The idea is to use the backup software, check sums and the backup copy to provide a form of extra redundancy and protection for files stored long term. Especially stuff like rarely changing media files like e-books, music, photos and video.

CarlB · 19. August 2020

Hi everyone!

I have installed rsnapshot and take a quick look on how it works and how it has to be configured and yep, it's certainly the easiest and faster way to have incremental snapshot but the idea to write my own script fashinated me and I have decide to stay on this way, also because I agree with Adoby observations about the much higher backup-strategy customization level reachable with a personal rsync script ( pre-post backup automated stuffs, naming style conventions, logfiles, mail notifications with personalized style...).

So starting from the previous script, inspired by the Adoby solution I have completely rewrite the script in a snapshot-oriented manner in a way that feets my needs. It isn't completely ultimated ( i have to implement the pre/post backup security checks discussed early...) and I have to do some final test but it works and it's pretty much done. (I know maybe it's bloated or don't well written due to my limitate knowledge of bash but it works and do what I want so ok for the moment...).

The script basically do what discussed so far...It implement a snapshot-based versioned-backup strategy of my omv-data drive to an external usb3 drive following the naming convention that i want. The drive is automatically mounted/unmounted within the script execution, logfiles are generated and sended as attachment via mail using mutt because mail doesn't allow me to attach files ( the personalized script log is linked in the mail body and the rsync detailed log is attacched).

The core job is done using rsync with --link-dest option, and the clean/rotation job of the previous snapshots is done parsing the directory list with a couple of for loops based on the snapshot-number that is part of the snapshot name.

As I said before the security checks on sentinel files are only sketchy and will be implemented on next upgrades.

The script takes in input the backup frequency desidered (e.g. daily) like rsnapshot in order to use the same script for the various frequency.

Source and Destination as well as the Retention Policy could be setted on the variable declaration sections.

So... Here's the code ---> (seebelow) . Any advices are welcome!

henfri · 19. August 2020

Zitat von Adoby

rsync is very well tested...

And your Script?

And your c++ Programm?

olduser · 20. August 2020

CarlB:

I'm not a PROgrammer and am a habitual function user, but you might want to put all variable assignments (especially the ones that touch disks) in their own file which will be the main file people read. For instance, I don't want to have to scroll 200 lines to see where the logs are going. Same for the actual Rsync command as well and how it's called. For instance, what if I want to use the script arbitrarily in a one-off fashion? If I did, that would kind of suggest to detach the cron logic.

It's always easy to critique someone's stuff, especially in contrast to your own style, but you might want to use functions and do some perm checks.

Being that I haven't used "hdparm", I have a honest question: How do you know this succeeds?

Code

# Put the backup disk to standby mode (spin-down)
echolog "Turning off backup disk..."
hdparm -y ${backup_dev} > /dev/null 2>&1
echolog "backup disk successfully turned off."

Can that not fail? I've never used it, but I see hdparm has the "-b", "-C" and "-I" switches:

Code

-b   Get/set bus state (0 == off, 1 == on, 2 == tristate)
-C   Check drive power mode status
-I   Detailed/current information directly from drive

Again though, I've never used it so I don't know if any switch can detect status, so maybe -y always works.

Adoby · 21. August 2020

Zitat von henfri

And your Script?
And your c++ Programm?

I have tested my script a lot more than I have tested any other backup solution. And I understand exactly what it does and how.

And at some point I will have tested my C++ program much more than any other combined backup snapshot, bitrot detection and correction and file level deduplication solution. As far as I know there are no other similar solutions, so that should be easy.

Are you saying that nobody should write any new scripts or any new software, because old software is better tested? So we should never do anything new or different? Only use what is already written?

There is nothing that prevents you from using whatever software you want to use.

CarlB · 21. August 2020

During Final Tests of the previous script some bugs came up. Bug were fixed and some minor update to the rotation engine were done. Updated code is attacched below.

olduser :

Zitat

[...] you might want to put all variable assignments (especially the ones that touch disks) in their own file which will be the main file people read. For instance, I don't want to have to scroll 200 lines to see where the logs are going. Same for the actual Rsync command as well and how it's called.

thanks for the advices. I was thinking about spread the code in multiple files using functions to separate tasks and I will certainly do that in the future.

Zitat

[...] what if I want to use the script arbitrarily in a one-off fashion? If I did, that would kind of suggest to detach the cron logic

I don't think about detaching the cron logic because the script was written for this purpose. If I need to sync to folders in a one-off manner I simply run an rsync job from the command line.

Zitat

Being that I haven't used "hdparm", I have a honest question: How do you know this succeeds? [...] Again though, I've never used it so I don't know if any switch can detect status, so maybe -y always works.

I'm not completely sure that hdparm -y always work but I made some tests and (if the disk support power management functionality, i.e. hdparm -C returns a valid answer) the command works regardless of the mount status of the drive. Being that said you're right and bacause there is no 100% guarantee that umount and hdparm -y works I have inserted conditional tests also for them.

About the "testing" matter I'm completely agree with Adoby . I don't reccomend to anyone to use my-script without analyzing & understand it and certainly I don't mean that one cannot use the ready-to-use solutions present.

I want that the backups are done following exactly my needs ( custom logs, naming convention...) and I think that the best way of doing it it's via a custom script.

About the reliability of that solution I agree that the strategy should be based on something well tested but once the custom rotation engine is validated I don't have any doubt about the reliability of the script because it simply run rsync (whose reliability is out of discussion).

my-backup.sh.txt

Need Advices about my Backup Strategy

CarlB 15. August 2020

Jetzt mitmachen!