Deduplication of Rsnapshot Backup-Data possible??

    • OMV 2.x

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • Deduplication of Rsnapshot Backup-Data possible??

      Hello everybody!

      Is there any way to consolidate backup-data from incremental Rnsapshot backups?


      My problem: Disksize on backup-drive is getting low, since i had and still will have some bigger data moves inside my datastrukture. (ex. import of photos of a year, correction of a typo in a parent-folder). All of these causes a copy instead of a hardlink, even though there's actually no change of the data itself.

      Is there a possibility to deduplicate the data? I'm shure, that there will be a quite big potential to save data size.

      google got me to a program called "hardlink" ( manpages.debian.org/stretch/hardlink/hardlink.1.en.html or dinotools.de/2013/07/10/linux-platz-sparen-mit-hardlinks/ )

      Has anyone experience with this?

      I'm Linux-beginner, everytime again, when I've to maintain my NAS. Therefore I was not able to test it myself (probably caused by to old OMV 2.2-Version... not shure about that)


      every advice highly welcome!
    • I use rsync directly, not rsnspshot, for my incremental snapshot style backups. I suspect the backup storage is very similar.

      When I restructure the data storage I typically also purge all but the latest snapshot and restructure the latest snapshot to mimic the new data storage.That way I avoid that the same file is stored twice in the backup storage. To be doubly safe you could also delete the old snapshot when the new has finished.

      A clever backup utility should perhaps automatically hardlink to the same file in the backup storage even if it is in the wrong place, using a combination of file metadata and file checksum. Could also handle duplicate files in backup storage the same way.
      OMV 4, 7 x ODROID HC2, 1 x ODROID HC1, 3 x 12TB, 2 x 8TB, 1 x 4TB, 1 x 2TB SSHD, 1 x 500GB SSD, GbE, WiFi mesh
    • Well, doing so i would not be able to roll back, would I? I really like it, to be able to roll back since I don't know, now which file or which state I'm gonna miss.

      If these 'hardlink' programm works, it would be exactly that (checking checksum and hardlinking everythink, which exists more than once. Perfect together with Rsnapshot (which basicly is rsync, as far as I understoot) to have it run once a month for ex. purging everything which doubelt in the meantime.
    • Sorry, i don't get the point... It does not have something to do, where to install the "hardlink"-command, or does it???

      The files shall be hardlinked on (or within) the filesystem (partition) were they already are. There's a little space of 116 GB free (out of 8TB), which is not much, when some major change will happen.
    • If the files are part of rsync snapshots, made by rsnapshot, then they already are hardlinked. You wanted to have the next snapshot of the newly structured content to be hardlinked to the old snapshots?

      That may be difficult if you are low on space. I assume that you must have an old snapshot and a new snapshot on the same filesystem in order to successfully combine them using the hardlink utility.

      But I have no direct experience from either rsnapshot or the hardlink utility. I may be wrong...

      If you already have the new snapshot on the filesystem, then you should be able to use the hardlink utility without problems.
      OMV 4, 7 x ODROID HC2, 1 x ODROID HC1, 3 x 12TB, 2 x 8TB, 1 x 4TB, 1 x 2TB SSHD, 1 x 500GB SSD, GbE, WiFi mesh

      The post was edited 1 time, last by Adoby ().

    • OK, now I think I got it.

      There already is quite an amount of snapshots (the latest from last night) on the filesystem. They are fairly hardlinked, so long as Rsnapshot found the Data on the same place in same condition. But since there were some big movements, renaming of folders etc. in the origin-datastrukture in the past, there will be quite a lot of data, which wasn't recognized by Rsnapshot to be hardlinked but is actually identical.

      I'm looking for a possibility to hardlink these Backup-data afterwards in order to save data while still be able to change parts of the origin-folderstrukture without instantly running low on Backup-disk-space

      So IF the 'hardlink'-command works as described, it should work I assume. Maybe there is somebody who already worked with it???
    • rsnapshot is just a frontend for rsync and one main point of rsnapshot is to hardlink files that haven't changed. If the files aren't hardlinked, they must have changed (or they are on different filesystems). There is no utility that is going to fix this.
      omv 4.1.22 arrakis | 64 bit | 4.15 proxmox kernel | omvextrasorg 4.1.15
      omv-extras.org plugins source code and issue tracker - github

      Please read this before posting a question and this and this for docker questions.
      Please don't PM for support... Too many PMs!
    • Well, I understood, that rsnapshot does not hardlink, if the file is just moved or the containing folder renamed. Is this a "different filesystem" already??

      Or will rsnapshot hardlink files, which are identical but have been moved? (it actually does not hardlink them, as I found out in my own experiment)

      But the programm 'hardlink' was described doing so???
    • For rsync (and rsnapshot?) to hard link the files have to be unchanged AND not moved since the last snapshot.

      But, as I said, you can cheat and move the files in the old snapshot, to match the new structure, before you take a new snapshot. Then rsync can hardlink fine.

      And from what I saw of the man page of the hardlink utility it can "fix" hard links between snapshots when the files have been moved but otherwise are unchanged.

      Rather interesting. Would be nice to have a backup utility that can do snapshots and hard links just like rsync, but also can search for candidates to hard link to.

      For instance it might figure out that the files:

      /work/ongoing/project1/big.zip
      /work/finished/2019/project1/big.zip.bak

      actually are the same files but with different names/locations.
      OMV 4, 7 x ODROID HC2, 1 x ODROID HC1, 3 x 12TB, 2 x 8TB, 1 x 4TB, 1 x 2TB SSHD, 1 x 500GB SSD, GbE, WiFi mesh
    • Adoby wrote:

      And from what I saw of the man page of the hardlink utility it can "fix" hard links between snapshots when the files have been moved but otherwise are unchanged.
      Exactly that's what I'm looking for!! It would be ways too much work to fix a folderstrukture in a daily, weekly and monthly saved snapshot-backup-strukture manualy or maybe actually not possible...

      At the moment I'm updating to Debian 9.8 / OMV 4 (was the time to do so anyways), hoping, that installation or use of 'hardlink'-programm will run after that to try it...
    • hardlinks are not what you want if you are moving stuff all over the place and still want them to not take up space. You should be using a CoW filesystem and snapshots. I have about 20 snapshots (using rsnapshot) of almost 6TB worth of data on an 8 TB drive. I don't move things around all the time though.
      omv 4.1.22 arrakis | 64 bit | 4.15 proxmox kernel | omvextrasorg 4.1.15
      omv-extras.org plugins source code and issue tracker - github

      Please read this before posting a question and this and this for docker questions.
      Please don't PM for support... Too many PMs!
    • Well, if you have reorganized the original folder structure on the data storage, there is, hypothetically, no reason why a backup utility, otherwise similar to rsync, couldn't be smart enough during the backup to search for hard link candidates within the previous snapshot in the backup storage. Then renamed/moved folders/files wouldn't matter. They would still be hard linked, if the files match.

      And it wouldn't even be strictly necessary with a CoW or checksummed filesystem. Checksums could be calculated separately.

      You could perhaps specify different "hard link levels" to use when searching for hard link candidates while updating snapshots. From none over standard to aggressive: 0=no hl, 1=hl if matching path+name+size+time, 2=hl if matching chksum+name+size+time, 3=hl if matching chksum+size.

      Bausau wrote:

      Exactly that's what I'm looking for!! It would be ways too much work to fix a folderstrukture in a daily, weekly and monthly saved snapshot-backup-strukture manualy or maybe actually not possible...
      Well, you should only need to update the latest snapshot. That is the only one that is used for hard links. I assume...

      But I typically purge old snapshots, except the latest, as well. If I change something. If the folder structure has changed I wouldn't want to restore something that use the old structure. But then I backup several folder trees separately depending on their contents. For instance only movies in one snapshot backup structure. And only ebooks in another. And so on... And I launch the rsync snapshot script sequentially for each folder tree that is backed up.
      OMV 4, 7 x ODROID HC2, 1 x ODROID HC1, 3 x 12TB, 2 x 8TB, 1 x 4TB, 1 x 2TB SSHD, 1 x 500GB SSD, GbE, WiFi mesh

      The post was edited 1 time, last by Adoby ().

    • Adoby wrote:

      couldn't be smart enough during the backup to search for hard link candidates within the previous snapshot in the backup storage. Then renamed/moved folders/files wouldn't matter. They would still be hard linked, if the files match.

      And it wouldn't even be strictly necessary with a CoW or checksummed filesystem. Checksums could be calculated separately.
      I suggest this because it is easy to still get at the backed up files structure like rsync/rsnapshot. If you had to checksum all the files and you had a lot of data, it might be a struggle to be able to execute a backup hourly. If the filesystem way is not the way, then borgbackup is very good since it dedupes at the block level and compresses. It also allows you to mount backups.
      omv 4.1.22 arrakis | 64 bit | 4.15 proxmox kernel | omvextrasorg 4.1.15
      omv-extras.org plugins source code and issue tracker - github

      Please read this before posting a question and this and this for docker questions.
      Please don't PM for support... Too many PMs!
    • Users Online 1

      1 Guest