Checking you foto backups with hashdeep
|It’s not just when backing up your photos and video clips that you might be worried that the backup might become faulty over time or that errors might creep in. Hashdeep could help here.
Introduction
I have my photos and video clips on an external NVMe drive, simply because that has proven to be the best way to work with multiple devices. This data is backed up regularly, e.g. when a new photo is imported into Lightroom, using Robocopy on a NAS system.
The NAS runs with Openmediavault and ZFS as a file system. Nevertheless, this, along with other data, is backed up via USB backup on external disks that are in an ICY-Box housing. The hard drive for photos is even changed every Monday. This means that there are 2 more copies of my photos.
And where does Hashdeep come into play?
Hashdeep is available for practically all Linux distributions, including Debian, because that is what Openmediavault is based on.
Hashdeep can generate checksums for all files in a directory tree and write them to a checksum file. Hashdeep was originally developed to detect files that had been changed after a hacker attack. Here is an example of the call:
hashdeep * > ~hashed.txt
The text file now contains the paths and names of the files for which hashes were generated.
cat hashes.txt
%%%% HASHDEEP-1.0
%%%% size,md5,sha256,filename
## Invoked from: /home/pm
## $ hashdeep 22_11_2024.jex bench.sh Bilder bin Blogbeiträge bookmarks.html c.sum Dokumente Downloads JoplinBackup Musik Nextcloud nvme0n1.log nvme1n1.log Öffentlich _resources R-Lavalier-II-L-Smartlav+.WAV Schreibtisch sizer.sh svn Videos vmware-pm Vorlagen wsizes yubico-authenticator-7.1.0-linux
##
41,35052144c737180d3ceea24566ee8324,6b642776b4881ff0253caf1bc168277e070eee0be11d1e4a9049bd8ac5824066,/home/pm/c.sum
1557,2e8e571ded4ac15dd4b67146aabe4dd0,d948fbbe0d32eb1d7da2fc46dd88b4142428366a91998ef775703c1ef6812acb,/home/pm/bench.sh
554,2005cf8c8659bd2949de79fb9cbb4c47,8aab4b737230183034e887a02a8dbaa566571bdab0e31f277a6d3596df1967ba,/home/pm/wsizes
11,25931bf8908a44507163449f0b616bab,4526f17352f6d8f9f5feabb6091b89fd37b9caf5ae1bc43e0a56e8741c642c05,/home/pm/sizer.sh
216,6a0c1f31c5e93f24f1a71007166a85a6,76af459f2e8499cef1e7991ab93b9107c5f7e0e95831c24e8508d8a8f2ad8ffe,/home/pm/nvme1n1.log
222,cc1a3c8511bfcab1a7eee288d0ef978a,2bb20d1ad7eaffbf82464e2926e2808d3366053aecb976155590e75655eb3557,/home/pm/nvme0n1.log
115443,af230c729bb198a06f0764a08e4da9d0,d6d55aedf147d49dee53889c84280378e14c5153f63a116a19ce3d775a8d2bcf,/home/pm/bookmarks.html
3227648,16eebb29f065f63ecef07190cc0d51ce,f629c44abe97a6704abc3e4a47db5708008167dab985ec8f54cf3679f67d7b66,/home/pm/22_11_2024.jex
30309632,dab8c26aefb9d09a9f19e8e7329d2231,21c3888ea66b76805daa02fcdc098dae0ebe18484eade794c2947bd9188c68b7,/home/pm/R-Lavalier-II-L-Smartlav+.WAV
To ensure the consistency of the storage, a check is carried out using Hashdeep. It looks like this:
hashdeep -a -v -k ~/hashes.txt *
hashdeep: Audit passed
Files matched: 3
Files partially matched: 0
Files moved: 0
New files found: 0
Known files not found: 0
Modern hard drives can usually detect errors in data storage using the parity bit, but if such an error occurs, it cannot be corrected.
Hashdeep can be used to check whether any changes have been made to files. In conjunction with the last access times, it may then be possible to determine when this change occurred.
And the application for photo backup?
My photos are in a subfolder and in this, another subfolder is created for each year. In this annual subfolder, subfolders are then created with the date and any event in which the photos are then stored. With the -r switch, these directories can now be traversed recursively.
I then created the following script to make this as automated as possible.
#!/bin/bash
# $Id: wrapper.sh 302 2024-10-24 17:02:29Z pm $
# $Rev: 302 $
# $Author: pm $
##################################
# Fotos = foto
# blog-fotos = blog
exec 5> >(logger -t $0)
BASH_XTRACEFD="5"
PS4='$LINENO: '
incyear () {
LY=$(cat $1)
local YEAR=$((LY+1))
if [ ${YEAR} -gt $(date +%Y) ]; then
YEAR=$2
fi
echo ${YEAR} > $1
echo ${YEAR}
}
case $1 in
foto)
STARTYEAR=2003
BASEDIR="Fotos/LWH/"
CHECKYEAR=~/hashdeep/check-$1
YEAR=$(incyear $CHECKYEAR $STARTYEAR)
cd /Pool/${BASEDIR}
echo "Year: Foto $YEAR"
time hashdeep -a -r -l -vv -k ~/hashdeep/$1-${YEAR}.txt ${YEAR}/*
echo "Fotos"
;;
foto-gen)
#STARTYEAR=2008
BASEDIR="Fotos/LWH/"
cd /Pool/${BASEDIR}
time hashdeep -j 16 -c md5 -r -l ${2}/* > ~/hashdeep/foto-${2}.txt
;;
usb-foto)
# called every 20 weeks for alle years will rotate thru all 3 disks
# same for every 10 weeks or 17 weeks
STARTYEAR=2003
#mount /dev/sdj1 /root/usb
BASEDIR="Fotos/LWH"
CHECKYEAR=~/hashdeep/$1
YEAR=$(incyear $CHECKYEAR $STARTYEAR)
cd /root/usb/${BASEDIR}
echo "Year: $YEAR"
time hashdeep -a -r -l -v -k ~/hashdeep/foto-${YEAR}.txt ${YEAR}/*
#umount /dev/sdj1
;;
*)
echo "$0 Help"
echo "foto Checks hashes for Fotos"
echo "foto-gen Param Year generates hashfile"
echo "usb-foto Param Year checks USB drives hashfile"
;;
esac
Let’s start with the simplest option here, namely we generate the hashes for all photos in a year. The script would then be called as follows: wrapper.sh foto-gen 2016.
The corresponding photos are then located in $BASEDIR/2016 and the hashdeep call creates all hashes for the files and saves them in the file ~/hashdeep/foto-2016.txt. The prefix foto is used because the script is a bit longer and also for the blog photos and also for the video. The only important thing here is that you change to the base directory and hashdeep only records relative paths.
We will look at checking and using usbbackup from Openmediavault in Part II.