Table des matières
Paquet Debian
Le backup de Chapril est déployé via un paquet Debian sur un repo privé. On décrit ici les points essentiels du paquet à défaut de publier le repo, ainsi que la configuration du contrôle d'intégrité des archives.
Aspects backup
Script de backup
C'est fournis par borgmatic.
On y adjoint une configuration dans /etc
:
- /etc/borgmatic.d/root.yaml
location: source_directories: - / exclude_patterns: - '/dev' - '/media/*' - '/mnt/*' - '/proc' - '/run/*' - '/srv/backups/*.chapril.org' - '/sys' - '/var/cache/*' - '/var/lib/backuppc/*' - '/var/lib/libvirt/images/' repositories: - 'ssh://backup@backup.chapril.org:/srv/backups/{fqdn}' storage: ssh_command: ssh -p 2242 -A archive_name_format: '{now:%Y-%m-%dT%H:%M:%S}' # pour bullseye : borg_cache_directory: /var/cache/borg consistency: check_last: 2 prefix: '20' retention: keep_daily: 7 keep_weekly: 4 prefix: '20' hooks: before_backup: - echo "Launching root backup at $(date -Iseconds)" - for file in /etc/borg/scripts/pre-hooks/* ; do test -e "$file" || continue; echo "Executing $file..."; $file; done after_backup: - for file in /etc/borg/scripts/post-hooks/* ; do test -e "$file" || continue; echo "Executing $file..."; $file; done - echo "Succeeded root backup at $(date -Iseconds)" - borgmatic info --archive latest --json on_error: - echo "Failed root backup at $(date -Iseconds)" # pour bullseye : # after_check: # - echo "Succeeded root checks at $(date -Iseconds)" # after_prune: # - echo "Succeeded root prune at $(date -Iseconds)"
Entrée Systemd
On déclenche avec un timer systemd qui retarde le démarrage avec un timing aléatoire pour éviter le ddos de Félicette.
- /etc/systemd/system/borgmatic.timer
[Unit] Description=Run borgmatic backup [Timer] # Will trigger at 01:00 each day # + 0-60 random minutes # + 30 minutes delay from borgmatic.service OnCalendar=*-*-* 01:00:00 Persistent=true RandomizedDelaySec=60 minutes [Install] WantedBy=timers.target
- /etc/systemd/system/borgmatic.service
[Unit] Description=borgmatic backup Wants=network-online.target After=network-online.target ConditionACPower=true [Service] Type=oneshot ## Lower CPU and I/O priority. Nice=19 CPUSchedulingPolicy=batch IOSchedulingClass=best-effort IOSchedulingPriority=7 IOWeight=100 ## Logs StandardOutput=syslog StandardError=syslog SyslogIdentifier=borgmatic # Prevent rate limiting of borgmatic log events. LogRateLimitIntervalSec=0 ## Launcher # Delay start to prevent backups immediately upon system startup ExecStartPre=sleep 30m ExecStart=borgmatic -v1 Restart=no
Scripts de pre hooks
- scripts/pre-hooks/dump-mysql
#!/bin/bash if ! test -x /usr/bin/mysql ; then exit 0 fi backup_dir=/var/backups/mysql databases=$(mysql --defaults-file=/etc/mysql/debian.cnf -B -N --execute="SHOW DATABASES" | grep -v 'lost+found\|performance_schema\|information_schema') for db in $databases ; do mkdir -p $backup_dir chmod 700 $backup_dir mysqldump --defaults-file=/etc/mysql/debian.cnf --events $db | bzip2 - > $backup_dir/$db.sql.bz2 done
- scripts/pre-hooks/dump-pgsql
#!/bin/bash if ! test -x /usr/bin/psql ; then exit 0 fi backup_dir=/var/backups/pgsql databases=$(su - postgres -c 'psql -c "\l"' | tail -n+4|cut -d'|' -f 1|sed -e '/^ *$/d'|sed -e '$d'| grep -v '^[[:space:]]*template0[[:space:]]*$') for db in $databases ; do mkdir -p $backup_dir chmod 700 $backup_dir su - postgres -c "pg_dump $db" | bzip2 - > $backup_dir/$db.sql.bz2 done
- scripts/pre-hooks/dump-influxdb
#!/bin/bash if test -x /usr/bin/influxd ; then backup_dir=/var/backups/influxdb db=icinga2 # Prepare. mkdir -p $backup_dir chmod 700 $backup_dir # Backup. influxd backup -portable -database $db -host localhost:8088 $backup_dir/$db # Prune. find $backup_dir/$db -type f -mtime +2 -delete fi
- scripts/pre-hooks/dump-selections
#!/bin/bash backup_dir=/var/backups/selections dpkg --get-selections > $backup_dir
Script de post install
- debian/postinst
#!/bin/sh # postinst script for backup-chapril # # see: dh_installdeb(1) # summary of how this script can be called: # * <postinst> `configure' <most-recently-configured-version> # * <old-postinst> `abort-upgrade' <new version> # * <conflictor's-postinst> `abort-remove' `in-favour' <package> # <new-version> # * <postinst> `abort-remove' # * <deconfigured's-postinst> `abort-deconfigure' `in-favour' # <failed-install-package> <version> `removing' # <conflicting-package> <version> # for details, see https://www.debian.org/doc/debian-policy/ or # the debian-policy package case "$1" in configure) backup_host="backup@backup.chapril.org" err=1 # on teste si ya une connectivité ssh et s'il faut initialiser le dépot ssh -p 2242 -A $backup_host -o BatchMode=yes true if [ 0 -eq $? ] then # si oui on teste s'il faut initier le dépot borg_bin="/usr/bin/borg" export BORG_RSH="ssh -p 2242 -A" backup_dest="$backup_host:/srv/backups/`hostname --fqdn`" $borg_bin list $backup_dest if [ $? -ne 0 ] then # si il faut on initie le dépot $borg_bin init --encryption none $backup_dest if [ 0 -eq $? ] then echo " ############################################################ " echo " # Dépot initialisé # " echo " ############################################################ " err=0 fi else echo "Dépot déjà initialisé" err=0 fi fi if [ 0 -ne $err ] then # si non on indique comment initier le dépot borg_bin="/usr/bin/borg" backup_dest="$backup_host:/srv/backups/`hostname --fqdn`" echo " ############################################################ " echo " # Impossible de vérifier et/ou d'initialiser le dépot. # " echo " # # " echo " # Vérifier la connectivité SSH : # " echo " # ssh -p 2242 -A backup@backup.chapril.org # " echo " # # " echo " # Puis initialisez le dépot à la main : # " echo BORG_RSH=\"ssh -p 2242 -A\" $borg_bin init --encryption none $backup_dest echo " # # " echo " ############################################################ " fi ;; abort-upgrade|abort-remove|abort-deconfigure) ;; *) echo "postinst called with unknown argument \`$1'" >&2 exit 1 ;; esac # dh_installdeb will replace this with shell code automatically # generated by other debhelper scripts. #DEBHELPER# exit 0
Rsyslog
- /etc/rsyslog.d/borgmatic.conf
if $programname == 'borgmatic' then /var/log/borgmatic.log & stop
Log rotate
- debian/borgmatic
/var/log/borgmatic.log { rotate 6 weekly compress missingok notifempty }
Configuration de l'hote
C'est surtout du ssh.
- __felicette__/etc/ssh/authorized_keys/backup
command="borg serve --restrict-to-path /srv/backups/dns.cluster.chapril.org",no-pty,no-agent-forwarding,no-port-forwarding,no-X11-forwarding,no-user-rc ssh-ed25519 ... root@dns.cluster.chapril.org command="borg serve --restrict-to-path /srv/backups/admin.cluster.chapril.org",no-pty,no-agent-forwarding,no-port-forwarding,no-X11-forwarding,no-user-rc ssh-ed25519 ... root@admin.cluster.chapril.org command="borg serve --restrict-to-path /srv/backups/mail.cluster.chapril.org",no-pty,no-agent-forwarding,no-port-forwarding,no-X11-forwarding,no-user-rc ssh-ed25519 ... root@mail.cluster.chapril.org ...
Configuration du monitoring
On a un script qui parse sur chaque machine le log de backup et qui est déployé par le paquet monitoring-plugins-chapril :
- /usr/lib/nagios/plugins/check_borgmatic
#!/usr/bin/env python3 import datetime, itertools, os, re now = datetime.datetime.now(datetime.timezone.utc) max_backup_delay = datetime.timedelta(1, 7200) def get_name(match): return match.group('name') def check_backup(filename): with open(filename) as f: logs = f.read() mixed_statuses = list(re.finditer(r'(?P<status>Succeeded|Failed) (?P<name>\w+) backup at (?P<date>\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d\+\d\d:\d\d)$', logs, re.MULTILINE)) for name, statuses in itertools.groupby(sorted(mixed_statuses, key=get_name), key=get_name): last = sorted(statuses, key=lambda x: x.group('date'))[-1] print('{name}: {status} at {date}'.format(**last.groupdict())) last_date = datetime.datetime.fromisoformat(last.group('date')) last_status = last.group('status') if last_status != 'Succeeded' or now - last_date > max_backup_delay: failure.append(name) failure = [] try: check_backup ("/var/log/borgmatic.log") except Exception: check_backup ("/var/log/borgmatic.log.1") if failure: exit (1) else: exit (0)
Et la conf icinga2 :
- __admin__/etc/icinga2/zones.d/global-templates/services/backups.conf
object CheckCommand "backup" { command = [ "sudo", PluginDir + "/check_borgmatic" ] } apply Service "Backup " { import "generic-service" check_command = "backup" command_endpoint = host.vars.client_endpoint assign where host.address && !host.vars.external }
Aspects contrôle d'intégrité
On contrôle directement chaque nuit sur la machine où les backups sont stockés (Félicette).
Script de contrôle
- __felicette__/srv/bin/check_backup.sh
#! /bin/bash logger="/var/log/check_backup.log" borg_bin="/usr/bin/borg" backup_dest="/srv/backups/" echo ======================================================================== >> $logger echo " New backup check" >> $logger echo ======================================================================== >> $logger date >> $logger echo "" >> $logger cd $backup_dest for repository in $(ls -d $backup_dest/*$(hostname -d)) do echo "== Checking $repository" >> $logger date >> $logger echo "" >> $logger $borg_bin check $repository 2>&1 >> $logger rc=$? if [[ $rc != 0 ]]; then exit $rc; fi done echo "" >> $logger date >> $logger echo Returned $rc >> $logger echo ======================================================================== >> $logger exit $rc
Entrée Cron
- __felicette__/etc/cron.d/check_backup
00 4 * * * root bash /srv/bin/check_backup.sh
Log rotate
- __felicette__/etc/logrotate.d/check-backup
/var/log/check_backup.log { weekly rotate 52 compress delaycompress missingok notifempty create 644 backup backup }
Configuration du monitoring
On a un script qui parse sur la machine le log de check_backup :
- __felicette__/usr/local/lib/nagios/plugins/check_check_backup
#!/usr/bin/env python # -*- encoding:utf8 -*- import datetime, os, re, locale today= datetime.datetime.now () max_backup_delay = datetime.timedelta (1, 7200) def last_backup (log_file): with open(log_file) as s: logs_ok = re.findall (r'^([ a-zéûA-Z:,0-9]*)( \(UTC\+0[12]00\))?\nReturned 0\n={30}', s.read (), re.MULTILINE)[-1][0] print "Last backup check : " + logs_ok try: return datetime.datetime.strptime (logs_ok, '%a %b %d %X %Z %Y') except: locale.setlocale(locale.LC_ALL, 'fr_FR.UTF-8') return datetime.datetime.strptime (logs_ok, '%A %d %B %Y, %X') try: last_backup_date= last_backup ("/var/log/check_backup.log") except: last_backup_date= last_backup ("/var/log/check_backup.log.1") if today - last_backup_date < max_backup_delay: exit (0) else: exit (1)
Et la conf icinga2 :
- __admin__/etc/icinga2/zones.d/global-templates/services/backups.conf
object CheckCommand "check_backup" { command = [ "/usr/local/lib/nagios/plugins/check_check_backup" ] }
- __admin__/etc/icinga2/zones.d/master/galanga/icinga2.conf
/* Backup checks */ apply Service "Check Backup " { import "generic-service" check_command = "check_backup" command_endpoint = host.vars.client_endpoint assign where host.name == "felicette.cluster.chapril.org" }