Table des matières

Paquet Debian

Le backup de Chapril est déployé via un paquet Debian sur un repo privé. On décrit ici les points essentiels du paquet à défaut de publier le repo, ainsi que la configuration du contrôle d'intégrité des archives.

Aspects backup

Script de backup

C'est fournis par borgmatic.

On y adjoint une configuration dans /etc :

/etc/borgmatic.d/root.yaml
location:
  source_directories:
    - /
  exclude_patterns:
    - '/dev'
    - '/media/*'
    - '/mnt/*'
    - '/proc'
    - '/run/*'
    - '/srv/backups/*.chapril.org'
    - '/sys'
    - '/var/cache/*'
    - '/var/lib/backuppc/*'
    - '/var/lib/libvirt/images/'
  repositories:
    - 'ssh://backup@backup.chapril.org:/srv/backups/{fqdn}'

storage:
  ssh_command: ssh -p 2242 -A
  archive_name_format: '{now:%Y-%m-%dT%H:%M:%S}'
  # pour bullseye : borg_cache_directory: /var/cache/borg

consistency:
  check_last: 2
  prefix: '20'

retention:
  keep_daily: 7
  keep_weekly: 4
  prefix: '20'

hooks:
  before_backup:
    - echo "Launching root backup at $(date -Iseconds)"
    - for file in /etc/borg/scripts/pre-hooks/*  ; do test -e "$file" || continue; echo "Executing $file..."; $file; done
  after_backup:
    - for file in /etc/borg/scripts/post-hooks/* ; do test -e "$file" || continue; echo "Executing $file..."; $file; done
    - echo "Succeeded root backup at $(date -Iseconds)"
    - borgmatic info --archive latest --json
  on_error:
    - echo "Failed root backup at $(date -Iseconds)"
# pour bullseye :
#  after_check:
#    - echo "Succeeded root checks at $(date -Iseconds)"
#  after_prune:
#    - echo "Succeeded root prune at $(date -Iseconds)"

Entrée Systemd

On déclenche avec un timer systemd qui retarde le démarrage avec un timing aléatoire pour éviter le ddos de Félicette.

/etc/systemd/system/borgmatic.timer
[Unit]
Description=Run borgmatic backup
 
[Timer]
# Will trigger at 01:00 each day
# + 0-60 random minutes
# + 30 minutes delay from borgmatic.service
OnCalendar=*-*-* 01:00:00
Persistent=true
RandomizedDelaySec=60 minutes
 
[Install]
WantedBy=timers.target
/etc/systemd/system/borgmatic.service
[Unit]
Description=borgmatic backup
Wants=network-online.target
After=network-online.target
ConditionACPower=true
 
[Service]
Type=oneshot
 
## Lower CPU and I/O priority.
Nice=19
CPUSchedulingPolicy=batch
IOSchedulingClass=best-effort
IOSchedulingPriority=7
IOWeight=100
 
## Logs
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=borgmatic
# Prevent rate limiting of borgmatic log events.
LogRateLimitIntervalSec=0
 
## Launcher
# Delay start to prevent backups immediately upon system startup
ExecStartPre=sleep 30m
ExecStart=borgmatic -v1
Restart=no

Scripts de pre hooks

scripts/pre-hooks/dump-mysql
#!/bin/bash
 
if ! test -x /usr/bin/mysql ; then
    exit 0
fi
 
backup_dir=/var/backups/mysql
databases=$(mysql --defaults-file=/etc/mysql/debian.cnf -B -N --execute="SHOW DATABASES" | grep -v 'lost+found\|performance_schema\|information_schema')
 
for db in $databases ; do
    mkdir -p $backup_dir
    chmod 700 $backup_dir
    mysqldump --defaults-file=/etc/mysql/debian.cnf --events $db | bzip2 - > $backup_dir/$db.sql.bz2
done
scripts/pre-hooks/dump-pgsql
#!/bin/bash
 
if ! test -x /usr/bin/psql ; then
    exit 0
fi
 
backup_dir=/var/backups/pgsql
databases=$(su - postgres -c 'psql -c "\l"' | tail -n+4|cut -d'|' -f 1|sed -e '/^ *$/d'|sed -e '$d'| grep -v '^[[:space:]]*template0[[:space:]]*$')
 
for db in $databases ; do
    mkdir -p $backup_dir
    chmod 700 $backup_dir
    su - postgres -c "pg_dump $db" | bzip2 - > $backup_dir/$db.sql.bz2
done
scripts/pre-hooks/dump-influxdb
#!/bin/bash
 
if test -x /usr/bin/influxd ; then
    backup_dir=/var/backups/influxdb
    db=icinga2
 
    # Prepare.
    mkdir -p $backup_dir
    chmod 700 $backup_dir
 
    # Backup.
    influxd backup -portable -database $db -host localhost:8088 $backup_dir/$db
 
    # Prune.
    find $backup_dir/$db -type f -mtime +2 -delete
fi
scripts/pre-hooks/dump-selections
#!/bin/bash
 
backup_dir=/var/backups/selections
 
dpkg --get-selections > $backup_dir

Script de post install

debian/postinst
#!/bin/sh
# postinst script for backup-chapril
#
# see: dh_installdeb(1)
 
# summary of how this script can be called:
#        * <postinst> `configure' <most-recently-configured-version>
#        * <old-postinst> `abort-upgrade' <new version>
#        * <conflictor's-postinst> `abort-remove' `in-favour' <package>
#          <new-version>
#        * <postinst> `abort-remove'
#        * <deconfigured's-postinst> `abort-deconfigure' `in-favour'
#          <failed-install-package> <version> `removing'
#          <conflicting-package> <version>
# for details, see https://www.debian.org/doc/debian-policy/ or
# the debian-policy package
 
 
case "$1" in
    configure)
 
    backup_host="backup@backup.chapril.org"
 
    err=1
    # on teste si ya une connectivité ssh et s'il faut initialiser le dépot
    ssh -p 2242 -A $backup_host -o BatchMode=yes true
 
    if [ 0 -eq $? ]
    then
      # si oui on teste s'il faut initier le dépot
      borg_bin="/usr/bin/borg"
      export BORG_RSH="ssh -p 2242 -A"
      backup_dest="$backup_host:/srv/backups/`hostname --fqdn`"
      $borg_bin list $backup_dest
      if [ $? -ne 0 ]
      then
        # si il faut on initie le dépot
        $borg_bin init --encryption none $backup_dest
        if [ 0 -eq $? ]
        then
          echo " ############################################################ "
          echo " #                       Dépot initialisé                   # "
          echo " ############################################################ "
          err=0
        fi
      else
        echo "Dépot déjà initialisé"
        err=0
      fi
    fi
    if [ 0 -ne $err ]
    then
      # si non on indique comment initier le dépot
      borg_bin="/usr/bin/borg"
      backup_dest="$backup_host:/srv/backups/`hostname --fqdn`"
      echo " ############################################################ "
      echo " #  Impossible de vérifier et/ou d'initialiser le dépot.    # "
      echo " #                                                          # "
      echo " #            Vérifier la connectivité SSH :                # "
      echo " #         ssh -p 2242 -A backup@backup.chapril.org         # "
      echo " #                                                          # "
      echo " # Puis initialisez le dépot à la main :                    # "
      echo BORG_RSH=\"ssh -p 2242 -A\" $borg_bin init --encryption none $backup_dest
      echo " #                                                          # "
      echo " ############################################################ "
    fi
 
    ;;
 
    abort-upgrade|abort-remove|abort-deconfigure)
    ;;
 
    *)
        echo "postinst called with unknown argument \`$1'" >&2
        exit 1
    ;;
esac
 
# dh_installdeb will replace this with shell code automatically
# generated by other debhelper scripts.
 
#DEBHELPER#
 
exit 0

Rsyslog

/etc/rsyslog.d/borgmatic.conf
if $programname == 'borgmatic' then /var/log/borgmatic.log
& stop

Log rotate

debian/borgmatic
/var/log/borgmatic.log
{
  rotate 6
  weekly
  compress
  missingok
  notifempty
}

Configuration de l'hote

C'est surtout du ssh.

__felicette__/etc/ssh/authorized_keys/backup
command="borg serve --restrict-to-path /srv/backups/dns.cluster.chapril.org",no-pty,no-agent-forwarding,no-port-forwarding,no-X11-forwarding,no-user-rc ssh-ed25519 ... root@dns.cluster.chapril.org
command="borg serve --restrict-to-path /srv/backups/admin.cluster.chapril.org",no-pty,no-agent-forwarding,no-port-forwarding,no-X11-forwarding,no-user-rc ssh-ed25519 ... root@admin.cluster.chapril.org
command="borg serve --restrict-to-path /srv/backups/mail.cluster.chapril.org",no-pty,no-agent-forwarding,no-port-forwarding,no-X11-forwarding,no-user-rc ssh-ed25519 ... root@mail.cluster.chapril.org
...

Configuration du monitoring

On a un script qui parse sur chaque machine le log de backup et qui est déployé par le paquet monitoring-plugins-chapril :

/usr/lib/nagios/plugins/check_borgmatic
#!/usr/bin/env python3
 
import datetime, itertools, os, re
 
now = datetime.datetime.now(datetime.timezone.utc)
max_backup_delay = datetime.timedelta(1, 7200)
 
def get_name(match):
    return match.group('name')
 
def check_backup(filename):
    with open(filename) as f:
        logs = f.read()
        mixed_statuses = list(re.finditer(r'(?P<status>Succeeded|Failed) (?P<name>\w+) backup at (?P<date>\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d\+\d\d:\d\d)$', logs, re.MULTILINE))
        for name, statuses in itertools.groupby(sorted(mixed_statuses, key=get_name), key=get_name):
            last = sorted(statuses, key=lambda x: x.group('date'))[-1]
            print('{name}: {status} at {date}'.format(**last.groupdict()))
            last_date = datetime.datetime.fromisoformat(last.group('date'))
            last_status = last.group('status')
            if last_status != 'Succeeded' or now - last_date > max_backup_delay:
                failure.append(name)
 
failure = []
try:
    check_backup ("/var/log/borgmatic.log")
except Exception:
    check_backup ("/var/log/borgmatic.log.1")
 
if failure:
    exit (1)
else:
    exit (0)

Et la conf icinga2 :

__admin__/etc/icinga2/zones.d/global-templates/services/backups.conf
object CheckCommand "backup" {
	command = [ "sudo", PluginDir + "/check_borgmatic" ]
}
 
apply Service "Backup " {
  import "generic-service"
 
  check_command = "backup"
  command_endpoint = host.vars.client_endpoint
 
  assign where host.address && !host.vars.external
}

Aspects contrôle d'intégrité

On contrôle directement chaque nuit sur la machine où les backups sont stockés (Félicette).

Script de contrôle

__felicette__/srv/bin/check_backup.sh
#! /bin/bash
 
logger="/var/log/check_backup.log"
borg_bin="/usr/bin/borg"
backup_dest="/srv/backups/"
 
 
echo ======================================================================== 	>> $logger
echo "                           New backup check" 				                >> $logger
echo ======================================================================== 	>> $logger
date 						>> $logger
echo "" 					>> $logger
 
cd $backup_dest
 
for repository in $(ls -d $backup_dest/*$(hostname -d))
do
 
    echo "==  Checking $repository"		>> $logger
    date 						        >> $logger
    echo "" 					        >> $logger
 
    $borg_bin check $repository    2>&1 >> $logger
    rc=$?
    if [[ $rc != 0 ]]; then exit $rc; fi
done
 
echo "" 					>> $logger
date 						>> $logger
echo Returned $rc 				>> $logger
echo ========================================================================   >> $logger
 
exit $rc

Entrée Cron

__felicette__/etc/cron.d/check_backup
00 4 * * * root bash /srv/bin/check_backup.sh

Log rotate

__felicette__/etc/logrotate.d/check-backup
/var/log/check_backup.log {
        weekly
        rotate 52
        compress
        delaycompress
        missingok
        notifempty
        create 644 backup backup
}

Configuration du monitoring

On a un script qui parse sur la machine le log de check_backup :

__felicette__/usr/local/lib/nagios/plugins/check_check_backup
#!/usr/bin/env python
# -*- encoding:utf8 -*-
 
import datetime, os, re, locale
 
today= datetime.datetime.now ()
max_backup_delay = datetime.timedelta (1, 7200)
 
def last_backup (log_file):
    with open(log_file) as s:
        logs_ok = re.findall (r'^([ a-zéûA-Z:,0-9]*)( \(UTC\+0[12]00\))?\nReturned 0\n={30}', s.read (), re.MULTILINE)[-1][0]
        print "Last backup check : " + logs_ok
        try:
            return datetime.datetime.strptime (logs_ok, '%a %b %d %X %Z %Y')
        except:
            locale.setlocale(locale.LC_ALL, 'fr_FR.UTF-8')
            return datetime.datetime.strptime (logs_ok, '%A %d %B %Y, %X')
 
try:
    last_backup_date= last_backup ("/var/log/check_backup.log")
except:
    last_backup_date= last_backup ("/var/log/check_backup.log.1")
 
if today - last_backup_date < max_backup_delay:
    exit (0)
else:
    exit (1)

Et la conf icinga2 :

__admin__/etc/icinga2/zones.d/global-templates/services/backups.conf
object CheckCommand "check_backup" {
    command = [ "/usr/local/lib/nagios/plugins/check_check_backup" ]
}
__admin__/etc/icinga2/zones.d/master/galanga/icinga2.conf
/* Backup checks */
apply Service "Check Backup " {
  import "generic-service"
 
  check_command = "check_backup"
  command_endpoint = host.vars.client_endpoint
 
  assign where host.name == "felicette.cluster.chapril.org"
}