Best PracticesJanuary 31, 2026 10 min read

How to Monitor Backup Jobs and Get Alerts on Failure

Set up reliable monitoring for your database and file backups. Get instant alerts when backup jobs fail, run too long, or don't run at all.

WizStatus Team
Author

Backups are only useful if they actually run. Too many organizations discover their backup failed only when they need to restore. Here's how to set up reliable monitoring for all your backup jobs.

The Backup Monitoring Problem

Backup failures are uniquely dangerous because:

  1. They're silent - No user notices when a backup doesn't run
  2. They're infrequent - Daily or weekly jobs are easy to forget
  3. They're critical - Discovered too late, the consequences are severe
  4. They're assumed - "The backup runs every night... right?"

Without monitoring, failed backups go unnoticed for days, weeks, or until disaster strikes.

What to Monitor

Backup Completion

Did the backup job finish successfully?

#!/bin/bash
pg_dump database > backup.sql
if [ $? -eq 0 ]; then
  curl https://wizstatus.com/ping/backup-complete
fi

Backup File Validity

Did it actually produce a valid backup?

# Check file exists and has size
if [ -s backup.sql ] && [ $(stat -f%z backup.sql) -gt 1000 ]; then
  curl https://wizstatus.com/ping/backup-valid
fi

Backup Duration

Is it taking longer than expected?

START=$(date +%s)
pg_dump database > backup.sql
END=$(date +%s)
DURATION=$((END - START))

curl "https://wizstatus.com/ping/backup?duration=$DURATION"

Storage Space

Is there room for backups?

USED=$(df /backup | tail -1 | awk '{print $5}' | sed 's/%//')
if [ $USED -gt 90 ]; then
  # Alert separately for disk space
  echo "Backup disk nearly full: $USED%"
fi

Database Backup Monitoring

PostgreSQL

#!/bin/bash
# postgres-backup.sh

BACKUP_DIR="/backup/postgres"
DATE=$(date +%Y%m%d)
BACKUP_FILE="$BACKUP_DIR/prod-$DATE.sql.gz"

# Create backup
pg_dump production | gzip > "$BACKUP_FILE"

# Verify
if [ $? -eq 0 ] && [ -s "$BACKUP_FILE" ]; then
  # Test backup integrity
  gunzip -t "$BACKUP_FILE"
  if [ $? -eq 0 ]; then
    curl -fsS https://wizstatus.com/ping/postgres-backup
  fi
fi

MySQL/MariaDB

#!/bin/bash
# mysql-backup.sh

mysqldump --all-databases | gzip > /backup/mysql-$(date +%Y%m%d).sql.gz

if [ ${PIPESTATUS[0]} -eq 0 ] && [ -s /backup/mysql-$(date +%Y%m%d).sql.gz ]; then
  curl -fsS https://wizstatus.com/ping/mysql-backup
fi

MongoDB

#!/bin/bash
# mongo-backup.sh

mongodump --out=/backup/mongo-$(date +%Y%m%d)

if [ $? -eq 0 ]; then
  curl -fsS https://wizstatus.com/ping/mongo-backup
fi

File Backup Monitoring

Rsync Backups

#!/bin/bash
# rsync-backup.sh

rsync -avz --delete /data/ /backup/data/

if [ $? -eq 0 ]; then
  curl -fsS https://wizstatus.com/ping/file-backup
fi

Restic/Borg Backups

#!/bin/bash
# restic-backup.sh

restic backup /important-data

if [ $? -eq 0 ]; then
  restic check
  if [ $? -eq 0 ]; then
    curl -fsS https://wizstatus.com/ping/restic-backup
  fi
fi

Cloud Backup Monitoring

AWS S3 Backup

#!/bin/bash
# s3-backup.sh

aws s3 sync /data s3://my-backup-bucket/$(date +%Y%m%d)/

if [ $? -eq 0 ]; then
  curl -fsS https://wizstatus.com/ping/s3-backup
fi

Monitoring Cloud Backup Services

For services like AWS Backup, Azure Backup, or GCP:

  1. Use cloud-native events (SNS, Event Grid)
  2. Create a webhook endpoint that pings your monitor
  3. Or poll backup status via API

Setting Up Heartbeat Monitoring

Step 1: Create Monitors

For each backup job:

  1. Create a heartbeat monitor
  2. Set the expected schedule (e.g., "Daily at 2:00 AM")
  3. Set grace period (backup duration + 30 minutes)

Step 2: Configure Alerts

Backup failures are critical:

  • Email to backup admin
  • Slack to ops channel
  • SMS for database backups
  • PagerDuty for production systems

Step 3: Add Pings to Scripts

Modify each backup script to ping on success:

# At the end of successful backup
curl -fsS --retry 3 https://wizstatus.com/ping/YOUR-TOKEN

Verification Beyond Monitoring

Monitoring confirms the job ran. Also verify:

Test Restores

Periodically restore backups to verify they work:

#!/bin/bash
# Monthly restore test

createdb test_restore
pg_restore -d test_restore /backup/latest.dump

if [ $? -eq 0 ]; then
  # Run some validation queries
  psql -d test_restore -c "SELECT count(*) FROM users"
  dropdb test_restore
  curl https://wizstatus.com/ping/restore-test
fi

Backup Size Monitoring

Alert if backup size changes dramatically:

TODAY_SIZE=$(stat -f%z today-backup.sql)
YESTERDAY_SIZE=$(stat -f%z yesterday-backup.sql)

CHANGE=$(( (TODAY_SIZE - YESTERDAY_SIZE) * 100 / YESTERDAY_SIZE ))

if [ $CHANGE -lt -50 ] || [ $CHANGE -gt 100 ]; then
  echo "Backup size changed by ${CHANGE}%"
  # Send alert about unusual change
fi

Common Backup Failures

Disk Full

# Check before starting
AVAILABLE=$(df /backup | tail -1 | awk '{print $4}')
REQUIRED=10000000  # 10GB in KB

if [ $AVAILABLE -lt $REQUIRED ]; then
  echo "Insufficient space for backup"
  exit 1
fi

Permission Errors

# Test write access
touch /backup/.write-test
if [ $? -ne 0 ]; then
  echo "Cannot write to backup directory"
  exit 1
fi
rm /backup/.write-test

Network Timeouts

# For remote backups, check connectivity first
if ! nc -z backup-server 22 2>/dev/null; then
  echo "Cannot reach backup server"
  exit 1
fi

Best Practices

1. Ping Only on Verified Success

Don't just check exit code. Verify the backup:

  • File exists
  • File has reasonable size
  • File passes integrity check (gunzip -t, etc.)

2. Keep Backup Logs

exec > /var/log/backup-$(date +%Y%m%d).log 2>&1
# All output goes to log

3. Alert on Long Duration

If backup usually takes 30 minutes but takes 3 hours, investigate.

4. Separate Monitoring per Environment

Production and staging backups should have separate monitors.

5. Document Retention Policy

Know how many backups you keep and verify rotation works.

Monitoring Checklist

  • All databases have backup monitoring
  • All file backups have monitoring
  • Heartbeat monitors match backup schedules
  • Grace periods account for backup duration
  • Critical alerts go to on-call
  • Monthly restore tests are scheduled
  • Backup logs are retained
  • Disk space is monitored separately
Never discover a failed backup when you need to restore. Set up heartbeat monitoring for all your backup jobs with WizStatus.

Related Articles

How to Monitor Cron Jobs: Step-by-Step Guide
Tutorials

How to Monitor Cron Jobs: Step-by-Step Guide

Learn how to set up monitoring for your cron jobs. Get alerts when scheduled tasks fail, run too long, or don't run at all.
10 min read
Dead Man's Switch: Ensure Critical Jobs Never Fail Silently
Monitoring

Dead Man's Switch: Ensure Critical Jobs Never Fail Silently

Understand dead man's switch monitoring for critical systems. Learn how to implement fail-safe alerting for jobs that must run reliably.
9 min read
ETL Pipeline Monitoring: Detect Silent Failures
DevOps

ETL Pipeline Monitoring: Detect Silent Failures

Monitor your ETL pipelines with heartbeat checks. Get alerts when data pipelines fail, run too long, or produce unexpected results.
11 min read

Start monitoring your infrastructure today

Put these insights into practice with WizStatus monitoring.

Try WizStatus Free