How to Monitor Cron Jobs: Step-by-Step Guide

Cron jobs are the backbone of automated tasks on Unix systems. But when they fail silently, you might not know until critical processes break. Here's how to set up reliable monitoring for all your cron jobs.

Why Monitor Cron Jobs?

Cron jobs fail silently. Common issues include:

Job never starts - Typo in crontab, wrong path
Job crashes - Runtime errors, missing dependencies
Job hangs - Infinite loops, deadlocks
Job runs but fails - Database errors, network issues
Server reboots - Cron doesn't run after reboot

Without monitoring, you discover these problems when users complain or data goes stale.

Method 1: Heartbeat Monitoring (Recommended)

The most reliable approach: your cron job pings a monitoring service on completion.

Step 1: Get Your Ping URL

Create a heartbeat monitor in your monitoring service. You'll receive a unique URL like:

text

https://wizstatus.com/ping/abc123

Step 2: Modify Your Cron Job

Add a ping after successful execution:

Before:

cron

0 2 * * * /home/user/backup.sh

After:

cron

0 2 * * * /home/user/backup.sh && curl -fsS --retry 3 https://wizstatus.com/ping/abc123

The && ensures the ping only runs if the script succeeds.

Step 3: Configure Expected Schedule

In your monitoring dashboard:

Set schedule: "Daily at 2:00 AM"
Set grace period: 30-60 minutes (depending on job duration)

Step 4: Test the Setup

Run the job manually and verify:

The ping appears in your monitoring dashboard
Manually fail the job and confirm you get an alert

Method 2: Wrapper Script

For complex jobs, create a wrapper:

bash

#!/bin/bash
# cron-wrapper.sh

PING_URL="$1"
shift
COMMAND="$@"

# Run the command
$COMMAND
EXIT_CODE=$?

# Ping only on success
if [ $EXIT_CODE -eq 0 ]; then
  curl -fsS --retry 3 "$PING_URL"
else
  echo "Job failed with exit code $EXIT_CODE"
fi

exit $EXIT_CODE

Usage:

cron

0 2 * * * /home/user/cron-wrapper.sh https://wizstatus.com/ping/abc123 /home/user/backup.sh

Method 3: Email + Log Monitoring

Traditional but less reliable approach:

cron

MAILTO=alerts@yourcompany.com
0 2 * * * /home/user/backup.sh 2>&1 | tee -a /var/log/backup.log

Drawbacks:

Email can be delayed or filtered
Doesn't catch jobs that don't run at all
Requires parsing logs

Cron Syntax Refresher

text

┌───────────── minute (0-59)
│ ┌───────────── hour (0-23)
│ │ ┌───────────── day of month (1-31)
│ │ │ ┌───────────── month (1-12)
│ │ │ │ ┌───────────── day of week (0-6, Sunday=0)
│ │ │ │ │
* * * * * command

Common schedules:

cron

* * * * *     # Every minute
0 * * * *     # Every hour
0 0 * * *     # Daily at midnight
0 2 * * *     # Daily at 2 AM
0 0 * * 0     # Weekly on Sunday
0 0 1 * *     # Monthly on the 1st

Monitoring Different Job Types

Backup Jobs

bash

#!/bin/bash
# backup.sh

pg_dump production > /backup/db-$(date +%Y%m%d).sql

if [ $? -eq 0 ]; then
  # Verify backup file exists and has size
  if [ -s /backup/db-$(date +%Y%m%d).sql ]; then
    curl -fsS https://wizstatus.com/ping/backup-token
  fi
fi

Report Generation

bash

#!/bin/bash
# daily-report.sh

python generate_report.py

if [ $? -eq 0 ] && [ -f /reports/daily-$(date +%Y%m%d).pdf ]; then
  curl -fsS https://wizstatus.com/ping/report-token
fi

Data Sync Jobs

bash

#!/bin/bash
# sync.sh

rsync -avz /source/ /destination/
RSYNC_EXIT=$?

if [ $RSYNC_EXIT -eq 0 ]; then
  curl -fsS https://wizstatus.com/ping/sync-token
else
  echo "Rsync failed with code $RSYNC_EXIT"
fi

Queue Processors

For continuous processors, ping periodically:

python

import time
import requests

PING_URL = "https://wizstatus.com/ping/queue-token"
PING_INTERVAL = 300  # 5 minutes

last_ping = 0

while True:
    process_next_job()

    if time.time() - last_ping > PING_INTERVAL:
        requests.get(PING_URL)
        last_ping = time.time()

Handling Job Duration

For jobs that might exceed the grace period:

Start/End Pings

bash

#!/bin/bash

# Ping start
curl -fsS https://wizstatus.com/ping/job-token/start

# Long running job
./long-backup-process.sh

# Ping complete
curl -fsS https://wizstatus.com/ping/job-token

Dynamic Grace Periods

Estimate job duration and set grace accordingly:

Short jobs (< 5 min): 10 minute grace
Medium jobs (5-30 min): 45 minute grace
Long jobs (30+ min): Job duration + 30 minutes

Common Issues and Solutions

Issue: Ping fails due to network

bash

# Add retries and timeout
curl -fsS --retry 3 --retry-delay 10 --max-time 30 "$PING_URL"

Issue: Job runs as root but curl isn't found

Use full path:

cron

0 2 * * * /path/to/backup.sh && /usr/bin/curl -fsS "$PING_URL"

Issue: Environment variables not available

Define in crontab or script:

cron

PATH=/usr/local/bin:/usr/bin:/bin
0 2 * * * /path/to/backup.sh && curl -fsS "$PING_URL"

Issue: Job output clutters logs

Redirect appropriately:

cron

0 2 * * * /path/to/job.sh > /var/log/job.log 2>&1 && curl -fsS "$PING_URL"

Testing Your Setup

Verify cron is running

bash

systemctl status cron
# or
service cron status

Test job execution

bash

# Run manually
/path/to/your/script.sh

# Check if ping was received
# (verify in monitoring dashboard)

Simulate failure

bash

# Temporarily break the script
exit 1

# Verify no ping is sent
# Verify alert is triggered after grace period

Checklist

Identified all cron jobs to monitor
Created heartbeat monitors for each job
Matched monitor schedule to cron schedule
Set appropriate grace periods
Modified cron entries to ping on success
Tested successful execution pings
Tested failure scenarios
Set up notification channels

Never let a cron job fail silently again. Set up heartbeat monitoring and get alerts within minutes when jobs don't complete.

How to Monitor Cron Jobs: Step-by-Step Guide

Why Monitor Cron Jobs?

Method 1: Heartbeat Monitoring (Recommended)

Step 1: Get Your Ping URL

Step 2: Modify Your Cron Job

Step 3: Configure Expected Schedule

Step 4: Test the Setup

Method 2: Wrapper Script

Method 3: Email + Log Monitoring

Cron Syntax Refresher

Monitoring Different Job Types

Backup Jobs

Report Generation

Data Sync Jobs

Queue Processors

Handling Job Duration

Start/End Pings

Dynamic Grace Periods

Common Issues and Solutions

Issue: Ping fails due to network

Issue: Job runs as root but curl isn't found

Issue: Environment variables not available

Issue: Job output clutters logs

Testing Your Setup

Verify cron is running

Test job execution

Simulate failure

Checklist

Related Articles

How to Monitor Backup Jobs and Get Alerts on Failure

Dead Man's Switch: Ensure Critical Jobs Never Fail Silently

ETL Pipeline Monitoring: Detect Silent Failures

Start monitoring your infrastructure today