PagerDuty Integration: Escalation Policies for Incidents

PagerDuty is the industry standard for on-call management and incident response. Integrating it with your uptime monitoring ensures critical alerts reach the right people, even at 3 AM. Here's how to set it up properly.

Why PagerDuty?

On-call management - Schedules, rotations, overrides
Smart escalations - If one person doesn't respond, alert another
Multi-channel delivery - Phone, SMS, push, email
Incident tracking - Timeline, acknowledgment, resolution
Analytics - Response times, alert patterns

Integration Setup

Step 1: Create a Service

Log into PagerDuty
Go to Services → Service Directory
Click "+ New Service"
Name it (e.g., "Production Uptime Monitoring")
Select escalation policy
Choose integration type

Step 2: Add Integration

For most monitoring tools:

In Service, go to Integrations tab
Click "+ Add Integration"
Select your monitoring tool or "Events API v2"
Copy the Integration Key (routing key)

Step 3: Configure Monitoring Tool

In your monitoring service:

Go to notification settings
Add PagerDuty integration
Paste the Integration Key
Map severity levels

Escalation Policies

Basic Escalation

Level 1 (0 min):  Primary On-Call
Level 2 (15 min): Secondary On-Call
Level 3 (30 min): Engineering Manager
Level 4 (45 min): VP Engineering

Creating Escalation Policy

Go to People → Escalation Policies
Click "+ New Escalation Policy"
Add levels with targets and delays
Assign to services

Best Practices

Short initial timeout - 10-15 minutes for critical
Multiple escalation levels - At least 3
Clear ownership - Someone is always responsible
Regular updates - Review quarterly

On-Call Schedules

Setting Up Schedules

Go to People → On-Call Schedules
Click "+ New Schedule"
Configure rotation:
- Rotation type (daily, weekly)
- Handoff time
- Team members

Schedule Types

Weekly Rotation:

Week 1: Alice
Week 2: Bob
Week 3: Carol
Week 4: Alice...

Follow-the-Sun:

Americas (8 AM - 4 PM PT): Team A
EMEA (8 AM - 4 PM GMT): Team B
APAC (8 AM - 4 PM JST): Team C

Overrides

For planned absences:

Go to schedule
Click on the time slot
Create override with replacement

Alert Routing

Severity Mapping

Map monitoring severity to PagerDuty:

Monitor Status	PagerDuty Severity	Action
Down	Critical	Page immediately
Degraded	Error	Page after 5 min
Warning	Warning	Create incident, no page
Info	Info	Log only

Service Dependencies

If Service A depends on Service B, configure:

Service B alerts trigger first
Service A alerts suppressed if B is down

Incident Workflows

Acknowledgment

When alerted:

Acknowledge within SLA (e.g., 5 min)
Acknowledgment stops escalation
Snooze if needed (with reason)

Resolution

After fixing:

Resolve incident in PagerDuty
Monitoring should auto-resolve too
Add notes for postmortem

Auto-Resolution

Configure your monitoring to resolve PagerDuty incidents when service recovers:

{
  "routing_key": "...",
  "dedup_key": "monitor-123",
  "event_action": "resolve"
}

Event Payload

Trigger Event

{
  "routing_key": "YOUR_INTEGRATION_KEY",
  "event_action": "trigger",
  "dedup_key": "production-api-down",
  "payload": {
    "summary": "Production API is DOWN",
    "severity": "critical",
    "source": "WizStatus Monitoring",
    "custom_details": {
      "monitor": "Production API",
      "url": "https://api.example.com",
      "error": "HTTP 500",
      "duration": "5 minutes"
    }
  },
  "links": [{
    "href": "https://dashboard.wizstatus.com/monitors/123",
    "text": "View in Dashboard"
  }]
}

Resolve Event

{
  "routing_key": "YOUR_INTEGRATION_KEY",
  "event_action": "resolve",
  "dedup_key": "production-api-down"
}

Deduplication

Use dedup_key to prevent duplicate incidents:

Same dedup_key = same incident
New triggers update existing incident
Resolve closes the specific incident

Good dedup keys:

production-api-http-check
ssl-cert-expiry-example-com
monitor-12345

Testing Integration

Send Test Alert

Most monitoring tools have "Send Test Alert" button.

Or use API directly:

curl -X POST https://events.pagerduty.com/v2/enqueue \
  -H "Content-Type: application/json" \
  -d '{
    "routing_key": "YOUR_KEY",
    "event_action": "trigger",
    "dedup_key": "test-alert-001",
    "payload": {
      "summary": "Test alert from monitoring",
      "severity": "info",
      "source": "Test"
    }
  }'