PagerDuty is the industry standard for on-call management and incident response. Integrating it with your uptime monitoring ensures critical alerts reach the right people, even at 3 AM. Here's how to set it up properly.
Why PagerDuty?
- On-call management - Schedules, rotations, overrides
- Smart escalations - If one person doesn't respond, alert another
- Multi-channel delivery - Phone, SMS, push, email
- Incident tracking - Timeline, acknowledgment, resolution
- Analytics - Response times, alert patterns
Integration Setup
Step 1: Create a Service
- Log into PagerDuty
- Go to Services ā Service Directory
- Click "+ New Service"
- Name it (e.g., "Production Uptime Monitoring")
- Select escalation policy
- Choose integration type
Step 2: Add Integration
For most monitoring tools:
- In Service, go to Integrations tab
- Click "+ Add Integration"
- Select your monitoring tool or "Events API v2"
- Copy the Integration Key (routing key)
Step 3: Configure Monitoring Tool
In your monitoring service:
- Go to notification settings
- Add PagerDuty integration
- Paste the Integration Key
- Map severity levels
Escalation Policies
Basic Escalation
Level 1 (0 min): Primary On-Call
Level 2 (15 min): Secondary On-Call
Level 3 (30 min): Engineering Manager
Level 4 (45 min): VP Engineering
Creating Escalation Policy
- Go to People ā Escalation Policies
- Click "+ New Escalation Policy"
- Add levels with targets and delays
- Assign to services
Best Practices
- Short initial timeout - 10-15 minutes for critical
- Multiple escalation levels - At least 3
- Clear ownership - Someone is always responsible
- Regular updates - Review quarterly
On-Call Schedules
Setting Up Schedules
- Go to People ā On-Call Schedules
- Click "+ New Schedule"
- Configure rotation:
- Rotation type (daily, weekly)
- Handoff time
- Team members
Schedule Types
Weekly Rotation:
Week 1: Alice
Week 2: Bob
Week 3: Carol
Week 4: Alice...
Follow-the-Sun:
Americas (8 AM - 4 PM PT): Team A
EMEA (8 AM - 4 PM GMT): Team B
APAC (8 AM - 4 PM JST): Team C
Overrides
For planned absences:
- Go to schedule
- Click on the time slot
- Create override with replacement
Alert Routing
Severity Mapping
Map monitoring severity to PagerDuty:
| Monitor Status | PagerDuty Severity | Action |
|---|---|---|
| Down | Critical | Page immediately |
| Degraded | Error | Page after 5 min |
| Warning | Warning | Create incident, no page |
| Info | Info | Log only |
Service Dependencies
If Service A depends on Service B, configure:
- Service B alerts trigger first
- Service A alerts suppressed if B is down
Incident Workflows
Acknowledgment
When alerted:
- Acknowledge within SLA (e.g., 5 min)
- Acknowledgment stops escalation
- Snooze if needed (with reason)
Resolution
After fixing:
- Resolve incident in PagerDuty
- Monitoring should auto-resolve too
- Add notes for postmortem
Auto-Resolution
Configure your monitoring to resolve PagerDuty incidents when service recovers:
{
"routing_key": "...",
"dedup_key": "monitor-123",
"event_action": "resolve"
}
Event Payload
Trigger Event
{
"routing_key": "YOUR_INTEGRATION_KEY",
"event_action": "trigger",
"dedup_key": "production-api-down",
"payload": {
"summary": "Production API is DOWN",
"severity": "critical",
"source": "WizStatus Monitoring",
"custom_details": {
"monitor": "Production API",
"url": "https://api.example.com",
"error": "HTTP 500",
"duration": "5 minutes"
}
},
"links": [{
"href": "https://dashboard.wizstatus.com/monitors/123",
"text": "View in Dashboard"
}]
}
Resolve Event
{
"routing_key": "YOUR_INTEGRATION_KEY",
"event_action": "resolve",
"dedup_key": "production-api-down"
}
Deduplication
Use dedup_key to prevent duplicate incidents:
- Same dedup_key = same incident
- New triggers update existing incident
- Resolve closes the specific incident
Good dedup keys:
production-api-http-checkssl-cert-expiry-example-commonitor-12345
Testing Integration
Send Test Alert
Most monitoring tools have "Send Test Alert" button.
Or use API directly:
curl -X POST https://events.pagerduty.com/v2/enqueue \
-H "Content-Type: application/json" \
-d '{
"routing_key": "YOUR_KEY",
"event_action": "trigger",
"dedup_key": "test-alert-001",
"payload": {
"summary": "Test alert from monitoring",
"severity": "info",
"source": "Test"
}
}'
Verify Escalation
- Trigger test incident
- Don't acknowledge
- Wait for escalation timeout
- Verify next level is notified
Common Configurations
Small Team (2-3 people)
- Simple weekly rotation
- 2-level escalation
- SMS + push notifications
Medium Team (5-10 people)
- Weekly rotation with primary/secondary
- 3-level escalation
- Different schedules for business/after-hours
Large Team (10+ people)
- Follow-the-sun schedules
- Tiered escalation by severity
- Multiple services per team
- Dedicated incident commander
Troubleshooting
Alerts Not Arriving
- Check Integration Key is correct
- Verify service is not in maintenance mode
- Check user notification rules
- Review PagerDuty event logs
Wrong Person Alerted
- Check schedule is current
- Verify no overrides
- Check escalation policy assignment
- Review user time zone settings
Too Many Alerts
- Review deduplication settings
- Check for flapping monitors
- Adjust thresholds
- Consider suppression rules
Integration Checklist
- PagerDuty service created
- Integration key added to monitoring
- Escalation policy configured
- On-call schedule set up
- Severity mapping configured
- Auto-resolution enabled
- Test alert sent successfully
- Escalation tested
- Team notified of on-call duties