glossary.categories.devopsAcronyme

MTTR

Mean Time To Repair / Mean Time To Recovery

Également connu sous le nom de: Mean Time To RecoveryMean Repair Time

Mean Time To Repair/Recovery - the average time to restore service after a failure.

Définition

Mean Time To Repair (or Recovery) is a key reliability metric that measures the average time taken to restore a system to operational status after a failure. MTTR starts when a failure is detected and ends when normal service is restored. A lower MTTR indicates more efficient incident response and recovery processes. MTTR is one of the four key DevOps metrics (alongside deployment frequency, lead time, and change failure rate) used to measure software delivery performance.

Exemples

MTTR Calculation

How to calculate MTTR from incident data.

// MTTR Calculation Example
const incidents = [
  { duration: 45 },  // 45 minutes
  { duration: 30 },  // 30 minutes
  { duration: 60 },  // 60 minutes
  { duration: 15 },  // 15 minutes
];

const totalDowntime = incidents.reduce((sum, i) => sum + i.duration, 0);
const mttr = totalDowntime / incidents.length;
// MTTR = 150 / 4 = 37.5 minutes

Cas d'usage

Measuring incident response effectiveness
SRE team performance tracking
Identifying process improvement opportunities
Capacity planning for on-call teams

Bonnes pratiques

  • Automate detection to reduce time-to-detect
  • Maintain runbooks for common failure scenarios
  • Practice incident response through game days
  • Implement automated recovery where possible
  • Track MTTR trends over time to measure improvement

FAQ

Mettez vos connaissances sur MTTR en pratique

Commencez à surveiller votre infrastructure avec WizStatus.

Aucune carte de crédit requise • 20 monitors gratuits pour toujours