REST API Health Check Endpoints: Design and Implementation

Health check endpoints are a fundamental component of production-ready REST APIs. They provide a standardized way for monitoring systems, load balancers, and orchestration platforms to assess service health.

A well-designed health check endpoint does more than return 200 OK. It provides meaningful information about the service's ability to handle requests and its dependency health.

What Are Health Check Endpoints?

Health check endpoints are dedicated API routes that report the operational status of a service. They serve multiple purposes:

Load balancers use them to determine which instances can receive traffic
Orchestration systems use them to manage container lifecycle
Monitoring systems use them to track availability and alert on issues

Common Health Check Patterns

Pattern	Endpoint	Purpose
Liveness	`/health/live` or `/healthz`	Is the process running?
Readiness	`/health/ready`	Can it handle requests?
Deep health	`/health/complete`	Are all dependencies healthy?

These endpoints typically return JSON responses with status fields. HTTP status codes convey the overall health (200 for healthy, 503 for unhealthy), while response bodies provide diagnostic details.

Why Health Check Design Matters

Avoiding Operational Problems

Poor health check implementation causes real operational problems:

An overly aggressive health check that fails during temporary database hiccups might cause all instances to be removed from load balancing simultaneously, creating a complete outage from a recoverable situation.

Conversely, a too-simple health check that always returns 200 might keep routing traffic to an instance that's running but unable to serve requests.

Kubernetes Integration

Health checks are critical for container orchestration. Kubernetes uses:

Liveness probes: To restart unhealthy containers
Readiness probes: To control traffic routing

Incorrect probe configuration is a leading cause of deployment issues and service disruptions.

Debugging and Documentation

Health endpoints serve as documentation and debugging aids. A detailed health response helps operators quickly identify which component is causing issues without digging through logs.

How to Design Health Check Endpoints

Implement Two Core Endpoints

At minimum, implement liveness and readiness endpoints.

Liveness Endpoint

The liveness endpoint (/health/live) should return quickly with minimal logic:

app.get('/health/live', (req, res) => {
  res.status(200).json({ status: 'alive' });
});

Avoid dependency checks in liveness probes. This endpoint answers: "Should this container be killed and restarted?"

Readiness Endpoint

The readiness endpoint (/health/ready) checks whether the service can handle actual requests:

app.get('/health/ready', async (req, res) => {
  try {
    await db.ping();
    await cache.ping();
    res.status(200).json({ status: 'ready' });
  } catch (error) {
    res.status(503).json({ status: 'not_ready', error: error.message });
  }
});

Return 503 if any critical dependency is unavailable.

Implement Detailed Health Checks

For comprehensive monitoring, implement a detailed health endpoint:

{
  "status": "healthy",
  "timestamp": "2025-01-17T10:30:00Z",
  "version": "1.2.3",
  "components": {
    "database": {
      "status": "healthy",
      "latency_ms": 5
    },
    "cache": {
      "status": "healthy",
      "latency_ms": 1
    },
    "external_api": {
      "status": "degraded",
      "latency_ms": 500
    }
  }
}

Consider Partial Health States

An API might function with degraded cache performance or when a non-critical external service is unavailable. Design health responses that distinguish between:

Healthy: All systems operational
Degraded: Functional but with reduced performance
Unhealthy: Unable to serve requests

Health Check Best Practices

Keep Liveness Probes Fast

Liveness probes should:

Complete in under 100ms
Never fail due to dependency issues
Avoid external calls

A slow or flaky liveness probe causes unnecessary container restarts that worsen, rather than improve, service stability.

Implement Timeouts for Readiness Checks

A database check that hangs indefinitely makes your health endpoint unreliable:

const checkDatabase = async () => {
  const timeout = new Promise((_, reject) =>
    setTimeout(() => reject(new Error('Timeout')), 2000)
  );
  return Promise.race([db.ping(), timeout]);
};

Set reasonable timeouts (1-2 seconds) and treat timeout as unhealthy.

Cache Dependency Results

For deep health endpoints, cache results briefly to avoid hammering dependencies:

let cachedHealth = null;
let cacheTime = 0;

app.get('/health/complete', async (req, res) => {
  const now = Date.now();
  if (cachedHealth && now - cacheTime < 5000) {
    return res.json(cachedHealth);
  }

  cachedHealth = await checkAllDependencies();
  cacheTime = now;
  res.json(cachedHealth);
});

A 5-10 second cache is usually appropriate.

Include Version Information

Include version information in health responses to help diagnose deployment-related issues:

{
  "status": "healthy",
  "version": "2.3.1",
  "git_sha": "abc123f"
}

Never Expose Sensitive Information

Avoid revealing internal IPs, credentials, detailed error messages, or other information that could aid attackers. Health endpoints should be informative for operators without being exploitable.

Configure Kubernetes Probes Appropriately

livenessProbe:
  httpGet:
    path: /health/live
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  failureThreshold: 3
  timeoutSeconds: 5

readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 3
  timeoutSeconds: 3

Key settings:

initialDelaySeconds: Give the application time to start
periodSeconds and failureThreshold: Avoid reacting to transient issues
timeoutSeconds: Ensure it exceeds your worst-case response time

Conclusion

Well-designed health check endpoints are essential infrastructure for operating reliable REST APIs. By implementing thoughtful liveness, readiness, and detailed health endpoints, you enable automated systems to make good decisions about routing and lifecycle management.

Key Takeaways

Invest time in getting health checks right
Test health check behavior under various failure scenarios
Tune probe configurations based on actual application behavior

The health check infrastructure you build will be exercised constantly in production, making it some of the most critical code in your service.