Monitoring & alerts
External uptime probes, app.unhealthy / app.recovered, top error paths
Percher pings every paid app from outside the cluster on a schedule and tells you when external probes start failing. Distinct from crash diagnostics: the watchdog watches the container process; uptime monitoring watches whether the app is actually reachable from the internet (DNS, Caddy, TLS, 5xx). Both can fire for the same incident.
Probe cadence per plan
- Free — disabled (no external probes)
- Starter — every 5 minutes
- Maker — every minute
- Pro — every 30 seconds
The probe fetches the app's primary URL (verified custom domain if any, otherwise the *.percher.run subdomain) with an 8-second timeout. HEAD requests fall back to GET for hosts that 405 on HEAD. Anything that returns 2xx or 3xx counts as up; 4xx and 5xx and network errors count as down.
Outage detection
A single bad probe is not an outage. Three consecutive failures flip the app to down and fire app.unhealthy. A subsequent successful probe flips back to up and fires app.recovered — but only when the matching unhealthy was actually delivered (per channel, paired). A 15-minute cooldown gates re-firing during flapping. The Health tab shows live status, 24h/7d/30d uptime %, the latency sparkline, and a 7-day outage log.
Webhook payload — app.unhealthy
{
"type": "app.unhealthy",
"id": "wh_abc123",
"timestamp": 1735689600000,
"data": {
"appId": "app_xyz",
"appName": "my-app",
"url": "https://my-app.percher.run",
"consecutiveFailures": 3,
"statusCode": 502,
"error": "bad gateway",
"latencyMs": 8000
}
}Webhook payload — app.recovered
{
"type": "app.recovered",
"id": "wh_def456",
"timestamp": 1735689900000,
"data": {
"appId": "app_xyz",
"appName": "my-app",
"url": "https://my-app.percher.run",
"statusCode": 200,
"latencyMs": 47
}
}Per-route 5xx tracking
The Analytics tab's Top error pathspanel shows which routes returned 4xx/5xx, sorted by 5xx count, with errorRate% computed from the per-route request total. Aggregated daily from Caddy access logs and re-summed across the selected window — a route that fails consistently every day for a month rises to the top of the 30-day view, even if it's never the worst on any single day. Updates every 15 minutes.