Cron Monitoring
Cron jobs fail quietly unless you design monitoring around the schedule. Good cron monitoring checks whether the job should have run, whether it actually ran, whether it finished, and whether the result was healthy enough to trust.
Preview next run timesWhat to monitor
Heartbeat received
The job started or finished and sent a ping within the expected schedule window.
Missed execution
No heartbeat arrived after the schedule plus a grace period, which usually means cron did not run or the host was unavailable.
Failed execution
The job ran but exited with a non-zero status, threw an exception, or wrote an explicit failure event.
Slow execution
The job completed, but took longer than its normal runtime budget and may overlap future executions.
Production cron monitoring checklist
- Define the exact cron expression and timezone for every production job.
- Send a heartbeat when the job starts and another when it finishes.
- Alert on missed finishes, not only failed process exits.
- Keep logs with job name, schedule, host, start time, end time, and exit status.
- Set a grace period that matches the job's normal runtime plus operational buffer.
- Document ownership, escalation paths, and manual recovery steps.
Recommended alert windows
| Job type | Suggested grace period | Escalation |
|---|---|---|
| Frequent health sync | 1-2 missed intervals | Notify Slack or email first |
| Nightly backup | Runtime plus 30-60 minutes | Page if recovery point objective is at risk |
| Billing or ETL job | Runtime plus downstream SLA buffer | Page owner and create incident ticket |
Platform notes
- Linux cron needs explicit logging because failed jobs can disappear into local mail or syslog.
- Kubernetes CronJobs need monitoring for missed starts, failed Pods, and concurrency policy side effects.
- AWS EventBridge rules should be paired with target failure metrics, dead-letter queues, and CloudWatch alarms.
- Quartz jobs should track trigger misfires, thread pool starvation, and application-level exceptions.