Cron Monitoring

Cron jobs fail quietly unless you design monitoring around the schedule. Good cron monitoring checks whether the job should have run, whether it actually ran, whether it finished, and whether the result was healthy enough to trust.

Preview next run times

What to monitor

Heartbeat received

The job started or finished and sent a ping within the expected schedule window.

Missed execution

No heartbeat arrived after the schedule plus a grace period, which usually means cron did not run or the host was unavailable.

Failed execution

The job ran but exited with a non-zero status, threw an exception, or wrote an explicit failure event.

Slow execution

The job completed, but took longer than its normal runtime budget and may overlap future executions.

Production cron monitoring checklist

Define the exact cron expression and timezone for every production job.
Send a heartbeat when the job starts and another when it finishes.
Alert on missed finishes, not only failed process exits.
Keep logs with job name, schedule, host, start time, end time, and exit status.
Set a grace period that matches the job's normal runtime plus operational buffer.
Document ownership, escalation paths, and manual recovery steps.

Recommended alert windows

Job type	Suggested grace period	Escalation
Frequent health sync	1-2 missed intervals	Notify Slack or email first
Nightly backup	Runtime plus 30-60 minutes	Page if recovery point objective is at risk
Billing or ETL job	Runtime plus downstream SLA buffer	Page owner and create incident ticket

Platform notes

Linux cron needs explicit logging because failed jobs can disappear into local mail or syslog.
Kubernetes CronJobs need monitoring for missed starts, failed Pods, and concurrency policy side effects.
AWS EventBridge rules should be paired with target failure metrics, dead-letter queues, and CloudWatch alarms.
Quartz jobs should track trigger misfires, thread pool starvation, and application-level exceptions.

Cron Explainer