Cron Monitoring
Reliability
DevOps

Cron Monitoring

Cron jobs fail quietly unless you design monitoring around the schedule. Good cron monitoring checks whether the job should have run, whether it actually ran, whether it finished, and whether the result was healthy enough to trust.

Preview next run times

What to monitor

Heartbeat received

The job started or finished and sent a ping within the expected schedule window.

Missed execution

No heartbeat arrived after the schedule plus a grace period, which usually means cron did not run or the host was unavailable.

Failed execution

The job ran but exited with a non-zero status, threw an exception, or wrote an explicit failure event.

Slow execution

The job completed, but took longer than its normal runtime budget and may overlap future executions.

Production cron monitoring checklist

  • Define the exact cron expression and timezone for every production job.
  • Send a heartbeat when the job starts and another when it finishes.
  • Alert on missed finishes, not only failed process exits.
  • Keep logs with job name, schedule, host, start time, end time, and exit status.
  • Set a grace period that matches the job's normal runtime plus operational buffer.
  • Document ownership, escalation paths, and manual recovery steps.

Recommended alert windows

Job typeSuggested grace periodEscalation
Frequent health sync1-2 missed intervalsNotify Slack or email first
Nightly backupRuntime plus 30-60 minutesPage if recovery point objective is at risk
Billing or ETL jobRuntime plus downstream SLA bufferPage owner and create incident ticket

Platform notes

  • Linux cron needs explicit logging because failed jobs can disappear into local mail or syslog.
  • Kubernetes CronJobs need monitoring for missed starts, failed Pods, and concurrency policy side effects.
  • AWS EventBridge rules should be paired with target failure metrics, dead-letter queues, and CloudWatch alarms.
  • Quartz jobs should track trigger misfires, thread pool starvation, and application-level exceptions.

Useful next steps