Advanced scheduling: Execute periodically with cron jobs

If you want to run a job periodically you could repeatedly submit jobs, or qedit existing jobs after they run, but both of those options are a kludge. Instead, the condor_schedd provides support for cron-like jobs as a first-class citizen.

The cron-like feature builds on the ability to defer job execution. However, instead of using deferral_time, commands analogous to crontab(5) fields are available. cron_month, cron_day_of_month, cron_day_of_week, cron_hour, and cron_minute all behave as you would expect, and default to * when not provided.

To run a job every two minutes,

executable = /bin/date
log = cron.log
output = cron.out
error = cron.err

cron_minute = 0-59/2
on_exit_remove = false

queue

Note – on_exit_remove = false is required or the job will only be run once. It is arguable that on_exit_remove should default to false for jobs using cron_* commands.

After submitting and waiting 10 minutes, results can be found in the cron.log file.

$ grep ^00 cron.log
000 (009.000.000) 09/09 09:22:46 Job submitted from host: <127.0.0.1:56639>
001 (009.000.000) 09/09 09:24:00 Job executing on host: <127.0.0.1:45887>
006 (009.000.000) 09/09 09:24:00 Image size of job updated: 75
004 (009.000.000) 09/09 09:24:00 Job was evicted.
001 (009.000.000) 09/09 09:26:00 Job executing on host: <127.0.0.1:45887>
004 (009.000.000) 09/09 09:26:00 Job was evicted.
001 (009.000.000) 09/09 09:28:00 Job executing on host: <127.0.0.1:45887>
004 (009.000.000) 09/09 09:28:00 Job was evicted.
001 (009.000.000) 09/09 09:30:00 Job executing on host: <127.0.0.1:45887>
004 (009.000.000) 09/09 09:30:00 Job was evicted.
001 (009.000.000) 09/09 09:32:00 Job executing on host: <127.0.0.1:45887>
004 (009.000.000) 09/09 09:32:01 Job was evicted.

Note – the job appears to be evicted instead of terminated. What really happens is the job remains in the queue on termination. This is arguably a poor choice of wording in the log.

Just like for job deferral, there is no guarantee resources will be available at exactly the right time to run the job. cron_prep_time and cron_window provide a means to introduce tolerance.

Common question: What happens when a job takes longer than the time between defined starts, i.e. job takes 30 minutes to complete and is set to be run every 15 minutes?

Answer: The job will run serially. It will not stack up. The job does not need to serialize itself.

Note – a common complication, arguably a bug, which occurs only in pools with little or no new jobs being submitted, is that matchmaking must happen in time for the job dispatch. The Schedd does not publish a new Submitter Ad for the cron job’s owner when the job completes. This means that submitter ad the Negotiator sees may have zero idle jobs, resulting in no new match being handed out to dispatch the job on the next time it is set to execute.

Enjoy.

Advertisements

Tags: , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: