Archive for October, 2012

Pre and Post job scripts

October 29, 2012

Condor has a few ways to run programs associated with a job, beyond the job itself. If you’re an administrator, you can use the USER_JOB_WRAPPER. If you’re a user who is friends with your administrator, you can use Job Hooks. If you are ambitious, you can wrap all your jobs in a script that runs programs before and after your actual job.

Or, you can use the PreCmd and PostCmd attributes on your job. They specify programs to run before and after your job executes. By example,

$ cat prepost.job
cmd = /bin/sleep
args = 1

log = prepost.log
output = prepost.out
error = prepost.err

+PreCmd = "pre_script"
+PostCmd = "post_script"

transfer_input_files = pre_script, post_script
should_transfer_files = always

queue
$ cat pre_script
#!/bin/sh
date > prepost.pre

$ cat post_script
#!/bin/sh
date > prepost.post

Running,

$ condor_submit prepost.job
Submitting job(s)
.
1 job(s) submitted to cluster 1.

...wait a few seconds, or 259...

$ cat prepost.pre
Sun Oct 14 18:06:00 UTC 2012

$ cat prepost.post
Sun Oct 14 18:06:02 UTC 2012

That’s about it, except for some gotchas.

  • transfer_input_files is manual and required
  • The scripts are run from Iwd, you can’t use +PreCmd=”/bin/blah”, instead +PreCmd=”blah” and transfer_input_files=/bin/blah
  • should_transfer_files = always, scripts are run from Iwd, if run local to the Schedd Iwd will be in the EXECUTE directory but the scripts won’t be
  • Script stdout/err and exit code are ignored
  • You must use +Attr=”” syntax, +PreCmd=pre_script won’t work
  • There is no option of arguments for the scripts
  • There is no starter environment, thus no $_CONDOR_JOB_AD/$_CONDOR_MACHINE_AD, but you can find .job_ad and .machine_ad in $_CONDOR_SCRATCH_DIR
  • Make sure the scripts are executable, otherwise the job will be put on hold with a reason similar to: Error from 127-0-0-1.NO_DNS: Failed to execute ‘…/dir_30626/pre_script’: Permission denied
  • PostCmd is broken in condor 7.6, but works in 7.8
Advertisements

Tip: ISO8601 dates in your logs

October 22, 2012

Condor produces internal data in both structured and unstructured forms.

The structured forms are just that and designed to be processed by external programs. These are the event logs (UserLog or EVENT_LOG), the HISTORY file and PER_JOB_HISTORY_DIR and POOL_HISTORY_DIR, and the job_queue.log and Accountantnew.log transaction logs.

The unstructured forms are for debugging and designed to be read by a person, often an experienced person. They are often called trace, or debug, logs and are the files in the LOG directory, or the extra output seen when passing -debug to command-line tools, i.e. condor_q -debug.

Consuming and processing the unstructured forms with external programs is increasingly important. Consider tracing incidents through a deployment of 50,000 geographically distributed, physical and virtual systems. Or, even 100 local systems.

More and more tools that provide the ability to aggregate unstructured logs are emerging and they all need to do some basic parsing of the logs. Help make their integration simpler and use a well defined format for timestamps.

For instance, ISO8601 –

DEBUG_TIME_FORMAT = "%Y-%m-%dT%H:%M:%S%z "

Advanced scheduling: Execute periodically with cron jobs

October 15, 2012

If you want to run a job periodically you could repeatedly submit jobs, or qedit existing jobs after they run, but both of those options are a kludge. Instead, the condor_schedd provides support for cron-like jobs as a first-class citizen.

The cron-like feature builds on the ability to defer job execution. However, instead of using deferral_time, commands analogous to crontab(5) fields are available. cron_month, cron_day_of_month, cron_day_of_week, cron_hour, and cron_minute all behave as you would expect, and default to * when not provided.

To run a job every two minutes,

executable = /bin/date
log = cron.log
output = cron.out
error = cron.err

cron_minute = 0-59/2
on_exit_remove = false

queue

Note – on_exit_remove = false is required or the job will only be run once. It is arguable that on_exit_remove should default to false for jobs using cron_* commands.

After submitting and waiting 10 minutes, results can be found in the cron.log file.

$ grep ^00 cron.log
000 (009.000.000) 09/09 09:22:46 Job submitted from host: <127.0.0.1:56639>
001 (009.000.000) 09/09 09:24:00 Job executing on host: <127.0.0.1:45887>
006 (009.000.000) 09/09 09:24:00 Image size of job updated: 75
004 (009.000.000) 09/09 09:24:00 Job was evicted.
001 (009.000.000) 09/09 09:26:00 Job executing on host: <127.0.0.1:45887>
004 (009.000.000) 09/09 09:26:00 Job was evicted.
001 (009.000.000) 09/09 09:28:00 Job executing on host: <127.0.0.1:45887>
004 (009.000.000) 09/09 09:28:00 Job was evicted.
001 (009.000.000) 09/09 09:30:00 Job executing on host: <127.0.0.1:45887>
004 (009.000.000) 09/09 09:30:00 Job was evicted.
001 (009.000.000) 09/09 09:32:00 Job executing on host: <127.0.0.1:45887>
004 (009.000.000) 09/09 09:32:01 Job was evicted.

Note – the job appears to be evicted instead of terminated. What really happens is the job remains in the queue on termination. This is arguably a poor choice of wording in the log.

Just like for job deferral, there is no guarantee resources will be available at exactly the right time to run the job. cron_prep_time and cron_window provide a means to introduce tolerance.

Common question: What happens when a job takes longer than the time between defined starts, i.e. job takes 30 minutes to complete and is set to be run every 15 minutes?

Answer: The job will run serially. It will not stack up. The job does not need to serialize itself.

Note – a common complication, arguably a bug, which occurs only in pools with little or no new jobs being submitted, is that matchmaking must happen in time for the job dispatch. The Schedd does not publish a new Submitter Ad for the cron job’s owner when the job completes. This means that submitter ad the Negotiator sees may have zero idle jobs, resulting in no new match being handed out to dispatch the job on the next time it is set to execute.

Enjoy.

Tip: notification = never

October 8, 2012

By default, the condor_schedd will notify you, via email, when your job completes. This is a handy feature when running a few jobs, but can become overwhelming if you are running many jobs.

It can even turn into a problem if you are being notified at a mailbox you do not monitor.

# df /var
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/...             233747128 215868920   5813032  98% /

# du -a /var | sort -n -r | head -n 4
150436072       /var
111752396       /var/spool
111706452       /var/spool/mail
108702404       /var/spool/mail/matt

Yes, that’s ~105GB of job completion notification emails. All ignored. Oops.

The email notification feature is controlled on a per job basis by the notification command in a job’s submit file. See man condor_submit. To not get email notification, set it to NEVER, e.g.

$ echo queue | condor_submit -a cmd=/bin/hostname -a notification=never

If you are a pool administrator and want to change the default from COMPLETE to NEVER use the SUBMIT_EXPRS configuration parameter.

Notification = NEVER
SUBMIT_EXPRS = $(SUBMIT_EXPRS) Notification

Users will still be able to override the configured default by putting notification = complete|always|error in their submit files.

Keep those disks clean.

Partitionable slot utilization

October 1, 2012

There are already ways to get pool utilization information on a macro level. Until Condor 7.8 and the introduction of TotalSlot{Cpus,Memory,Disk}, there were no good ways to get utilization on a micro level. At least not with only the standard command-line tools.

Resource data is available per slot. Getting macro, pool utilization always requires aggregating data across multiple slots. Getting micro, slot utilization should not.

In a pool using partitionable slots, you can now get per slot utilization from the slot itself. There is no need for any extra tooling to perform aggregation or correlation. This means condor_status can directly provide utilization information on the micro, per slot level.

$ echo "           Name  Cpus Avail Util%  Memory Avail Util%"
$ condor_status -constraint "PartitionableSlot =?= TRUE" -format "%15s" Name -format "%6d" TotalSlotCpus -format "%6d" Cpus -format "%5d%%" "((TotalSlotCpus - Cpus) / (TotalSlotCpus * 1.0)) * 100" -format "%8d" TotalSlotMemory -format "%6d" Memory -format "%5d%%" "((TotalSlotMemory - Memory) / (TotalSlotMemory * 1.0)) * 100" -format "\n" TRUE 

           Name  Cpus Avail Util%  Memory Avail Util%
  slot1@eeyore0    16    12   25%   65536 48128   26%
  slot2@eeyore0    16    14   12%   65536 58368   10%
  slot1@eeyore1    16    12   25%   65536 40960   37%
  slot2@eeyore1    16    15    6%   65536 62464    4%

This is especially useful when machines are configured into combinations of multiple partitionable slots or partitionable and static slots.

Someday the pool utilization script should be integrated with condor_status.


%d bloggers like this: