Archive for the ‘Condor’ Category

Tip: notification = never

October 8, 2012

By default, the condor_schedd will notify you, via email, when your job completes. This is a handy feature when running a few jobs, but can become overwhelming if you are running many jobs.

It can even turn into a problem if you are being notified at a mailbox you do not monitor.

# df /var
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/...             233747128 215868920   5813032  98% /

# du -a /var | sort -n -r | head -n 4
150436072       /var
111752396       /var/spool
111706452       /var/spool/mail
108702404       /var/spool/mail/matt

Yes, that’s ~105GB of job completion notification emails. All ignored. Oops.

The email notification feature is controlled on a per job basis by the notification command in a job’s submit file. See man condor_submit. To not get email notification, set it to NEVER, e.g.

$ echo queue | condor_submit -a cmd=/bin/hostname -a notification=never

If you are a pool administrator and want to change the default from COMPLETE to NEVER use the SUBMIT_EXPRS configuration parameter.

Notification = NEVER
SUBMIT_EXPRS = $(SUBMIT_EXPRS) Notification

Users will still be able to override the configured default by putting notification = complete|always|error in their submit files.

Keep those disks clean.

Advertisements

Partitionable slot utilization

October 1, 2012

There are already ways to get pool utilization information on a macro level. Until Condor 7.8 and the introduction of TotalSlot{Cpus,Memory,Disk}, there were no good ways to get utilization on a micro level. At least not with only the standard command-line tools.

Resource data is available per slot. Getting macro, pool utilization always requires aggregating data across multiple slots. Getting micro, slot utilization should not.

In a pool using partitionable slots, you can now get per slot utilization from the slot itself. There is no need for any extra tooling to perform aggregation or correlation. This means condor_status can directly provide utilization information on the micro, per slot level.

$ echo "           Name  Cpus Avail Util%  Memory Avail Util%"
$ condor_status -constraint "PartitionableSlot =?= TRUE" -format "%15s" Name -format "%6d" TotalSlotCpus -format "%6d" Cpus -format "%5d%%" "((TotalSlotCpus - Cpus) / (TotalSlotCpus * 1.0)) * 100" -format "%8d" TotalSlotMemory -format "%6d" Memory -format "%5d%%" "((TotalSlotMemory - Memory) / (TotalSlotMemory * 1.0)) * 100" -format "\n" TRUE 

           Name  Cpus Avail Util%  Memory Avail Util%
  slot1@eeyore0    16    12   25%   65536 48128   26%
  slot2@eeyore0    16    14   12%   65536 58368   10%
  slot1@eeyore1    16    12   25%   65536 40960   37%
  slot2@eeyore1    16    15    6%   65536 62464    4%

This is especially useful when machines are configured into combinations of multiple partitionable slots or partitionable and static slots.

Someday the pool utilization script should be integrated with condor_status.

Advanced scheduling: Execute in the future with job deferral

September 24, 2012

One advanced scheduling feature of Condor is the ability to set a time, in the future, when a job should be run. This is called a deferral time.

Using the deferral_time command, you simply specify a time, in seconds since EPOCH, when your job should run:

executable = /bin/date
log = deferral.log
output = deferral.out
error = deferral.err

deferral_time = 1357016400

queue

Use date(1) to generate the deferral_time.

$ date -d @1357016400
Tue Jan  1 00:00:00 EST 2013
$ date +%s -d "2013-01-01 00:00:00"
1357016400

After submitting the job and waiting until 1 Jan 2013, you can see the result by looking in deferral.log and deferral.out.

$ grep ^00 deferral.log
000 (001.000.000) 08/15 22:33:00 Job submitted from host: <127.0.0.1:56006>
001 (001.000.000) 01/01 00:00:00 Job executing on host: <127.0.0.2:57590>
006 (001.000.000) 01/01 00:00:00 Image size of job updated: 75
005 (001.000.000) 01/01 00:00:00 Job terminated.

$ cat deferral.out
Tue Jan  1 00:00:00 EST 2013

Of course there is no guarantee that a resource will be available at a precise time in the future. A job that does not run at its deferral_time will be put on Hold for manual intervention.

To reduce the likelihood of missing the deferral_time and needing manual intervention, the deferral_prep_time and deferral_window commands are available. Respectively, they specify the amount of time before the deferral_time that the job can be matched with a resource and how long after the deferral_time execution is acceptable.

executable = /bin/date
log = deferral.log
output = deferral.out
error = deferral.err

deferral_time = 1357016400

# 1 day = 24 hour * 60 min * 60 sec = 86,400 seconds
# 1/2 day = 86,400 sec / 2 = 43,200 seconds
deferral_prep_time = 86400
deferral_window = 43200

queue

In the example above, the job may be matched to a resource, where it will keep the resource Claimed/Busy for up to a day (deferral_prep_time) in advance of its actual run. This will make it more likely that the job will run at precisely the deferral_time. It also means that for accounting purposes, you will be charged for using the resource, though the job has not yet run.

Additionally, if the job is not matched or otherwise does not start at precisely deferral_time, it has half a day (deferral_window) to run before it is put on hold for manual intervention.

That’s it.

MATLAB jobs and Condor

September 17, 2012

The question of how to run MATLAB jobs with Condor comes up from time to time and there is no central location for the knowledge.

If you want to use MATLAB with Condor,

Read http://sysadminhelp.cms.caltech.edu/AOK/HTC/, it provides an informative overview and detailed example, including how to use mcc to create a standalone executable.

Or read http://www.liv.ac.uk/csd/escience/condor/matlab/

Or read http://www.lehigh.edu/computing/hpc/running/condor_matlab.html

Possibly read https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToRunMatlab

Likely just enter “MATLAB condor” into your favorite search engine and you will find useful results.

The Owner state

September 11, 2012

What is the Owner state? Why are my slots in the Owner state? Why do jobs not start immediately after I restart Condor? How do I keep slots from going into the Owner state?

$ condor_status
Name               OpSys      Arch   *State*  Activity LoadAv Mem   ActvtyTime
eeyore.local       LINUX      X86_64 *Owner*  Idle     0.480  7783  0+00:00:04
                     Machines Owner Claimed Unclaimed Matched Preempting
        X86_64/LINUX        1     1       0         0       0          0
               Total        1     1       0         0       0          0

All these questions require understanding a bit of Condor policy. Specifically, the START policy on execute nodes.

In Condor, the decision to run a job on a resource, a.k.a. a slot, is made by both the job and the resource. The job specifies the requirements it has for a resource, e.g. Memory > 1024. And, the resource specifies the requirements it has for jobs, e.g. MemoryUsage < Memory. If both do not agree, the job will not be run on the resource.

First, the Owner state is an indication that the resource is not available to run jobs. The resource is in use by its Owner. Historically, this has been someone at the resource’s console. Generally, think of Owner meaning Unavailable.

Second, the resource specifies its requirements using the START configuration parameter.

$ condor_config_val -v START
START: ( (KeyboardIdle > 15 * 60) && ( ((LoadAvg - CondorLoadAvg) <= 0.3) || (State != "Unclaimed" && State != "Owner")) )
  Defined in '/etc/condor/condor_config', line 754.

When the START policy evaluates to False, i.e. starting jobs is not allowed, the resource will appear in the Owner state.

Note that the default START policy does not care about aspects of the jobs that might run. It cares entirely about aspects of the resource itself. It is designed to protect a user at the console (KeyboardIdle) and other processes already running on the machine (LoadAvg – CondorLoadAvg).

Third, jobs may not start immediately on restart because the KeyboardIdle timer has been reset to 0. This means waiting 15 minutes for the START policy, specifically KeyboardIdle > 15 * 60, to evaluate to True and the resource to become available.

Finally, on dedicated resources, which are very common, the KeyboardIdle component of the START policy can be removed. In fact, it is quite common to simply set START = TRUE.

Maintaining tight accounting group quota usage with preemption

September 4, 2012

Many metrics in a distributed system should be viewed over time. However, desires are often for instantaneous views. Take for example usage data.

Usage data in Condor is commonly viewed through accounting groups. Accounting groups can be given quotas that specify what percent of a pool each can use.

Instead of viewing accounting group usage summed over a week or a day or even hours, it is common to want any snapshot of usage to match configured quotas. At the same time, it is common to want accounting groups to exceed their quota if resources are available. These two desires are at odds.

Erik Erlandson has written about how to use preemption to maintain tight accounting group quota usage.

Instantaneous views should not be relied on, but with preemption it is possible to reduce the window you sum over.

Now, if preemption is not an option, the desire to see usage snapshots matching configured quotas must be abandoned.

Pool utilization and schedd statistic graphs

June 22, 2012

Assuming you are gathering pool utilization and schedd statistics, you might be able to see something like this,

Queue depth and job rates

This graph is for a single schedd and may show queue depth’s, a.k.a. the number of jobs waiting in the queue, impact on job submission, start and completion rates. The submission rate is on the top. The start and completion rates overlap, which is good. I say may show because there are other factors involved that have not been ruled out, such as other processes on the system that started to run it out of memory. Note that the base rate is a function of job duration and number of available slots. Despite having hundreds of slots, the max rate is quite low because the jobs were minutes long.

Over this nine day period, as the queue grew to 1.8 million jobs, the utilization remained above 95%,

Pool utilization

Wallaby: Skeleton Group

June 19, 2012

Read about Wallaby’s Skeleton Group feature. Working similar to /etc/skel for accounts on a single system, it provides base configuration to nodes as they join a pool. It is especially useful for pools with dynamic and opportunistic resources.

Schedd stats with OpenSTDB

June 14, 2012

Building on Pool utilization with OpenSTDB, the condor_schedd also advertises a plethora of useful statistics that can be harvested with condor_status.

Make the metrics,

$ tsdb mkmetric condor.schedd.jobs.idle condor.schedd.jobs.running condor.schedd.jobs.held condor.schedd.jobs.mean_runtime condor.schedd.jobs.mean_waittime condor.schedd.jobs.historical.mean_runtime condor.schedd.jobs.historical.mean_waittime condor.schedd.jobs.submission_rate condor.schedd.jobs.start_rate condor.schedd.jobs.completion_rate
...

Obtain schedd_stats_for_opentsdb.sh and run,

$ while true; do ./schedd_stats_for_opentsdb.sh; sleep 15; done | nc -w 30 tsdb-host 4242 &

View the results at,

http://tsdb-host:4242/#start=1h-ago&m=sum:condor.schedd.jobs.idle&o=&m=sum:condor.schedd.jobs.running&o=&m=sum:condor.schedd.jobs.held&o=&m=sum:10m-avg:condor.schedd.jobs.submission_rate&o=axis%20x1y2&m=sum:10m-avg:condor.schedd.jobs.start_rate&o=axis%20x1y2&m=sum:10m-avg:condor.schedd.jobs.completion_rate&o=axis%20x1y2&ylabel=jobs&y2label=rates&yrange=%5B0:%5D&y2range=%5B0:%5D&key=out%20center%20top%20horiz%20box

Pool utilization with OpenSTDB

June 12, 2012

Merging the pool utilization script with OpenTSDB.

Once you have followed the stellar OpenTSDB Getting Started guide, make the metrics with,

$ tsdb mkmetric condor.pool.slots.unavail condor.pool.slots.avail condor.pool.slots.total condor.pool.slots.used condor.pool.slots.used_of_avail condor.pool.slots.used_of_total condor.pool.cpus.unavail condor.pool.cpus.avail condor.pool.cpus.total condor.pool.cpus.used condor.pool.cpus.used_of_avail condor.pool.cpus.used_of_total condor.pool.memory.unavail condor.pool.memory.avail condor.pool.memory.total condor.pool.memory.used condor.pool.memory.used_of_avail condor.pool.memory.used_of_total
metrics condor.pool.slots.unavail: [0, 0, 1]
metrics condor.pool.slots.avail: [0, 0, 2]
metrics condor.pool.slots.total: [0, 0, 3]
...
metrics condor.pool.memory.used_of_total: [0, 0, 17]

Obtain utilization_for_opentsdb.sh before running,

$ while true; do ./utilization_for_opentsdb.sh; sleep 15; done | nc -w 30 tsdb-host 4242 &

View the results at,

http://tsdb-host:4242/#start=1h-ago&m=sum:condor.pool.cpus.total&o=&m=sum:condor.pool.cpus.used&o=&m=sum:condor.pool.cpus.used_of_avail&o=axis%20x1y2&ylabel=cpus&y2label=%2525+utilization&yrange=%5B0:%5D&y2range=%5B0:1%5D&key=out%20center%20top%20horiz%20box

The number of statistics about operating Condor pools has been growing over the years. All are easily retrieved via condor_status for feeding into OpenTSDB.


%d bloggers like this: