Posts Tagged ‘ClassAds’

Configuration and policy evaluation

December 10, 2012

Figuring out how evaluation happens in configuration and policy is a common problem. The confusion is justified.

Configuration provides substitution with $() syntax, while policy is full ClassAd language evaluation without $() syntax.

Configuration is all the parameters listed in files discoverable with condor_config_val -config.

$ condor_config_val -config
Configuration source:
	/etc/condor/condor_config
Local configuration sources:
	/etc/condor/config.d/00personal_condor.config

Policy is the ClassAd expression found on the right-hand side of specific configuration parameters. For instance,

$ condor_config_val -v START
START: ( (KeyboardIdle > 15 * 60) && ( ((LoadAvg - CondorLoadAvg) <= 0.3) || (State != "Unclaimed" && State != "Owner")) )
  Defined in '/etc/condor/condor_config', line 753.

Configuration evaluation allows for substitution of configuration parameters with $().

$ cat /etc/condor/condor_config | head -n753 | tail -n1
START			= $(UWCS_START)

$ condor_config_val -v UWCS_START
UWCS_START: ( (KeyboardIdle > 15 * 60) && ( ((LoadAvg - CondorLoadAvg) <= 0.3) || (State != "Unclaimed" && State != "Owner")) )
  Defined in '/etc/condor/condor_config', line 808.

$ cat /etc/condor/condor_config | head -n808 | tail -n3
UWCS_START	= ( (KeyboardIdle > $(StartIdleTime)) \
                    && ( $(CPUIdle) || \
                         (State != "Unclaimed" && State != "Owner")) )

Here START is actually the value of UWCS_START, provided by $(UWCS_START).

The substitution is recursive. Explore /etc/condor/condor_config and the JustCPU parameter. It is actually a parameter that is never read by daemons or tools. It is only useful in other configuration parameters. It’s shorthand.

Policy evaluation is full ClassAd expression evaluation. The evaluation happens at the appropriate times while daemons or tools are running.

Taking START as an example, the words KeyboardIdle, LoadAvg, CondorLoadAvg, State are attributes found on machine ads, and it is evaluated by the condor_startd and condor_negotiator to figure out if a job is allowed to start on a resource.

$ condor_status -l slot1@eeyore.local | grep -e ^KeyboardIdle -e ^LoadAvg -e ^CondorLoadAvg -e ^State
KeyboardIdle = 0
LoadAvg = 0.290000
CondorLoadAvg = 0.0
State = "Owner"

Evaluation happens by recursively evaluating those attributes. The expression ((KeyboardIdle > 15 * 60) && (((LoadAvg - CondorLoadAvg) <= 0.3) || (State != "Unclaimed" && State != "Owner"))) becomes ((0 > 15 * 60) && (((0.29 - 0.0) <= 0.3) || ("Owner" != "Unclaimed" && "Owner" != "Owner"))). And so forth.

That’s it.

Custom resource attributes: Facter

November 29, 2011

Condor provides a large set of attributes, facts, about resources for scheduling and querying, but it does not provide everything possible. Instead, there is a mechanism to extend the set. Previously, we added FreeMemoryMB. The set can also be extend with information from Facter.

Facter provides an extensible set of facts about a system. To include facter facts we need a means to translate them into attributes and add to Startd configuration.

$ facter
...
architecture => x86_64
domain => local
facterversion => 1.5.9
hardwareisa => x86_64
hardwaremodel => x86_64
physicalprocessorcount => 1
processor0 => Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz
selinux => true
selinux_config_mode => enforcing
swapfree => 3.98 GB
swapsize => 4.00 GB
...

The facts are of the form name => value, not very far off from ClassAd attributes. A simple script to convert all the facts into attribute with string values is,

/usr/libexec/condor/facter.sh

#!/bin/sh
type facter &> /dev/null || exit 1
facter | sed 's/\([^ ]*\) => \(.*\)/facter_\1 = "\2"/'
$ facter.sh
...
facter_architecture = "x86_64"
facter_domain = "local"
facter_facterversion = "1.5.9"
facter_hardwareisa = "x86_64"
facter_hardwaremodel = "x86_64"
facter_physicalprocessorcount = "1"
facter_processor0 = "Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz"
facter_selinux = "true"
facter_selinux_config_mode = "enforcing"
facter_swapfree = "3.98 GB"
facter_swapsize = "4.00 GB"
...

And the configuration, simply dropped into /etc/condor/config.d,

/etc/condor/config.d/49facter.config

FACTER = /usr/libexec/condor/facter.sh
STARTD_CRON_JOBLIST = $(STARTD_CRON_JOBLIST) FACTER
STARTD_CRON_FACTER_EXECUTABLE = $(FACTER)
STARTD_CRON_FACTER_PERIOD = 300

A condor_reconfig and the facter facts will be available,

$ condor_status -long | grep ^facter
...
facter_architecture = "x86_64"
facter_facterversion = "1.5.9"
facter_domain = "local"
facter_swapfree = "3.98 GB"
facter_selinux = "true"
facter_hardwaremodel = "x86_64"
facter_selinux_config_mode = "enforcing"
facter_processor0 = "Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz"
facter_selinux_mode = "targeted"
facter_hardwareisa = "x86_64"
facter_swapsize = "4.00 GB"
facter_physicalprocessorcount = "1"
...

For scheduling, just use the facter information in job requierments, e.g. requirements = facter_selinux == "true".

Or, query your pool to see what resources are not running selinux,

$ condor_status -const 'facter_selinux == "false"'
Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime
eeyore.local       LINUX      X86_64 Unclaimed Idle     0.030  3760  0+00:12:31
                     Machines Owner Claimed Unclaimed Matched Preempting
        X86_64/LINUX        1     0       0         1       0          0
               Total        1     0       0         1       0          0

Oops.

Cap job runtime: Debugging periodic job policy in a Condor pool

December 5, 2009

Job policy includes a set of periodic expressions; PeriodicHold, PeriodicRelease, and PeriodicRemove. Periodic expressions on a job are evaluated in the context of the job’s ad. They should evaluate to a boolean; if PeriodicRemove is TRUE, then remove the job. They are evaluated by the Schedd and a Shadow. The Schedd evaluates them as frequently as PERIODIC_EXPR_INTERVAL, which defaults to 60 seconds in condor 7.4 and 300 before. The Shadow evaluates them periodically, based on PERIODIC_EXPR_INTERVAL, and every time it receives an update from the Starter running the job. Updates occur periodically, controlled by STARTER_UPDATE_INTERVAL on the Starter.

Say you want to put a job on hold if it runs for more than 60 seconds. To do this you need job policy and two points of reference; the start time of the job, and the current time.

The job policy you want to use is PeriodicHold.

For the two time references you need to look at a job’s ad to see what is available. You can see a job’s ad with condor_q -long. The start time is available as JobCurrentStartDate, seconds since Epoch, and it would appear that ServerTime is the current time, seconds since Epoch.

Here’s a job you might write:

executable = /bin/sleep
arguments =  29m
notification = never
periodic_hold = (ServerTime - JobCurrentStartDate) >= 60
queue 1

If you submit it and wait a minute you will see it is on Hold.

$ condor_q

-- Submitter: robin.local :  : robin.local
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
   1.0   matt           11/5  19:03   0+00:00:19 H  0   2.0  sleep 29m         

1 jobs; 0 idle, 0 running, 1 held

But, it only ran for 19 seconds. Check why.

$ condor_q -hold

-- Submitter: robin.local :  : robin.local
 ID      OWNER           HELD_SINCE HOLD_REASON                   
   1.0   matt           11/5  19:04 The UNKNOWN (never set) PeriodicHold expres

1 jobs; 0 idle, 0 running, 1 held

The hold reason is truncated, but you can use condor_q -long to see all of it.

$ condor_q -long | grep ^HoldReason
HoldReason = "The UNKNOWN (never set) PeriodicHold expression '' evaluated to UNDEFINED"
HoldReasonCode = 5
HoldReasonSubCode = 0

This is saying that something went wrong and your PeriodicHold expression evaluated to UNDEFINED. You wanted it to evaluate to a boolean. The way expression evaluation in ClassAds happens, if part of an expression evaluates to UNDEFINED then chances are the entire expression will. For instance, the expression A + 1 will evaluate to the value of the A attribute plus 1. When A is a number like 2, the expression evaluates to 2 + 1 => 3. When A is not defined, the expression evaluates to UNDEFINED + 1 => UNDEFINED.

There is a handy way to debug expression in Condor, use the debug() ClassAd function.

Your job becomes:

executable = /bin/sleep
arguments =  29m
notification = never
periodic_hold = debug((ServerTime - JobCurrentStartDate) >= 60)
queue 1

When you submit the job and look in either the SchedLog or ShadowLog you will see “Classad debug” messages.

ShadowLog:

...: Classad debug: ServerTime --> UNDEFINED
...: Classad debug: JobCurrentStartDate --> ERROR
...: Classad debug: JobCurrentStartDate --> 1257476798
...: Classad debug: debug((ServerTime - JobCurrentStartDate) >= 60) --> UNDEFINED

Very clearly, ServerTime is evaluating to UNDEFINED. That may seem strange because it looked like it was part of the job ad. However, ServerTime is somewhat special. Likely buggy. It is only added to a job ad in response to a query, e.g. condor_q. It is not actually part of the job ad. Annoying.

There is a solution. Another special attribute is CurrentTime. It is available to all expressions, but it is not a visible attribute on a job ad. Also annoying. You have to know it is there. Using it we can rewrite the job as follows.

executable = /bin/sleep
arguments =  29m
notification = never
periodic_hold = debug((CurrentTime - JobCurrentStartDate) >= 60)
queue 1

After submitting the job and waiting a minute you can see that it has been put on hold.

$ condor_q

-- Submitter: robin.local :  : robin.local
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
   3.0   matt           11/5  19:09   0+00:01:00 H  0   0.0  sleep 29m         

1 jobs; 0 idle, 0 running, 1 held

The details of the hold are also much more what you would expect.

$ condor_q -long | grep ^HoldReason
HoldReason = "The job attribute PeriodicHold expression 'debug((CurrentTime - JobCurrentStartDate) >= 60)' evaluated to TRUE"
HoldReasonCode = 3
HoldReasonSubCode = 0

In the ShadowLog you can see that CurrentTime is available as expected.

ShadowLog:

...: Classad debug: CurrentTime --> 1257477062
...: Classad debug: JobCurrentStartDate --> ERROR
...: Classad debug: JobCurrentStartDate --> 1257477002
...: Classad debug: debug((CurrentTime - JobCurrentStartDate) >= 60) --> 1

Remember to remove debug() from your expressions.

Custom ClassAd attributes in Condor: FreeMemoryMB via STARTD_CRON

November 17, 2009

ClassAds enable a lot of power in Condor. ClassAds are the data representation used for most everything in Condor. Generally, a ClassAd is a set of name-value pairs. Values are typed, and include the usual suspects plus expressions. In Condor, jobs, users, slots, daemons, etc. all have a ClassAd representation with their own set of attributes, e.g. a job has a Requirements attribute whose value is an expression such as FreeMemoryMB > 1024, and a slot has a Memory attribute whose value is the amount of memory allocated to the slot.

ClassAds are schema-free, which means they can be arbitrarily extended. Any given ad can have any attribute a user or administrator wants to put in it. Attributes are given meaning by when and where they are referenced.

On a slot there are different classes of attributes. Resource statistics, such as Disk and LoadAvg; resource properties, such as NumCpus and HasVM; policy expressions, such as Requirements or Rank; etc.

One attribute not provided by default on a slot is a statistic representing the total amount of free memory on the system. There are multiple interpretations of what free memory might really mean. On Linux, it could be the number in the free column on the Mem: row reported by the free program, e.g.

$ free -m
             total       used       free     shared    buffers     cached
Mem:          2016       1790        225          0         93        845
-/+ buffers/cache:        851       1165
Swap:         1983          0       1983

or it might be the value on the buffers/cache line. It might be some combination of totalram, freeram, sharedram, bufferedram as reported by sysinfo(2). It might also include information about totalswap and freeswap, also from sysinfo(2).

The meaning is often a function of what kinds of policies are desired in a Condor deployment.

Once you have picked a meaning for your deployment, Condor provides you with the STARTD_CRON mechanism to include a FreeMemoryMB attribute in your slot ads. From there the attribute can be referenced by policy on jobs, during negotiation, and slot policy.

You need two things: first, a way to calculate the value for FreeMemoryMB, we’ll use the simple bash script below that pulls FreeMem: out of /proc/meminfo; second, configuration available to the condor_startd to run the program.

free_memory_mb.sh:

#!/bin/sh

FREE_MEMORY_KB=$(grep ^MemFree < /proc/meminfo | awk '{print $2}')
echo "FreeMemoryMB = $((FREE_MEMORY_KB / 1024))"

condor_config.local:

STARTD_CRON_JOBLIST = FREE_MEMORY_MB
STARTD_CRON_FREE_MEMORY_MB_EXECUTABLE = $(LIBEXEC)/free_memory_mb.sh
STARTD_CRON_FREE_MEMORY_MB_PERIOD = $(UPDATE_INTERVAL)s

Notes: First, the documentation is out of sync with 7.4.0 and the units on _PERIOD must be specified; second, UPDATE_INTERVAL needs to be defined, it specifies how often the condor_startd will periodically send updates to the Collector.

After reconfiguring the Startd, or just restarting condor, you can view the new attribute with condor_status:

$ condor_status -long  | grep ^FreeMemoryMB | sort | uniq -c
     10 FreeMemoryMB = 174

$ free -m
             total       used       free     shared    buffers     cached
Mem:          2016       1842        173          0        101        874
-/+ buffers/cache:        866       1149
Swap:         1983          0       1983

Yes, the values are different between FreeMemoryMB and free. The amount of free memory is changing constantly and we are just sampling it. You can increase the sampling rate, but beware that means you will generate more frequent updates to the Collector. Maybe not a problem when you have 32 machines and 256 slots, but definitely something to consider when you have 3000 machines and 24000 slots.

Final note: A better name for the attribute representing free memory on a system is TotalFreeMemoryMB to remain consistent with other attributes. For instance, Disk is a slot’s share of the TotalDisk free on the system.


%d bloggers like this: