Custom ClassAd attributes in Condor: FreeMemoryMB via STARTD_CRON

ClassAds enable a lot of power in Condor. ClassAds are the data representation used for most everything in Condor. Generally, a ClassAd is a set of name-value pairs. Values are typed, and include the usual suspects plus expressions. In Condor, jobs, users, slots, daemons, etc. all have a ClassAd representation with their own set of attributes, e.g. a job has a Requirements attribute whose value is an expression such as FreeMemoryMB > 1024, and a slot has a Memory attribute whose value is the amount of memory allocated to the slot.

ClassAds are schema-free, which means they can be arbitrarily extended. Any given ad can have any attribute a user or administrator wants to put in it. Attributes are given meaning by when and where they are referenced.

On a slot there are different classes of attributes. Resource statistics, such as Disk and LoadAvg; resource properties, such as NumCpus and HasVM; policy expressions, such as Requirements or Rank; etc.

One attribute not provided by default on a slot is a statistic representing the total amount of free memory on the system. There are multiple interpretations of what free memory might really mean. On Linux, it could be the number in the free column on the Mem: row reported by the free program, e.g.

$ free -m
             total       used       free     shared    buffers     cached
Mem:          2016       1790        225          0         93        845
-/+ buffers/cache:        851       1165
Swap:         1983          0       1983

or it might be the value on the buffers/cache line. It might be some combination of totalram, freeram, sharedram, bufferedram as reported by sysinfo(2). It might also include information about totalswap and freeswap, also from sysinfo(2).

The meaning is often a function of what kinds of policies are desired in a Condor deployment.

Once you have picked a meaning for your deployment, Condor provides you with the STARTD_CRON mechanism to include a FreeMemoryMB attribute in your slot ads. From there the attribute can be referenced by policy on jobs, during negotiation, and slot policy.

You need two things: first, a way to calculate the value for FreeMemoryMB, we’ll use the simple bash script below that pulls FreeMem: out of /proc/meminfo; second, configuration available to the condor_startd to run the program.


FREE_MEMORY_KB=$(grep ^MemFree < /proc/meminfo | awk '{print $2}')
echo "FreeMemoryMB = $((FREE_MEMORY_KB / 1024))"



Notes: First, the documentation is out of sync with 7.4.0 and the units on _PERIOD must be specified; second, UPDATE_INTERVAL needs to be defined, it specifies how often the condor_startd will periodically send updates to the Collector.

After reconfiguring the Startd, or just restarting condor, you can view the new attribute with condor_status:

$ condor_status -long  | grep ^FreeMemoryMB | sort | uniq -c
     10 FreeMemoryMB = 174

$ free -m
             total       used       free     shared    buffers     cached
Mem:          2016       1842        173          0        101        874
-/+ buffers/cache:        866       1149
Swap:         1983          0       1983

Yes, the values are different between FreeMemoryMB and free. The amount of free memory is changing constantly and we are just sampling it. You can increase the sampling rate, but beware that means you will generate more frequent updates to the Collector. Maybe not a problem when you have 32 machines and 256 slots, but definitely something to consider when you have 3000 machines and 24000 slots.

Final note: A better name for the attribute representing free memory on a system is TotalFreeMemoryMB to remain consistent with other attributes. For instance, Disk is a slot’s share of the TotalDisk free on the system.


