ClassAds enable a lot of power in Condor. ClassAds are the data representation used for most everything in Condor. Generally, a ClassAd is a set of name-value pairs. Values are typed, and include the usual suspects plus expressions. In Condor, jobs, users, slots, daemons, etc. all have a ClassAd representation with their own set of attributes, e.g. a job has a Requirements
attribute whose value is an expression such as FreeMemoryMB > 1024
, and a slot has a Memory
attribute whose value is the amount of memory allocated to the slot.
ClassAds are schema-free, which means they can be arbitrarily extended. Any given ad can have any attribute a user or administrator wants to put in it. Attributes are given meaning by when and where they are referenced.
On a slot there are different classes of attributes. Resource statistics, such as Disk
and LoadAvg
; resource properties, such as NumCpus
and HasVM
; policy expressions, such as Requirements
or Rank
; etc.
One attribute not provided by default on a slot is a statistic representing the total amount of free memory on the system. There are multiple interpretations of what free memory might really mean. On Linux, it could be the number in the free column on the Mem: row reported by the free
program, e.g.
$ free -m total used free shared buffers cached Mem: 2016 1790 225 0 93 845 -/+ buffers/cache: 851 1165 Swap: 1983 0 1983
or it might be the value on the buffers/cache line. It might be some combination of totalram
, freeram
, sharedram
, bufferedram
as reported by sysinfo(2)
. It might also include information about totalswap
and freeswap
, also from sysinfo(2)
.
The meaning is often a function of what kinds of policies are desired in a Condor deployment.
Once you have picked a meaning for your deployment, Condor provides you with the STARTD_CRON mechanism to include a FreeMemoryMB
attribute in your slot ads. From there the attribute can be referenced by policy on jobs, during negotiation, and slot policy.
You need two things: first, a way to calculate the value for FreeMemoryMB, we’ll use the simple bash script below that pulls FreeMem:
out of /proc/meminfo; second, configuration available to the condor_startd to run the program.
free_memory_mb.sh:
#!/bin/sh FREE_MEMORY_KB=$(grep ^MemFree < /proc/meminfo | awk '{print $2}') echo "FreeMemoryMB = $((FREE_MEMORY_KB / 1024))"
condor_config.local:
STARTD_CRON_JOBLIST = FREE_MEMORY_MB STARTD_CRON_FREE_MEMORY_MB_EXECUTABLE = $(LIBEXEC)/free_memory_mb.sh STARTD_CRON_FREE_MEMORY_MB_PERIOD = $(UPDATE_INTERVAL)s
Notes: First, the documentation is out of sync with 7.4.0 and the units on _PERIOD
must be specified; second, UPDATE_INTERVAL
needs to be defined, it specifies how often the condor_startd
will periodically send updates to the Collector.
After reconfiguring the Startd, or just restarting condor, you can view the new attribute with condor_status
:
$ condor_status -long | grep ^FreeMemoryMB | sort | uniq -c 10 FreeMemoryMB = 174 $ free -m total used free shared buffers cached Mem: 2016 1842 173 0 101 874 -/+ buffers/cache: 866 1149 Swap: 1983 0 1983
Yes, the values are different between FreeMemoryMB
and free
. The amount of free memory is changing constantly and we are just sampling it. You can increase the sampling rate, but beware that means you will generate more frequent updates to the Collector. Maybe not a problem when you have 32 machines and 256 slots, but definitely something to consider when you have 3000 machines and 24000 slots.
Final note: A better name for the attribute representing free memory on a system is TotalFreeMemoryMB
to remain consistent with other attributes. For instance, Disk
is a slot’s share of the TotalDisk
free on the system.