Posts Tagged ‘Utilization’

Partitionable slot utilization

October 1, 2012

There are already ways to get pool utilization information on a macro level. Until Condor 7.8 and the introduction of TotalSlot{Cpus,Memory,Disk}, there were no good ways to get utilization on a micro level. At least not with only the standard command-line tools.

Resource data is available per slot. Getting macro, pool utilization always requires aggregating data across multiple slots. Getting micro, slot utilization should not.

In a pool using partitionable slots, you can now get per slot utilization from the slot itself. There is no need for any extra tooling to perform aggregation or correlation. This means condor_status can directly provide utilization information on the micro, per slot level.

$ echo "           Name  Cpus Avail Util%  Memory Avail Util%"
$ condor_status -constraint "PartitionableSlot =?= TRUE" -format "%15s" Name -format "%6d" TotalSlotCpus -format "%6d" Cpus -format "%5d%%" "((TotalSlotCpus - Cpus) / (TotalSlotCpus * 1.0)) * 100" -format "%8d" TotalSlotMemory -format "%6d" Memory -format "%5d%%" "((TotalSlotMemory - Memory) / (TotalSlotMemory * 1.0)) * 100" -format "\n" TRUE 

           Name  Cpus Avail Util%  Memory Avail Util%
  slot1@eeyore0    16    12   25%   65536 48128   26%
  slot2@eeyore0    16    14   12%   65536 58368   10%
  slot1@eeyore1    16    12   25%   65536 40960   37%
  slot2@eeyore1    16    15    6%   65536 62464    4%

This is especially useful when machines are configured into combinations of multiple partitionable slots or partitionable and static slots.

Someday the pool utilization script should be integrated with condor_status.

Advertisements

Pool utilization and schedd statistic graphs

June 22, 2012

Assuming you are gathering pool utilization and schedd statistics, you might be able to see something like this,

Queue depth and job rates

This graph is for a single schedd and may show queue depth’s, a.k.a. the number of jobs waiting in the queue, impact on job submission, start and completion rates. The submission rate is on the top. The start and completion rates overlap, which is good. I say may show because there are other factors involved that have not been ruled out, such as other processes on the system that started to run it out of memory. Note that the base rate is a function of job duration and number of available slots. Despite having hundreds of slots, the max rate is quite low because the jobs were minutes long.

Over this nine day period, as the queue grew to 1.8 million jobs, the utilization remained above 95%,

Pool utilization

Pool utilization with OpenSTDB

June 12, 2012

Merging the pool utilization script with OpenTSDB.

Once you have followed the stellar OpenTSDB Getting Started guide, make the metrics with,

$ tsdb mkmetric condor.pool.slots.unavail condor.pool.slots.avail condor.pool.slots.total condor.pool.slots.used condor.pool.slots.used_of_avail condor.pool.slots.used_of_total condor.pool.cpus.unavail condor.pool.cpus.avail condor.pool.cpus.total condor.pool.cpus.used condor.pool.cpus.used_of_avail condor.pool.cpus.used_of_total condor.pool.memory.unavail condor.pool.memory.avail condor.pool.memory.total condor.pool.memory.used condor.pool.memory.used_of_avail condor.pool.memory.used_of_total
metrics condor.pool.slots.unavail: [0, 0, 1]
metrics condor.pool.slots.avail: [0, 0, 2]
metrics condor.pool.slots.total: [0, 0, 3]
...
metrics condor.pool.memory.used_of_total: [0, 0, 17]

Obtain utilization_for_opentsdb.sh before running,

$ while true; do ./utilization_for_opentsdb.sh; sleep 15; done | nc -w 30 tsdb-host 4242 &

View the results at,

http://tsdb-host:4242/#start=1h-ago&m=sum:condor.pool.cpus.total&o=&m=sum:condor.pool.cpus.used&o=&m=sum:condor.pool.cpus.used_of_avail&o=axis%20x1y2&ylabel=cpus&y2label=%2525+utilization&yrange=%5B0:%5D&y2range=%5B0:1%5D&key=out%20center%20top%20horiz%20box

The number of statistics about operating Condor pools has been growing over the years. All are easily retrieved via condor_status for feeding into OpenTSDB.

Pool utilization

January 31, 2012

Here is a utilization script for a Condor pool.

$ ./utilization.sh
       Unavailable Available    Total     Used:  Avail   Total
Slots         5968      5451    11419     4179  76.66%  36.59%
Cpus          6314      5903    12217     4631  78.45%  37.90%
Memory    14277325  11776800 26054125  9908190  84.13%  38.02%

And, if you know your workload will not run on slots with less then 1GB of memory, you can filter out slots that are too small,

$ ./utilization.sh 'Memory < 1024'
       Unavailable Available    Total     Used:  Avail   Total
Slots         6292      5127    11419     4177  81.47%  36.57%
Cpus          6638      5579    12217     4629  82.97%  37.88%
Memory    14592711  11461414 26054125  9904193  86.41%  38.01%

Remember, if an attribute is not on all slots you need to use the meta-comparison operators: =?= and =!=, e.g. 'MyCustomAttr =!= True'.


%d bloggers like this: