Posts Tagged ‘Statistics’

Statistic changes in HTCondor 7.7

February 12, 2013

Notice to HTCondor 7.8 users –

Statistics implemented during the 7.5 series that landed in 7.7.0 were rewritten by the time 7.8 was released. If you were using the original statistics for monitoring and/or reporting, here is a table to help you map old (left column) to new (right column).

See – 7.6 -> 7.8 schedd stats
(embedding content requires javascript, which is not available on wordpress.com)

Note: The *Rate and Mean* attributes require math, and UpdateTime requires memory

Pool utilization and schedd statistic graphs

June 22, 2012

Assuming you are gathering pool utilization and schedd statistics, you might be able to see something like this,

Queue depth and job rates

This graph is for a single schedd and may show queue depth’s, a.k.a. the number of jobs waiting in the queue, impact on job submission, start and completion rates. The submission rate is on the top. The start and completion rates overlap, which is good. I say may show because there are other factors involved that have not been ruled out, such as other processes on the system that started to run it out of memory. Note that the base rate is a function of job duration and number of available slots. Despite having hundreds of slots, the max rate is quite low because the jobs were minutes long.

Over this nine day period, as the queue grew to 1.8 million jobs, the utilization remained above 95%,

Pool utilization

Schedd stats with OpenSTDB

June 14, 2012

Building on Pool utilization with OpenSTDB, the condor_schedd also advertises a plethora of useful statistics that can be harvested with condor_status.

Make the metrics,

$ tsdb mkmetric condor.schedd.jobs.idle condor.schedd.jobs.running condor.schedd.jobs.held condor.schedd.jobs.mean_runtime condor.schedd.jobs.mean_waittime condor.schedd.jobs.historical.mean_runtime condor.schedd.jobs.historical.mean_waittime condor.schedd.jobs.submission_rate condor.schedd.jobs.start_rate condor.schedd.jobs.completion_rate
...

Obtain schedd_stats_for_opentsdb.sh and run,

$ while true; do ./schedd_stats_for_opentsdb.sh; sleep 15; done | nc -w 30 tsdb-host 4242 &

View the results at,

http://tsdb-host:4242/#start=1h-ago&m=sum:condor.schedd.jobs.idle&o=&m=sum:condor.schedd.jobs.running&o=&m=sum:condor.schedd.jobs.held&o=&m=sum:10m-avg:condor.schedd.jobs.submission_rate&o=axis%20x1y2&m=sum:10m-avg:condor.schedd.jobs.start_rate&o=axis%20x1y2&m=sum:10m-avg:condor.schedd.jobs.completion_rate&o=axis%20x1y2&ylabel=jobs&y2label=rates&yrange=%5B0:%5D&y2range=%5B0:%5D&key=out%20center%20top%20horiz%20box

Update on Negotiation cycle statistics

December 5, 2010

Back in March 2010, I posted an AWK script that would process a NegotiatorLog and, on a per cycle basis, provide information on the number of matches, duration of cycle, match rate (matches per second), number of rejections, number of submitters and the total number of slots considered.

Now, with GT1458 in 7.5.4 and thanks to Dan Bradley, you can get these statistics right from the Negotiator’s ad. Run condor_status -negotiator -long and you’ll find:

LastNegotiationCycleTime0 = 446803200
LastNegotiationCycleMatches0 = 990
LastNegotiationCycleDuration0 = 5
LastNegotiationCycleRejections0 = 2
LastNegotiationCycleActiveSubmitterCount0 = 1

Soon, thanks to Erik Erlandson and Jon Thomas, when GT1393 gets merged, you will get access to a host of new statistics, including:

LastNegotiationCyclePeriod0 = 25
LastNegotiationCycleMatchRate0 = 198.000000
LastNegotiationCycleTotalSlots0 = 1100
LastNegotiationCycleMatchRateSustained0 = 39.599998
LastNegotiationCycleSubmittersShareLimit0 = ""

Negotiation cycle statistics

March 29, 2010

If you have ever wondered how your Negotiator is doing, you may be interested in this AWK script. It summarizes negotiation cycles by reading NegotiatorLog output.

You can either pass it your NegotiatorLog or, if you want to only summarize recent cycles, combine tail -n and a pipe.

#!/usr/bin/awk -f

function parse_time(string) {
   return mktime(gensub(/([^/]*)\/([^ ]*) ([^:]*):([^:]*):([^ ]*) .*/,
                        "1984 \\1 \\2 \\3 \\4 \\5", "g"))
}

BEGIN { started = 0; finished = 0 }

/Started Negotiation Cycle/ {
   started = parse_time($0)
#   if (finished) print "Delay:", started - finished
   finished = 0; matched = 0; rejected = 0; submitters = 0; slots = 0
}

/Matched/ {
   matched += 1
}

/Rejected/ {
   rejected += 1
}

/Public ads include .* submitter, .* startd/ {
   submitters = $6
   slots = $8
}

/Finished Negotiation Cycle/ {
   finished = parse_time($0)
   if (!started) next #{ print "Skipping first cycle"; next }
#   if (!matched) next #{ print "Skipping cycle with no matches"; next }
   duration = finished - started
   if (!duration) next # { print "Skipping zero second cycle"; next }
   print strftime("%m/%d %T", started), "::",
       matched, "matches in",
       duration, "seconds",
       "(" matched / duration "/s) with",
       rejected, "rejections,",
       submitters, "submitters,",
       slots, "slots"
}

END {
   #if (!finished) print "Skipping last cycle"
}

Condor’s debug logs do not include a year or timezone in their timestamp, so cycles that span years or daylight savings periods will produce bogus results.


%d bloggers like this: