Archive for June, 2012

Pool utilization and schedd statistic graphs

June 22, 2012

Assuming you are gathering pool utilization and schedd statistics, you might be able to see something like this,

Queue depth and job rates

This graph is for a single schedd and may show queue depth’s, a.k.a. the number of jobs waiting in the queue, impact on job submission, start and completion rates. The submission rate is on the top. The start and completion rates overlap, which is good. I say may show because there are other factors involved that have not been ruled out, such as other processes on the system that started to run it out of memory. Note that the base rate is a function of job duration and number of available slots. Despite having hundreds of slots, the max rate is quite low because the jobs were minutes long.

Over this nine day period, as the queue grew to 1.8 million jobs, the utilization remained above 95%,

Pool utilization

Wallaby: Skeleton Group

June 19, 2012

Read about Wallaby’s Skeleton Group feature. Working similar to /etc/skel for accounts on a single system, it provides base configuration to nodes as they join a pool. It is especially useful for pools with dynamic and opportunistic resources.

Schedd stats with OpenSTDB

June 14, 2012

Building on Pool utilization with OpenSTDB, the condor_schedd also advertises a plethora of useful statistics that can be harvested with condor_status.

Make the metrics,

$ tsdb mkmetric condor.schedd.jobs.idle condor.schedd.jobs.running condor.schedd.jobs.held condor.schedd.jobs.mean_runtime condor.schedd.jobs.mean_waittime condor.schedd.jobs.historical.mean_runtime condor.schedd.jobs.historical.mean_waittime condor.schedd.jobs.submission_rate condor.schedd.jobs.start_rate condor.schedd.jobs.completion_rate
...

Obtain schedd_stats_for_opentsdb.sh and run,

$ while true; do ./schedd_stats_for_opentsdb.sh; sleep 15; done | nc -w 30 tsdb-host 4242 &

View the results at,

http://tsdb-host:4242/#start=1h-ago&m=sum:condor.schedd.jobs.idle&o=&m=sum:condor.schedd.jobs.running&o=&m=sum:condor.schedd.jobs.held&o=&m=sum:10m-avg:condor.schedd.jobs.submission_rate&o=axis%20x1y2&m=sum:10m-avg:condor.schedd.jobs.start_rate&o=axis%20x1y2&m=sum:10m-avg:condor.schedd.jobs.completion_rate&o=axis%20x1y2&ylabel=jobs&y2label=rates&yrange=%5B0:%5D&y2range=%5B0:%5D&key=out%20center%20top%20horiz%20box

Pool utilization with OpenSTDB

June 12, 2012

Merging the pool utilization script with OpenTSDB.

Once you have followed the stellar OpenTSDB Getting Started guide, make the metrics with,

$ tsdb mkmetric condor.pool.slots.unavail condor.pool.slots.avail condor.pool.slots.total condor.pool.slots.used condor.pool.slots.used_of_avail condor.pool.slots.used_of_total condor.pool.cpus.unavail condor.pool.cpus.avail condor.pool.cpus.total condor.pool.cpus.used condor.pool.cpus.used_of_avail condor.pool.cpus.used_of_total condor.pool.memory.unavail condor.pool.memory.avail condor.pool.memory.total condor.pool.memory.used condor.pool.memory.used_of_avail condor.pool.memory.used_of_total
metrics condor.pool.slots.unavail: [0, 0, 1]
metrics condor.pool.slots.avail: [0, 0, 2]
metrics condor.pool.slots.total: [0, 0, 3]
...
metrics condor.pool.memory.used_of_total: [0, 0, 17]

Obtain utilization_for_opentsdb.sh before running,

$ while true; do ./utilization_for_opentsdb.sh; sleep 15; done | nc -w 30 tsdb-host 4242 &

View the results at,

http://tsdb-host:4242/#start=1h-ago&m=sum:condor.pool.cpus.total&o=&m=sum:condor.pool.cpus.used&o=&m=sum:condor.pool.cpus.used_of_avail&o=axis%20x1y2&ylabel=cpus&y2label=%2525+utilization&yrange=%5B0:%5D&y2range=%5B0:1%5D&key=out%20center%20top%20horiz%20box

The number of statistics about operating Condor pools has been growing over the years. All are easily retrieved via condor_status for feeding into OpenTSDB.

Hadoop JobTracker and NameNode configuration error: FileNotFoundException jobToken

June 8, 2012

FYI for anyone running into java.io.FileNotFoundException: File file:/tmp/hadoop-USER/mapred/system/JOBID/jobToken does not exist. when running Hadoop MapReduce jobs.

Your JobTracker needs access to an HDFS NameNode to share information with TaskTrackers. If the JobTracker is misconfigured and cannot connect to a NameNode it will write to local disk instead. Or at least it will in hadoop-1.0.1. The result is TaskTrackers fail to find the jobToken in HDFS, or on their local disks, and will throw a FileNotFoundException.

I ran into this because I did not have a fs.default.name property in my JobTracker’s mapred-site.xml. I was putting it in hdfs-site.xml instead.

I have not found messages in the JobTracker (or TaskTracker) logs that explicitly indicate this misconfiguration.

Oops.

Service as a Job: Hadoop MapReduce TaskTracker

June 6, 2012

Continuing to build on other examples of services run as jobs, such as HDFS NameNode and HDFS DataNode, here is an example for the Hadoop MapReduce framework’s TaskTracker.

mapred_tasktracker.sh

#!/bin/sh -x

# condor_chirp in /usr/libexec/condor
export PATH=$PATH:/usr/libexec/condor

HADOOP_TARBALL=$1
JOBTRACKER_ENDPOINT=$2

# Note: bin/hadoop uses JAVA_HOME to find the runtime and tools.jar,
#       except tools.jar does not seem necessary therefore /usr works
#       (there's no /usr/lib/tools.jar, but there is /usr/bin/java)
export JAVA_HOME=/usr

# When we get SIGTERM, which Condor will send when
# we are kicked, kill off the datanode and gather logs
function term {
   ./bin/hadoop-daemon.sh stop tasktracker
# Useful if we can transfer data back
#   tar czf logs.tgz logs
#   cp logs.tgz $_CONDOR_SCRATCH_DIR
}

# Unpack
tar xzfv $HADOOP_TARBALL

# Move into tarball, inefficiently
cd $(tar tzf $HADOOP_TARBALL | head -n1)

# Configure,
#  . http.address must be set to port 0 (ephemeral)
cat > conf/mapred-site.xml <<EOF
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>$JOBTRACKER_ENDPOINT</value>
  </property>
  <property>
    <name>mapred.task.tracker.http.address</name>
    <value>0.0.0.0:0</value>
  </property>
</configuration>
EOF

# Try to shutdown cleanly
trap term SIGTERM

export HADOOP_CONF_DIR=$PWD/conf
export HADOOP_PID_DIR=$PWD
export HADOOP_LOG_DIR=$_CONDOR_SCRATCH_DIR/logs
./bin/hadoop-daemon.sh start tasktracker

# Wait for pid file
PID_FILE=$(echo hadoop-*-tasktracker.pid)
while [ ! -s $PID_FILE ]; do sleep 1; done
PID=$(cat $PID_FILE)

# Report back some static data about the tasktracker
# e.g. condor_chirp set_job_attr SomeAttr SomeData
# None at the moment.

# While the tasktracker is running, collect and report back stats
while kill -0 $PID; do
   # Collect stats and chirp them back into the job ad
   # None at the moment.
   sleep 30
done

The job description below uses the same templating technique as the DataNode example. The description uses a variable JobTrackerAddress, provided on the command-line as an argument to condor_submit.

mapred_tasktracker.job

# Submit w/ condor_submit -a JobTrackerAddress=<address>
# e.g. <address> = $HOSTNAME:9001

cmd = mapred_tasktracker.sh
args = hadoop-1.0.1-bin.tar.gz $(JobTrackerAddress)

transfer_input_files = hadoop-1.0.1-bin.tar.gz

# RFE: Ability to get output files even when job is removed
#transfer_output_files = logs.tgz
#transfer_output_remaps = "logs.tgz logs.$(cluster).tgz"
output = tasktracker.$(cluster).out
error = tasktracker.$(cluster).err

log = tasktracker.$(cluster).log

kill_sig = SIGTERM

# Want chirp functionality
+WantIOProxy = TRUE

should_transfer_files = yes
when_to_transfer_output = on_exit

requirements = HasJava =?= TRUE

queue

From here you can run condor_submit -a JobTrackerAddress=$HOSTNAME:9001 mapred_tasktracker.job a few times and build up a MapReduce cluster. Assuming you are already running a JobTrackers. You can also hit the JobTracker’s web interface to see TaskTrackers check in.

Assuming you are also running an HDFS instance, you can run some jobs against your new cluster.

For fun, I ran time hadoop jar hadoop-test-1.0.1.jar mrbench -verbose -maps 100 -inputLines 100 three times against 4 TaskTrackers, 8 TaskTrackers and 12 TaskTrackers. The resulting run times were,

# nodes runtime
4 01:26
8 00:50
12 00:38

%d bloggers like this: