Archive for the ‘Cloud’ Category

A recipe for starting cloud images with virt-install

January 8, 2014

I’m a fan of using the same OS image across multiple environments. So, I’m a fan of using cloud images, those with cloud-init installed, even outside of a cloud.

The trick to this is properly triggering the NoCloud datasource. It’s actually more of a pain than you would think, and not very well documented. Here’s my recipe (from Fedora 19),

wget http://download.fedoraproject.org/pub/fedora/linux/releases/test/20-Beta/Images/x86_64/Fedora-x86_64-20-Beta-20131106-sda.raw.xz
xz -d Fedora-x86_64-20-Beta-20131106-sda.raw.xz

echo "#cloud-config\npassword: fedora\nchpasswd: {expire: False}\nssh_pwauth: True" > user-data

NAME=node0
cp Fedora-x86_64-20-Beta-20131106-sda.raw.xz $NAME.raw
echo "instance-id: $NAME; local-hostname: $NAME" > meta-data
genisoimage -output $NAME-cidata.iso -volid cidata -joliet -rock user-data meta-data
virt-install --import --name $NAME --ram 512 --vcpus 2 --disk $NAME.raw --disk $NAME-cidata.iso,device=cdrom --network bridge=virbr0

Login with username fedora and password fedora.

You’ll also want to boost the amount of RAM if you plan on doing anything interesting in the guest.

You can repeat lines 6 through 10 to start multiple guests, just make sure to change the name in line 6.

If you want to ssh into the guest, you can use virsh console, login and use ifconfig / ip addr to find the address. Or, you can use arp -e and virsh dumpxml to match MAC addresses. Or just arp -e before and after starting the guest.

Note, you need to follow the meta-data and user-data lines very closely. If you don’t you may not trigger the NoCloud datasource properly. It took me a number of tries to get it right. Also, the volid needs to be “cidata” or it won’t be found, which turns out to be a configurable parameter for NoCloud. The chpasswd bit is to prevent being prompted to change your password the first time you login.

Consider becoming a fan of consistent OS images across your environments too!

Your API is a feature, give it real resource management

January 14, 2013

So much these days is about distributed resource management. That’s anything that can be created and destroyed in the cloud[0]. Proper management is especially important when the resource’s existence is tied to a real economy, e.g. your user’s credit card[1].

EC2 instance creation without idempotent RunInstance

EC2 instance creation without idempotent RunInstance

Above is a state machine required to ensure that resources created in AWS EC2 are not lost, i.e. do not have to be manually cleaned up. The green arrows represent error free flow. The rest is about error handling or external state changes, e.g. user terminated operation. This is from before EC2 supported idempotent instance creation.

The state machine rewritten to use idempotent instance creation,

EC2 instance creation with idempotent RunInstance

EC2 instance creation with idempotent RunInstance

What’s going on here? Handling failure during resource creation.

The important failure to consider as a client is what happens if you ask your resource provider to create something and you never hear back. This is a distributed system, there are numerous reasons why you may not hear back. For simplicity, consider the client code crashed between sending the request and receiving the response.

The solution is to construct a transaction for resource creation[2]. To construct a transaction, you need to atomically associate a piece of information with the resource at creation time. We’ll call that piece of information an anima.

In the old EC2 API, the only way to construct an anima was through controlling a security group or keypair. Since neither is tied to a real economy, both are reasonable options. The non-idempotent state machine above uses the keypair as it is less resource intensive for EC2.

On creation failure and with the anima in hand[3], the client must search the remote system for the anima before reattempting creation. This is handled by the GM_CHECK_VM state above.

Unfortunately, without explicit support in the API, i.e. lookup by anima, the search can be unnatural and expensive. For example, EC2 instances are not indexed on keypair. Searching requires a client side scan of all instances.

With the introduction of idempotent RunInstances, the portion of the state machine for constructing and locating the anima is reduced to the GM_SAVE_CLIENT_TOKEN state, an entirely local operation. The reduction in complexity is clear.

After two years, EC2 appears to be the only API providing idempotent instance creation[4]. Though APIs are starting to provide atomic anima association, often through metadata or instance attributes, and some even provide lookup.

You should provide an idempotent resource creation operation in your API too!

[0] “in the cloud” – really anywhere in any distributed system!
[1] Making money from forgotten or misplaced resources is a short term play.
[2] Alternatively, you can choose an architecture with a janitor process, which will bring its own complexities.
[3] “in hand” – so long as your hand is reliable storage.
[4] After a quick survey, I’m looking at you Rackspace, RimuHosting, vCloud, OpenNebula, OpenStack, Eucalyptus, GoGrid, Deltacloud, Google Compute Engine and Gandi.

Service as a Job: Hadoop MapReduce TaskTracker

June 6, 2012

Continuing to build on other examples of services run as jobs, such as HDFS NameNode and HDFS DataNode, here is an example for the Hadoop MapReduce framework’s TaskTracker.

mapred_tasktracker.sh

#!/bin/sh -x

# condor_chirp in /usr/libexec/condor
export PATH=$PATH:/usr/libexec/condor

HADOOP_TARBALL=$1
JOBTRACKER_ENDPOINT=$2

# Note: bin/hadoop uses JAVA_HOME to find the runtime and tools.jar,
#       except tools.jar does not seem necessary therefore /usr works
#       (there's no /usr/lib/tools.jar, but there is /usr/bin/java)
export JAVA_HOME=/usr

# When we get SIGTERM, which Condor will send when
# we are kicked, kill off the datanode and gather logs
function term {
   ./bin/hadoop-daemon.sh stop tasktracker
# Useful if we can transfer data back
#   tar czf logs.tgz logs
#   cp logs.tgz $_CONDOR_SCRATCH_DIR
}

# Unpack
tar xzfv $HADOOP_TARBALL

# Move into tarball, inefficiently
cd $(tar tzf $HADOOP_TARBALL | head -n1)

# Configure,
#  . http.address must be set to port 0 (ephemeral)
cat > conf/mapred-site.xml <<EOF
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>$JOBTRACKER_ENDPOINT</value>
  </property>
  <property>
    <name>mapred.task.tracker.http.address</name>
    <value>0.0.0.0:0</value>
  </property>
</configuration>
EOF

# Try to shutdown cleanly
trap term SIGTERM

export HADOOP_CONF_DIR=$PWD/conf
export HADOOP_PID_DIR=$PWD
export HADOOP_LOG_DIR=$_CONDOR_SCRATCH_DIR/logs
./bin/hadoop-daemon.sh start tasktracker

# Wait for pid file
PID_FILE=$(echo hadoop-*-tasktracker.pid)
while [ ! -s $PID_FILE ]; do sleep 1; done
PID=$(cat $PID_FILE)

# Report back some static data about the tasktracker
# e.g. condor_chirp set_job_attr SomeAttr SomeData
# None at the moment.

# While the tasktracker is running, collect and report back stats
while kill -0 $PID; do
   # Collect stats and chirp them back into the job ad
   # None at the moment.
   sleep 30
done

The job description below uses the same templating technique as the DataNode example. The description uses a variable JobTrackerAddress, provided on the command-line as an argument to condor_submit.

mapred_tasktracker.job

# Submit w/ condor_submit -a JobTrackerAddress=<address>
# e.g. <address> = $HOSTNAME:9001

cmd = mapred_tasktracker.sh
args = hadoop-1.0.1-bin.tar.gz $(JobTrackerAddress)

transfer_input_files = hadoop-1.0.1-bin.tar.gz

# RFE: Ability to get output files even when job is removed
#transfer_output_files = logs.tgz
#transfer_output_remaps = "logs.tgz logs.$(cluster).tgz"
output = tasktracker.$(cluster).out
error = tasktracker.$(cluster).err

log = tasktracker.$(cluster).log

kill_sig = SIGTERM

# Want chirp functionality
+WantIOProxy = TRUE

should_transfer_files = yes
when_to_transfer_output = on_exit

requirements = HasJava =?= TRUE

queue

From here you can run condor_submit -a JobTrackerAddress=$HOSTNAME:9001 mapred_tasktracker.job a few times and build up a MapReduce cluster. Assuming you are already running a JobTrackers. You can also hit the JobTracker’s web interface to see TaskTrackers check in.

Assuming you are also running an HDFS instance, you can run some jobs against your new cluster.

For fun, I ran time hadoop jar hadoop-test-1.0.1.jar mrbench -verbose -maps 100 -inputLines 100 three times against 4 TaskTrackers, 8 TaskTrackers and 12 TaskTrackers. The resulting run times were,

# nodes runtime
4 01:26
8 00:50
12 00:38

Service as a Job: HDFS NameNode

April 16, 2012

Scheduling an HDFS DataNode is a powerful function. However, an operational HDFS instance also requires a NameNode. Here is an example of how a NameNode can be scheduled, followed by scheduled DataNodes, to create an HDFS instance.

From here, HDFS instances can be dynamically created on shared resources. Workflows can be built to manage, grow and shrink HDFS instances. Multiple HDFS instances can be deployed on a single set of resources.

The control script is based on hdfs_datanode.sh. It discovers the NameNode’s endpoints and chirps them.

hdfs_namenode.sh

#!/bin/sh -x

# condor_chirp in /usr/libexec/condor
export PATH=$PATH:/usr/libexec/condor

HADOOP_TARBALL=$1

# Note: bin/hadoop uses JAVA_HOME to find the runtime and tools.jar,
#       except tools.jar does not seem necessary therefore /usr works
#       (there's no /usr/lib/tools.jar, but there is /usr/bin/java)
export JAVA_HOME=/usr

# When we get SIGTERM, which Condor will send when
# we are kicked, kill off the namenode
function term {
   ./bin/hadoop-daemon.sh stop namenode
}

# Unpack
tar xzfv $HADOOP_TARBALL

# Move into tarball, inefficiently
cd $(tar tzf $HADOOP_TARBALL | head -n1)

# Configure,
#  . fs.default.name,dfs.http.address must be set to port 0 (ephemeral)
#  . dfs.name.dir must be in _CONDOR_SCRATCH_DIR for cleanup
# FIX: Figure out why a hostname is required, instead of 0.0.0.0:0 for
#      fs.default.name
cat > conf/hdfs-site.xml <<EOF
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://$HOSTNAME:0</value>
  </property>
  <property>
    <name>dfs.name.dir</name>
    <value>$_CONDOR_SCRATCH_DIR/name</value>
  </property>
  <property>
    <name>dfs.http.address</name>
    <value>0.0.0.0:0</value>
  </property>
</configuration>
EOF

# Try to shutdown cleanly
trap term SIGTERM

export HADOOP_CONF_DIR=$PWD/conf
export HADOOP_LOG_DIR=$_CONDOR_SCRATCH_DIR/logs
export HADOOP_PID_DIR=$PWD

./bin/hadoop namenode -format
./bin/hadoop-daemon.sh start namenode

# Wait for pid file
PID_FILE=$(echo hadoop-*-namenode.pid)
while [ ! -s $PID_FILE ]; do sleep 1; done
PID=$(cat $PID_FILE)

# Wait for the log
LOG_FILE=$(echo $HADOOP_LOG_DIR/hadoop-*-namenode-*.log)
while [ ! -s $LOG_FILE ]; do sleep 1; done

# It would be nice if there were a way to get these without grepping logs
while [ ! $(grep "IPC Server listener on" $LOG_FILE) ]; do sleep 1; done
IPC_PORT=$(grep "IPC Server listener on" $LOG_FILE | sed 's/.* on \(.*\):.*/\1/')
while [ ! $(grep "Jetty bound to port" $LOG_FILE) ]; do sleep 1; done
HTTP_PORT=$(grep "Jetty bound to port" $LOG_FILE | sed 's/.* to port \(.*\)$/\1/')

# Record the port number where everyone can see it
condor_chirp set_job_attr NameNodeIPCAddress \"$HOSTNAME:$IPC_PORT\"
condor_chirp set_job_attr NameNodeHTTPAddress \"$HOSTNAME:$HTTP_PORT\"

# While namenode is running, collect and report back stats
while kill -0 $PID; do
   # Collect stats and chirp them back into the job ad
   # Nothing to do.
   sleep 30
done

The job description file is standard at this point.

hdfs_namenode.job

cmd = hdfs_namenode.sh
args = hadoop-1.0.1-bin.tar.gz

transfer_input_files = hadoop-1.0.1-bin.tar.gz

#output = namenode.$(cluster).out
#error = namenode.$(cluster).err

log = namenode.$(cluster).log

kill_sig = SIGTERM

# Want chirp functionality
+WantIOProxy = TRUE

should_transfer_files = yes
when_to_transfer_output = on_exit

requirements = HasJava =?= TRUE

queue

In operation –

Submit the namenode and find its endpoints,

$ condor_submit hdfs_namenode.job
Submitting job(s).
1 job(s) submitted to cluster 208.

$ condor_q -long 208| grep NameNode       
NameNodeHTTPAddress = "eeyore.local:60182"
NameNodeIPCAddress = "eeyore.local:38174"

Open a browser window to http://eeyore.local:60182 to find the cluster summary,

1 files and directories, 0 blocks = 1 total. Heap Size is 44.81 MB / 888.94 MB (5%)
  Configured Capacity                   :        0 KB
  DFS Used                              :        0 KB
  Non DFS Used                          :        0 KB
  DFS Remaining                         :        0 KB
  DFS Used%                             :       100 %
  DFS Remaining%                        :         0 %
 Live Nodes                             :           0
 Dead Nodes                             :           0
 Decommissioning Nodes                  :           0
  Number of Under-Replicated Blocks     :           0

Add a datanode,

$ condor_submit -a NameNodeAddress=hdfs://eeyore.local:38174 hdfs_datanode.job 
Submitting job(s).
1 job(s) submitted to cluster 209.

Refresh the cluster summary,

1 files and directories, 0 blocks = 1 total. Heap Size is 44.81 MB / 888.94 MB (5%)
  Configured Capacity                   :     9.84 GB
  DFS Used                              :       28 KB
  Non DFS Used                          :     9.54 GB
  DFS Remaining                         :   309.79 MB
  DFS Used%                             :         0 %
  DFS Remaining%                        :      3.07 %
 Live Nodes                             :           1
 Dead Nodes                             :           0
 Decommissioning Nodes                  :           0
  Number of Under-Replicated Blocks     :           0

And another,

$ condor_submit -a NameNodeAddress=hdfs://eeyore.local:38174 hdfs_datanode.job
Submitting job(s).
1 job(s) submitted to cluster 210.

Refresh,

1 files and directories, 0 blocks = 1 total. Heap Size is 44.81 MB / 888.94 MB (5%)
  Configured Capacity                   :    19.69 GB
  DFS Used                              :       56 KB
  Non DFS Used                          :    19.26 GB
  DFS Remaining                         :   435.51 MB
  DFS Used%                             :         0 %
  DFS Remaining%                        :      2.16 %
 Live Nodes                             :           2
 Dead Nodes                             :           0
 Decommissioning Nodes                  :           0
  Number of Under-Replicated Blocks     :           0

All the building blocks necessary to run HDFS on scheduled resources.

Service as a Job: HDFS DataNode

April 4, 2012

Building on other examples of services run as jobs, such as Tomcat, Qpidd and memcached, here is an example for the Hadoop Distributed File System‘s DataNode.

Below is the control script for the datanode. It mirrors the memcached’s script, but does not publish statistics. However, datanode statistic/metrics could be pulled and published.

hdfs_datanode.sh

#!/bin/sh -x

# condor_chirp in /usr/libexec/condor
export PATH=$PATH:/usr/libexec/condor

HADOOP_TARBALL=$1
NAMENODE_ENDPOINT=$2

# Note: bin/hadoop uses JAVA_HOME to find the runtime and tools.jar,
#       except tools.jar does not seem necessary therefore /usr works
#       (there's no /usr/lib/tools.jar, but there is /usr/bin/java)
export JAVA_HOME=/usr

# When we get SIGTERM, which Condor will send when
# we are kicked, kill off the datanode and gather logs
function term {
   ./bin/hadoop-daemon.sh stop datanode
# Useful if we can transfer data back
#   tar czf logs.tgz logs
#   cp logs.tgz $_CONDOR_SCRATCH_DIR
}

# Unpack
tar xzfv $HADOOP_TARBALL

# Move into tarball, inefficiently
cd $(tar tzf $HADOOP_TARBALL | head -n1)

# Configure,
#  . dfs.data.dir must be in _CONDOR_SCRATCH_DIR for cleanup
#  . address,http.address,ipc.address must be set to port 0 (ephemeral)
cat > conf/hdfs-site.xml <<EOF
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>$NAMENODE_ENDPOINT</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>$_CONDOR_SCRATCH_DIR/data</value>
  </property>
  <property>
    <name>dfs.datanode.address</name>
    <value>0.0.0.0:0</value>
  </property>
  <property>
    <name>dfs.datanode.http.address</name>
    <value>0.0.0.0:0</value>
  </property>
  <property>
    <name>dfs.datanode.ipc.address</name>
    <value>0.0.0.0:0</value>
  </property>
</configuration>
EOF

# Try to shutdown cleanly
trap term SIGTERM

export HADOOP_CONF_DIR=$PWD/conf
export HADOOP_PID_DIR=$PWD
export HADOOP_LOG_DIR=$_CONDOR_SCRATCH_DIR/logs
./bin/hadoop-daemon.sh start datanode

# Wait for pid file
PID_FILE=$(echo hadoop-*-datanode.pid)
while [ ! -s $PID_FILE ]; do sleep 1; done
PID=$(cat $PID_FILE)

# Report back some static data about the datanode
# e.g. condor_chirp set_job_attr SomeAttr SomeData
# None at the moment.

# While the datanode is running, collect and report back stats
while kill -0 $PID; do
   # Collect stats and chirp them back into the job ad
   # None at the moment.
   sleep 30
done

The job description below uses a templating technique. The description uses a variable NameNodeAddress, which is not defined in the description. Instead, the value is provided as an argument to condor_submit. In fact, a complete job can be defined without a description file, e.g. echo queue | condor_submit -a executable=/bin/sleep -a args=1d, but more on that some other time.

hdfs_datanode.job

# Submit w/ condor_submit -a NameNodeAddress=<address>
# e.g. <address> = hdfs://$HOSTNAME:2007

cmd = hdfs_datanode.sh
args = hadoop-1.0.1-bin.tar.gz $(NameNodeAddress)

transfer_input_files = hadoop-1.0.1-bin.tar.gz

# RFE: Ability to get output files even when job is removed
#transfer_output_files = logs.tgz
#transfer_output_remaps = "logs.tgz logs.$(cluster).tgz"
output = datanode.$(cluster).out
error = datanode.$(cluster).err

log = datanode.$(cluster).log

kill_sig = SIGTERM

# Want chirp functionality
+WantIOProxy = TRUE

should_transfer_files = yes
when_to_transfer_output = on_exit

requirements = HasJava =?= TRUE

queue

hadoop-1.0.1-bin.tar.gz is available from http://archive.apache.org/dist/hadoop/core/hadoop-1.0.1/

Finally, here is a running example,

A namenode is already running –

$ ./bin/hadoop dfsadmin -conf conf/hdfs-site.xml -report
Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)

Submit a datanode, knowing the namenode’s IPC port is 9000 –

$ condor_submit -a NameNodeAddress=hdfs://$HOSTNAME:9000 hdfs_datanode.job
Submitting job(s).
1 job(s) submitted to cluster 169.

$ condor_q
-- Submitter: eeyore.local : <127.0.0.1:59889> : eeyore.local
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD 
 169.0   matt            3/26 15:17   0+00:00:16 R  0   0.0 hdfs_datanode.sh h
1 jobs; 0 idle, 1 running, 0 held

Storage is now available, though not very much as my disk is almost full –

$ ./bin/hadoop dfsadmin -conf conf/hdfs-site.xml -report
Configured Capacity: 63810015232 (59.43 GB)
Present Capacity: 4907495424 (4.57 GB)
DFS Remaining: 4907466752 (4.57 GB)
DFS Used: 28672 (28 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Submit 9 more datanodes –

$ condor_submit -a NameNodeAddress=hdfs://$HOSTNAME:9000 hdfs_datanode.job
Submitting job(s).
1 job(s) submitted to cluster 170.
...
1 job(s) submitted to cluster 178.

$ ./bin/hadoop dfsadmin -conf conf/hdfs-site.xml -report
Configured Capacity: 638100152320 (594.28 GB)
Present Capacity: 40399958016 (37.63 GB)
DFS Remaining: 40399671296 (37.63 GB)
DFS Used: 286720 (280 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 10 (10 total, 0 dead)

At this point you can run a workload against the storage, visit the namenode at http://localhost:50070, or simply use ./bin/hadoop fs to interact with the filesystem.

Remember, all the datanodes were dispatched by a scheduler, run along side existing workload on your pool, and are completely manageable by standard policies.

Amazon S3 – Object Expiration, what about Instance Expiration

December 28, 2011

AWS is providing APIs that take distributed computing concerns into account. One could call them cloud concerns these days. Unfortunately, not all cloud providers are doing the same.

Idempotent instance creation showed up in Sept 2010, providing the ability to simplify interactions with EC2. Idempotent resource allocation is critical for distributed systems.

S3 object expiration appeared in Dec 2011, allowing for service-side managed deallocation of S3 resources.

Next up? It would be great to have an EC2 instance expiration feature. One that could be (0) assigned per instance and (1) adjusted while the instance exists. Bonus if can also be (2) adjusted from within the instance without credentials. Think leases.

Service as a Job: Memcached

December 5, 2011

Running services such as Tomcat or Qpidd show how to schedule and manage a service’s life-cycle via Condor. It is also possible to gather and centralize statistics about a service as it runs. Here is an example of how with memcached.

As with tomcat and qpidd, there is a control script and a job description.

New in the control script for memcached will be a loop to monitor and chirp back statistic information.

memcached.sh

#!/bin/sh

# condor_chirp in /usr/libexec/condor
export PATH=$PATH:/usr/libexec/condor

PORT_FILE=$TMP/.ports

# When we get SIGTERM, which Condor will send when
# we are kicked, kill off memcached.
function term {
   rm -f $PORT_FILE
   kill %1
}

# Spawn memcached, and make sure we can shut it down cleanly.
trap term SIGTERM
# memcached will write port information to env(MEMCACHED_PORT_FILENAME)
env MEMCACHED_PORT_FILENAME=$PORT_FILE memcached -p -1 "$@" &

# We might have to wait for the port
while [ ! -s $PORT_FILE ]; do sleep 1; done

# The port file's format is:
#  TCP INET: 56697
#  TCP INET6: 47318
#  UDP INET: 34453
#  UDP INET6: 54891
sed -i -e 's/ /_/' -e 's/\(.*\): \(.*\)/\1=\2/' $PORT_FILE
source $PORT_FILE
rm -f $PORT_FILE

# Record the port number where everyone can see it
condor_chirp set_job_attr MemcachedEndpoint \"$HOSTNAME:$TCP_INET\"
condor_chirp set_job_attr TCP_INET $TCP_INET
condor_chirp set_job_attr TCP_INET6 $TCP_INET6
condor_chirp set_job_attr UDP_INET $UDP_INET
condor_chirp set_job_attr UDP_INET6 $UDP_INET6

# While memcached is running, collect and report back stats
while kill -0 %1; do
   # Collect stats and chirp them back into the job ad
   echo stats | nc localhost $TCP_INET | \
    grep -v -e END -e version | tr '\r' '\0' | \
     awk '{print "stat_"$2,$3}' | \
      while read -r stat; do
         condor_chirp set_job_attr $stat
      done
   sleep 30
done

A refresher about chirp. Jobs are stored in condor_schedd processes. They are described using the ClassAd language, extensible name value pairs. chirp is a tool a job can use while it runs to modify its classad stored in the schedd.

The job description, passed to condor_submit, is vanilla except for how arguments are passed to memcached.sh. The dollardollar use, see man condor_submit, allows memcached to use as much memory as is available on the slot where it gets scheduled. Slots may have different amounts of Memory available.

memcached.job

cmd = memcached.sh
args = -m $$(Memory)

log = memcached.log

kill_sig = SIGTERM

# Want chirp functionality
+WantIOProxy = TRUE

should_transfer_files = if_needed
when_to_transfer_output = on_exit

queue

An example, note that the set of memcached servers to use is generated from condor_q,

$ condor_submit -a "queue 4" memcached.job
Submitting job(s)....
4 job(s) submitted to cluster 80.

$ condor_q -format "%s\t" MemcachedEndpoint -format "total_items: %d\t" stat_total_items -format "memory: %d/" stat_bytes -format "%d\n" stat_limit_maxbytes
eeyore.local:50608	total_items: 0	memory: 0/985661440
eeyore.local:47766	total_items: 0	memory: 0/985661440
eeyore.local:39130	total_items: 0	memory: 0/985661440
eeyore.local:57410	total_items: 0	memory: 0/985661440

$ SERVERS=$(condor_q -format "%s," MemcachedEndpoint); for word in $(cat words); do echo $word > $word; memcp --servers=$SERVERS $word; \rm $word; done &
[1] 959

$ condor_q -format "%s\t" MemcachedEndpoint -format "total_items: %d\t" stat_total_items -format "memory: %d/" stat_bytes -format "%d\n" stat_limit_maxbytes
eeyore.local:50608	total_items: 480	memory: 47740/985661440
eeyore.local:47766	total_items: 446	memory: 44284/985661440
eeyore.local:39130	total_items: 504	memory: 50140/985661440
eeyore.local:57410	total_items: 490	memory: 48632/985661440

$ condor_q -format "%s\t" MemcachedEndpoint -format "total_items: %d\t" stat_total_items -format "memory: %d/" stat_bytes -format "%d\n" stat_limit_maxbytes
eeyore.local:50608	total_items: 1926	memory: 191264/985661440
eeyore.local:47766	total_items: 1980	memory: 196624/985661440
eeyore.local:39130	total_items: 2059	memory: 204847/985661440
eeyore.local:57410	total_items: 2053	memory: 203885/985661440

$ condor_q -format "%s\t" MemcachedEndpoint -format "total_items: %d\t" stat_total_items -format "memory: %d/" stat_bytes -format "%d\n" stat_limit_maxbytes
eeyore.local:50608	total_items: 3408	memory: 338522/985661440
eeyore.local:47766	total_items: 3542	memory: 351784/985661440
eeyore.local:39130	total_items: 3666	memory: 364552/985661440
eeyore.local:57410	total_items: 3600	memory: 357546/985661440

[1]  + done       for word in $(cat words); do; echo $word > $word; memcp --servers=$SERVERS ; 

Enjoy.

Custom resource attributes: Facter

November 29, 2011

Condor provides a large set of attributes, facts, about resources for scheduling and querying, but it does not provide everything possible. Instead, there is a mechanism to extend the set. Previously, we added FreeMemoryMB. The set can also be extend with information from Facter.

Facter provides an extensible set of facts about a system. To include facter facts we need a means to translate them into attributes and add to Startd configuration.

$ facter
...
architecture => x86_64
domain => local
facterversion => 1.5.9
hardwareisa => x86_64
hardwaremodel => x86_64
physicalprocessorcount => 1
processor0 => Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz
selinux => true
selinux_config_mode => enforcing
swapfree => 3.98 GB
swapsize => 4.00 GB
...

The facts are of the form name => value, not very far off from ClassAd attributes. A simple script to convert all the facts into attribute with string values is,

/usr/libexec/condor/facter.sh

#!/bin/sh
type facter &> /dev/null || exit 1
facter | sed 's/\([^ ]*\) => \(.*\)/facter_\1 = "\2"/'
$ facter.sh
...
facter_architecture = "x86_64"
facter_domain = "local"
facter_facterversion = "1.5.9"
facter_hardwareisa = "x86_64"
facter_hardwaremodel = "x86_64"
facter_physicalprocessorcount = "1"
facter_processor0 = "Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz"
facter_selinux = "true"
facter_selinux_config_mode = "enforcing"
facter_swapfree = "3.98 GB"
facter_swapsize = "4.00 GB"
...

And the configuration, simply dropped into /etc/condor/config.d,

/etc/condor/config.d/49facter.config

FACTER = /usr/libexec/condor/facter.sh
STARTD_CRON_JOBLIST = $(STARTD_CRON_JOBLIST) FACTER
STARTD_CRON_FACTER_EXECUTABLE = $(FACTER)
STARTD_CRON_FACTER_PERIOD = 300

A condor_reconfig and the facter facts will be available,

$ condor_status -long | grep ^facter
...
facter_architecture = "x86_64"
facter_facterversion = "1.5.9"
facter_domain = "local"
facter_swapfree = "3.98 GB"
facter_selinux = "true"
facter_hardwaremodel = "x86_64"
facter_selinux_config_mode = "enforcing"
facter_processor0 = "Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz"
facter_selinux_mode = "targeted"
facter_hardwareisa = "x86_64"
facter_swapsize = "4.00 GB"
facter_physicalprocessorcount = "1"
...

For scheduling, just use the facter information in job requierments, e.g. requirements = facter_selinux == "true".

Or, query your pool to see what resources are not running selinux,

$ condor_status -const 'facter_selinux == "false"'
Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime
eeyore.local       LINUX      X86_64 Unclaimed Idle     0.030  3760  0+00:12:31
                     Machines Owner Claimed Unclaimed Matched Preempting
        X86_64/LINUX        1     0       0         1       0          0
               Total        1     0       0         1       0          0

Oops.

Getting started: Condor and EC2 – EC2 execute node

November 10, 2011

We have been over starting and managing instances from Condor, using condor_ec2_q to help, and importing existing instances. Here we will cover extending an existing pool using execute nodes run from EC2 instances. We will start with an existing pool, create an EC2 instance, configure the instance to run condor, authorize the instance to join the existing pool, and run a job.

Let us pretend that the node running your existing pool’s condor_collector and condor_schedd is called condor.condorproject.org.

These instructions will require bi-directional connectivity between condor.condorproject.org and your EC2 instance. condor.condorproject.org must be connected to the internet with a publically routable address. Also, ports must be open in its firewall for the Collector and Schedd. The EC2 execute nodes have to be able to connect to condor.condorproject.org to talk to the condor_collector and condor_schedd. It cannot be behind a NAT or firewall. Okay, let’s start.

I am going to use ami-60bd4609, a publically available Fedora 15 AMI. You can either start the instance via the AWS console, or submit it by following previous instructions.

Once the instance is up and running, login and sudo yum install condor. Note, until BZ656562 is resolved, you will have to sudo mkdir /var/run/condor; sudo chown condor.condor /var/run/condor before starting condor. Start condor with sudo service condor start to get a personal condor.

Configuring condor on the instance is very similar to creating a multiple node pool. You will need to set the CONDOR_HOST, ALLOW_WRITE, and DAEMON_LIST,

# cat > /etc/condor/config.d/40execute_node.config
CONDOR_HOST = condor.condorproject.org
DAEMON_LIST = MASTER, STARTD
ALLOW_WRITE = $(ALLOW_WRITE), $(CONDOR_HOST)
^D

If you do not give condor.condorproject.org WRITE permissions, the Schedd will fail to start jobs. StartLog will report,

PERMISSION DENIED to unauthenticated@unmapped from host 128.105.291.82 for command 442 (REQUEST_CLAIM), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 128.105.291.82,condor.condorproject.org, hostname size = 1, original ip address = 128.105.291.82

Now remember, we need bi-directional connectivity. So condor.condorproject.org must be able to connect to the EC2 instance’s Startd. The condor_start will listen on an ephemeral port by default. You could restrict it to a port range or use condor_shared_port. For simplicity, just force a non-ephemeral port of 3131,

# echo "STARTD_ARGS = -p 3131" >> /etc/condor/config.d/40execute_node.config

You can now open TCP port 3131 in the instance’s iptables firewall. If you are using the Fedora 15 AMI, the firewall is off by default and needs no adjustment. Additionally, the security group on the instance needs to have TCP port 3131 authorized. Use the AWS Console or ec2-authorize GROUP -p 3131.

If you miss either of these steps, the Schedd will fail to start jobs on the instance, likely with a message similar to,

Failed to send REQUEST_CLAIM to startd ec2-174-129-47-20.compute-1.amazonaws.com <174.129.47.20:3131>#1220911452#1#... for matt: SECMAN:2003:TCP connection to startd ec2-174-129-47-20.compute-1.amazonaws.com <174.129.47.20:3131>#1220911452#1#... for matt failed.

A quick service condor restart on the instance, and a condor_status on condor.condorproject.org would hopefully show the instance joined the pool. Except the instance has not been authorized yet. In fact, the CollectorLog will probably report,

PERMISSION DENIED to unauthenticated@unmapped from host 174.129.47.20 for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD: reason: ADVERTISE_STARTD authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 174.129.47.20,ec2-174-129-47-20.compute-1.amazonaws.com
PERMISSION DENIED to unauthenticated@unmapped from host 174.129.47.20 for command 2 (UPDATE_MASTER_AD), access level ADVERTISE_MASTER: reason: ADVERTISE_MASTER authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 174.129.47.20,ec2-174-129-47-20.compute-1.amazonaws.com

The instance needs to be authorized to advertise itself into the Collector. A good way to do that is to add,

ALLOW_ADVERTISE_MASTER = $(ALLOW_WRITE), ec2-174-129-47-20.compute-1.amazonaws.com
ALLOW_ADVERTISE_STARTD = $(ALLOW_WRITE), ec2-174-129-47-20.compute-1.amazonaws.com

to condor.condorproject.org’s configuration and reconfig with condor_reconfig. A note here, ALLOW_WRITE is added in because I am assuming you are following previous instructions. If you have ALLOW_ADVERTISE_MASTER/STARTD already configured, you should append to them instead. Also, appending for each new instance will get tedious. You could be very trusting and allow *.amazonaws.com, but it is better to use SSL or PASSWORD authentication. I will describe that some other time.

After the reconfig, the instance will eventually show up in a condor_status listing.

$ condor_status
Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime
localhost.localdom LINUX      INTEL  Unclaimed Benchmar 0.430  1666  0+00:00:04

The name is not very helpful, but also not a problem.

It is time to submit a job.

$ condor_submit
Submitting job(s)
cmd = /bin/sleep
args = 1d
should_transfer_files = IF_NEEDED
when_to_transfer_output = ON_EXIT
queue
.^D
1 job(s) submitted to cluster 14.

$ condor_q
-- Submitter: condor.condorproject.org : <128.105.291.82:36900> : condor.condorproject.org
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
  31.0   matt           11/11 11:11   0+00:00:00 I  0   0.0  sleep 1d
1 jobs; 1 idle, 0 running, 0 held

The job will stay idle forever, which is no good. The problem can be found in the SchedLog,

Enqueued contactStartd startd=<10.72.55.105:3131>
In checkContactQueue(), args = 0x9705798, host=<10.72.55.105:3131>
Requesting claim localhost.localdomain <10.72.55.105:3131>#1220743035#1#... for matt 31.0
attempt to connect to <10.72.55.105:3131> failed: Connection timed out (connect errno = 110).  Will keep trying for 45 total seconds (24 to go).

The root cause is that the instance has two internet addresses. A private one, which is not routable from condor.condorproject.org, that it is advertising,

$ condor_status -format "%s, " Name -format "%s\n" MyAddress
localhost.localdomain, <10.72.55.105:3131>

And a public one, which can be found from within the instance,

$ curl -f http://169.254.169.254/latest/meta-data/public-ipv4
174.129.47.20

Condor has a way to handle this. The TCP_FORWARDING_HOST configuration parameter can be set to the public address for the instance.

# echo "TCP_FORWARDING_HOST = $(curl -f http://169.254.169.254/latest/meta-data/public-ipv4)" >> /etc/condor/config.d/40execute_node.config

A condor_reconfig will apply the change, but a restart will clear out the old entry first. Oops. Note, you cannot set TCP_FORWARDING_HOST to the public-hostname of the instance, because the public hostname will be revolved within the instance and will resolve to the instance’s internal, private address.

When setting TCP_FORWARDING_HOST, also set PRIVATE_NETWORK_INTERFACE to let the host talk to itself over its private address.

# echo "PRIVATE_NETWORK_INTERFACE = $(curl -f http://169.254.169.254/latest/meta-data/local-ipv4)" >> /etc/condor/config.d/40execute_node.config

Doing so will prevent the condor_startd from using its public address to send DC_CHILDALIVE messages to the condor_master, which might fail because of a firewall or security group setting,

attempt to connect to <174.129.47.20:34550> failed: Connection timed out (connect errno = 110).  Will keep trying for 390 total seconds (200 to go).
attempt to connect to <174.129.47.20:34550> failed: Connection timed out (connect errno = 110).
ChildAliveMsg: failed to send DC_CHILDALIVE to parent daemon at <174.129.47.20:34550> (try 1 of 3): CEDAR:6001:Failed to connect to <174.129.47.20:34550>

Or if simply because the master does not trust the public address,

PERMISSION DENIED to unauthenticated@unmapped from host 174.129.47.20 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 174.129.47.20,ec2-174-129-47-20.compute-1.amazonaws.com, hostname size = 1, original ip address = 174.129.47.20

Now run that service condor restart and the public, routable address will be advertised,

$ condor_status -format "%s, " Name -format "%s\n" MyAddress
localhost.localdomain, <174.129.47.20:3131?noUDP>

The job will be started on the instance automatically,

$ condor_q -run
-- Submitter: condor.condorproject.org : <128.105.291.82:36900> : condor.condorproject.org
 ID      OWNER            SUBMITTED     RUN_TIME HOST(S)
  31.0   matt           11/11 11:11   0+00:00:11 localhost.localdomain

If you want to clean up the localhost.localdomain, set the instance’s hostname and restart condor,

$ sudo hostname $(curl -f http://169.254.169.254/latest/meta-data/public-hostname)
$ sudo service condor restart
(wait for the start to advertise)
$ condor_status -format "%s, " Name -format "%s\n" MyAddress
ec2-174-129-47-20.compute-1.amazonaws.com, <174.129.47.20:3131?noUDP>

In summary,

Configuration changes on condor.condorproject.org,

ALLOW_ADVERTISE_MASTER = $(ALLOW_WRITE), ec2-174-129-47-20.compute-1.amazonaws.com
ALLOW_ADVERTISE_STARTD = $(ALLOW_WRITE), ec2-174-129-47-20.compute-1.amazonaws.com

Setup on the instance,

# cat > /etc/condor/config.d/40execute_node.config
CONDOR_HOST = condor.condorproject.org
DAEMON_LIST = MASTER, STARTD
ALLOW_WRITE = $(ALLOW_WRITE), $(CONDOR_HOST)
STARTD_ARGS = -p 3131
^D
# echo "TCP_FORWARDING_HOST = $(curl -f http://169.254.169.254/latest/meta-data/public-ipv4)" >> /etc/condor/config.d/40execute_node.config
# echo "PRIVATE_NETWORK_INTERFACE = $(curl -f http://169.254.169.254/latest/meta-data/local-ipv4)" >> /etc/condor/config.d/40execute_node.config
# hostname $(curl -f http://169.254.169.254/latest/meta-data/public-hostname)

Getting started: Condor and EC2 – Importing instances with condor_ec2_link

November 7, 2011

Starting and managing instances describes the powerful feature of Condor to start and manage EC2 instances, but what if you are already using something other than Condor to start your instance, such as the AWS Management Console.

Importing instances turns out to be straightforward, if you know how instances are started. In a nutshell, the condor_gridmanager executes a state machine and records its current state in an attribute named GridJobId. To import an instance, submit a job that is already in the state where an instance id has been assigned. You can take a submit file and add +GridJobId = “ec2 https://ec2.amazonaws.com/ BOGUS INSTANCE-ID. The INSTANCE-ID needs to be the actual identifier of the instance you want to import. For instance,

...
ec2_access_key_id = ...
ec2_secret_access_key = ...
...
+GridJobId = "ec2 https://ec2.amazonaws.com/ BOGUS i-319c3652"
queue

It is important to get the ec2_access_key_id and ec2_secret_access_key correct. Without them Condor will not be able to communicate with EC2 and EC2_GAHP_LOG will report,

$ tail -n2 $(condor_config_val EC2_GAHP_LOG)
11/11/11 11:11:11 Failure response text was '
AuthFailureAWS was not able to validate the provided access credentialsab50f005-6d77-4653-9cec-298b2d475f6e'.

This error will not be reported back into the job, putting it on hold, instead the gridmanager will think the EC2 is down for the job. Oops.

$ grep down $(condor_config_val GRIDMANAGER_LOG)
11/11/11 11:11:11 [10697] resource https://ec2.amazonaws.com is now down
11/11/11 11:14:22 [10697] resource https://ec2.amazonaws.com is still down

To simplify the import, here is a script that will use ec2-describe-instances to get useful metadata about the instance and populate a submit file for you,

condor_ec2_link

#!/bin/sh

# Provide three arguments:
#  . instance id to link
#  . path to file with access key id
#  . path to file with secret access key

# TODO:
#  . Get EC2UserData (ec2-describe-instance-attribute --user-data)

ec2-describe-instances --show-empty-fields $1 | \
   awk '/^INSTANCE/ {id=$2; ami=$3; keypair=$7; type=$10; zone=$12; ip=$17; group=$29}
        /^TAG/ {name=$5}
        END {print "universe = grid\n",
                   "grid_resource = ec2 https://ec2.amazonaws.com\n",
                   "executable =", ami"-"name, "\n",
                   "log = $(executable).$(cluster).log\n",
                   "ec2_ami_id =", ami, "\n",
                   "ec2_instance_type =", type, "\n",
                   "ec2_keypair_file = name-"keypair, "\n",
                   "ec2_security_groups =", group, "\n",
                   "ec2_availability_zone =", zone, "\n",
                   "ec2_elastic_ip =", ip, "\n",
                   "+EC2InstanceName = \""id"\"\n",
                   "+GridJobId = \"$(grid_resource) BOGUS", id, "\"\n",
                   "queue\n"}' | \
      condor_submit -a "ec2_access_key_id = $2" \
                    -a "ec2_secret_access_key = $3"

In action,

$ ./condor_ec2_link i-319c3652 /home/matt/Documents/AWS/Cert/AccessKeyID /home/matt/Documents/AWS/Cert/SecretAccessKey
Submitting job(s).
1 job(s) submitted to cluster 1739.

$ ./condor_ec2_q 1739
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
1739.0   matt           11/11 11:11   0+00:00:00 I  0   0.0 ami-e1f53a88-TheNa
  Instance name: i-319c3652
  Groups: sg-4f706226
  Keypair file: /home/matt/Documents/AWS/name-TheKeyPair
  AMI id: ami-e1f53a88
  Instance type: t1.micro
1 jobs; 1 idle, 0 running, 0 held

(20 seconds later)

$ ./condor_ec2_q 1739
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
1739.0   matt           11/11 11:11   0+00:00:01 R  0   0.0 ami-e1f53a88-TheNa
  Instance name: i-319c3652
  Hostname: ec2-50-17-104-50.compute-1.amazonaws.com
  Groups: sg-4f706226
  Keypair file: /home/matt/Documents/AWS/name-TheKeyPair
  AMI id: ami-e1f53a88
  Instance type: t1.micro
1 jobs; 0 idle, 1 running, 0 held

There are a few things that can be improved here, the most notable of which is the RUN_TIME. The Gridmanager gets status data from EC2 periodically. This is how the EC2RemoteVirtualMachineName (Hostname) gets populated on the job. The instance’s launch time is also available. Oops.


%d bloggers like this: