Posts Tagged ‘Cloud’

A recipe for starting cloud images with virt-install

January 8, 2014

I’m a fan of using the same OS image across multiple environments. So, I’m a fan of using cloud images, those with cloud-init installed, even outside of a cloud.

The trick to this is properly triggering the NoCloud datasource. It’s actually more of a pain than you would think, and not very well documented. Here’s my recipe (from Fedora 19),

wget http://download.fedoraproject.org/pub/fedora/linux/releases/test/20-Beta/Images/x86_64/Fedora-x86_64-20-Beta-20131106-sda.raw.xz
xz -d Fedora-x86_64-20-Beta-20131106-sda.raw.xz

echo "#cloud-config\npassword: fedora\nchpasswd: {expire: False}\nssh_pwauth: True" > user-data

NAME=node0
cp Fedora-x86_64-20-Beta-20131106-sda.raw.xz $NAME.raw
echo "instance-id: $NAME; local-hostname: $NAME" > meta-data
genisoimage -output $NAME-cidata.iso -volid cidata -joliet -rock user-data meta-data
virt-install --import --name $NAME --ram 512 --vcpus 2 --disk $NAME.raw --disk $NAME-cidata.iso,device=cdrom --network bridge=virbr0

Login with username fedora and password fedora.

You’ll also want to boost the amount of RAM if you plan on doing anything interesting in the guest.

You can repeat lines 6 through 10 to start multiple guests, just make sure to change the name in line 6.

If you want to ssh into the guest, you can use virsh console, login and use ifconfig / ip addr to find the address. Or, you can use arp -e and virsh dumpxml to match MAC addresses. Or just arp -e before and after starting the guest.

Note, you need to follow the meta-data and user-data lines very closely. If you don’t you may not trigger the NoCloud datasource properly. It took me a number of tries to get it right. Also, the volid needs to be “cidata” or it won’t be found, which turns out to be a configurable parameter for NoCloud. The chpasswd bit is to prevent being prompted to change your password the first time you login.

Consider becoming a fan of consistent OS images across your environments too!

Advertisements

Your API is a feature, give it real resource management

January 14, 2013

So much these days is about distributed resource management. That’s anything that can be created and destroyed in the cloud[0]. Proper management is especially important when the resource’s existence is tied to a real economy, e.g. your user’s credit card[1].

EC2 instance creation without idempotent RunInstance

EC2 instance creation without idempotent RunInstance

Above is a state machine required to ensure that resources created in AWS EC2 are not lost, i.e. do not have to be manually cleaned up. The green arrows represent error free flow. The rest is about error handling or external state changes, e.g. user terminated operation. This is from before EC2 supported idempotent instance creation.

The state machine rewritten to use idempotent instance creation,

EC2 instance creation with idempotent RunInstance

EC2 instance creation with idempotent RunInstance

What’s going on here? Handling failure during resource creation.

The important failure to consider as a client is what happens if you ask your resource provider to create something and you never hear back. This is a distributed system, there are numerous reasons why you may not hear back. For simplicity, consider the client code crashed between sending the request and receiving the response.

The solution is to construct a transaction for resource creation[2]. To construct a transaction, you need to atomically associate a piece of information with the resource at creation time. We’ll call that piece of information an anima.

In the old EC2 API, the only way to construct an anima was through controlling a security group or keypair. Since neither is tied to a real economy, both are reasonable options. The non-idempotent state machine above uses the keypair as it is less resource intensive for EC2.

On creation failure and with the anima in hand[3], the client must search the remote system for the anima before reattempting creation. This is handled by the GM_CHECK_VM state above.

Unfortunately, without explicit support in the API, i.e. lookup by anima, the search can be unnatural and expensive. For example, EC2 instances are not indexed on keypair. Searching requires a client side scan of all instances.

With the introduction of idempotent RunInstances, the portion of the state machine for constructing and locating the anima is reduced to the GM_SAVE_CLIENT_TOKEN state, an entirely local operation. The reduction in complexity is clear.

After two years, EC2 appears to be the only API providing idempotent instance creation[4]. Though APIs are starting to provide atomic anima association, often through metadata or instance attributes, and some even provide lookup.

You should provide an idempotent resource creation operation in your API too!

[0] “in the cloud” – really anywhere in any distributed system!
[1] Making money from forgotten or misplaced resources is a short term play.
[2] Alternatively, you can choose an architecture with a janitor process, which will bring its own complexities.
[3] “in hand” – so long as your hand is reliable storage.
[4] After a quick survey, I’m looking at you Rackspace, RimuHosting, vCloud, OpenNebula, OpenStack, Eucalyptus, GoGrid, Deltacloud, Google Compute Engine and Gandi.

Service as a Job: Hadoop MapReduce TaskTracker

June 6, 2012

Continuing to build on other examples of services run as jobs, such as HDFS NameNode and HDFS DataNode, here is an example for the Hadoop MapReduce framework’s TaskTracker.

mapred_tasktracker.sh

#!/bin/sh -x

# condor_chirp in /usr/libexec/condor
export PATH=$PATH:/usr/libexec/condor

HADOOP_TARBALL=$1
JOBTRACKER_ENDPOINT=$2

# Note: bin/hadoop uses JAVA_HOME to find the runtime and tools.jar,
#       except tools.jar does not seem necessary therefore /usr works
#       (there's no /usr/lib/tools.jar, but there is /usr/bin/java)
export JAVA_HOME=/usr

# When we get SIGTERM, which Condor will send when
# we are kicked, kill off the datanode and gather logs
function term {
   ./bin/hadoop-daemon.sh stop tasktracker
# Useful if we can transfer data back
#   tar czf logs.tgz logs
#   cp logs.tgz $_CONDOR_SCRATCH_DIR
}

# Unpack
tar xzfv $HADOOP_TARBALL

# Move into tarball, inefficiently
cd $(tar tzf $HADOOP_TARBALL | head -n1)

# Configure,
#  . http.address must be set to port 0 (ephemeral)
cat > conf/mapred-site.xml <<EOF
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>$JOBTRACKER_ENDPOINT</value>
  </property>
  <property>
    <name>mapred.task.tracker.http.address</name>
    <value>0.0.0.0:0</value>
  </property>
</configuration>
EOF

# Try to shutdown cleanly
trap term SIGTERM

export HADOOP_CONF_DIR=$PWD/conf
export HADOOP_PID_DIR=$PWD
export HADOOP_LOG_DIR=$_CONDOR_SCRATCH_DIR/logs
./bin/hadoop-daemon.sh start tasktracker

# Wait for pid file
PID_FILE=$(echo hadoop-*-tasktracker.pid)
while [ ! -s $PID_FILE ]; do sleep 1; done
PID=$(cat $PID_FILE)

# Report back some static data about the tasktracker
# e.g. condor_chirp set_job_attr SomeAttr SomeData
# None at the moment.

# While the tasktracker is running, collect and report back stats
while kill -0 $PID; do
   # Collect stats and chirp them back into the job ad
   # None at the moment.
   sleep 30
done

The job description below uses the same templating technique as the DataNode example. The description uses a variable JobTrackerAddress, provided on the command-line as an argument to condor_submit.

mapred_tasktracker.job

# Submit w/ condor_submit -a JobTrackerAddress=<address>
# e.g. <address> = $HOSTNAME:9001

cmd = mapred_tasktracker.sh
args = hadoop-1.0.1-bin.tar.gz $(JobTrackerAddress)

transfer_input_files = hadoop-1.0.1-bin.tar.gz

# RFE: Ability to get output files even when job is removed
#transfer_output_files = logs.tgz
#transfer_output_remaps = "logs.tgz logs.$(cluster).tgz"
output = tasktracker.$(cluster).out
error = tasktracker.$(cluster).err

log = tasktracker.$(cluster).log

kill_sig = SIGTERM

# Want chirp functionality
+WantIOProxy = TRUE

should_transfer_files = yes
when_to_transfer_output = on_exit

requirements = HasJava =?= TRUE

queue

From here you can run condor_submit -a JobTrackerAddress=$HOSTNAME:9001 mapred_tasktracker.job a few times and build up a MapReduce cluster. Assuming you are already running a JobTrackers. You can also hit the JobTracker’s web interface to see TaskTrackers check in.

Assuming you are also running an HDFS instance, you can run some jobs against your new cluster.

For fun, I ran time hadoop jar hadoop-test-1.0.1.jar mrbench -verbose -maps 100 -inputLines 100 three times against 4 TaskTrackers, 8 TaskTrackers and 12 TaskTrackers. The resulting run times were,

# nodes runtime
4 01:26
8 00:50
12 00:38

Service as a Job: HDFS NameNode

April 16, 2012

Scheduling an HDFS DataNode is a powerful function. However, an operational HDFS instance also requires a NameNode. Here is an example of how a NameNode can be scheduled, followed by scheduled DataNodes, to create an HDFS instance.

From here, HDFS instances can be dynamically created on shared resources. Workflows can be built to manage, grow and shrink HDFS instances. Multiple HDFS instances can be deployed on a single set of resources.

The control script is based on hdfs_datanode.sh. It discovers the NameNode’s endpoints and chirps them.

hdfs_namenode.sh

#!/bin/sh -x

# condor_chirp in /usr/libexec/condor
export PATH=$PATH:/usr/libexec/condor

HADOOP_TARBALL=$1

# Note: bin/hadoop uses JAVA_HOME to find the runtime and tools.jar,
#       except tools.jar does not seem necessary therefore /usr works
#       (there's no /usr/lib/tools.jar, but there is /usr/bin/java)
export JAVA_HOME=/usr

# When we get SIGTERM, which Condor will send when
# we are kicked, kill off the namenode
function term {
   ./bin/hadoop-daemon.sh stop namenode
}

# Unpack
tar xzfv $HADOOP_TARBALL

# Move into tarball, inefficiently
cd $(tar tzf $HADOOP_TARBALL | head -n1)

# Configure,
#  . fs.default.name,dfs.http.address must be set to port 0 (ephemeral)
#  . dfs.name.dir must be in _CONDOR_SCRATCH_DIR for cleanup
# FIX: Figure out why a hostname is required, instead of 0.0.0.0:0 for
#      fs.default.name
cat > conf/hdfs-site.xml <<EOF
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://$HOSTNAME:0</value>
  </property>
  <property>
    <name>dfs.name.dir</name>
    <value>$_CONDOR_SCRATCH_DIR/name</value>
  </property>
  <property>
    <name>dfs.http.address</name>
    <value>0.0.0.0:0</value>
  </property>
</configuration>
EOF

# Try to shutdown cleanly
trap term SIGTERM

export HADOOP_CONF_DIR=$PWD/conf
export HADOOP_LOG_DIR=$_CONDOR_SCRATCH_DIR/logs
export HADOOP_PID_DIR=$PWD

./bin/hadoop namenode -format
./bin/hadoop-daemon.sh start namenode

# Wait for pid file
PID_FILE=$(echo hadoop-*-namenode.pid)
while [ ! -s $PID_FILE ]; do sleep 1; done
PID=$(cat $PID_FILE)

# Wait for the log
LOG_FILE=$(echo $HADOOP_LOG_DIR/hadoop-*-namenode-*.log)
while [ ! -s $LOG_FILE ]; do sleep 1; done

# It would be nice if there were a way to get these without grepping logs
while [ ! $(grep "IPC Server listener on" $LOG_FILE) ]; do sleep 1; done
IPC_PORT=$(grep "IPC Server listener on" $LOG_FILE | sed 's/.* on \(.*\):.*/\1/')
while [ ! $(grep "Jetty bound to port" $LOG_FILE) ]; do sleep 1; done
HTTP_PORT=$(grep "Jetty bound to port" $LOG_FILE | sed 's/.* to port \(.*\)$/\1/')

# Record the port number where everyone can see it
condor_chirp set_job_attr NameNodeIPCAddress \"$HOSTNAME:$IPC_PORT\"
condor_chirp set_job_attr NameNodeHTTPAddress \"$HOSTNAME:$HTTP_PORT\"

# While namenode is running, collect and report back stats
while kill -0 $PID; do
   # Collect stats and chirp them back into the job ad
   # Nothing to do.
   sleep 30
done

The job description file is standard at this point.

hdfs_namenode.job

cmd = hdfs_namenode.sh
args = hadoop-1.0.1-bin.tar.gz

transfer_input_files = hadoop-1.0.1-bin.tar.gz

#output = namenode.$(cluster).out
#error = namenode.$(cluster).err

log = namenode.$(cluster).log

kill_sig = SIGTERM

# Want chirp functionality
+WantIOProxy = TRUE

should_transfer_files = yes
when_to_transfer_output = on_exit

requirements = HasJava =?= TRUE

queue

In operation –

Submit the namenode and find its endpoints,

$ condor_submit hdfs_namenode.job
Submitting job(s).
1 job(s) submitted to cluster 208.

$ condor_q -long 208| grep NameNode       
NameNodeHTTPAddress = "eeyore.local:60182"
NameNodeIPCAddress = "eeyore.local:38174"

Open a browser window to http://eeyore.local:60182 to find the cluster summary,

1 files and directories, 0 blocks = 1 total. Heap Size is 44.81 MB / 888.94 MB (5%)
  Configured Capacity                   :        0 KB
  DFS Used                              :        0 KB
  Non DFS Used                          :        0 KB
  DFS Remaining                         :        0 KB
  DFS Used%                             :       100 %
  DFS Remaining%                        :         0 %
 Live Nodes                             :           0
 Dead Nodes                             :           0
 Decommissioning Nodes                  :           0
  Number of Under-Replicated Blocks     :           0

Add a datanode,

$ condor_submit -a NameNodeAddress=hdfs://eeyore.local:38174 hdfs_datanode.job 
Submitting job(s).
1 job(s) submitted to cluster 209.

Refresh the cluster summary,

1 files and directories, 0 blocks = 1 total. Heap Size is 44.81 MB / 888.94 MB (5%)
  Configured Capacity                   :     9.84 GB
  DFS Used                              :       28 KB
  Non DFS Used                          :     9.54 GB
  DFS Remaining                         :   309.79 MB
  DFS Used%                             :         0 %
  DFS Remaining%                        :      3.07 %
 Live Nodes                             :           1
 Dead Nodes                             :           0
 Decommissioning Nodes                  :           0
  Number of Under-Replicated Blocks     :           0

And another,

$ condor_submit -a NameNodeAddress=hdfs://eeyore.local:38174 hdfs_datanode.job
Submitting job(s).
1 job(s) submitted to cluster 210.

Refresh,

1 files and directories, 0 blocks = 1 total. Heap Size is 44.81 MB / 888.94 MB (5%)
  Configured Capacity                   :    19.69 GB
  DFS Used                              :       56 KB
  Non DFS Used                          :    19.26 GB
  DFS Remaining                         :   435.51 MB
  DFS Used%                             :         0 %
  DFS Remaining%                        :      2.16 %
 Live Nodes                             :           2
 Dead Nodes                             :           0
 Decommissioning Nodes                  :           0
  Number of Under-Replicated Blocks     :           0

All the building blocks necessary to run HDFS on scheduled resources.

Service as a Job: HDFS DataNode

April 4, 2012

Building on other examples of services run as jobs, such as Tomcat, Qpidd and memcached, here is an example for the Hadoop Distributed File System‘s DataNode.

Below is the control script for the datanode. It mirrors the memcached’s script, but does not publish statistics. However, datanode statistic/metrics could be pulled and published.

hdfs_datanode.sh

#!/bin/sh -x

# condor_chirp in /usr/libexec/condor
export PATH=$PATH:/usr/libexec/condor

HADOOP_TARBALL=$1
NAMENODE_ENDPOINT=$2

# Note: bin/hadoop uses JAVA_HOME to find the runtime and tools.jar,
#       except tools.jar does not seem necessary therefore /usr works
#       (there's no /usr/lib/tools.jar, but there is /usr/bin/java)
export JAVA_HOME=/usr

# When we get SIGTERM, which Condor will send when
# we are kicked, kill off the datanode and gather logs
function term {
   ./bin/hadoop-daemon.sh stop datanode
# Useful if we can transfer data back
#   tar czf logs.tgz logs
#   cp logs.tgz $_CONDOR_SCRATCH_DIR
}

# Unpack
tar xzfv $HADOOP_TARBALL

# Move into tarball, inefficiently
cd $(tar tzf $HADOOP_TARBALL | head -n1)

# Configure,
#  . dfs.data.dir must be in _CONDOR_SCRATCH_DIR for cleanup
#  . address,http.address,ipc.address must be set to port 0 (ephemeral)
cat > conf/hdfs-site.xml <<EOF
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>$NAMENODE_ENDPOINT</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>$_CONDOR_SCRATCH_DIR/data</value>
  </property>
  <property>
    <name>dfs.datanode.address</name>
    <value>0.0.0.0:0</value>
  </property>
  <property>
    <name>dfs.datanode.http.address</name>
    <value>0.0.0.0:0</value>
  </property>
  <property>
    <name>dfs.datanode.ipc.address</name>
    <value>0.0.0.0:0</value>
  </property>
</configuration>
EOF

# Try to shutdown cleanly
trap term SIGTERM

export HADOOP_CONF_DIR=$PWD/conf
export HADOOP_PID_DIR=$PWD
export HADOOP_LOG_DIR=$_CONDOR_SCRATCH_DIR/logs
./bin/hadoop-daemon.sh start datanode

# Wait for pid file
PID_FILE=$(echo hadoop-*-datanode.pid)
while [ ! -s $PID_FILE ]; do sleep 1; done
PID=$(cat $PID_FILE)

# Report back some static data about the datanode
# e.g. condor_chirp set_job_attr SomeAttr SomeData
# None at the moment.

# While the datanode is running, collect and report back stats
while kill -0 $PID; do
   # Collect stats and chirp them back into the job ad
   # None at the moment.
   sleep 30
done

The job description below uses a templating technique. The description uses a variable NameNodeAddress, which is not defined in the description. Instead, the value is provided as an argument to condor_submit. In fact, a complete job can be defined without a description file, e.g. echo queue | condor_submit -a executable=/bin/sleep -a args=1d, but more on that some other time.

hdfs_datanode.job

# Submit w/ condor_submit -a NameNodeAddress=<address>
# e.g. <address> = hdfs://$HOSTNAME:2007

cmd = hdfs_datanode.sh
args = hadoop-1.0.1-bin.tar.gz $(NameNodeAddress)

transfer_input_files = hadoop-1.0.1-bin.tar.gz

# RFE: Ability to get output files even when job is removed
#transfer_output_files = logs.tgz
#transfer_output_remaps = "logs.tgz logs.$(cluster).tgz"
output = datanode.$(cluster).out
error = datanode.$(cluster).err

log = datanode.$(cluster).log

kill_sig = SIGTERM

# Want chirp functionality
+WantIOProxy = TRUE

should_transfer_files = yes
when_to_transfer_output = on_exit

requirements = HasJava =?= TRUE

queue

hadoop-1.0.1-bin.tar.gz is available from http://archive.apache.org/dist/hadoop/core/hadoop-1.0.1/

Finally, here is a running example,

A namenode is already running –

$ ./bin/hadoop dfsadmin -conf conf/hdfs-site.xml -report
Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)

Submit a datanode, knowing the namenode’s IPC port is 9000 –

$ condor_submit -a NameNodeAddress=hdfs://$HOSTNAME:9000 hdfs_datanode.job
Submitting job(s).
1 job(s) submitted to cluster 169.

$ condor_q
-- Submitter: eeyore.local : <127.0.0.1:59889> : eeyore.local
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD 
 169.0   matt            3/26 15:17   0+00:00:16 R  0   0.0 hdfs_datanode.sh h
1 jobs; 0 idle, 1 running, 0 held

Storage is now available, though not very much as my disk is almost full –

$ ./bin/hadoop dfsadmin -conf conf/hdfs-site.xml -report
Configured Capacity: 63810015232 (59.43 GB)
Present Capacity: 4907495424 (4.57 GB)
DFS Remaining: 4907466752 (4.57 GB)
DFS Used: 28672 (28 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Submit 9 more datanodes –

$ condor_submit -a NameNodeAddress=hdfs://$HOSTNAME:9000 hdfs_datanode.job
Submitting job(s).
1 job(s) submitted to cluster 170.
...
1 job(s) submitted to cluster 178.

$ ./bin/hadoop dfsadmin -conf conf/hdfs-site.xml -report
Configured Capacity: 638100152320 (594.28 GB)
Present Capacity: 40399958016 (37.63 GB)
DFS Remaining: 40399671296 (37.63 GB)
DFS Used: 286720 (280 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 10 (10 total, 0 dead)

At this point you can run a workload against the storage, visit the namenode at http://localhost:50070, or simply use ./bin/hadoop fs to interact with the filesystem.

Remember, all the datanodes were dispatched by a scheduler, run along side existing workload on your pool, and are completely manageable by standard policies.

Amazon S3 – Object Expiration, what about Instance Expiration

December 28, 2011

AWS is providing APIs that take distributed computing concerns into account. One could call them cloud concerns these days. Unfortunately, not all cloud providers are doing the same.

Idempotent instance creation showed up in Sept 2010, providing the ability to simplify interactions with EC2. Idempotent resource allocation is critical for distributed systems.

S3 object expiration appeared in Dec 2011, allowing for service-side managed deallocation of S3 resources.

Next up? It would be great to have an EC2 instance expiration feature. One that could be (0) assigned per instance and (1) adjusted while the instance exists. Bonus if can also be (2) adjusted from within the instance without credentials. Think leases.

Service as a Job: Memcached

December 5, 2011

Running services such as Tomcat or Qpidd show how to schedule and manage a service’s life-cycle via Condor. It is also possible to gather and centralize statistics about a service as it runs. Here is an example of how with memcached.

As with tomcat and qpidd, there is a control script and a job description.

New in the control script for memcached will be a loop to monitor and chirp back statistic information.

memcached.sh

#!/bin/sh

# condor_chirp in /usr/libexec/condor
export PATH=$PATH:/usr/libexec/condor

PORT_FILE=$TMP/.ports

# When we get SIGTERM, which Condor will send when
# we are kicked, kill off memcached.
function term {
   rm -f $PORT_FILE
   kill %1
}

# Spawn memcached, and make sure we can shut it down cleanly.
trap term SIGTERM
# memcached will write port information to env(MEMCACHED_PORT_FILENAME)
env MEMCACHED_PORT_FILENAME=$PORT_FILE memcached -p -1 "$@" &

# We might have to wait for the port
while [ ! -s $PORT_FILE ]; do sleep 1; done

# The port file's format is:
#  TCP INET: 56697
#  TCP INET6: 47318
#  UDP INET: 34453
#  UDP INET6: 54891
sed -i -e 's/ /_/' -e 's/\(.*\): \(.*\)/\1=\2/' $PORT_FILE
source $PORT_FILE
rm -f $PORT_FILE

# Record the port number where everyone can see it
condor_chirp set_job_attr MemcachedEndpoint \"$HOSTNAME:$TCP_INET\"
condor_chirp set_job_attr TCP_INET $TCP_INET
condor_chirp set_job_attr TCP_INET6 $TCP_INET6
condor_chirp set_job_attr UDP_INET $UDP_INET
condor_chirp set_job_attr UDP_INET6 $UDP_INET6

# While memcached is running, collect and report back stats
while kill -0 %1; do
   # Collect stats and chirp them back into the job ad
   echo stats | nc localhost $TCP_INET | \
    grep -v -e END -e version | tr '\r' '\0' | \
     awk '{print "stat_"$2,$3}' | \
      while read -r stat; do
         condor_chirp set_job_attr $stat
      done
   sleep 30
done

A refresher about chirp. Jobs are stored in condor_schedd processes. They are described using the ClassAd language, extensible name value pairs. chirp is a tool a job can use while it runs to modify its classad stored in the schedd.

The job description, passed to condor_submit, is vanilla except for how arguments are passed to memcached.sh. The dollardollar use, see man condor_submit, allows memcached to use as much memory as is available on the slot where it gets scheduled. Slots may have different amounts of Memory available.

memcached.job

cmd = memcached.sh
args = -m $$(Memory)

log = memcached.log

kill_sig = SIGTERM

# Want chirp functionality
+WantIOProxy = TRUE

should_transfer_files = if_needed
when_to_transfer_output = on_exit

queue

An example, note that the set of memcached servers to use is generated from condor_q,

$ condor_submit -a "queue 4" memcached.job
Submitting job(s)....
4 job(s) submitted to cluster 80.

$ condor_q -format "%s\t" MemcachedEndpoint -format "total_items: %d\t" stat_total_items -format "memory: %d/" stat_bytes -format "%d\n" stat_limit_maxbytes
eeyore.local:50608	total_items: 0	memory: 0/985661440
eeyore.local:47766	total_items: 0	memory: 0/985661440
eeyore.local:39130	total_items: 0	memory: 0/985661440
eeyore.local:57410	total_items: 0	memory: 0/985661440

$ SERVERS=$(condor_q -format "%s," MemcachedEndpoint); for word in $(cat words); do echo $word > $word; memcp --servers=$SERVERS $word; \rm $word; done &
[1] 959

$ condor_q -format "%s\t" MemcachedEndpoint -format "total_items: %d\t" stat_total_items -format "memory: %d/" stat_bytes -format "%d\n" stat_limit_maxbytes
eeyore.local:50608	total_items: 480	memory: 47740/985661440
eeyore.local:47766	total_items: 446	memory: 44284/985661440
eeyore.local:39130	total_items: 504	memory: 50140/985661440
eeyore.local:57410	total_items: 490	memory: 48632/985661440

$ condor_q -format "%s\t" MemcachedEndpoint -format "total_items: %d\t" stat_total_items -format "memory: %d/" stat_bytes -format "%d\n" stat_limit_maxbytes
eeyore.local:50608	total_items: 1926	memory: 191264/985661440
eeyore.local:47766	total_items: 1980	memory: 196624/985661440
eeyore.local:39130	total_items: 2059	memory: 204847/985661440
eeyore.local:57410	total_items: 2053	memory: 203885/985661440

$ condor_q -format "%s\t" MemcachedEndpoint -format "total_items: %d\t" stat_total_items -format "memory: %d/" stat_bytes -format "%d\n" stat_limit_maxbytes
eeyore.local:50608	total_items: 3408	memory: 338522/985661440
eeyore.local:47766	total_items: 3542	memory: 351784/985661440
eeyore.local:39130	total_items: 3666	memory: 364552/985661440
eeyore.local:57410	total_items: 3600	memory: 357546/985661440

[1]  + done       for word in $(cat words); do; echo $word > $word; memcp --servers=$SERVERS ; 

Enjoy.

Getting started: Condor and EC2 – Importing instances with condor_ec2_link

November 7, 2011

Starting and managing instances describes the powerful feature of Condor to start and manage EC2 instances, but what if you are already using something other than Condor to start your instance, such as the AWS Management Console.

Importing instances turns out to be straightforward, if you know how instances are started. In a nutshell, the condor_gridmanager executes a state machine and records its current state in an attribute named GridJobId. To import an instance, submit a job that is already in the state where an instance id has been assigned. You can take a submit file and add +GridJobId = “ec2 https://ec2.amazonaws.com/ BOGUS INSTANCE-ID. The INSTANCE-ID needs to be the actual identifier of the instance you want to import. For instance,

...
ec2_access_key_id = ...
ec2_secret_access_key = ...
...
+GridJobId = "ec2 https://ec2.amazonaws.com/ BOGUS i-319c3652"
queue

It is important to get the ec2_access_key_id and ec2_secret_access_key correct. Without them Condor will not be able to communicate with EC2 and EC2_GAHP_LOG will report,

$ tail -n2 $(condor_config_val EC2_GAHP_LOG)
11/11/11 11:11:11 Failure response text was '
AuthFailureAWS was not able to validate the provided access credentialsab50f005-6d77-4653-9cec-298b2d475f6e'.

This error will not be reported back into the job, putting it on hold, instead the gridmanager will think the EC2 is down for the job. Oops.

$ grep down $(condor_config_val GRIDMANAGER_LOG)
11/11/11 11:11:11 [10697] resource https://ec2.amazonaws.com is now down
11/11/11 11:14:22 [10697] resource https://ec2.amazonaws.com is still down

To simplify the import, here is a script that will use ec2-describe-instances to get useful metadata about the instance and populate a submit file for you,

condor_ec2_link

#!/bin/sh

# Provide three arguments:
#  . instance id to link
#  . path to file with access key id
#  . path to file with secret access key

# TODO:
#  . Get EC2UserData (ec2-describe-instance-attribute --user-data)

ec2-describe-instances --show-empty-fields $1 | \
   awk '/^INSTANCE/ {id=$2; ami=$3; keypair=$7; type=$10; zone=$12; ip=$17; group=$29}
        /^TAG/ {name=$5}
        END {print "universe = grid\n",
                   "grid_resource = ec2 https://ec2.amazonaws.com\n",
                   "executable =", ami"-"name, "\n",
                   "log = $(executable).$(cluster).log\n",
                   "ec2_ami_id =", ami, "\n",
                   "ec2_instance_type =", type, "\n",
                   "ec2_keypair_file = name-"keypair, "\n",
                   "ec2_security_groups =", group, "\n",
                   "ec2_availability_zone =", zone, "\n",
                   "ec2_elastic_ip =", ip, "\n",
                   "+EC2InstanceName = \""id"\"\n",
                   "+GridJobId = \"$(grid_resource) BOGUS", id, "\"\n",
                   "queue\n"}' | \
      condor_submit -a "ec2_access_key_id = $2" \
                    -a "ec2_secret_access_key = $3"

In action,

$ ./condor_ec2_link i-319c3652 /home/matt/Documents/AWS/Cert/AccessKeyID /home/matt/Documents/AWS/Cert/SecretAccessKey
Submitting job(s).
1 job(s) submitted to cluster 1739.

$ ./condor_ec2_q 1739
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
1739.0   matt           11/11 11:11   0+00:00:00 I  0   0.0 ami-e1f53a88-TheNa
  Instance name: i-319c3652
  Groups: sg-4f706226
  Keypair file: /home/matt/Documents/AWS/name-TheKeyPair
  AMI id: ami-e1f53a88
  Instance type: t1.micro
1 jobs; 1 idle, 0 running, 0 held

(20 seconds later)

$ ./condor_ec2_q 1739
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
1739.0   matt           11/11 11:11   0+00:00:01 R  0   0.0 ami-e1f53a88-TheNa
  Instance name: i-319c3652
  Hostname: ec2-50-17-104-50.compute-1.amazonaws.com
  Groups: sg-4f706226
  Keypair file: /home/matt/Documents/AWS/name-TheKeyPair
  AMI id: ami-e1f53a88
  Instance type: t1.micro
1 jobs; 0 idle, 1 running, 0 held

There are a few things that can be improved here, the most notable of which is the RUN_TIME. The Gridmanager gets status data from EC2 periodically. This is how the EC2RemoteVirtualMachineName (Hostname) gets populated on the job. The instance’s launch time is also available. Oops.

Getting started: Condor and EC2 – condor_ec2_q tool

November 2, 2011

While Getting started with Condor and EC2, it is useful to display the EC2 specific attributes on jobs. This is a script that mirrors condor_q output, using its formatting parameters, and adds details for EC2 jobs.

condor_ec2_q:

#!/bin/sh

# NOTE:
#  . Requires condor_q >= 7.5.2, old classads do not
#    have %
#  . When running, jobs show RUN_TIME of their current
#    run, not accumulated, which would require adding
#    in RemoteWallClockTime
#  . See condor_utils/condor_q.cpp:encode_status for
#    JobStatus map

# TODO:
#  . Remove extra ShadowBday==0 test,
#    condor_gridmanager < 7.7.5 (3a896d01) did not
#    delete ShadowBday when a job was not running.
#    RUN_TIME of held EC2 jobs would be wrong.

echo ' ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD'
condor_q \
   -format '%4d.' ClusterId \
   -format '%-3d ' ProcId \
   -format '%-14s ' Owner \
   -format '%-11s ' 'formatTime(QDate,"%m/%d %H:%M")' \
   -format '%3d+' 'ifThenElse(ShadowBday =!= UNDEFINED, ifThenElse(ShadowBday != 0, time() - ShadowBday, int(RemoteWallClockTime)), int(RemoteWallClockTime)) / (60*60*24)' \
   -format '%02d:' '(ifThenElse(ShadowBday =!= UNDEFINED, ifThenElse(ShadowBday != 0, time() - ShadowBday, int(RemoteWallClockTime)), int(RemoteWallClockTime)) % (60*60*24)) / (60*60)' \
   -format '%02d:' '(ifThenElse(ShadowBday =!= UNDEFINED, ifThenElse(ShadowBday != 0, time() - ShadowBday, int(RemoteWallClockTime)), int(RemoteWallClockTime)) % (60*60)) / 60' \
   -format '%02d ' 'ifThenElse(ShadowBday =!= UNDEFINED, ifThenElse(ShadowBday != 0, time() - ShadowBday, int(RemoteWallClockTime)), int(RemoteWallClockTime)) % 60' \
   -format '%-2s ' 'substr("?IRXCH>S", JobStatus, 1)' \
   -format '%-3d ' JobPrio \
   -format '%-4.1f ' ImageSize/1024.0 \
   -format '%-18.18s' 'strcat(Cmd," ",Arguments)' \
   -format '\n' Owner \
   -format '  Instance name: %s\n' EC2InstanceName \
   -format '  Hostname: %s\n' EC2RemoteVirtualMachineName \
   -format '  Keypair file: %s\n' EC2KeyPairFile \
   -format '  User data: %s\n' EC2UserData \
   -format '  User data file: %s\n' EC2UserDataFile \
   -format '  AMI id: %s\n' EC2AmiID \
   -format '  Instance type: %s\n' EC2InstanceType \
   "$@" | awk 'BEGIN {St["I"]=0;St["R"]=0;St["H"]=0} \
   	       {St[$6]++; print} \
   	       END {for (i=0;i<=7;i++) jobs+=St[substr("?IRXCH>S",i,1)]; \
	       	    print jobs, "jobs;", \
		          St["I"], "idle,", St["R"], "running,", St["H"], "held"}'

In action,

$ condor_q
-- Submitter: eeyore.local :  : eeyore.local
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
1728.0   matt           10/31 23:09   0+00:04:15 H  0   0.0  EC2_Instance-ami-6
1732.0   matt           11/1  01:43   0+05:16:46 R  0   0.0  EC2_Instance-ami-6
5 jobs; 0 idle, 4 running, 1 held

$ ./condor_ec2_q 
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
1728.0   matt           10/31 23:09   0+00:04:15 H  0   0.0  EC2_Instance-ami-6
  Instance name: i-31855752
  Hostname: ec2-50-19-175-62.compute-1.amazonaws.com
  Keypair file: /home/matt/Documents/AWS/EC2_Instance-ami-60bd4609.1728.pem
  User data: Hello EC2_Instance-ami-60bd4609!
  AMI id: ami-60bd4609
  Instance type: m1.small
1732.0   matt           11/01 01:43   0+05:16:48 R  0   0.0  EC2_Instance-ami-6
  Instance name: i-a90edcca
  Hostname: ec2-107-20-6-83.compute-1.amazonaws.com
  Keypair file: /home/matt/Documents/AWS/EC2_Instance-ami-60bd4609.1732.pem
  User data: Hello EC2_Instance-ami-60bd4609!
  AMI id: ami-60bd4609
  Instance type: m1.small
5 jobs; 0 idle, 4 running, 1 held

Condor the Cloud Scheduler @ Open Source Cloud Computing Forum, 10 Feb 2010

April 23, 2010

Red Hat hosted another Open Source Cloud Computing forum on 10 February 2010. It included a number of new technologies under development at Red Hat. I presented a second session in a series on Condor, this time as a cloud scheduler. Like before, the proceedings of the forum will be available online for 12 months. The Condor: Cloud Scheduler presentation is also attached to this blog.


%d bloggers like this: