Archive for the ‘Grid’ Category

Statistic changes in HTCondor 7.7

February 12, 2013

Notice to HTCondor 7.8 users –

Statistics implemented during the 7.5 series that landed in 7.7.0 were rewritten by the time 7.8 was released. If you were using the original statistics for monitoring and/or reporting, here is a table to help you map old (left column) to new (right column).

See – 7.6 -> 7.8 schedd stats
(embedding content requires javascript, which is not available on wordpress.com)

Note: The *Rate and Mean* attributes require math, and UpdateTime requires memory

Advertisements

Concurrency Limits: Group defaults

January 21, 2013

Concurrency limits allow for protecting resources by providing a way to cap the number of jobs requiring a specific resource that can run at one time.

For instance, limit licenses and filer access at four regional data centers.

CONCURRENCY_LIMIT_DEFAULT = 15
license.north_LIMIT = 30
license.south_LIMIT = 30
license.east_LIMIT = 30
license.west_LIMIT = 45
filer.north_LIMIT = 75
filer.south_LIMIT = 150
filer.east_LIMIT = 75
filer.west_LIMIT = 75

Notice the repetition.

In addition to the repetition, every license.* and filer.* must be known and recorded in configuration. The set may be small in this example, but imagine imposing a limit on each user or each submission. The set of users is board, dynamic and may differ by region. The set of submissions is a more extreme version of the users case, yet it is still realistic.

To simplify the configuration management for groups of limits, a new feature to provide group defaults to limit was added for the Condor 7.8 series.

The feature requires that only the exception to a rule be called out explicitly in configuration. For instance, license.west and filer.south are the exceptions in the configuration above. Simplified configuration available in 7.8,

CONCURRENCY_LIMIT_DEFAULT = 15
CONCURRENCY_LIMIT_DEFAULT_license = 30
CONCURRENCY_LIMIT_DEFAULT_filer = 75
license.west_LIMIT = 45
filer.south_LIMIT = 150

In action,

$ for limit in license.north license.south license.east license.west filer.north filer.south filer.east filer.west; do echo queue 1000 | condor_submit -a cmd=/bin/sleep -a args=1d -a concurrency_limits=$limit; done

$ condor_q -format '%s\n' ConcurrencyLimits -const 'JobStatus == 2' | sort | uniq -c | sort -n
     30 license.east
     30 license.north
     30 license.south
     45 license.west
     75 filer.east
     75 filer.north
     75 filer.west
    150 filer.south

Partitionable slot utilization

October 1, 2012

There are already ways to get pool utilization information on a macro level. Until Condor 7.8 and the introduction of TotalSlot{Cpus,Memory,Disk}, there were no good ways to get utilization on a micro level. At least not with only the standard command-line tools.

Resource data is available per slot. Getting macro, pool utilization always requires aggregating data across multiple slots. Getting micro, slot utilization should not.

In a pool using partitionable slots, you can now get per slot utilization from the slot itself. There is no need for any extra tooling to perform aggregation or correlation. This means condor_status can directly provide utilization information on the micro, per slot level.

$ echo "           Name  Cpus Avail Util%  Memory Avail Util%"
$ condor_status -constraint "PartitionableSlot =?= TRUE" -format "%15s" Name -format "%6d" TotalSlotCpus -format "%6d" Cpus -format "%5d%%" "((TotalSlotCpus - Cpus) / (TotalSlotCpus * 1.0)) * 100" -format "%8d" TotalSlotMemory -format "%6d" Memory -format "%5d%%" "((TotalSlotMemory - Memory) / (TotalSlotMemory * 1.0)) * 100" -format "\n" TRUE 

           Name  Cpus Avail Util%  Memory Avail Util%
  slot1@eeyore0    16    12   25%   65536 48128   26%
  slot2@eeyore0    16    14   12%   65536 58368   10%
  slot1@eeyore1    16    12   25%   65536 40960   37%
  slot2@eeyore1    16    15    6%   65536 62464    4%

This is especially useful when machines are configured into combinations of multiple partitionable slots or partitionable and static slots.

Someday the pool utilization script should be integrated with condor_status.

Custom resource attributes: Facter

November 29, 2011

Condor provides a large set of attributes, facts, about resources for scheduling and querying, but it does not provide everything possible. Instead, there is a mechanism to extend the set. Previously, we added FreeMemoryMB. The set can also be extend with information from Facter.

Facter provides an extensible set of facts about a system. To include facter facts we need a means to translate them into attributes and add to Startd configuration.

$ facter
...
architecture => x86_64
domain => local
facterversion => 1.5.9
hardwareisa => x86_64
hardwaremodel => x86_64
physicalprocessorcount => 1
processor0 => Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz
selinux => true
selinux_config_mode => enforcing
swapfree => 3.98 GB
swapsize => 4.00 GB
...

The facts are of the form name => value, not very far off from ClassAd attributes. A simple script to convert all the facts into attribute with string values is,

/usr/libexec/condor/facter.sh

#!/bin/sh
type facter &> /dev/null || exit 1
facter | sed 's/\([^ ]*\) => \(.*\)/facter_\1 = "\2"/'
$ facter.sh
...
facter_architecture = "x86_64"
facter_domain = "local"
facter_facterversion = "1.5.9"
facter_hardwareisa = "x86_64"
facter_hardwaremodel = "x86_64"
facter_physicalprocessorcount = "1"
facter_processor0 = "Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz"
facter_selinux = "true"
facter_selinux_config_mode = "enforcing"
facter_swapfree = "3.98 GB"
facter_swapsize = "4.00 GB"
...

And the configuration, simply dropped into /etc/condor/config.d,

/etc/condor/config.d/49facter.config

FACTER = /usr/libexec/condor/facter.sh
STARTD_CRON_JOBLIST = $(STARTD_CRON_JOBLIST) FACTER
STARTD_CRON_FACTER_EXECUTABLE = $(FACTER)
STARTD_CRON_FACTER_PERIOD = 300

A condor_reconfig and the facter facts will be available,

$ condor_status -long | grep ^facter
...
facter_architecture = "x86_64"
facter_facterversion = "1.5.9"
facter_domain = "local"
facter_swapfree = "3.98 GB"
facter_selinux = "true"
facter_hardwaremodel = "x86_64"
facter_selinux_config_mode = "enforcing"
facter_processor0 = "Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz"
facter_selinux_mode = "targeted"
facter_hardwareisa = "x86_64"
facter_swapsize = "4.00 GB"
facter_physicalprocessorcount = "1"
...

For scheduling, just use the facter information in job requierments, e.g. requirements = facter_selinux == "true".

Or, query your pool to see what resources are not running selinux,

$ condor_status -const 'facter_selinux == "false"'
Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime
eeyore.local       LINUX      X86_64 Unclaimed Idle     0.030  3760  0+00:12:31
                     Machines Owner Claimed Unclaimed Matched Preempting
        X86_64/LINUX        1     0       0         1       0          0
               Total        1     0       0         1       0          0

Oops.

Getting started: Condor and EC2 – condor_ec2_q tool

November 2, 2011

While Getting started with Condor and EC2, it is useful to display the EC2 specific attributes on jobs. This is a script that mirrors condor_q output, using its formatting parameters, and adds details for EC2 jobs.

condor_ec2_q:

#!/bin/sh

# NOTE:
#  . Requires condor_q >= 7.5.2, old classads do not
#    have %
#  . When running, jobs show RUN_TIME of their current
#    run, not accumulated, which would require adding
#    in RemoteWallClockTime
#  . See condor_utils/condor_q.cpp:encode_status for
#    JobStatus map

# TODO:
#  . Remove extra ShadowBday==0 test,
#    condor_gridmanager < 7.7.5 (3a896d01) did not
#    delete ShadowBday when a job was not running.
#    RUN_TIME of held EC2 jobs would be wrong.

echo ' ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD'
condor_q \
   -format '%4d.' ClusterId \
   -format '%-3d ' ProcId \
   -format '%-14s ' Owner \
   -format '%-11s ' 'formatTime(QDate,"%m/%d %H:%M")' \
   -format '%3d+' 'ifThenElse(ShadowBday =!= UNDEFINED, ifThenElse(ShadowBday != 0, time() - ShadowBday, int(RemoteWallClockTime)), int(RemoteWallClockTime)) / (60*60*24)' \
   -format '%02d:' '(ifThenElse(ShadowBday =!= UNDEFINED, ifThenElse(ShadowBday != 0, time() - ShadowBday, int(RemoteWallClockTime)), int(RemoteWallClockTime)) % (60*60*24)) / (60*60)' \
   -format '%02d:' '(ifThenElse(ShadowBday =!= UNDEFINED, ifThenElse(ShadowBday != 0, time() - ShadowBday, int(RemoteWallClockTime)), int(RemoteWallClockTime)) % (60*60)) / 60' \
   -format '%02d ' 'ifThenElse(ShadowBday =!= UNDEFINED, ifThenElse(ShadowBday != 0, time() - ShadowBday, int(RemoteWallClockTime)), int(RemoteWallClockTime)) % 60' \
   -format '%-2s ' 'substr("?IRXCH>S", JobStatus, 1)' \
   -format '%-3d ' JobPrio \
   -format '%-4.1f ' ImageSize/1024.0 \
   -format '%-18.18s' 'strcat(Cmd," ",Arguments)' \
   -format '\n' Owner \
   -format '  Instance name: %s\n' EC2InstanceName \
   -format '  Hostname: %s\n' EC2RemoteVirtualMachineName \
   -format '  Keypair file: %s\n' EC2KeyPairFile \
   -format '  User data: %s\n' EC2UserData \
   -format '  User data file: %s\n' EC2UserDataFile \
   -format '  AMI id: %s\n' EC2AmiID \
   -format '  Instance type: %s\n' EC2InstanceType \
   "$@" | awk 'BEGIN {St["I"]=0;St["R"]=0;St["H"]=0} \
   	       {St[$6]++; print} \
   	       END {for (i=0;i<=7;i++) jobs+=St[substr("?IRXCH>S",i,1)]; \
	       	    print jobs, "jobs;", \
		          St["I"], "idle,", St["R"], "running,", St["H"], "held"}'

In action,

$ condor_q
-- Submitter: eeyore.local :  : eeyore.local
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
1728.0   matt           10/31 23:09   0+00:04:15 H  0   0.0  EC2_Instance-ami-6
1732.0   matt           11/1  01:43   0+05:16:46 R  0   0.0  EC2_Instance-ami-6
5 jobs; 0 idle, 4 running, 1 held

$ ./condor_ec2_q 
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
1728.0   matt           10/31 23:09   0+00:04:15 H  0   0.0  EC2_Instance-ami-6
  Instance name: i-31855752
  Hostname: ec2-50-19-175-62.compute-1.amazonaws.com
  Keypair file: /home/matt/Documents/AWS/EC2_Instance-ami-60bd4609.1728.pem
  User data: Hello EC2_Instance-ami-60bd4609!
  AMI id: ami-60bd4609
  Instance type: m1.small
1732.0   matt           11/01 01:43   0+05:16:48 R  0   0.0  EC2_Instance-ami-6
  Instance name: i-a90edcca
  Hostname: ec2-107-20-6-83.compute-1.amazonaws.com
  Keypair file: /home/matt/Documents/AWS/EC2_Instance-ami-60bd4609.1732.pem
  User data: Hello EC2_Instance-ami-60bd4609!
  AMI id: ami-60bd4609
  Instance type: m1.small
5 jobs; 0 idle, 4 running, 1 held

Getting started: Condor and EC2 – Starting and managing instances

October 31, 2011

Condor has the ability to start and manage the lifecycle of instances in EC2. The integration was released in early 2008 with version 7.1.

The integration started with users being able to upload AMIs to S3 and manage instances using the EC2 and S3 SOAP APIs. At the time, mid-2007, creating a useful AMI required so much user interaction that the complexity of supporting S3 AMI upload was not justified. The implementation settled on pure instance lifecycle management, a very powerful base and a core Condor strength.

A point of innovation during the integration was how to transactionally start instances. The instance’s security group (originally) and ssh keypair (finally), were used as a tracking key. This innovation turned into an RFE and eventually resulted in idempotent instance creation, a feature all Cloud APIs should support. In fact, all distributed resource management APIs should support it, more on this sometime.

Today, in Condor 7.7 and MRG 2, Condor uses the EC2 Query API via the ec2_gahp, and that’s our starting point. We’ll build a submit file, start an instance, get key metadata about the instance, and show how to control the instance’s lifecycle just like any other job’s.

First, the submit file,

universe = grid
grid_resource = ec2 https://ec2.amazonaws.com/

ec2_access_key_id = /home/matt/Documents/AWS/Cert/AccessKeyID
ec2_secret_access_key = /home/matt/Documents/AWS/Cert/SecretAccessKey

ec2_ami_id = ami-60bd4609
ec2_instance_type = m1.small

ec2_user_data = Hello $(executable)!

executable = EC2_Instance-$(ec2_ami_id)

log = $(executable).$(cluster).log

ec2_keypair_file = $(executable).$(cluster).pem

queue

The universe must be grid. The resource string is ec2 https://ec2.amazonaws.com, and the URL may be changed if a proxy is needed or possibly debugging with a redirect.

The ec2_access_key_id and ec2_secret_access_key are full paths to files containing your credentials for accessing EC2. These are needed so Condor can act on your behalf when talking to EC2. They need not and should not be world readable. Take a look at EC2 User Guide: Amazon EC2 Credentials for information on obtaining your credentials.

The ec2_ami_id and ec2_instance_type are required. They specify the AMI off which to base the instance and the type of instance to create, respectively. ami-60bd4609 is an EBS backed Fedora 15 image supported by the Fedora Cloud SIG. A list of instance types can be found in EC2 User Guide: Instance Families and Types. I picked m1.small because the AMI is 32-bit.

ec2_user_data is optional, but when provided gives the instance some extra data to act on when starting up. It is described in EC2 User Guide: Using Instance Metadata. This is an incredibly powerful feature, allowing parameterization of AMIs.

The executable field is simply a label here. It should really be called label or name and integrate with the AWS Console.

The log is our old friend the structured log of lifecycle events.

The ec2_keypair_file is the file where Condor will put the ssh keypair used for accessing the instance. This is a file instead of a keypair name because Condor generates a new keypair for each instance as part of tracking the instances. Eventually Condor should use EC2’s idempotent RunInstances.

Second, let’s submit the job,

$ condor_submit f15-ec2.sub            
Submitting job(s).
1 job(s) submitted to cluster 1710.

$ condor_q
-- Submitter: eeyore.local :  : eeyore.local
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
1710.0   matt           10/30 20:58   0+00:00:00 I  0   0.0  EC2_Instance-ami-6
1 jobs; 1 idle, 0 running, 0 held

Condor is starting up a condor_gridmanager, which is in turn starting up an ec2_gahp to communicate with EC2.

$ pstree | grep condor 
     |-condor_master-+-aviary_query_se
     |               |-condor_collecto---4*[{condor_collect}]
     |               |-condor_negotiat---4*[{condor_negotia}]
     |               |-condor_schedd-+-condor_gridmana---ec2_gahp---2*[{ec2_gahp}]
     |               |               |-condor_procd
     |               |               `-4*[{condor_schedd}]
     |               |-condor_startd-+-condor_procd
     |               |               `-4*[{condor_startd}]
     |               `-4*[{condor_master}]

Third, when the job is running the instance will also be started in EC2. Take a look at the log file, EC2_Instance-ami-60bd4609.1710.log, for some information. Also, the instance name and hostname will be available on the job ad,

$ condor_q -format "Instance name: %s\n" EC2InstanceName -format "Instance hostname: %s\n" EC2RemoteVirtualMachineName -format "Keypair: %s\n" EC2KeyPairFile
Instance name: i-7f37e31c
Instance hostname: ec2-184-72-158-77.compute-1.amazonaws.com
Keypair: /home/matt/Documents/AWS/EC2_Instance-ami-60bd4609.1710.pem

The instance name can be used with the AWS Console or ec2-describe-instances,

$ ec2-describe-instances i-7f37e31c
RESERVATION	r-f6592498	821108636519	default
INSTANCE	i-7f37e31c	ami-60bd4609	ec2-184-72-158-77.compute-1.amazonaws.com	ip-10-118-37-239.ec2.internal	running	SSH_eeyore.local_eeyore.local#1710.0#1320022728	0		m1.small	2011-10-31T00:59:01+0000	us-east-1c	aki-407d9529			monitoring-disabled	184.72.158.77	10.118.37.239			ebs					paravirtual	xen		sg-e5a18c8c	default
BLOCKDEVICE	/dev/sda1	vol-fe4aaf93	2011-10-31T00:59:24.000Z

The instance hostname along with the ec2_keypair_file will let us access the instance,

$ ssh -i /home/matt/Documents/AWS/EC2_Instance-ami-60bd4609.1710.pem ec2-user@ec2-184-72-158-77.compute-1.amazonaws.com
The authenticity of host 'ec2-184-72-158-77.compute-1.amazonaws.com (184.72.158.77)' can't be established.
RSA key fingerprint is f2:6e:da:bb:53:47:34:b6:2e:fe:63:62:a5:c8:a5:2e.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'ec2-184-72-158-77.compute-1.amazonaws.com,184.72.158.77' (RSA) to the list of known hosts.

Appliance:	Fedora-15 appliance 1.1
Hostname:	localhost.localdomain
IP Address:	10.118.37.239

[ec2-user@localhost ~]$ 

Notice that the Fedora instances use a default account of ec2-user, not root.

Also, the user data is available in the instance. Any program could read and act on it.

[ec2-user@localhost ~]$ curl http://169.254.169.254/latest/user-data
Hello EC2_Instance-ami-60bd4609!

Finally, controlling the instance’s lifecycle, simply issue condor_hold or condor_rm and the instance will be terminated. You can also run shutdown -H now in the instance. Here I’ll run sudo shutdown -H now.

[ec2-user@localhost ~]$ sudo shutdown -H now
Broadcast message from ec2-user@localhost.localdomain on pts/0 (Mon, 31 Oct 2011 01:11:55 -0400):
The system is going down for system halt NOW!
[ec2-user@localhost ~]$
Connection to ec2-184-72-158-77.compute-1.amazonaws.com closed by remote host.
Connection to ec2-184-72-158-77.compute-1.amazonaws.com closed.

You will notice that condor_q does not immediately reflect that the instance is terminated, even though ec2-describe-instances will. This is because Condor only polls for status changes in EC2 every 5 minutes by default. The GRIDMANAGER_JOB_PROBE_INTERVAL configuration param is the control.

In this case, the instance was shutdown at Sun Oct 30 21:12:52 EDT 2011 and Condor noticed at 21:14:40,

$ tail -n11 EC2_Instance-ami-60bd4609.1710.log
005 (1710.000.000) 10/30 21:14:40 Job terminated.
	(1) Normal termination (return value 0)
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
	0  -  Run Bytes Sent By Job
	0  -  Run Bytes Received By Job
	0  -  Total Bytes Sent By Job
	0  -  Total Bytes Received By Job
...

Bonus, use periodic_hold or periodic_remove to cap how long an instance can run. Add periodic_hold = (time() – ShadowBday) >= 60 to the submit file and your instance will be terminated, by Condor, after 60 seconds.

$ tail -n6 EC2_Instance-ami-60bd4609.1713.log
001 (1713.000.000) 10/30 21:33:39 Job executing on host: ec2 https://ec2.amazonaws.com/
...
012 (1713.000.000) 10/30 21:37:54 Job was held.
	The job attribute PeriodicHold expression '( time() - ShadowBday ) >= 60' evaluated to TRUE
	Code 3 Subcode 0
...

The instance was not terminated at exactly 60 seconds because the PERIODIC_EXPR_INTERVAL configuration defaults to 300 seconds, just like the GRIDMANAGER_JOB_PROBE_INTERVAL.

Imagine keeping your EC2 instance inventory in Condor. Condor’s policy engine and extensible metadata for jobs automatically extend to instances running in EC2.

Submitting a DAG via Aviary using Python

September 16, 2011

Submitting individual jobs through Condor’s various interfaces is, unsurprisingly, the first thing people do. A quick second is submitting DAGs. I have previously discussed this in Java with BirdBath.

Aviary is a suite of APIs that expose Condor features via powerful, easy to use developer interfaces. It builds on experience from other implementations and takes an approach of exposing common use cases through clean abstractions, while maintaining the Condor philosophy of giving experts access to extended features.

The code is maintained in the contributions section of the Condor repository and is documented in the Grid Developer Guide.

The current implementation provides a SOAP interface for job submission, control and query. It is broken in two parts: a plugin to the condor_schedd that exposes the submission and control; and, a daemon, aviary_query_server, exposing the data querying capabilities.

Installation on Fedora 15 and beyond is a simple yum install condor-aviary. The condor-aviary package includes configuration placed in /etc/condor/config.d. A reconfig of the condor_master, to start the aviary_query_server, and restart of the condor_schedd, to load the plugin, is necessary.

Once installed, there are examples in the repository, including a python submit script.

Starting with the submit.py above, submit_dag.py is a straightforward extension following the CondorSubmitDAG.java example.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright 2009-2011 Red Hat, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# uses Suds - https://fedorahosted.org/suds/
from suds.client import Client
import sys, pwd, os, logging, argparse

def attr_builder(type_, format):
    def attr(name, value):
        attr = client.factory.create("ns0:Attribute")
        attr.name = name
        attr.type = type_
        attr.value = format % (value,)
        return attr
    return attr
string_attr=attr_builder('STRING', '%s')
int_attr=attr_builder('INTEGER', '%d')
expr_attr=attr_builder('EXPRESSION', '%s')


parser = argparse.ArgumentParser(description='Submit a job remotely via SOAP.')
parser.add_argument('-v', '--verbose', action='store_true',
                    default=False, help='enable SOAP logging')
parser.add_argument('-u', '--url', action='store', nargs='?', dest='url',
                    default='http://localhost:9090/services/job/submitJob',
                    help='http or https URL prefix to be added to cmd')
parser.add_argument('dag', action='store', help='full path to dag file')
args =  parser.parse_args()


uid = pwd.getpwuid(os.getuid())[0] or "nobody"

client = Client('file:/var/lib/condor/aviary/services/job/aviary-job.wsdl')

client.set_options(location=args.url)

if args.verbose:
    logging.basicConfig(level=logging.INFO)
    logging.getLogger('suds.client').setLevel(logging.DEBUG)
    print client

try:
    result = client.service.submitJob(
        '/usr/bin/condor_dagman',
        '-f -l . -Debug 3 -AutoRescue 1 -DoRescueFrom 0 -Allowversionmismatch -Lockfile %s.lock -Dag %s' % (args.dag, args.dag),
        uid,
        args.dag[:args.dag.rindex('/')],
        args.dag[args.dag.rindex('/')+1:],
        [],
        [string_attr('Env', '_CONDOR_MAX_DAGMAN_LOG=0;_CONDOR_DAGMAN_LOG=%s.dagman.out' % (args.dag,)),
         int_attr('JobUniverse', 7),
         string_attr('UserLog', args.dag + '.dagman.log'),
         string_attr('RemoveKillSig', 'SIGUSR1'),
         expr_attr('OnExitRemove', '(ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode >= 0 && ExitCode <= 2))')]
	)
except Exception, e:
    print 'invocation failed at: ', args.url
    print e
    sys.exit(1)	

if result.status.code != 'OK':
    print result.status.code,'; ', result.status.text
    sys.exit(1)

print args.verbose and result or result.id.job

Getting Started: Multiple node Condor pool with firewalls

June 21, 2011

Creating a Condor pool with no firewalls up is quite a simple task. Before the condor_shared_port daemon, doing the same with firewalls was a bit painful.

Condor uses dynamic ports for everything except the Collector. The Collector endpoint is the bootstrap. This means a Schedd might start up on a random ephemeral port, and each of its shadows might as well. This causes headaches for firewalls as large ranges of ports need to be opened for communication. There are ways to control the ephemeral range used. Unfortunately, doing so just reduced the port range some, did not guarantee Condor was on the ports, and could limit scale.

The condor_shared_port daemon allows Condor to use a single inbound port on a machine.

Again, using Fedora 15. I had no luck with firewalld and firewall-cmd. Instead I fell back to using straight iptables.

The first thing to do is pick a port for Condor to use on your machines. The simplest thing to do is pick 9618, the port typically known as the Collector’s port.

On all machines where Condor is going to run, you want to –

# lokkit --enabled

# service iptables start
Starting iptables (via systemctl):  [  OK  ]

# service iptables status
Table: filter
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination
1    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED
2    ACCEPT     icmp --  0.0.0.0/0            0.0.0.0/0
3    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
4    REJECT     all  --  0.0.0.0/0            0.0.0.0/0 reject-with icmp-host-prohibited

Chain FORWARD (policy ACCEPT)
num  target     prot opt source               destination
1    REJECT     all  --  0.0.0.0/0            0.0.0.0/0 reject-with icmp-host-prohibited

Chain OUTPUT (policy ACCEPT)
num  target     prot opt source               destination

If you want to ssh to the machine again, be sure to insert rules above the “REJECT ALL — …” –

# iptables -I INPUT 4 -p tcp -m tcp --dport 22 -j ACCEPT

And open a port, both TCP and UDP, for the shared port daemon –

# iptables -I INPUT 5 -p tcp -m tcp --dport condor -j ACCEPT
# iptables -I INPUT 6 -p udp -m udp --dport condor -j ACCEPT

Next you want to configure Condor to use the shared port daemon, with port 9618 –

# cat > /etc/condor/config.d/41shared_port.config
SHARED_PORT_ARGS = -p 9618
DAEMON_LIST = $(DAEMON_LIST), SHARED_PORT
COLLECTOR_HOST = $(CONDOR_HOST)?sock=collector
USE_SHARED_PORT = TRUE
^D

In order, SHARED_PORT_ARGS tells the shared port daemon to listen on port 9618, DAEMON_LIST tells the master to start the shared port daemon, COLLECTOR_HOST specifies that the collector will be on the sock named “collector”, and finally USE_SHARED_PORT tells all daemons to register and use the shared port daemon.

After you put that configuration on all your systems, run service condor restart, and go.

You will have the shared port daemon listening on 9618 (condor), and all communication between machines will around through it.

# lsof -i | grep $(pidof condor_shared_port)
condor_sh 31040  condor    8u  IPv4  74105      0t0  TCP *:condor (LISTEN)
condor_sh 31040  condor    9u  IPv4  74106      0t0  UDP *:condor

That’s right, you have a condor pool with firewalls and a single port opened for communication on each node.

A protocol mismatch: Some unnecessary Starter->Shadow communication

May 1, 2011

During previous runs that push an out of the box Schedd to see what it can handle with respect to job rates, Shadows were repeatedly complaining about being delayed in reusing claims.

It turned out that a bug introduced in Sept 2010 was causing a protocol mismatch, which when removed, allows the Schedd to sustain about 95 job completions per second.

During the Condor 7.5 development series, the Schedd gained the ability to reuse existing Shadows to serially run jobs. This change has huge benefits reducing load on the Schedd and underlying OS. A topic for another time, Condor relies heavily on process separation in its architecture, and Shadows are effectively pooled workers – think workers in a thread pool.

Here are the relevant logs at the time of the delayed claim reuse.

ShadowLog –

04/21/11 15:12:49 (1.1) (23946): Reporting job exit reason 100 and attempting to fetch new job.
04/21/11 15:12:49 (1.1) (23946): Reading job ClassAd from STDIN
04/21/11 15:12:53 (1.1) (23946): Switching to new job 1.1
04/21/11 15:12:53 (?.?) (23946): Initializing a VANILLA shadow for job 1.1
04/21/11 15:12:53 (1.1) (23946): no UserLog found
04/21/11 15:12:53 (1.1) (23946): ClaimStartd is true, trying to claim startd 
04/21/11 15:12:53 (1.1) (23946): Requesting claim description
04/21/11 15:12:53 (1.1) (23946): in RemoteResource::initStartdInfo()
04/21/11 15:12:53 (1.1) (23946): Request was NOT accepted for claim description
04/21/11 15:12:53 (1.1) (23946): Completed REQUEST_CLAIM to startd description
04/21/11 15:12:53 (1.1) (23946): Entering DCStartd::activateClaim()
04/21/11 15:12:53 (1.1) (23946): DCStartd::activateClaim: successfully sent command, reply is: 2
04/21/11 15:12:53 (1.1) (23946): Request to run on none  was DELAYED (previous job still being vacated)
04/21/11 15:12:53 (1.1) (23946): activateClaim(): will try again in 1 seconds
04/21/11 15:12:54 (1.1) (23946): Entering DCStartd::activateClaim()
04/21/11 15:12:54 (1.1) (23946): DCStartd::activateClaim: successfully sent command, reply is: 1
04/21/11 15:12:54 (1.1) (23946): Request to run on none  was ACCEPTED

StartLog –

04/21/11 15:12:49 Changing activity: Idle -> Busy
04/21/11 15:12:49 Received TCP command 60008 (DC_CHILDALIVE) from unauthenticated@unmapped , access level DAEMON
04/21/11 15:12:49 Received job ClassAd update from starter.
04/21/11 15:12:49 Received job ClassAd update from starter.
04/21/11 15:12:49 Closing job ClassAd update socket from starter.
04/21/11 15:12:49 Called deactivate_claim_forcibly()
04/21/11 15:12:49 In Starter::kill() with pid 24278, sig 3 (SIGQUIT)
04/21/11 15:12:49 Send_Signal(): Doing kill(24278,3) [SIGQUIT]
04/21/11 15:12:49 in starter:killHard starting kill timer
04/21/11 15:12:52 9400356 kbytes available for ".../execute"
04/21/11 15:12:52 Publishing ClassAd for 'MIPS' to slot 1
04/21/11 15:12:52 Publishing ClassAd for 'KFLOPS' to slot 1
04/21/11 15:12:52 Trying to update collector 
04/21/11 15:12:52 Attempting to send update via TCP to collector eeyore.local 
04/21/11 15:12:52 Sent update to 1 collector(s)
04/21/11 15:12:53 Got REQUEST_CLAIM while in Claimed state, ignoring.
04/21/11 15:12:53 DaemonCore: No more children processes to reap.
04/21/11 15:12:53 Got activate claim while starter is still alive.
04/21/11 15:12:53 Telling shadow to try again later.
04/21/11 15:12:53 Starter pid 24278 exited with status 0
04/21/11 15:12:53 Canceled hardkill-starter timer (2535)
04/21/11 15:12:53 State change: starter exited
04/21/11 15:12:53 Changing activity: Busy -> Idle
04/21/11 15:12:54 Got activate_claim request from shadow ()

StarterLog –

04/21/11 15:12:49 Job 1.1 set to execute immediately
04/21/11 15:12:49 Starting a VANILLA universe job with ID: 1.1
04/21/11 15:12:49 IWD: /tmp
04/21/11 15:12:49 About to exec /bin/true 
04/21/11 15:12:49 Create_Process succeeded, pid=24279
04/21/11 15:12:49 Process exited, pid=24279, status=0
04/21/11 15:12:49 Got SIGQUIT.  Performing fast shutdown.
04/21/11 15:12:49 ShutdownFast all jobs.
04/21/11 15:12:53 condor_read() failed: recv() returned -1, errno = 104 Connection reset by peer, reading 5 bytes from .
04/21/11 15:12:53 IO: Failed to read packet header
04/21/11 15:12:53 **** condor_starter (condor_STARTER) pid 24278 EXITING WITH STATUS 0
04/21/11 15:12:54 Setting maximum accepts per cycle 4.

First, the ShadowLog shows the Shadow is being told, by the Startd, to delay claim reuse. The Shadow decides to sleep() for a second. Second, the StartLog shows the Startd rejecting the claim reuse because the Starter still exists. Third, the StarterLog shows the Starter still exists because it is waiting in condor_read().

Using D_FULLDEBUG and D_NETWORK, it turned out that the Starter was trying to talk to the Shadow, sending an update, when the Shadow was not expecting to process anything from the Starter. The Shadow was not responding, causing the Starter to wait and eventually timeout.

The issue was the update not being properly blocked from executing during the shutdown. A two line patch and re-running the submission-completion test, resulted in zero delays in claim reuse and a bump to about 95 jobs processed per second.

The dip in the rates is from SHADOW_WORKLIFE and a topic for the discussion on pooled Shadows.

Quick walk with Condor: Looking at Scheduler performance w/o notification

April 24, 2011

Recently I did a quick walk with the Schedd, looking at its submission and completion rates. Out of the box, submitting jobs with no special consideration for performance, the Schedd comfortably ran 55 jobs per second.

Without sending notifications, the Schedd can sustain a rate of 85 jobs per second.

I ran the test again, this time with notification=never and in two configurations: first, with 500,000 jobs submitted upfront; second, with submissions occurring during completions. The idea was to get an idea for performance when the Shadow is not burdened with sending email notifications of job completions, and to figure out how the Schedd performs with respect to servicing condor_submit at the same time it is running jobs.

First, submitting 500,000 jobs and then letting them drain showed a sustained rate of about 86 jobs per second.

Upfront submission of 500K jobs, drain at rate of 85 jobs per second

Second, building up the submission rate to about 100 jobs per second showed a sustained rate of about 83 jobs per second (81 shown in graph below).

Submission and drain rates of 85 jobs per second

The results are quite satisfying, and show the Schedd can sustain a reasonably high job execution rate at the same time it services submissions.


%d bloggers like this: