Condor Configuration: Subsystem and Local-name

Subsystem

Every Condor daemon has a burned in notion of a subsystem, its subsystem. These are fairly logical, e.g. condor_startd’s subsystem is STARTD while condor_collector’s subsystem is COLLECTOR. See the pattern? As of 7.4 there are about 30, including MASTER, SCHEDD, SHADOW, STARTER, TOOL, GRIDMANAGER, VM_GAHP, …

All Condor daemons read the same configuration files. Subsystem is a useful mechanism to vary configuration parameters between daemons. For instance, the configuration parameter NOT_RESPONDING_TIMEOUT controls how long a daemon can go without sending a keep-alive to its parent. It defaults to one hour, but maybe you do not want to wait for an hour if your condor_collector hangs. To achieve this you can set COLLECTOR.NOT_RESPONDING_TIMEOUT = 1800, in seconds of course, which means the condor_collector only gets to go off the reservation for at most 30 minutes.

Local-name

As you surely know, the condor_master reads the DAEMON_LIST parameter to figure out what daemons it should run, e.g. DAEMON_LIST = MASTER, STARTD runs a condor_startd. It is often popular to run multiple copies of a daemon. As a way to do deployment testing, an installation may want to have a shadow pool that only runs no-op-like jobs on a newer version of Condor than is in production, while sharing the production hardware. I want to meet the folks who buy an extra 5,000 node cluster just for production testing. In such a configuration the DAEMON_LIST may be MASTER, STARTD, SHADOW_STARTD. Pretend the SHADOW_STARTD is defined to be some different condor_startd version.

SHADOW_STARTD = $(STARTD)
DAEMON_LIST = MASTER, STARTD, SHADOW_STARTD

This means the condor_master tries to run two condor_startd daemons. This is not enough configuration to make it work though. Each Startd will read the same parameters, e.g. STARTD_LOG, EXECUTE or policy like START. That is probably not what was intended. In fact having two Startds share an EXECUTE is a recipe for disaster.

Both the STARTD and SHADOW_STARTD are the condor_startd executable, even if they are different versions, so they both have the same subsystem. Local-name to the rescue here. Each daemon can be given a -local-name parameter,

SHADOW_STARTD_ARGS = -local-name SHADOW

Local-name provides the needed differentiator. You can now set specific configuration for the SHADOW_STARTD,

STARTD.SHADOW.EXECUTE = $(LOCAL_DIR)/shadow_execute

Keep in mind, this is not enough config to run two Startds on a single system. You will probably also need to set STARTD.SHADOW.ADDRESS_FILE, STARTD.SHADOW.STARTD_NAME, STARTD.SHADOW.STARTD_LOG and disable USE_PROCD.

Advertisements

Tags: , , ,

2 Responses to “Condor Configuration: Subsystem and Local-name”

  1. Ben Cotton Says:

    Purdue doesn’t buy a 5000-node cluster for testing, but when you’ve got several large clusters, turning one of them into a test cluster works pretty well. Of course, we use Condor to scavenge cycles from PBS. If we were using Condor as the primary scheduler, it would be a different story.

  2. Subsystem and Daemon confusion « Spinning Says:

    […] Spinning « Condor Configuration: Subsystem and Local-name […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: