Posts Tagged ‘Subsystem’

Condor Configuration: Subsystem and Local-name

August 20, 2010


Every Condor daemon has a burned in notion of a subsystem, its subsystem. These are fairly logical, e.g. condor_startd’s subsystem is STARTD while condor_collector’s subsystem is COLLECTOR. See the pattern? As of 7.4 there are about 30, including MASTER, SCHEDD, SHADOW, STARTER, TOOL, GRIDMANAGER, VM_GAHP, …

All Condor daemons read the same configuration files. Subsystem is a useful mechanism to vary configuration parameters between daemons. For instance, the configuration parameter NOT_RESPONDING_TIMEOUT controls how long a daemon can go without sending a keep-alive to its parent. It defaults to one hour, but maybe you do not want to wait for an hour if your condor_collector hangs. To achieve this you can set COLLECTOR.NOT_RESPONDING_TIMEOUT = 1800, in seconds of course, which means the condor_collector only gets to go off the reservation for at most 30 minutes.


As you surely know, the condor_master reads the DAEMON_LIST parameter to figure out what daemons it should run, e.g. DAEMON_LIST = MASTER, STARTD runs a condor_startd. It is often popular to run multiple copies of a daemon. As a way to do deployment testing, an installation may want to have a shadow pool that only runs no-op-like jobs on a newer version of Condor than is in production, while sharing the production hardware. I want to meet the folks who buy an extra 5,000 node cluster just for production testing. In such a configuration the DAEMON_LIST may be MASTER, STARTD, SHADOW_STARTD. Pretend the SHADOW_STARTD is defined to be some different condor_startd version.


This means the condor_master tries to run two condor_startd daemons. This is not enough configuration to make it work though. Each Startd will read the same parameters, e.g. STARTD_LOG, EXECUTE or policy like START. That is probably not what was intended. In fact having two Startds share an EXECUTE is a recipe for disaster.

Both the STARTD and SHADOW_STARTD are the condor_startd executable, even if they are different versions, so they both have the same subsystem. Local-name to the rescue here. Each daemon can be given a -local-name parameter,


Local-name provides the needed differentiator. You can now set specific configuration for the SHADOW_STARTD,


Keep in mind, this is not enough config to run two Startds on a single system. You will probably also need to set STARTD.SHADOW.ADDRESS_FILE, STARTD.SHADOW.STARTD_NAME, STARTD.SHADOW.STARTD_LOG and disable USE_PROCD.


%d bloggers like this: