Posts Tagged ‘Configuration’

Configuration and policy evaluation

December 10, 2012

Figuring out how evaluation happens in configuration and policy is a common problem. The confusion is justified.

Configuration provides substitution with $() syntax, while policy is full ClassAd language evaluation without $() syntax.

Configuration is all the parameters listed in files discoverable with condor_config_val -config.

$ condor_config_val -config
Configuration source:
	/etc/condor/condor_config
Local configuration sources:
	/etc/condor/config.d/00personal_condor.config

Policy is the ClassAd expression found on the right-hand side of specific configuration parameters. For instance,

$ condor_config_val -v START
START: ( (KeyboardIdle > 15 * 60) && ( ((LoadAvg - CondorLoadAvg) <= 0.3) || (State != "Unclaimed" && State != "Owner")) )
  Defined in '/etc/condor/condor_config', line 753.

Configuration evaluation allows for substitution of configuration parameters with $().

$ cat /etc/condor/condor_config | head -n753 | tail -n1
START			= $(UWCS_START)

$ condor_config_val -v UWCS_START
UWCS_START: ( (KeyboardIdle > 15 * 60) && ( ((LoadAvg - CondorLoadAvg) <= 0.3) || (State != "Unclaimed" && State != "Owner")) )
  Defined in '/etc/condor/condor_config', line 808.

$ cat /etc/condor/condor_config | head -n808 | tail -n3
UWCS_START	= ( (KeyboardIdle > $(StartIdleTime)) \
                    && ( $(CPUIdle) || \
                         (State != "Unclaimed" && State != "Owner")) )

Here START is actually the value of UWCS_START, provided by $(UWCS_START).

The substitution is recursive. Explore /etc/condor/condor_config and the JustCPU parameter. It is actually a parameter that is never read by daemons or tools. It is only useful in other configuration parameters. It’s shorthand.

Policy evaluation is full ClassAd expression evaluation. The evaluation happens at the appropriate times while daemons or tools are running.

Taking START as an example, the words KeyboardIdle, LoadAvg, CondorLoadAvg, State are attributes found on machine ads, and it is evaluated by the condor_startd and condor_negotiator to figure out if a job is allowed to start on a resource.

$ condor_status -l slot1@eeyore.local | grep -e ^KeyboardIdle -e ^LoadAvg -e ^CondorLoadAvg -e ^State
KeyboardIdle = 0
LoadAvg = 0.290000
CondorLoadAvg = 0.0
State = "Owner"

Evaluation happens by recursively evaluating those attributes. The expression ((KeyboardIdle > 15 * 60) && (((LoadAvg - CondorLoadAvg) <= 0.3) || (State != "Unclaimed" && State != "Owner"))) becomes ((0 > 15 * 60) && (((0.29 - 0.0) <= 0.3) || ("Owner" != "Unclaimed" && "Owner" != "Owner"))). And so forth.

That’s it.

Advertisements

Wallaby: Skeleton Group

June 19, 2012

Read about Wallaby’s Skeleton Group feature. Working similar to /etc/skel for accounts on a single system, it provides base configuration to nodes as they join a pool. It is especially useful for pools with dynamic and opportunistic resources.

Customizing Condor configuration: LOCAL_CONFIG_FILE vs LOCAL_CONFIG_DIR

June 16, 2011

Condor has a powerful configuration system. The language is powerful and so are the ways to extend default configuration.

All Condor processes, daemons/auxiliary programs/command-line tools, read configuration files in the same way. They start with what is commonly called the “global configuration file.” It is not so much global as it is a place for Condor distributors to put configuration that should be common to all installations, not to be directly edited by users. It is a place where distributors can safely change configuration between versions without having to worry about merge conflicts, and users do not have to worry about reapplying their changes.

The global configuration file is one of the following, in order:

0) Filename specified in a CONDOR_CONFIG environment variable
1) /etc/condor/condor_config
2) /use/local/etc/condor_config
3) ~condor/condor_config

For those who care, src/condor_utils/condor_config.cpp defines the order. Fedora uses /etc/condor/condor_config, allowing CONDOR_CONFIG to override.

The most important aspect of the global config file is how it enables users to extend configuration.

Historically, extension was done via the LOCAL_CONFIG_FILE. It provided a single location that a user/administrator could add configuration. It has been around for almost 15 years (since ~1997), and still works well for some use cases, such as host config files managed in a shared filesystem. However, it has a huge drawback that it requires coordinated editing. The coordination extends to features that are packaged and installed on top of Condor. Using it as a StringList does not alleviate the coordination, just extends it to the global config file.

Enter LOCAL_CONFIG_DIR, in March 2006. It provides the common configuration directory mechanism found in other systems software, such as /etc/ld.so.conf.d and /etc/yum.repo.d. It allows administrators and packages to play in a single sandbox and be properly isolated.

The way to extend Condor configuration in Fedora or Red Hat Enterprise Linux is via /etc/condor/config.d, set from LOCAL_CONFIG_DIR=/etc/condor/config.d in /etc/condor/condor_config.

But wait, you’re right, some coordination is still necessary when there is parameter overlap between files. That’s much less coordination though.

The way Condor’s configuration language works means that files read later during configuration can override parameters set in earlier files. For instance,

$ ls -l /etc/condor/config.d
total 16
-rw-r--r--. 1 root root  720 May 31 11:42 00personal_condor.config
-rw-r--r--. 1 root root 1434 May 31 11:39 61aviary.config

$ grep DAEMON_LIST /etc/condor/config.d/00personal_condor.config
DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD

$ condor_config_val -v DAEMON_LIST
DAEMON_LIST: COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD, QUERY_SERVER
  Defined in '/etc/condor/config.d/61aviary.config', line 20.

This can be handled in a few simple ways,

0) Append to parameters whenever possible, for instance above –

$ grep DAEMON_LIST /etc/condor/config.d/61aviary.config
DAEMON_LIST = $(DAEMON_LIST), QUERY_SERVER

1) Separate user managed files from package managed files –

Prefix all files with two-digit numbers with the following ranges:

. 00 – reserved for a default config, e.g. 00personal_condor.config
. 10-40 – user/admin configuration files
. 50-80 – packaged configuration files
. 99 – reserved for features requiring control of configuration

Finally, if you still need to use LOCAL_CONFIG_FILE, you can always set it within a configuration file under /etc/condor/config.d.

Condor Configuration: Subsystem and Local-name

August 20, 2010

Subsystem

Every Condor daemon has a burned in notion of a subsystem, its subsystem. These are fairly logical, e.g. condor_startd’s subsystem is STARTD while condor_collector’s subsystem is COLLECTOR. See the pattern? As of 7.4 there are about 30, including MASTER, SCHEDD, SHADOW, STARTER, TOOL, GRIDMANAGER, VM_GAHP, …

All Condor daemons read the same configuration files. Subsystem is a useful mechanism to vary configuration parameters between daemons. For instance, the configuration parameter NOT_RESPONDING_TIMEOUT controls how long a daemon can go without sending a keep-alive to its parent. It defaults to one hour, but maybe you do not want to wait for an hour if your condor_collector hangs. To achieve this you can set COLLECTOR.NOT_RESPONDING_TIMEOUT = 1800, in seconds of course, which means the condor_collector only gets to go off the reservation for at most 30 minutes.

Local-name

As you surely know, the condor_master reads the DAEMON_LIST parameter to figure out what daemons it should run, e.g. DAEMON_LIST = MASTER, STARTD runs a condor_startd. It is often popular to run multiple copies of a daemon. As a way to do deployment testing, an installation may want to have a shadow pool that only runs no-op-like jobs on a newer version of Condor than is in production, while sharing the production hardware. I want to meet the folks who buy an extra 5,000 node cluster just for production testing. In such a configuration the DAEMON_LIST may be MASTER, STARTD, SHADOW_STARTD. Pretend the SHADOW_STARTD is defined to be some different condor_startd version.

SHADOW_STARTD = $(STARTD)
DAEMON_LIST = MASTER, STARTD, SHADOW_STARTD

This means the condor_master tries to run two condor_startd daemons. This is not enough configuration to make it work though. Each Startd will read the same parameters, e.g. STARTD_LOG, EXECUTE or policy like START. That is probably not what was intended. In fact having two Startds share an EXECUTE is a recipe for disaster.

Both the STARTD and SHADOW_STARTD are the condor_startd executable, even if they are different versions, so they both have the same subsystem. Local-name to the rescue here. Each daemon can be given a -local-name parameter,

SHADOW_STARTD_ARGS = -local-name SHADOW

Local-name provides the needed differentiator. You can now set specific configuration for the SHADOW_STARTD,

STARTD.SHADOW.EXECUTE = $(LOCAL_DIR)/shadow_execute

Keep in mind, this is not enough config to run two Startds on a single system. You will probably also need to set STARTD.SHADOW.ADDRESS_FILE, STARTD.SHADOW.STARTD_NAME, STARTD.SHADOW.STARTD_LOG and disable USE_PROCD.


%d bloggers like this: