In case you did not know, you can run multiple copies of Condor on a single machine. You can even run multiple copies of portions of Condor on a single machine.
You may want to do this to assist in over provisioning a system, perform some strict workload isolation, or run an administrative shadow pool.
The condor_master does make some efforts to prevent duplicate copies running against the same configuration. The important thing is to figure out what configuration must be unique.
Here is the configuration for running a second Startd on a machine. Put it in /etc/condor/config.d/31shadow_startd.config.
# Define the daemon name for the shadow startd, to be put in # DAEMON_LIST. The master will read DAEMON_LIST and look for # configuration for each daemon, such as SHADOW_STARTD_ARGS and # SHADOW_STARTD_ENVIRONMENT. SHADOW_STARTD = $(STARTD) # Arguments for the SHADOW_STARTD daemon: # -f means foreground (eliminates need for adding SHADOW_STARTD to DC_DAEMON_LIST) # -local-name gives the SHADOW_STARTD a namespace for its configuration SHADOW_STARTD_ARGS = -f -local-name SHADOW_STARTD # The startd spawns starters, each of which writes to a log file. That # log file is found by looking up the STARTER_LOG configuration. There # is no STARTER_ARGS, similar to SHADOW_STARTD_ARGS, to pass # -local-name. Instead the STARTER_LOG is put in the environment of # the startd, by the master, so it can be inherited by the # starters. Similarly, the startd and starters share a procd. Multiple # startd/starter sets sharing a procd will result in failures. So a # unique PROCD_ADDRESS is also provided. SHADOW_STARTD_ENVIRONMENT = "_CONDOR_STARTER_LOG=$(STARTER_LOG).shadow _CONDOR_PROCD_ADDRESS=$(PROCD_ADDRESS).shadow" # Configuration in the SHADOW_STARTD namespace: SHADOW_STARTD.STARTD_LOG = $(STARTD_LOG).shadow SHADOW_STARTD.STARTD_ADDRESS_FILE = $(STARTD_ADDRESS_FILE).shadow # A unique EXECUTE directory, create 1777 owned by condor. SHADOW_STARTD.EXECUTE = $(EXECUTE).shadow # A unique name to avoid collisions in the collector. SHADOW_STARTD.STARTD_NAME = SHADOW_STARTD # Put an extra attribute on the slots from the SHADOW_STARTD, useful # in identifying them. SHADOW_STARTD.IsShadowStartd = TRUE SHADOW_STARTD.STARTD_ATTRS = $(STARTD_ATTRS), IsShadowStartd # The shadow startd may have special requirements for jobs. #SHADOW_STARTD.START = IsForShadowStartd =?= TRUE # Add the SHADOW_STARTD to the set of daemons managed by the master. DAEMON_LIST = $(DAEMON_LIST), SHADOW_STARTD
After installing the configuration, make sure to create the EXECUTE directory.
# sudo -u condor mkdir -m0755 $(condor_config_val SHADOW_STARTD.EXECUTE)
Then you can test it is working –
# service condor start $ pstree -p | grep condor |-condor_master(9947)-+-condor_collecto(9949) | |-condor_negotiat(9950) | |-condor_schedd(9951)---condor_procd(9956) | |-condor_startd(9952) | `-condor_startd(9953)
Notice the two Startds. You can also check /var/log/condor for the shadow startd’s log files.
$ condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime slot1@SHADOW_START LINUX X86_64 Unclaimed Idle 0.000 987 0+00:00:04 slot1@eeyore.local LINUX X86_64 Unclaimed Idle 0.000 987 0+00:00:04 slot2@SHADOW_START LINUX X86_64 Unclaimed Idle 0.000 987 0+00:00:20 slot2@eeyore.local LINUX X86_64 Unclaimed Idle 0.000 987 0+00:00:20 slot3@SHADOW_START LINUX X86_64 Unclaimed Idle 0.000 987 0+00:00:21 slot3@eeyore.local LINUX X86_64 Unclaimed Idle 0.000 987 0+00:00:21 slot4@SHADOW_START LINUX X86_64 Unclaimed Idle 0.000 987 0+00:00:22 slot4@eeyore.local LINUX X86_64 Unclaimed Idle 0.000 987 0+00:00:22 Machines Owner Claimed Unclaimed Matched Preempting X86_64/LINUX 8 0 0 8 0 0 Total 8 0 0 8 0 0 $ echo -e 'cmd=/bin/sleep\nargs=1d\nrequirements=IsShadowStartd=!=TRUE\nqueue 8\nrequirements=IsShadowStartd=?=TRUE\nqueue 8' | condor_submit Submitting job(s)................ 16 job(s) submitted to cluster 1.The first 8 jobs (1.0 to 1.7) will not run on the shadow startd, while the second 8 (1.8 to 1.15) will only.$ condor_q -run -- Submitter: eeyore.local : : eeyore.local ID OWNER SUBMITTED RUN_TIME HOST(S) 1.0 matt 10/18 13:50 0+00:00:16 slot1@eeyore.local 1.1 matt 10/18 13:50 0+00:00:16 slot2@eeyore.local 1.2 matt 10/18 13:50 0+00:00:16 slot3@eeyore.local 1.3 matt 10/18 13:50 0+00:00:16 slot4@eeyore.local 1.8 matt 10/18 13:50 0+00:00:16 slot1@SHADOW_STARTD@eeyore.local 1.9 matt 10/18 13:50 0+00:00:16 slot2@SHADOW_STARTD@eeyore.local 1.10 matt 10/18 13:50 0+00:00:16 slot3@SHADOW_STARTD@eeyore.local 1.11 matt 10/18 13:50 0+00:00:16 slot4@SHADOW_STARTD@eeyore.local
Leave a comment