The condor_schedd’s architecture currently represents each running job as a condor_shadow process it can directly manage. I’m going to skip the pros & cons of threading vs processes, or multiplexing vs not. Those are great topics, but not of interest right now. The condor_schedd uses processes and does not multiplex, so when it is managing 10,000 running jobs it will have 10,000 running condor_shadow processes. Other than the shock of 10,000 processes, what are some immediate concerns? 1) CPU usage – actually very small, condor_shadow processes spend all their time waiting on I/O, 2) I/O usage – condor_shadows may spend all their time waiting on I/O, but when that I/O opens it is typically small, maybe 10K network and a few K disk, 3) memory usage – now that’s an interesting one.
Quick ways to look at memory usage?
1) top, but it’s not going to be as helpful as you might like.
$ top -n1 -p $(pidof condor_shadow-base) ... PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10487 matt 20 0 10092 3756 3176 S 0.0 0.2 0:00.01 condor_shadow-b
This gives a general idea about the memory usage of the condor_shadow. It has about 3756KB of resident memory and is sharing about 3176KB.
What we really want to see is the amount of memory that cannot be shared between condor_shadow processes. This will give us a rough lower bound on how much memory is needed for 10,000 condor_shadows. top might suggest that number is 3756-3176K=580K, but that’s a common mistake. A quick web search will tell you why.
2) pmap, which will give a better idea and where the memory is going.
$ pmap -d $(pidof condor_shadow-base) Address Kbytes Mode Offset Device Mapping 00110000 884 r-x-- 0000000000000000 0fd:00000 libstdc++.so.6.0.13 001ed000 16 r---- 00000000000dc000 0fd:00000 libstdc++.so.6.0.13 001f1000 8 rw--- 00000000000e0000 0fd:00000 libstdc++.so.6.0.13 001f3000 24 rw--- 0000000000000000 000:00000 [ anon ] 001f9000 44 r-x-- 0000000000000000 0fd:00000 libnss_files-2.11.1.so 00204000 4 r---- 000000000000a000 0fd:00000 libnss_files-2.11.1.so 00205000 4 rw--- 000000000000b000 0fd:00000 libnss_files-2.11.1.so 0063a000 116 r-x-- 0000000000000000 0fd:00000 libgcc_s-4.4.3-20100127.so.1 00657000 4 rw--- 000000000001c000 0fd:00000 libgcc_s-4.4.3-20100127.so.1 00733000 4 r-x-- 0000000000000000 000:00000 [ anon ] 00745000 8 r-x-- 0000000000000000 0fd:00000 libcom_err.so.2.1 00747000 4 rw--- 0000000000002000 0fd:00000 libcom_err.so.2.1 0074a000 8 r-x-- 0000000000000000 0fd:00000 libkeyutils-1.2.so 0074c000 4 rw--- 0000000000001000 0fd:00000 libkeyutils-1.2.so 0074f000 716 r-x-- 0000000000000000 0fd:00000 libkrb5.so.3.3 00802000 24 rw--- 00000000000b3000 0fd:00000 libkrb5.so.3.3 0082d000 276 r-x-- 0000000000000000 0fd:00000 libfreebl3.so 00872000 4 rw--- 0000000000045000 0fd:00000 libfreebl3.so 00873000 16 rw--- 0000000000000000 000:00000 [ anon ] 00880000 180 r-x-- 0000000000000000 0fd:00000 libgssapi_krb5.so.2.2 008ad000 4 rw--- 000000000002d000 0fd:00000 libgssapi_krb5.so.2.2 008b0000 168 r-x-- 0000000000000000 0fd:00000 libk5crypto.so.3.1 008da000 4 rw--- 000000000002a000 0fd:00000 libk5crypto.so.3.1 008dd000 32 r-x-- 0000000000000000 0fd:00000 libkrb5support.so.0.1 008e5000 4 rw--- 0000000000007000 0fd:00000 libkrb5support.so.0.1 008e8000 28 r-x-- 0000000000000000 0fd:00000 libcrypt-2.11.1.so 008ef000 4 r---- 0000000000007000 0fd:00000 libcrypt-2.11.1.so 008f0000 4 rw--- 0000000000008000 0fd:00000 libcrypt-2.11.1.so 008f1000 156 rw--- 0000000000000000 000:00000 [ anon ] 0091a000 328 r-x-- 0000000000000000 0fd:00000 libssl.so.1.0.0 0096c000 16 rw--- 0000000000051000 0fd:00000 libssl.so.1.0.0 00974000 120 r-x-- 0000000000000000 0fd:00000 ld-2.11.1.so 00992000 4 r---- 000000000001d000 0fd:00000 ld-2.11.1.so 00993000 4 rw--- 000000000001e000 0fd:00000 ld-2.11.1.so 00996000 1464 r-x-- 0000000000000000 0fd:00000 libc-2.11.1.so 00b04000 8 r---- 000000000016e000 0fd:00000 libc-2.11.1.so 00b06000 4 rw--- 0000000000170000 0fd:00000 libc-2.11.1.so 00b07000 12 rw--- 0000000000000000 000:00000 [ anon ] 00b0c000 88 r-x-- 0000000000000000 0fd:00000 libpthread-2.11.1.so 00b22000 4 r---- 0000000000015000 0fd:00000 libpthread-2.11.1.so 00b23000 4 rw--- 0000000000016000 0fd:00000 libpthread-2.11.1.so 00b24000 8 rw--- 0000000000000000 000:00000 [ anon ] 00b28000 12 r-x-- 0000000000000000 0fd:00000 libdl-2.11.1.so 00b2b000 4 r---- 0000000000002000 0fd:00000 libdl-2.11.1.so 00b2c000 4 rw--- 0000000000003000 0fd:00000 libdl-2.11.1.so 00b2f000 160 r-x-- 0000000000000000 0fd:00000 libm-2.11.1.so 00b57000 4 r---- 0000000000027000 0fd:00000 libm-2.11.1.so 00b58000 4 rw--- 0000000000028000 0fd:00000 libm-2.11.1.so 00b5b000 72 r-x-- 0000000000000000 0fd:00000 libz.so.1.2.3 00b6d000 4 rw--- 0000000000011000 0fd:00000 libz.so.1.2.3 00b7b000 112 r-x-- 0000000000000000 0fd:00000 libselinux.so.1 00b97000 4 r---- 000000000001b000 0fd:00000 libselinux.so.1 00b98000 4 rw--- 000000000001c000 0fd:00000 libselinux.so.1 00b9b000 80 r-x-- 0000000000000000 0fd:00000 libresolv-2.11.1.so 00baf000 4 ----- 0000000000014000 0fd:00000 libresolv-2.11.1.so 00bb0000 4 r---- 0000000000014000 0fd:00000 libresolv-2.11.1.so 00bb1000 4 rw--- 0000000000015000 0fd:00000 libresolv-2.11.1.so 00bb2000 8 rw--- 0000000000000000 000:00000 [ anon ] 042a3000 1476 r-x-- 0000000000000000 0fd:00000 libcrypto.so.1.0.0 04414000 80 rw--- 0000000000170000 0fd:00000 libcrypto.so.1.0.0 04428000 12 rw--- 0000000000000000 000:00000 [ anon ] 049f0000 188 r-x-- 0000000000000000 0fd:00000 libpcre.so.0.0.1 04a1f000 4 rw--- 000000000002e000 0fd:00000 libpcre.so.0.0.1 08048000 2496 r-x-- 0000000000000000 0fd:00001 condor_shadow-base 082b8000 8 rw--- 0000000000270000 0fd:00001 condor_shadow-base 082ba000 12 rw--- 0000000000000000 000:00000 [ anon ] 09c64000 396 rw--- 0000000000000000 000:00000 [ anon ] b7819000 24 rw--- 0000000000000000 000:00000 [ anon ] b7839000 12 rw--- 0000000000000000 000:00000 [ anon ] bf930000 84 rw--- 0000000000000000 000:00000 [ stack ] mapped: 10092K writeable/private: 972K shared: 0K
“writable/private: 972K” is a better number to go by. It suggests that 10,000 shadows will require at least 972*10,000/2^20=9.269GB of memory.
So that’s where the memory is. Now why is it there?
I’ve been playing with Massif lately, so it seemed like a reasonable place to start. It can produce a usage graph,
KB 335.8^ # | ::::::::::@:::::::::#: | @@:@::::: :::: @: : :::::#: | ::::::@ :@::::: :::: @: : :::::#: | ::::::@ :@::::: :::: @: : :::::#: | @::::::::::::@ :@::::: :::: @: : :::::#: | ::@::@:@:::: :::::::@ :@::::: :::: @: : :::::#: | @:: @::@:@:::: :::::::@ :@::::: :::: @: : :::::#: | :@:: @::@:@:::: :::::::@ :@::::: :::: @: : :::::#: | ::@:: @::@:@:::: :::::::@ :@::::: :::: @: : :::::#: | @::@:: @::@:@:::: :::::::@ :@::::: :::: @: : :::::#: | @::@:: @::@:@:::: :::::::@ :@::::: :::: @: : :::::#: | @@@::@:: @::@:@:::: :::::::@ :@::::: :::: @: : :::::#: | @ @::@:: @::@:@:::: :::::::@ :@::::: :::: @: : :::::#: | @@ @::@:: @::@:@:::: :::::::@ :@::::: :::: @: : :::::#: | @@@ @::@:: @::@:@:::: :::::::@ :@::::: :::: @: : :::::#: | @@@ @::@:: @::@:@:::: :::::::@ :@::::: :::: @: : :::::#: | @@@@@ @::@:: @::@:@:::: :::::::@ :@::::: :::: @: : :::::#: | @ @@@ @::@:: @::@:@:::: :::::::@ :@::::: :::: @: : :::::#: | @@ @@@ @::@:: @::@:@:::: :::::::@ :@::::: :::: @: : :::::#: 0 +----------------------------------------------------------------------->Mi 0 17.22
Massif tells us about dynamic memory allocations, which in this case amount to 335KB. That is only a third of the 972K of private memory reported by pmap. This suggests that two-thirds of the private memory in the shadow may be from static allocations.
objdump is another tool I’ve been using lately to inspect ELF binaries.
$ objdump -t condor_shadow-base | grep -e "\.bss" -e "\.data" | sort -k5 | c++filt | tail 082bc900 l O .bss 00000100 pe_logname.7591 082bca00 l O .bss 00000100 pe_user.7592 082bc160 l O .bss 00000100 priv_identifier::id 082b8d80 l O .data 00000120 CondorEnvironList 082bbbc0 g O .bss 000001c4 ConfigTab 082bbf20 l O .bss 00000200 priv_history 082ba920 l O .bss 00001000 answer.7409 082b9920 l O .bss 00001000 answer.7437 082b89a0 g .data 00000000 __data_start 082b89a0 w .data 00000000 data_start
What’s this, two 1K static buffers named answer.
* * *
After some code inspection it turned out that two functions in ckpt_name.c contained static char answer[MAXPATHLEN]
, where MAXPATHLEN is of course 1024. One of those functions, gen_exec_name, was dead code and easy to remove. The other, gen_ckpt_name, was active, with all but one of its call sites immediately strdup()’ing its result. After necessary modifications,
$ objdump -t condor_shadow-sans-answers | grep -e "\.bss" -e "\.data" | sort -k5 | c++filt | tail 082b9720 g O .bss 000000e8 MaxLog 082b8b00 l O .data 000000f0 SigNameArray 082ba900 l O .bss 00000100 pe_logname.7591 082baa00 l O .bss 00000100 pe_user.7592 082ba160 l O .bss 00000100 priv_identifier::id 082b8d80 l O .data 00000120 CondorEnvironList 082b9bc0 g O .bss 000001c4 ConfigTab 082b9f20 l O .bss 00000200 priv_history 082b89a0 g .data 00000000 __data_start 082b89a0 w .data 00000000 data_start
And while running,
$ pmap -d $(pidof condor_shadow-sans-answers) | grep -e rw -e private ... 082b8000 8 rw--- 0000000000270000 0fd:00001 condor_shadow-sans-answers 082ba000 4 rw--- 0000000000000000 000:00000 [ anon ] 089f0000 396 rw--- 0000000000000000 000:00000 [ anon ] b7877000 24 rw--- 0000000000000000 000:00000 [ anon ] b7897000 12 rw--- 0000000000000000 000:00000 [ anon ] bff58000 84 rw--- 0000000000000000 000:00000 [ stack ] mapped: 10084K writeable/private: 964K shared: 0K
We see an expected decrease in the shadow’s private address space. But, even though those answer buffers were the largest around, we’re aiming at small game. This is a ~0.8% improvement, or ~78MB out of our ~9.269GB. At least we now have a procedure for further reducing memory, even if it is a fraction of a percent at a time. In fact, with a little more digging there are more possibilities here,
$ objdump -t condor_shadow-sans-answers | grep -e "\.bss" -e "\.data" | sort -k5 | c++filt | \ grep NULL_XACTION | tail -n1 082babbc l O .bss 00000004 classad::NULL_XACTION
While classad::NULL_XACTION may be small, there are quite a few of them,
$ objdump -t condor_shadow-sans-answers | grep -e "\.bss" -e "\.data" | sort -k5 | c++filt | \ awk '{print $6}' | sort | uniq -c | sort -n | tail 2 DEFAULT_INDENT 2 EvalBool(compat_classad::ClassAd*, 2 SharedPortEndpoint::SharedPortEndpoint(char 2 SharedPortEndpoint::UseSharedPort(MyString*, 2 std::__ioinit 2 SubsystemInfo::setClass(SubsystemInfoLookup 3 vtable 11 guard 107 classad::NULL_XACTION
* * *
And it turns out classad::NULL_XACTION is not only defined in a header file that is included in many places, but it is completely unused. Note: Please don’t put data like NULL_XACTION in a header.
$ objdump -t condor_shadow-sans-answers-null_xaction | grep -e "\.bss" -e "\.data" | sort -k5 | c++filt | \ awk '{print $6}' | sort | uniq -c | sort -n | tail 1 ZZZZZ 2 DEFAULT_INDENT 2 EvalBool(compat_classad::ClassAd*, 2 SharedPortEndpoint::SharedPortEndpoint(char 2 SharedPortEndpoint::UseSharedPort(MyString*, 2 std::__ioinit 2 SubsystemInfo::setClass(SubsystemInfoLookup 3 vtable 11 guard
Since NULL_XACTION is not actually used its removal should not change the private address space of the shadow, but instead the overall binary and memory size.
$ pmap -d $(pidof condor_shadow-sans-answers-null_xaction) | grep -e rw -e private ... 082b2000 8 rw--- 0000000000269000 0fd:00001 condor_shadow-sans-answers-null_xaction 082b4000 4 rw--- 0000000000000000 000:00000 [ anon ] 08a0a000 396 rw--- 0000000000000000 000:00000 [ anon ] b77f5000 24 rw--- 0000000000000000 000:00000 [ anon ] b7815000 12 rw--- 0000000000000000 000:00000 [ anon ] bfd28000 84 rw--- 0000000000000000 000:00000 [ stack ] mapped: 10060K writeable/private: 964K shared: 0K
Which appears to be the case,
$ size * text data bss dec hex filename 2554154 5032 14360 2573546 2744ea condor_shadow-base 2554146 5032 6168 2565346 2724e2 condor_shadow-sans-answers 2528350 4648 5724 2538722 26bce2 condor_shadow-sans-answers-null_xaction
Ok, so that was just a diversion. Back to big game. From the original pmap dump, there are a number of libraries linked in with the shadow that are using memory of their own. One of those libraries, libcrypt is using far more than the others.
008e8000 28 r-x-- 0000000000000000 0fd:00000 libcrypt-2.11.1.so 008ef000 4 r---- 0000000000007000 0fd:00000 libcrypt-2.11.1.so 008f0000 4 rw--- 0000000000008000 0fd:00000 libcrypt-2.11.1.so 008f1000 156 rw--- 0000000000000000 000:00000 [ anon ]
* * *
The existing build system for Condor, which uses imake(!), has a nasty habit of linking all binaries with all libraries. And as it turns out, crypt is not used by Condor code anymore. It has been replaced by OpenSSL’s DES implementation, likely many years ago.
Running without libcrypt,
$ pmap -d $(pidof condor_shadow-sans-answers-null_xaction-crypt) | grep -e rw -e private 005e6000 4 rw--- 000000000000b000 0fd:00000 libnss_files-2.11.1.so 00657000 4 rw--- 000000000001c000 0fd:00000 libgcc_s-4.4.3-20100127.so.1 0073b000 8 rw--- 00000000000e0000 0fd:00000 libstdc++.so.6.0.13 0073d000 24 rw--- 0000000000000000 000:00000 [ anon ] 00747000 4 rw--- 0000000000002000 0fd:00000 libcom_err.so.2.1 0074c000 4 rw--- 0000000000001000 0fd:00000 libkeyutils-1.2.so 00802000 24 rw--- 00000000000b3000 0fd:00000 libkrb5.so.3.3 008ad000 4 rw--- 000000000002d000 0fd:00000 libgssapi_krb5.so.2.2 008da000 4 rw--- 000000000002a000 0fd:00000 libk5crypto.so.3.1 008e5000 4 rw--- 0000000000007000 0fd:00000 libkrb5support.so.0.1 0096c000 16 rw--- 0000000000051000 0fd:00000 libssl.so.1.0.0 00993000 4 rw--- 000000000001e000 0fd:00000 ld-2.11.1.so 00b06000 4 rw--- 0000000000170000 0fd:00000 libc-2.11.1.so 00b07000 12 rw--- 0000000000000000 000:00000 [ anon ] 00b23000 4 rw--- 0000000000016000 0fd:00000 libpthread-2.11.1.so 00b24000 8 rw--- 0000000000000000 000:00000 [ anon ] 00b2c000 4 rw--- 0000000000003000 0fd:00000 libdl-2.11.1.so 00b58000 4 rw--- 0000000000028000 0fd:00000 libm-2.11.1.so 00b6d000 4 rw--- 0000000000011000 0fd:00000 libz.so.1.2.3 00b98000 4 rw--- 000000000001c000 0fd:00000 libselinux.so.1 00bb1000 4 rw--- 0000000000015000 0fd:00000 libresolv-2.11.1.so 00bb2000 8 rw--- 0000000000000000 000:00000 [ anon ] 04414000 80 rw--- 0000000000170000 0fd:00000 libcrypto.so.1.0.0 04428000 12 rw--- 0000000000000000 000:00000 [ anon ] 04a1f000 4 rw--- 000000000002e000 0fd:00000 libpcre.so.0.0.1 082ad000 8 rw--- 0000000000264000 0fd:00001 condor_shadow-sans-answers-null_xaction-crypt 082af000 4 rw--- 0000000000000000 000:00000 [ anon ] 0908e000 396 rw--- 0000000000000000 000:00000 [ anon ] b77bd000 20 rw--- 0000000000000000 000:00000 [ anon ] b77dc000 12 rw--- 0000000000000000 000:00000 [ anon ] bf820000 84 rw--- 0000000000000000 000:00000 [ stack ] mapped: 9548K writeable/private: 780K shared: 0K
We see a huge improvement in both mapped and private memory usage. 780K/972K shows a ~19% improvement, or a reduction in memory of ~1.8GB over 10,000 running jobs.
BIG NOTE: The condor_shadow used above is from the 7.5 development series, which contains a problem around inclusion of a static parameter table on its heap. Using a version without the config table and some config minimzation tricks yields similar results, private memory footprint going from 708K to 516K (516/708=0.728). All on my 32-bit laptop of course. Memory on a 64-bit architecture goes from ~1,400K to ~1,000K (1000/1400=0.714).
There is clearly still more to be done here. In addition to evaluating the other libraries linked into the shadow and eliminating many of the static buffers, the memory allocations reported by Massif can be explored. However, the config table factored out there is likely only 100KB to explore.