Posts Tagged ‘Cache’

Know your environment: Importance of infrastructure – DNS

April 6, 2011

Consider running nscd on your core nodes.

All programs run in some environment. A program’s environment will have a significant impact on how the program executes. All programs should be very aware of how they interact with their environment.

Applications rely on their host operating system for much of their executing environment. An application might simply rely on the operating system for memory allocation, maybe even without alloc/free. Or an application may use a complex mixture of accessing disk, memory, network, semaphores, etc. All of these resources have trade-offs.

Distributed systems execute in an even more complex mixture. They will often hit all of the operating system provided resources, but also network present services. In doing so, those needing to understand the execution environment expands beyond just the application to administrators.

During some scale testing of Condor, an unusual execution pattern appeared in condor_submit and condor_q. Periodic, 5 to 15 minute stalls at about a 20% frequency. Noticeable as gaps in log files and debug output. Try LogTimeGap.awk.

Investigation with strace revealed the time gaps were during communication with DNS servers. Specifically, resolving an alias used in COLLECTOR_HOST. Try strace -e connect,sendto,recvfrom,poll condor_q and time _CONDOR_TOOL_DEBUG=D_ALL condor_q -debug 2>&1 | awk -f LogTimeGap.awk 1. Simply replacing the alias with the CNAME it referred to eliminated the gaps immediately. See bug 682442. A straightforward resolution, but it may not go far enough.

The issue was slow name resolution, and a single case was worked around. What about other cases though? What if non-alias resolutions start taking a long time. Two more complete solutions appear: 0) perform a transform step on configuration and pre-resolve all hostnames, 1) add another service to the environment, one designed to mitigate these issues, called nscd.

The first option is fairly straightforward, and quite reasonable for many deployments. Especially deployments that take advantage of Wallaby to simplify configuration management. Though, it will take some care, will have to be Condor aware, and may have a gap if host based authentication is being used. Host based authentication is used out of the box and in many deployments, so it may be a high barrier to entry.

The second option appears simpler. It amounts to service nscd start and possibly chkconfig --levels 23 nscd on. It will handle host based authentication configurations more simply, and does not require the transformation step. However, you will have added yet another service to Condor’s environment, and, for that matter, a service you know will have a significant impact on execution. What happens when someone redeploying forgets to enable nscd, if nscd starts to misbehave itself, or nscd simply is not available. Are you sure nscd inter-operates with your round-robin domain aliases. nscd is going to interpose itself for other applications on the same system, were they written with proper layering in mind. A simple solution, but with possibly non-obvious implications.


Some interesting data during a particularly bad time period –

Running time condor_reschedule 1,000 times with and without nscd.

Without,

# Distribution of delays at second resolution, samples x time,
# 0 is the target. 0 is not hit, some executions up to 25 seconds.
$ grep ^real no-nscd.log | sed 's/.*m\([^.]*\).*/\1/' | sort -n | uniq -c
    327 0
    403 5
    197 10
     61 15
     10 20
      2 25

# Breakdown of sub-second executions, looking for consistency.
# Execution is not consistent, or not as consistent as with nscd.
$ grep ^real no-nscd.log | grep 0m0 | sort -n | uniq -c
     10 real	0m0.024s
     19 real	0m0.025s
     52 real	0m0.026s
     71 real	0m0.027s
     55 real	0m0.028s
     41 real	0m0.029s
     27 real	0m0.030s
      7 real	0m0.031s
      6 real	0m0.032s
      7 real	0m0.033s
      6 real	0m0.034s
      4 real	0m0.035s
      6 real	0m0.037s
      1 real	0m0.038s
      4 real	0m0.039s
      2 real	0m0.040s
      1 real	0m0.041s
      1 real	0m0.042s
      1 real	0m0.043s
      1 real	0m0.045s
      1 real	0m0.053s
      1 real	0m0.057s
      1 real	0m0.071s
      1 real	0m0.126s
      1 real	0m0.230s

With,

# Distribution of delays at second resolution, samples x time,
# 0 is the target.
$ grep ^real nscd.log | sed 's/.*m\([^.]*\).*/\1/' | sort -n | uniq -c
   1000 0

# Breakdown of sub-second executions, looking for consistency.
$ grep ^real nscd.log | grep 0m0 | sort -n | uniq -c
     69 real	0m0.012s
    888 real	0m0.013s
     36 real	0m0.014s
      6 real	0m0.015s
      1 real	0m0.016s

LogTimeGap.awk

#!/bin/awk -f

function parse_time(string) {
   return mktime(gensub(/([^/]*)\/([^ ]*)\/([^ ]*) ([^:]*):([^:]*):([^ ]*) .*/,
                        "1984 \\1 \\2 \\4 \\5 \\6", "g"))
}

BEGIN {
   previous_time = 0; previous_line = ""; current_time = 0
   ARGC = 1
   MAX_GAP = ARGV[1]
   if (MAX_GAP == "") MAX_GAP = 30
   print "Maximum allowable gap:", MAX_GAP, "seconds"
}

{
   current_time = parse_time($0)
   gap = current_time - previous_time
   if (previous_time > 0 && gap > MAX_GAP) {
	   print "Found gap of " gap " seconds:\n", previous_line "\n", $0
   }
   previous_line = $0
   previous_time = current_time
}

END { }
Advertisements

NFS and Job Initial Working Directory (Iwd)

February 14, 2010

Condor deployments tend to include a network file system, such as NFS, AFS or SMB, which allows users easy access to their files across many machines. The presence of such file systems also means that a user can skip using Condor’s file transfer mechanisms and have their jobs write output or read input directly from the networked locations, often the user’s home directory. Condor is more than happy to do this, as long as the user’s credentials are available to access the home directory, which is often the case. Condor will even go one step further.

Sometime in the past, a user was automating their job submission to Condor, similar to what DAGMan does, and ran into a problem when their files were written to NFS. Their meta-scheduler, as they’re called, was reading job output files and getting stale cached data. This meant the job may have completed but the machine on which the meta-scheduler was running only saw part of the output. To get around this issue the condor_schedd, which in this case was managing jobs for the meta-scheduler, was changed to try and flush the NFS cache for the job’s Iwd. When a job completes the Schedd checks to see if the Iwd is on NFS, and if so creates a temporary file that is immediately deleted. The Schedd’s log reports “Forcing NFS sync of Iwd” and a .condor_nfs_sync_XXXXXX file briefly lives in the Iwd. This of course has pros and cons.

On the plus side, this is helpful to meta-schedulers because now they never have to bother making sure data sources aren’t stale. Arguably the meta-scheduler should be fixed in this situation. On the negative side, all jobs that have an Iwd in NFS now incur a penalty in the form of some NFS round trips when they complete. This penalty can actually be very dramatic, even halving the number of jobs a single Schedd can complete in a second.

To address the performance hit, in Condor 7.4, the IwdFlushNFSCache job attribute was introduced. It defaults to True, and can be changed in a submit file with +IwdFlushNFSCache = False or for all new jobs with IwdFlushNFSCache = False followed by SUBMIT_EXPRS = IwdFlushNFSCache in configuration. As expected, IwdFlushNFSCache works as a guard to the code in the condor_schedd that flushes the Iwd on job completion.

Maybe in future versions of Condor (7.5+) the default will become False and those who need the cache flushing functionality will place +IwdFlushNFSCache = True in their submit files.


%d bloggers like this: