Archive for the ‘User Experience’ Category

Web design complexity

January 7, 2013

One thing that has always impressed me is the ability of web designers to deal with browser idiosyncrasies.

For instance, knowing why this happens in firefox-17.0.1-1.fc17.x86_64 –

A bootstrap btn-primary viewed from 0.0.0.0

A bootstrap btn-primary viewed from 0.0.0.0

A bootstrap btn-primary viewed from localhost

A bootstrap btn-primary viewed from localhost

A bootstrap btn-primary firebug computed color from 0.0.0.0 and localhost

A bootstrap btn-primary firebug computed color from 0.0.0.0 and localhost

Needless to say, the web is littered with questions about why btn-primary background color is not always white. Most have answers, some with varying degrees of complex css. Others involve changing versions of software. All the while it might just be the URL used to view the page.

Test in a production environment.

Advertisements

Timeouts from condor_rm and condor_submit

December 16, 2009

The condor_schedd is an event driven process, like all other Condor daemons. It spends its time waiting in select(2) for events to process. Events include: condor_q queries, spawning and reaping condor_shadow processes, accepting condor_submit submissions, negotiating with the Negotiator, removing jobs during condor_rm. The responsiveness of the Schedd to user interaction, e.g. condor_q, condor_rm, condor_submit, and process interaction, e.g. messages with condor_shadow, condor_startd or condor_negotiator, is effected by how long it takes to process an event and how many events are waiting to be processed.

For instance, if a thousand condor_shadow processes start up at the same time there may be a thousand keep-alive messages for the Schedd to process after a single call to select. Once select returns, no new events will be considered until the Schedd calls select again. A condor_rm request would have to wait. Likewise, if any one event takes a long time to process, such as a negotiation cycle, it can also keep the Schedd from getting back to select and accepting new events.

Basically, to function well, the Schedd needs to get back to select as fast as possible.

From a user perspective, when the Schedd does not get back to select quickly, a condor_rm or condor_submit attempt may appear to fail, e.g.

$ time condor_rm -a

Could not remove all jobs.

real	0m20.069s
user	0m0.020s
sys	0m0.020s

As of the Condor 7.4 series, this rarely happens because of internal events that the Schedd is processing. The Schedd uses structures that allow such events to be interleaved with calls to select. However, some events still take long periods of time, e.g. the removal of 300,000 jobs above. One such event is a negotiation cycle initiated by the Negotiator. If a condor_rm, condor_q, condor_submit, etc happens during a negotiation, there is a good chance it may timeout.

Though a simple re-try of the tool will often succeed, this timeout may be annoying to users of the tools, be they people or processes. An alternative to a re-try is to extend the timeout used by the tool. The default timeout is 20 seconds, which is very often long enough, but may not be in large pools.

To extend the timeout for condor_submit, put SUBMIT_TIMEOUT_MULTIPLER=3 in the configuration file read by condor_submit. To extend the timeout for condor_q, condor_rm, etc, put TOOL_TIMEOUT_MULTILIER=3 in the configuration file read by the tool. These changes will take the default timeout, 20 seconds, and multiply it by 3, giving the Schedd 60 seconds to respond. For instance, with 100Ks of jobs in the queue:

$ _CONDOR_TOOL_TIMEOUT_MULTIPLIER=3 time condor_rm -a
All jobs marked for removal.
0.01user 0.02system 0:53.99elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+4374minor)pagefaults 0swaps

%d bloggers like this: