Archive for the ‘QMF’ Category

Submitting to Condor from QMF: A minimal Vanilla Universe job

April 30, 2009

Part of Red Hat’s MRG product is a management system that covers Condor. At the core of that system is the Qpid Management Framework (QMF), built on top of AMQP. Condor components modeled in QMF allow for information retrieval and control injection. I previously discussed How QMF Submission to Condor Could work, and now there’s a prototype of that interface.

Jobs in Condor are represented as ClassAds, a.k.a. a job ad, which are essentially schema-free. Any schema they have is defined by the code paths that handle jobs. This includes not only the name of attributes but also the types of their values.. For instance, a job ad does not have to have an attribute that describes what program to execute, the Cmd attribute whose value is a string, but if it does then the job can actually run.

Who cares what’s in a job ad?

Most all components in a Condor system handle jobs in one way or another, which means they all help define what a job ad looks like. To complicate things a bit, different components handle jobs in different ways depending on the job’s type or requested features. For simplicity I’m only going to discuss jobs that want to run in the Vanilla Universe with minimal extra features, e.g. no file transfer requests.

The Schedd is the first component that deals with jobs. Its purpose is to manage them. It insists all jobs it manages have: an Owner string, an identify of the job’s owner; a ClusterId string, an identifier assigned by the Schedd; a ProcId string, an identifier within the ClusterId, also assigned by the Schedd; a JobStatus integer, specifying the state the job is in, e.g. Idle, Held, Running; and, a JobUniverse integer, denoying part of a job’s type, e.g. Vanilla Universe or Grid Universe.

The Negotaitor is sent jobs by the Schedd to perform matching with machines. To do that matching it requires that a job have a Requirements expression.

The Shadow helps the Schedd manage a job while it is running. The Schedd gives it the job ad to the Shadow, and the Shadow insists on finding an Iwd string representing a path to the job’s initial working directory.

The Startd handles jobs similarly to the Negotiator. Since matching is bi-directional in Condor and execution resources get final say in running a job, the Startd evaluates the job’s Requirements expression before accepting it.

The Starter is responsible for actually running the job. It requires that a job have the Cmd string attribute. Without one the Starter does not know what program to run. An additional requirement the Starter imposes is the validity of the Owner string. If the Starter is to impersonate the Owner, then the string must specify an identify known on the Starter’s machine.

Now, those are not the only attributes on a job ad. Often components will fill in sane defaults for attributes they need but are not present. For instance, the Shadow will fill in the TransferFiles string, and the Schedd will assume a MaxHosts integer.

What about tools?

It’s not just the components that manage jobs that care about attributes on a job. The condor_q command-line tool also expects to find certain attributes to help it display information. In addition to those attributes already described, it requires: a QDate integer, the number of seconds since EPOCH when the job was submitted; a RemoteUserCpu float, an attribute set while a job runs; a JobPrio integer, specifying the job’s relative priority to other jobs; and, an ImageSize integer, specifying the amount of memory used by the job in KiB.

What does a submitter care about?

Anyone who wants to submit a job to Condor is going to have to specify enough attributes on the job ad to make all the Condor components happy. Those attributes will vary depending on the type of job and the features it desires. Luckily Condor can fill in many attributes with sane defaults. For instance, condor_q wants a QDate and a RemoteUserCpu. Those are two attributes that the submitter should not be able to or have to specify.

To perform a simple submission for running a pre-staged program, the job ad needs: a Cmd, a Requirements, a JobUniverse, an Iwd, and an Owner. Additionally, an Args string is possible if the Cmd takes arguments.

Given the knownledge of what attributes are required on a job ad, and using the QMF Python Console Tutorial, I was able to quickly write up an example program to submit a job via QMF.

#!/usr/bin/env python

from qmf.console import Session
from sys import exit

UNIVERSE = {"VANILLA": 5, "SCHEDULER": 7, "GRID": 9, "JAVA": 10, "PARALLEL": 11, "LOCAL": 12, "VM": 13}
JOB_STATUS = ("", "Idle", "Running", "Removed", "Completed", "Held", "")

ad = {"Cmd":          {"TYPE": STRING_TYPE,  "VALUE": "/bin/sleep"},
      "Args":         {"TYPE": STRING_TYPE,  "VALUE": "120"},
      "Requirements": {"TYPE": EXPR_TYPE,    "VALUE": "TRUE"},
      "JobUniverse":  {"TYPE": INTEGER_TYPE, "VALUE": "%s" % (UNIVERSE["VANILLA"],)},
      "Iwd":          {"TYPE": STRING_TYPE,  "VALUE": "/tmp"},
      "Owner":        {"TYPE": STRING_TYPE,  "VALUE": "nobody"}}

session = Session(); session.addBroker("amqp://localhost:5672")
schedulers = session.getObjects(_class="scheduler", _package="mrg.grid")
result = schedulers[0].Submit(ad)

if result.status:
    print "Error submitting job:", result.text

print "Submitted job:", result.Id

jobs = session.getObjects(_class="job", _package="mrg.grid")
job = reduce(lambda x,y: x or (y.CustomId == result.Id and y), jobs, False)
if not job:
    print "Did not find job"

print "Job status:", JOB_STATUS[job.JobStatus]
print "Job properties:"
for prop in job.getProperties():
    print " ",prop[0],"=",prop[1]

The program does more than just submit, it also looks up the job based on what has been published via QMF. The job that it does submit runs /bin/sleep 120 as the nobody user from /tmp on any, Requirements = TRUE, execution node.

The job’s ClassAd is presented as nested maps. The top level map holds attribute names mapped to values. Those values are themselves maps that specify the type of the actual value and a representation of the actual value. All representations of values are strings. The type specifies how the string should be handled, e.g. if it should be parsed into an int or float.

Two good sources of information about job ad attributes are the UW’s Condor Manual’s Appendix, and the output from condor_submit -dump when run against a job submission file.

How QMF Submission to Condor Could Work

March 23, 2009

I recently goofed and told someone that they could use the Qpid Management Framework (QMF) to submit jobs to Condor. What I meant to say is they can use AMQP. This is maybe understandable because QMF is a management framework built on top of AMQP, and MRG Grid already has many parts of Condor modeled in QMF, but submission via QMF could be very different than via AMQP.

QMF is a framework that allows for the modeling of objects that can publish information about themselves as well as respond to actions. All information and control is sent via AMQP messages.

Along with a quick correction to my comment, s/QMF/AMQP/, I went ahead and mocked up a QMF submission interface to make my comment almost true.

Existing Submission Interfaces

Condor already has a number of submission interfaces: the command-line tools, e.g. condor_submit; a GAHP interface, the condor_c-gahp; a SOAP interface, once termed Birdbath; the previously mentioned AMQP interface; and a few others. So, what’s one more? Or, why one more!?

Command-line Interface

The command-line interface is the default means for submitting jobs to Condor’s Scheduler, the condor_schedd. The condor_submit tool takes a job description file, performs some processing on it, and generates one or many ClassAds representing jobs, a.k.a job ads. The condor_schedd only cares about job ads, and is never exposed to the job description file. condor_submit’s processing is sometimes shallow, e.g. executable = /bin/true becomes Cmd = "/bin/true", and sometimes not, e.g. getenv = TRUE becomes Environment = "<contents of env for condor_submit>". Sometimes the processing is even iterative in nature, e.g. queue 1000000 generates one million copies of the job constructed since the last queue command. The job description file is really a script in the condor_submit language that generates jobs. This makes the condor_submit tool thick, and optimizations that it performs requires it to be tightly integrated with condor_schedd.

SOAP Interface

The SOAP interface (starts slide 15) is very different from condor_submit. It is implemented within the condor_schedd, and exposes a transactional interface that accepts job ads as input. This means no high level job description file processing. It also means the thick condor_submit tool could be implemented on top of the SOAP interface. A job ad that might be submitted via SOAP would look like:

    Arguments="Hello there!";

This is a job ad that may have been created from a job description file like:

   executable = /bin/echo
   arguments = Hello there!
   requirements = TRUE
   queue 1

Pass that to condor_submit -dump to have a look.

A QMF Interface

So, what about a QMF submission interface. A nice aspect of the condor_submit interface is the script nature of the input. Unfortunately, there are some things that cannot be cleanly captured on the remote side of a submission, e.g. the getenv command, transfer_input_files, platform specific requirements bits, or working directory information. To some extent these reasons, and the desire to keep script processing out of the condor_schedd, is why the SOAP interface only deals in job ads. It’s also a reason why a QMF interface should only handle job ads.

A benefit of the SOAP interface is, quite obviously, that it makes for a more natural programmatic interface. Unfortunately, it also exposes concepts and optimizations that are used by condor_submit and may not be needed by other submission programs, e.g. transactions and clusters.

A Submission

One thing that is an afterthought when using both interfaces is the notion of a submission, something that binds together jobs based on their overall purpose. Often a cluster is thought of as the means to group jobs. However, a single job description file can generate multiple clusters. Likewise, the SOAP interface can allow for group all jobs into a single cluster, but if one of the jobs is a DAGMan workflow then the point of the single cluster is violated. The use of clusters to associate jobs is broken.

Two things the QMF interface can do are: 1) simplify the operations required to perform a submission; and, 2) motivate its users to materialize the notion of a submission.

A QMF submission API

   submit, ClassAd -> Id : Submit a new job described by ClassAd


   create, void -> Id : Create a transaction to submit data and a job ad
   send, Id x Data -> void : Spool data for a forthcoming job ad

This interface would be a great simplification over the SOAP API. It eliminates the necessity of a transaction and chunked data transfers, and it does not expose the notion of a cluster. Without a cluster, job association must be done in some other way. The natural way is via an attribute on job ads, including DAGMan jobs. All jobs in a submission could have an attribute Submission = "Monday Parameter Sweep Run, features: A, B, D", a +Submission = "Monday..." in a job description file.

This interface does not have some of the high level niceties of a condor_submit submission. However, those niceties are not necessarily the ability to do many things with one line, e.g. queue 100, but to have a well defined description of a job. Understanding executable becomes the Cmd attribute is one thing, knowing universe = vanilla becomes JobUniverse = 5 is significantly different. Shortcomings in the high level interface can be addressed with improved specification for a job ad.

%d bloggers like this: