Last modified 9 years ago Last modified on 07/10/14 18:00:17

Utility Functions

Utility functions are site-definable functions that Cobalt uses to generate a job's score. These are Python functions that are internally evaluated by the queue-manager, cqm. Fairly sophisticated decisions can be encoded in these score functions, as well as behaviors, and this allows the scheduler's behavior to be highly tailored to a specific system and workload prioritization.

Examples of Utility Functions

We have been testing out two utility functions which try to achieve two different goals. Both functions make use of the ratio of queued time to requested wall time. This is a value that increases as jobs wait, and also captures the fact that, for example, waiting an hour before running is more painful for a 20 minute job than for a 6 hour job. We usually refer to this ratio as the "unitless waiting time".

The first function is defined as:

def wfp():
    global wall_time
    wall_time = wall_time * 60
    val = (queued_time / wall_time)**3 * size
    return (val, 0.65 * val)

This function aims to avoid large job starvation. All jobs get increasingly angry the longer they wait, and bigger jobs moreso.

The second function is defined as:

def unicef():
    n = max(math.ceil(math.log(size)/math.log(2)), 6.0)
    z = n - 5
    # avoid dividing by zero
    val = (queued_time / (60*z*max(wall_time, 1.0)))
    return (val, min(0.75*val, 4.5))

This function is used on our T&D system and on the development portion of intrepid. It's goal is to provide fast turn around for small jobs. In this function, job size serves to penalize jobs by "stretching out" the wall time requested. This slows the rate at which large jobs accumulate their utility scores. The weird computation of n is to find the log base 2 of the size of the partition which will run the job. On our machines, 26 = 64 is the smallest available partition. The value of z is thus 1 for the smallest partition, so small jobs don't get their wall time stretched, while any larger job will.

Writing Utility Functions

If you are creating your own utility function the following information is provided by cqm:

queued_time = The current queued time
wall_time = the requested wall clock time of the job in seconds
wall_time_p = the predicted wall clock time of the job in seconds
hold_time = The time the job has been in a hold state
size  = the size of the job being considered
user_name  = user of the job under consideration
project = project of the job
queue_priority: priority factor of the queue
jobid = the jobid of the job
score = the current job score
state = the current state of the job

Keep in mind that Cobalt currently tracks requested wall-clock times in minutes, while the queued time is in seconds.