Utility Functions
Utility functions are site-definable functions that Cobalt uses to generate a job's score. These are Python functions that are internally evaluated by the queue-manager, cqm. Fairly sophisticated decisions can be encoded in these score functions, as well as behaviors, and this allows the scheduler's behavior to be highly tailored to a specific system and workload prioritization.
Examples of Utility Functions
We have been testing out two utility functions which try to achieve two different goals. Both functions make use of the ratio of queued time to requested wall time. This is a value that increases as jobs wait, and also captures the fact that, for example, waiting an hour before running is more painful for a 20 minute job than for a 6 hour job. We usually refer to this ratio as the "unitless waiting time".
The first function is defined as:
def wfp(): global wall_time wall_time = wall_time * 60 val = (queued_time / wall_time)**3 * size return (val, 0.65 * val)
This function aims to avoid large job starvation. All jobs get increasingly angry the longer they wait, and bigger jobs moreso.
The second function is defined as:
def unicef(): n = max(math.ceil(math.log(size)/math.log(2)), 6.0) z = n - 5 # avoid dividing by zero val = (queued_time / (60*z*max(wall_time, 1.0))) return (val, min(0.75*val, 4.5))
This function is used on our T&D system and on the development portion of intrepid. It's goal is to provide fast turn around for small jobs. In this function, job size serves to penalize jobs by "stretching out" the wall time requested. This slows the rate at which large jobs accumulate their utility scores. The weird computation of n is to find the log base 2 of the size of the partition which will run the job. On our machines, 26 = 64 is the smallest available partition. The value of z is thus 1 for the smallest partition, so small jobs don't get their wall time stretched, while any larger job will.
Writing Utility Functions
If you are creating your own utility function the following information is provided by cqm:
queued_time = The current queued time wall_time = the requested wall clock time of the job in seconds wall_time_p = the predicted wall clock time of the job in seconds hold_time = The time the job has been in a hold state size = the size of the job being considered user_name = user of the job under consideration project = project of the job queue_priority: priority factor of the queue jobid = the jobid of the job score = the current job score state = the current state of the job
Keep in mind that Cobalt currently tracks requested wall-clock times in minutes, while the queued time is in seconds.