Version 1 (modified by richp, 8 years ago) (diff) |
---|
cobalt.conf
NAME
cobalt.conf - configuration parameters for cobalt componentsSYNOPSIS
/etc/cobalt.confDESCRIPTION
The Cobalt configuration file is an "ini-style" configuration file. This configuration file has sections for all Cobalt components and clients in a given instance. The general format of a section is:[section]
Values that are lists are ":"-delimited. In the event that a key is defined multiple times in a section, the value of the last key in the section will be the value used. Comments may be made in a file by beginning a like with a '#'. Comments must not be inline with key-value pairs. The sections that follow describe the various sections and their options.
If a configuration value definition is mandatory, that will be noted.
General Sections
[components]
- service-location
- The url:port of the service-locator component (slp). The default port is 8256. This must be specified for a given Cobalt instance
- python
-
The path to the python interpreter to use. If omitted, the default is
/usr/bin/python
[communication]
SSL configuration for Cobalt. These must be specified per-install- key
- Key to use for SSL communication. May be generated via openssl(1)
- cert
- This is a locally stored certification for authenticating the key.
- ca
- Certification authority for the cert entry. This is typically the same as cert.
- password
- Required to be set in configuration file. This is a shared secret for all Cobalt daemons and clients to use, and is the password required for Cobalt's internal XMLRPC communication.
[statefiles]
Options for Cobalt's statefile persistence.- location
- Path to where the statefiles are stored.
[system]
Common system configuration settings. These apply to all types of systems.- size
- Maximum size of a given system in nodes.
[forker]
Common option for Cobalt forker components.- ignore_setgroup_errors
- Default false. If set to true, then setuid/setgid failures will not kill jobs. Necessary for running local or subinstances of Cobalt, as well as running any non-root simulation mode. If this is set and the forker components are not running as root, this will cause any job ran to be run as the user to run as the user that the forker component is running as.
- save_me_interval
- The minimum interval that Cobalt will wait between saving statefiles for this component, in seconds. By default the interval is 10.0 seconds. Under periods of high load on the component, the interval between statefiles may be longer.
[logger]
This section handles cobalt component logging and default levels. Valid logging levels in this section are DEBUG, INFO, NOTICE, WARNING, ERROR and CRITICAL- to_syslog
- If true, send logging data to the syslog daemon.
- syslog_level
- Only send messages to syslog at this level or higher. The default level is INFO
- syslog_location
- Location of logfile
- syslog_facility
- Logger facility to send logs to. The default is local0
- to_console
- Send logging data to console or stdout/stderr as appropriate. This defaults to true.
- console_level
- Only send messages to the console at this level or higher. The default level is INFO
[bgsched]
- default_reservation_policy
- If set, this is the score accrual policy that will be used on reservation queues. The default policy is "default" (fifo).
- db_flush_interval
- The minimum frequency with which messages are sent to the database component. use_db_logging must be set to true, and the default interval is 10 seconds. log_dir The directory to place reservation accounting logs.
- overflow_file
- This is a file location to use for holding database messages should use_db_logging be set to true, but the CobaltDB writer component is unavailable for an extended period of time. If this file is present, then on cdbwriter startup, messages from this file will be pushed to the component and added to the database, followed by in-memory pending messages.
- max_queued_messages
- This is the number of messages to keep in memory before flushing to the overflow_file. If set to -1, the component will never flush to the overflow file. If this is not set, then the overflow file will not be used.
- save_me_interval
- The minimum interval that Cobalt will wait between saving statefiles for this component, in seconds. By default the interval is 10.0 seconds. Under periods of high load on the component, the interval between statefiles may be longer.
- schedule_jobs_interval
- This is the minimum interval between iterations of the scheduling loop. The default time is 10 seconds.
- utility_file
- Location of file for site-defined utility functions.
- use_db_logging
-
If true, send messages to CobaltDB, or cache the messages that would be sent
if the CobaltDB writer is currently unavailable for later writing. The default
is false
[cqm]
These are options for the queue-manager component, cqm. Cqm handles queueing and overall job tracking operations.- filters
- A colon-delimited list of paths to scripts to run. These are run by the clients that work with cqm(8), specifically, qsub(1), qalter(1), and qmove(1). These are invoked from the clients and these scripts must run return an exit status of 0 prior to the job, or job modification being passed into cqm. These are intended as site-specific validation scripts. Scripts recieve job parameters as key=value pairs as arguments, and any key=value pairs written to stdout will modify job parameters accordingly, for instance a non-default initial score of 500 may be written to stdout as score=500. If a job would fail to pass the filter entirely, then it should return a nonzero exit status. A note as to which filter failed should be presented to the user. It should be noted that cqadm(1) as an admin-level command does not run these filters. Since the filters are invoked as a part of client invocation, any change to this parameter to a running Cobalt instance will have an immediate effect without signaling or restart.
- job_prescripts
- A colon-delimited list of scripts to run when the job is scheduled, but prior to job invocation. These are run once per job, whether or not it is preempted. Nonzero exit statuses in these scripts are fatal to a job starting up.
- job_postscripts
- A colon-delimited list of scripts to run after the job has ended. These are run once per job, whether or not it is preempted. Nonzero exit statuses in these scripts have no effect on a job.
- resource_prescripts
- A colon-delimited list of scripts to run when the job is scheduled, but prior to job invocation. These are run once per task, prior to resuming from preemption. Nonzero exit statuses in these scripts are fatal to a job starting up.
- resource_postscripts
- A colon-delimited list of scripts to run after the job has ended. These are run after each preemption step. Nonzero exit statuses at the end of a job in these scripts have no effect on a job.
- dep_frac
- The floating-point fraction of a job's score that a dependent job inherits. This sets a default value and may be overridden on a per-job basis by the schedctl(1) command. The default is 0.5.
- scale_dep_frac
- If set to true, the dependency fraction inherited by jobs will be modified by the ratio of the size of the resources the dependent job to the job it is inheriting score from. This only applies to dependent jobs that are smaller than the job they are inheriting from. For instance, a 4 node job depending on an 8 node job would inherit half the score fraction than an 8 node job that depended on an 8-node job.
- mailserver
- The address of the mailserver to use for sending admin emails and requested user emails for startup and termination notification.
- force_kill_delay
- The length of time, in seconds, to wait between sending a SIGTERM and a SIGKILL to a job. The default is 300 seconds.
- log_dir
- The directory to place job accounting logs.
- overflow_file
- This is a file location to use for holding database messages should use_db_logging be set to true, but the CobaltDB writer component is unavailable for an extended period of time. If this file is present, then on cdbwriter startup, messages from this file will be pushed to the component and added to the database, followed by in-memory pending messages.
- max_queued_messages
- This is the number of messages to keep in memory before flushing to the overflow_file. If set to -1, the component will never flush to the overflow file. If this is not set, then the overflow file will not be used.
- save_me_interval
- The minimum interval that Cobalt will wait between saving statefiles for this component, in seconds. By default the interval is 10.0 seconds. Under periods of high load on the component, the interval between statefiles may be longer.
- utility_file
- Location of file for site-defined utility functions.
- use_db_logging
- If true, send messages to CobaltDB, or cache the messages that would be sent if the CobaltDB writer is currently unavailable for later writing. The default is false
- poll_process_groups_interval
- The interval in seconds between queries to the system component for process group status.
- use_db_jobid_generator
- If true, use CobaltDB to generate a unique jobid. This may be used to ensure unique jobids across multiple Cobalt instances on related resources. Default false.
- progress_interval
- The minimum time in seconds between job statemachine steps. Default 10 seconds.
- max_walltime
- If set, defines a general maximum requested walltime for all queues. May be overriden by setting the MaxWalltime property on a given queue. If this is not set, then there is no default limit on the length of time a user job may request, unless explicitly set as a part of a given queue.
- compute_utility_interval
- The minimum time in seconds to wait between score calculation iterations. The default is 10 seconds.
- cqstat_header
- A colon-delimited list of display headers to use in qstat(1)'s default display. A default set of headers will be used if this is not set.
- cqstat_header_full
-
A colon-delimited list of display headers to use with
qstat(1)'s
-f flag. If not set, a default set of display headers are used. This does
not change the -f -l combination for display.
[cdbwriter]
- log_dir
- The directory to place cdbwriter message overflow files.
- user
- The user to connect to DB2. It is recommended to use a user identity that only has access to the Cobalt database. This user requires read, write, and update permissions on the Cobalt database.
- pwd
- This is the password that the user will use to connect to the Cobalt database.
- database
- The name of the database in DB2 to connect to that contains the Cobalt database.
- schema
- The name of the DB2 schema where the Cobalt database resides. Multiple schemas may exist in the same database, which is useful for handling multiple, related, Cobalt instances.
- save_me_interval
-
The minimum interval that Cobalt will wait between saving statefiles for this
component, in seconds. By default the interval is 10.0 seconds. Under periods
of high load on the component, the interval between statefiles may be longer.
Cluster System Sections
[cluster_system]
- simulation_mode
- Set the cluster_system component to run in a simulation mode. In this mode, The cluster system will not actually run jobs on target nodes in its configuration, but it will instead run the simulation_executable which will provide statistics on what would have ran. Otherwise the system component will track and allocate resources as though it was actually running on a multi-node cluster, with a confguration sprcified in the hostfile entry if true. This defaults to false.
- simulation_executable
- Instead of running pre and postscripts, run the specified executable. This must be specified if running in simulation_mode. Output from this script is logged to the cluster_system component's logs.
- run_remote
- If set to false, do not attempt to run pre/postscripts on remote resources. The default is true.
- hostfile
- This is a list of hostnames for nodes that the cluster system component can schedule. Nodes may be added or removed, and the list of available nodes is updated at restart.
- epilogue
- This is a colon-delimited set of scripts to run on a per-node basis on task termination on a resource. If any script returns a non-zero exit status, the node will be marked down, and no new jobs will be scheduled on that resource.
- epilogue_timeout
-
The amount of time in seconds to wait for each script to complete. If the script has
not completed and exited with a status of 0 before this timeout is reached, that node
will be marked down.
- prologue
- Not currently used. Per-node scripts are currently launched as a part of the cqm(8) resource_prologue
- prologue_timeout
- This is not currently used within the cluster system component
- allocation_timeout
- This is the time in seconds to wait when resources are allocated, but have not had a job started on them. This usually occurs when a user deletes a job while it is starting up. After this timeout has elapsed the resources will be returned to the pool of available nodes, and a new job may be scheduled on the resources. The default timeout is 300 seconds.
- drain_mode
- This sets the backfill mode to use and may be one of backfill, drain_only, or first_fit. The first_fit mode will run the highest scored job that can immediately run on resources available. The drain_only mode will run the highest scored job, if sufficient resources are available or it will start draining nodes and then run the job once sufficient resources are available. The backfill mode will run and drain resources as the drain_only mode, but will also attempt to run jobs on the empty, but draining nodes in a score-order first-fit manner. It is recommended that backfill be used if draining is permitted for improved utilization of cluster resources.
- minimum_backfill_window
-
This is the minimum amount of backfill time to set for a set of resources that
being cleaned by post-job epilogue scripts. The default is 300 seconds.
BlueGene/P Sections
[bgpm]
- mmcs_server_ip
- The IP address of the BlueGene mmcs_server.
- mpirun
-
The location of the BlueGene mpirun binary. This is typically
/bgsys/drivers/ppcfloor/bin/mpirun
[bgsystem]
- kernel
- If true, allow the use of alternative kernels
- bootprofiles
- This is a path to the directory that holds the alternate kernel subdirectories. If alternate kernel support is being used, then this must be set.
- This is the location of where symlinks to the current profiles of partitions
- should be made. Cobalt will autogenerate these symlinks as a part of the boot process on an as-needed basis.
- bgtype
-
The type of BlueGene being run on. For BlueGene/Q this should be set to 'bgp'.
BlueGene/Q Sections
[bgpm]
- runjob
-
The location of the BlueGene runjob binary. This is typically
/bgsys/drivers/ppcfloor/bin/runjob
[bgsystem]
- allow_alternate_kernels
- If set to true, allow alternate kernels to be run by users using the --kernel or --io_kernel flags to qsub(1). This defaults to false.
- bootprofiles
- This is a path to the directory that holds the alternate kernel subdirectories. If alternate kernel support is being used, then this must be set.
- This is the location of where symlinks to the current profiles of partitions
- should be made. Cobalt will autogenerate these symlinks as a part of the boot process on an as-needed basis.
- default_kernel
- The default compute-node kernel image to use. This name should be a directory found at the path indicated by . This value is set to 'default' by default.
- default_kernel_options
- A list of options to pass to the default kernel image.
- ion_default_kernel
- The default IO-node kernel image to use. This name should be a directory found at the path indicated by ion_default_kernel_options A list of options to pass to the default kernel image. . This value is set to 'default' by default.
- subblock_prefix
- This is a location prefix to attach to subblock names. Usually this is the resource's prefix for the Cobalt instance. The default for subblock use is "COBALT".
- subblock_config
-
Sets a configuration for subblock use. This is a key-value list of the form:
-
-
"[blockname1:min_size1],[blockname2:min_size2],..."
-
-
- ignore_subblock_sizes
- A colon-delimited list of sizes to skip when generating pseudoblocks for automatic subblock use.
- terminal_boot_timeout
- Sets an automatic timeout in seconds for block boots initiated by Cobalt's boot_block(1) command. The default is 300 seconds.
- bgtype
-
The type of BlueGene being run on. For BlueGene/Q this should be set to 'bgq'.
ENVIRONMENT
COBALT_CONFIG_FILES If set, Cobalt will use the configuration pointed to by this path.FILES
- /etc/cobalt.conf
-
This is the default location for the configuration file used by all Cobalt
daemons and clients. Due to the potential for abuse of the
XMLRPC
interfaces, access to this file should be carefully controlled. This file
does not to be writable under normal conditions, and only must be readable
by the user used by Cobalt's setgid wrappers. By default, this is the
cobalt
user.
SEE ALSO
slp(8), bgpm(8), bgsched(8), cqm(8)
Index
This document was created by man2html, using the manual pages.
Time: 18:13:41 GMT, June 18, 2015