ALICE VO-Box LDAP Configuration Reference

 

 

General information

Except for the username that is used to run AliEn from the VO-Box, all configuration is taken from two sources: the LDAP configuration database and a local file called ~alicesgm/.alien/alice.conf. The relative priority for these two sources is set by the value of the localconfig item in ou=,ou=Sites,o=alice,dc=cern,dc=ch:

 

  • localconfig = ' ' is the default: all of the configuration is taken from the central DB and the local file is ignored altogether.

  • localconfig = overwrite: the local file has priority, i.e. if a configuration item is defined in both places, the one from the local file is used

  • localconfig = add: the configuration is taken from the central DB; however, configuration items that are not defined in the central DB can be defined in the local file.

For a production site, except in special cases, all of the configuration should be taken from LDAP, i.e. the local file is not needed.

If such is the case, the local file is parsed anyway, no action is taken and warning messages are generated in the services logs:

 

Jul 14 15:42:50 info Reading the configuration file from /home/alicesgm/.alien/alice.conf
Jul 14 15:42:50 info The local configuration is not allowed to define services
Jul 14 15:42:50 info The local configuration is not allowed to define services
Jul 14 15:42:50 info The local configuration is not allowed to define services
Jul 14 15:42:50 info The local configuration is not allowed to define services
Jul 14 15:42:50 info The local configuration is not allowed to override LOG_DIR

In the following the LDAP configuration is described; for details on the local file syntax please see the VO-Box installation HowTo.

The central AliEn configuration DB is on ldap://aliendb06a.cern.ch:8389, with base DN o=alice,dc=cern,dc=ch.

All items not described below are usually to leave empty in the LDAP DB.

 

CE configuration

The convention for CE services on AliEn nodes is to be called after the name of their batch system (e.g. LSF, PBS,...). For LCG VO-Boxes, the name is LCG, so the CE will be called something like ALICE::Torino::LCG. However, most CREAM CEs are called something like ALICE::Torino::Torino-CREAM

 

General items

 

host

the full hostname of the VO-Box (e.g. alibox.to.infn.it)

port

leave empty

name

Should be LCG for normal sites, -CREAM for CREAM CEs

type

WMS for submission through WMS, CREAM for direct submission to CREAM CEs Changed in v2-17!

maxqueuedjobs

Number of jobs to be kept queued at site

maxjobs

Maximum number of total jobs at the site (i.e. running + queued): should match the number of available slots + maxqueuedjobs.

TTL

usually set to 172800, unless local restrictions reduce the maximum wall time for jobs

si2kPrice

for now, set at 1.

installMethod

leave empty.

 

LRMS commands

For each operation on a given JobAgent (submission, status query, etc.) these entries define the command to be used and the options to be passed. If left empty, commands default to the gLite 3.1 WMS or CREAM commands, depending on the CE type entry. In normal setups, leave everything empty.

 

submitcmd
Submission command.
submitarg
Leave empty. Changed in v2-17!
statusmcd
Status query command.
statusarg
Leave empty.
killcmd
Job cancel command.
killarg
Leave empty.
matchcmd
List-match command.
matcharg
Leave empty.

 

The 'Environment' entry

These entries are passed to the AliEn CE running on the VO-Box via environment variables; please note that only entries documented here are supposed to be set in this way, and is strongly discouraged to set the same entries in the VO-Box shell environment. Environment variables set in LDAP have priority over both local (shell) environment variables, $ALIEN_HOME/.Environment and ~/.alien/Environment files.

The 'Environment' entry may contain the following definitions, in the form NAME=value (e.g. something like 'CE_MINWAIT=0'):

 

CE_LCGCE
Mandatory List of CEs to be used. CE names are in the <host>:<port>/jobmanager-<lrms>-<queue> form used by the GlueCEUniqueId entry in the LCG Information System, so something like t2-ce-01.to.infn.it:2119/jobmanager-lcgpbs-alice. The list is comma-separated, with CEs seeing the same resources grouped by parentheses to avoid double counting. (see note below for more details).
CE_RBLIST
Mandatory List of WMS hostnames to be used. The value is a comma-separated list of WMS hostnames, which can be grouped in parentheses to specify load-balancing sets (see note below for more details). Changed in v2-17!

All of the following are optional and have meaningful and reasonable defaults, so set them only if you know what you're doing:

 

CE_SITE_BDII
Full URI of the local BDII to query, including base DN. If unset, the Resource BDII running on each CE is queried: this is OK in most sites. The format is ldap://: ,, so to query an old-fashioned GRIS set this to something like ldap://infn-ce-01.ct.trigrid.it:2135,mds-vo-name=local,o=grid.
CE_RANKING
Ranking expression to be set in the JDL. This is useful only in multi-CE sites; possible values include e.g. CE_RANKING=1 or CE_RANKING=-GlueCEStateWaitingJobs.
CE_GETRUNNING
Plug to insert a command that returns the number of running jobs, overriding internal queries to the IS.
CE_GETWAITING
Plug to insert a command that returns the number of waiting jobs, overriding internal queries to the IS.
CE_PROXYDURATION
Duration of the proxy requested at renewal. Default is 172800 seconds.
CE_PROXYTHRESHOLD
A proxy is renewed if its TTL is lower than this limit. Default is 165600 seconds.
CE_RBINTERVAL
Time after which the system tries to revert to the default WMS in multi-WMS systems. This is relevant only in WMS sites. Default is 7200 seconds.
CE_DELEGATIONINTERVAL
Time after which the proxy delagtion to the CREAM CE is renewed. This is relevant only in CREAM sites. Default is 72000 seconds.

Note: AliEn_WMS, CE_MINWAIT and DEL_PROXY are obsolete and not used any more as for AliEn v2-17. These entries should be removed. Changed in v2-17!

 

WMS configuration logic

The name of the WMS to be used is passed to the submission command through a configuration file. Such configuration files, with extension .vo.conf are automatically generated in ~alicesgm/alien-logs/. Any such file contains one or more WMS endpoints, which are picked by the submission command randomly to distribute the load across different servers. On top of this, there is a failover mechanism in the AliEn code that will switch to a different configuration file (and thus to a different set of WMSes) should the default ones fail, and try to revert to the default one after a given amount of time (configurable via the CE_RBINTERVAL entry). The CE_RBLIST entry is a comma-separated list of WMSs (hostnames only); groups of WMSs that should go in one configuration file are to be grouped in parentheses. So for example

CE_RBLIST=(wms1.cern.ch,wms2.cern.ch),(wms1.cnaf.infn.it,wms2.cnaf.infn.it),wms.in2p3.fr

will generate three configuration files, the default one including the two CERN WMSs, that will be used in load-balancing mode; the second choice are the CNAF WMSs that agin are used in load-balance mode, and the third choice is the lonely french WMS. So the logic is: try to submit with the first config file (the LCG system will choose one of the WMS for each submission); if this fails, try the second and so on. After CE_RBINTERVAL seconds, try to revert to the default ones. This entry is not formally mandatory, in the sense that the service will start anyway if its' not defined but the default WMS defined in the VO-Box will be used, predictably leading to unpredictable results. Probably redundant note, but one never knows: the WMSs mentioned above (i.e. wms1.cern.ch and so on) are only examples do not exist in the real world.

 

CE configuration logic

The CE_LCGCE entry is a comma-separated list of CE service endpoints. CEs can be grouped (using parentheses) tyo specify whether they are redundant CEs serving the same set of Worker Nodes, which all publish the same number of running and waiting jobs. In this case all CEs are queried and the maximum number reported is used for the full group, and the values for each group (and single CEs) are summed to compute the total number of waiting and running jobs at each site. Thus, something like CE_LCGCE=(CE1,CE2),CE3 means that CE1 and CE2 are redundant CEs serving the same set of Worker Nodes, whereas CE3 is a completely separated resource (at the same site). So the number of running jobs will be _max_(CE1,CE2)+CE3.

All the CEs are inserted in the JDL requirements, leaving to the WMS the task of distributing the jobs across them. In sites with a large number of CEs (like CERN), the default ranking expression (usually some variation on EstimatedResponseTime) may not fit our needs; in this cases, it can be overridden by setting e.g. CE_RANKING=1 (all CEs have equal probability of getting a job, regardlessly of the existing load).

 

SE Configuration

 

FTD Configuration

 

PackMan Configuration

 

MonALISA Configuration