ALICE VO-Box setup and configuration

This document describes how to install and configure the site VO-Box to support ALICE VO services. This is a node on which will be deployed long-lived agents and services and it is expected to be provided at the sites. The agents/services deployment and support on the VO-Box is under the VO responsibility. The general requirements for the VO node and agents/services description is given below. Please visit Installation Guide topic to learn more details on how to download precompiled AliEn packages.

Please beware there is outdated information on these pages! Please consult the grid team for help.

 

1 Prerequisites

 

1.1 Machine setup

 

  • General requirements
    • SL6 or CentOS/EL7, 64-bit Linux. The machine usually needs to be a WLCG VOBOX
    • Hardware: min 4 GB RAM, any standard CPU, 20 GB of space for logs, 5 GB CVMFS cache

 

  • Network connectivity

 

Port # Access: Outgoing + Service

==========================================================================

8098/TCP incoming from site WN - JAliEn/Java Serialized Object stream

8097/TCP incoming from site WN - JAliEn/WebSocketS

8084/TCP incoming from CERN and the site WN - ClusterMonitor

1093/TCP incoming from World - MonALISA FDT server, SE tests

8884/UDP incoming from the site WN and site SE nodes - Monitoring info

9930/UDP incoming from the site SE nodes - Xrootd metrics

+   ICMP incoming and outgoing - network topology for file placement and access

(only if not using CVMFS) 9991/TCP incoming from CERN - PackMan

==========================================================================

 

In general, the assumption is that the outgoing connectivity from the VO-box and the WNs is unrestricted.

 

CERN has multiple networks that may all be used for central services, already now or in the future (please mind the mask!):

  • 128.141.0.0/16
  • 128.142.0.0/16
  • 137.138.0.0/16  ←  part of central services are here
  • 188.184.0.0/15  ←  part of central services are here
  • 185.249.56.0/22
  • 192.65.196.0/23
  • 192.91.242.0/24
  • 194.12.128.0/18

For IPv6 only these networks apply:

  • 2001:1458::/32
  • 2001:1459::/32

 

The VO-Box usually should be preinstalled as a standard WLCG VO-Box, following the instructions given at:

 

 

This procedure sets up a standard gLite UI, with the following additions (in particular provided by lcg-vobox RPM):

 

  • Only one local user account alicesgm (or equivalent), with no special privileges. Please DO NOT configure pool accounts for the SGM user on the VO-Box!
  • Access via gsissh, with selected users from the ALICE LCG VO mapped to the alicesgm account (YAIM handles that)
  • A proxy renewal service running, for the automatic renewal or registered proxies via the MyProxy mechanism (idem)
  • A host certificate, issued by one of the trusted LCG Certification Authorities. The machine also needs to be registered as a trusted host in the CERN MyProxy server, myproxy.cern.ch.

TIP: To have the machine registered as trusted host in myproxy.cern.ch, send an email with the host certificate DN to Maarten.Litmaath@cern.ch. You can get the host certificate DN by issuing

 

VO-Box>openssl x509 -in /etc/grid-security/hostcert.pem -noout -subject

Additionally, specifically for ALICE, the following configuration details are required:

 

  • The home directory should not be mounted via NFS from some server (for performance reasons and because lock files may be kept there)
  • The experiment software is provided on the VO-box and Worker nodes through CVMFS. See  'Setup CVMFS'.

 

ATTENTION: do NOT bother with the rest of this page!
Instead, contact the grid team for further instructions.

1.2 Preparing for site administrator role

The person that will act as a VO-Box administrator needs no special privileges on the VO-Box machine. She will need a valid LCG X509 certificate, issued by one of the trusted CAs. Details on how to obtain a certificate can be found in the AliEn.HowToGetCertificate document.

 

  • The certificate subject must be registered in the LCG ALICE VO. Then, it must be registered as a user in AliEn. Please follow the instructions.
  • The user then needs some VO-specific privileges both on the LCG and AliEn sides; thus one needs to ask for her subject to be mapped to the alicesgm LCG user, and her AliEn user to be 'allowed to become aliprod ' by sending a message to project-lcg-vo-alice-admin@cern.ch.

ALERT! Please note that the manager's certificate files are not to be installed on the VO-Box!

 

2 Accessing the machine

Access to the VO-Box is via gsissh, from any gLite User Interface. An gLite User Interface can be set up on any machine with AFS:

 

LCG-UI> source /afs/cern.ch/project/gd/LCG-share/sl5/etc/profile.d/grid_env.[csh|sh]

Please make sure that your certificate files are in the ${HOME}/.globus directory, then connect to the VO-Box by issuing

 

LCG-UI> voms-proxy-init -voms alice:/alice/Role=lcgadmin
LCG-UI> gsissh -p 1975 your-VO-Box

Please note that, while these are the default values, in principle the gsisshd port and the local account name may be different (ask whoever installed the gLite VO-Box).

 

3 Testing the gLite installation

Please note that these are just the basic tests needed to check the main functionalities. Do not hesitate to ask for help through alice-lcg-task-force@cern.ch

 

3.1 Access tests

 

  • From any gLite-UI, try to connect to the VO-Box via gsissh in verbose mode to check possible errors and connection status of the machine:

 

LCG-UI>gsissh -v -p 1975 your-VO-Box

If this does not work, or shows any problems, you're in  trouble. There's really many possible sources: gsisshd not running on the machine, firewall or network problems (e.g. you need reverse DNS), wrong grid-mapfile on the machine, CAs or CRLs not up to date, etc.

 

3.2 Proxy renewal tests

For details about all this proxy renewal machinery, please check the AliEn.HowToManageVOBoxProxies page.

 

  • Check that the proxy is visible in the /tmp area of the machine

 

VO-Box>grid-proxy-info

The output should look like

 

subject : /C=IT/O=INFN/OU=Personal Certificate/L=Torino/CN=Stefano Bagnasco/CN=proxy/CN=proxy
issuer : /C=IT/O=INFN/OU=Personal Certificate/L=Torino/CN=Stefano Bagnasco/CN=proxy
identity : /C=IT/O=INFN/OU=Personal Certificate/L=Torino/CN=Stefano Bagnasco
type : full legacy globus proxy
strength : 512 bits
path : /tmp/x509up_p760.fileJUVI2k.1
timeleft : 11:26:16

 

  • Check that the proxy can be registered and queried through the vobox-proxy tool:

 

VO-Box> vobox-proxy register -t 48
VO-Box> vobox-proxy query
VO-Box> vobox-proxy unregister
VO-Box> vobox-proxy query

 

Check that you don't have any errors in the output, except for the last command that should indicate that a proxy was not found.

If the register command complains about an unknown option "-t" or the absence of a VO name, please update your VOBOX to the latest release!

Note: the proxy renewal service on the VO-Box uses a root cron job to generate a proxy for the machine, then uses that proxy to authenticate to the MyProxy server in order to renew the registered proxies.

In case of problems in this case, check if the proxy renewal service for alice is running and if the machine is running properly its proxy. If this goes wrong, you can check some more things:

 

  • The machine is not properly registered in the myproxy server (see above).

 

  • The proxy which the machine uses to authenticate to the MyProxy server is expired, non-existing or wrong in some way. The relevant file is /var/lib/vobox/alice/renewal-proxy.pem. You can check it by issuing

 

VO-Box>grid-proxy-info -f /var/lib/vobox/alice/renewal-proxy.pem

If this is the case, check that the crond service is running, and that there is a file called /etc/cron.d/alice-box-proxyrenewal.

 

  • The proxy renewal service is down. Check it by issuing

 

VO-Box>/etc/init.d/alice-box-proxyrenewal status

Root privileges used to be needed to properly start or stop it, but the latest VOBOX versions (e.g. 3.2.15) do not have that restriction any longer.

 

  • Check the VO-Box RPM installation. Check that all the files under /var/lib/vobox/alice are there with the proper permissions. The output of `ls -l /var/lib/vobox/alice` should look like

 

drwx------. 2 alicesgm alice 4096 Jan 23 00:00 agents
drwx------. 2 alicesgm alice 4096 Jan 23 16:49 data
drwxr-xr-x. 2 alicesgm alice 4096 Jan 23 13:57 etc
drwx------. 2 alicesgm alice 4096 Jan 23 00:00 info-provider
drwxr-x---. 2 alicesgm alice 4096 Feb  9 15:13 lock
drwxr-x---. 2 alicesgm alice 4096 Jan 23 00:00 log
drwxr-x---. 2 alicesgm alice 4096 Feb 21 22:15 proxy_repository
-r--------. 1 alicesgm alice 4888 Feb 21 20:20 renewal-proxy.pem
drwx------. 2 alicesgm alice 4096 Jan 23 00:00 start
drwx------. 2 alicesgm alice 4096 Jan 23 00:00 stop
drwx------. 2 alicesgm alice 4096 Jan 23 16:49 tmp

 

3.3 ALICE-specific tests

 

  • Try to submit a job to your CREAM CE:

VO-Box> cat cream-test.jdl

Type          = "Job";
JobType       = "Normal";
Executable    = "/bin/hostname";
StdOutput     = "hello.out";
StdError      = "hello.err";
InputSandbox  = {"/etc/group"};
OutputSandbox = {"hello.out","hello.err"};
OutputSandboxBaseDestURI = "gsiftp://localhost";

VO-Box> glite-ce-job-submit -a -r your-CREAM-CE:8443/cream-bbb-qqq -o test.id cream-test.jdl

VO-Box> glite-ce-job-status -i test.id

VO-Box> glite-ce-job-output -i test.id     # when the job is marked done

 

Here "bbb" stands for the batch system (e.g. "pbs" or "lsf") and "qqq" is the queue to which ALICE jobs should be submitted (e.g. "alice").

 

  • Check the relevant job information from your CREAM CE using the lcg-infosites tool like this:

VO-Box> lcg-infosites -is your-CREAM-CE ce            # if it has a queue for ALICE

VO-Box> lcg-infosites -is your-CREAM-CE voview -v 1   # should always work

 

 

 

4 AliEn Installation

 

Please beware that AliEn normally is started from CVMFS, so do not bother with this!

If you're just upgrading AliEn to a new release, and more or less know what you're doing, short instructions with no comments (useful for cut-and-paste to the command line) are provided below.

 

4.1 Downloading Installer

 

See Download section for step by step download instructions.

 

4.2 Selecting Installer Options

Installer requires the dialog tool, which may not be installed by default. The following will guide you through the different questions posed by the installer:

 

  • Select latest stable version (or latest development version if you like to be up to date with developments)
  • Answer "NO" autodetect & reuse (this will allow to script to copy and install all necessary external software in $VO_ALICE_SW_DIR directory.)
  • Answer "YES" to use default location to store downloaded file (~alicesgm/.alien/cache).
  • Check whether the autodetected architecture is OK.
  • Select $VO_ALICE_SW_DIR/alien as path for installation. You will need about 400 MB of disk space. Please note that the default installation directory (/opt/alien) is not OK for gLite VO-Boxes (alicesgm usually has no write privileges there)
  • Select 'site', 'LCG' and 'monitor' components.

At the end of the process (it may take a while), AliEn should be installed in the right path. Next step is configuration.

It's however best to take also a look at the installer's log file, ~alicesgm/.alien/cache/install.log and search for any errors.

 

5 AliEn Configuration

AliEn takes its configuration from two sources:

 

  • The central LDAP database ldap://aliendb5.cern.ch:8389/o=alice,dc=cern,dc=ch
  • A local file ${HOME}/.alien/alice.conf

The second is deprecated and can be an empty file. Thus, the procedure for site configuration is currently the following:

 

  1. The site manager asks the central services manager Latchezar.Betev@cern.ch   to enter the site information in the AliEn LDAP. The information is:
    • Name of the site(e.g. INFN-TORINO)
    • Domain name (e.g. to.infn.it)
    • Name of the city (e.g. Torino)
    • Name and e-mail of the administrator (e.g Stefano.Bagnasco@to.infn.it)
    • Hostname of the VO-Box (e.g. alibox.to.infn.it)
    • Site location - latitude, longitude - determined through  this locator
    • Name and ALICE queue in the CREAM CE (for example: cecream.ca.infn.it:8443/cream-lsf-alice)
  2. Once the configuration is ready, the AliEn services can be started on the VO-box. You may need to ask for help from the AliEn/site experts through this list: alice-lcg-task-force@cern.ch  

 

5.1 Environment setup

First, make sure a path to the AliEn executable is in the user $PATH. The preferred method is creating links to the relevant executables in the ~/bin directory, and add that to the $PATH:

 

mkdir -p ~alicesgm/bin
ln -s $VO_ALICE_SW_DIR/alien/bin/alien ~alicesgm/bin
ln -s $VO_ALICE_SW_DIR/alien/scripts/lcg/lcgAlien.sh ~alicesgm/bin
cat >> ~alicesgm/.bashrc <
>export PATH=~alicesgm/bin:$PATH
>EOF

A file called ~alicesgm/.alien/Environment is automatically sourced by AliEn every time it is executed, to set up the proper environment. Please note that it has priority over any variable set in the user shell environment, which will be overridden by the value set in the file. Create the file and edit it to look like

 

export ALIEN_USER=...

where ... is the AliEn username of the person that will manage the VO-Box (i.e. that will start the services), who has requested the alicesgm and aliprod roles as described above.

The standard WLCG VO-Box installation goes through all the scripts defined in /var/lib/vobox/alice/start at boot time, and in /var/lib/vobox/alice/stop while shutting down. To have the system start and stop AliEn services at boot time, links to the scripts are needed:

 

Vo-Box> ln -s /cvmfs/alice.cern.ch/bin/aliend ~/bin

Vo-Box> ln -s ~/bin/aliend /var/lib/vobox/alice/start/S50-Alice.sh
Vo-Box> ln -s ~/bin/aliend /var/lib/vobox/alice/stop/K50-Alice.sh

To make the thing work, you also have to generate two files:

 

  • ~alicesgm/.alien/etc/aliend/startup.conf with the following line:

 

ALIEN_ORGANISATIONS="ALICE"

 

  • ~alicesgm/.alien/etc/aliend/ALICE/startup.conf with:

 

AliEnUser="alicesgm"
AliEnCommand="$HOME/bin/lcgAlien.sh"   # see note below
AliEnServices="Monitor CE CMreport MonaLisa"

If the .alien/etc/aliend directory tree does not exist, please create it:

 

VO-Box> mkdir -p ~alicesgm/.alien/etc/aliend/ALICE

 

The lcgAlien.sh script is not yet available in /cvmfs/alice.cern.ch/bin but can be copied from a recent AliEn version hosted in CVMFS. For example:

 

Vo-Box> cp /cvmfs/alice.cern.ch/x86_64-2.6-gnu-4.1.2/Packages/AliEn/v2-19-223/scripts/lcg/lcgAlien.sh ~/bin

 

6 Starting the services

6.1 Registering the proxy for renewal

Before starting the AliEn services on the VO-Box, the administrator will need a proxy registered in a MyProxy server. This is needed by the VO-Box to get and renew the delegated proxies the CE uses for submission.

The server currently used for all VO-Boxes is myproxy.cern.ch. Please make sure that the $MYPROXY_SERVER environment variable points to it.

In order to register the proxy, from any gLite User Interface on which the administrator has her personal certificates installed, please issue

 

LCG-UI> myproxy-init -s myproxy.cern.ch -d -n -t 48 -c 3000

Please check myproxy-init --help for details on options and command line arguments. The argument to "-c" gives the desired number of hours for which the uploaded proxy should be valid (4 months in this case): it can be as large as the remaining lifetime of your certificate, usually many months.

Now, in order to have the proxy on the VO-Box periodically renewed by the service running on the VO-Box (this is part of the standard gLite VO-Box installation), you will need to register it with that service. Once logged on the VO-Box via gsissh (i.e. with a valid proxy), please issue

 

VO-Box> vobox-proxy register -t 48   [ --proxy-safe 3600 --myproxy-safe 864000 --email your-email ]

Note: The optional arguments allow you to be notified some time (e.g. 10 days) before your uploaded myproxy expires. If the command complains about an unknown option "-t" or the absence of the VO specification, please upgrade your VOBOX to the latest release.

Please note that sending mail messages from the VO-Box may be impossible due to local restrictions. Please check this with the site administrators (the mail is not mandatory to have a running VO_Box, just practical.)

You can then check the status and lifetime of all the registered proxies by issuing

 

VO-Box> vobox-proxy query -dn all

The delegated proxy is periodically renewed by the VO-Box service, but the AliEn CE service also renews it if needed just before submitting a bunch of JobAgents; we want to remove that code, though, so please register your VOBOX proxy with the "-t 48" arguments.

Before proceeding further, you should now be able to check the basic AliEn installation. See here how to do it.

 

6.2 Starting the services

ALERT! Before (re)starting an AliEn service, beware to define your proxy correctly:

 

VO-Box> export X509_USER_PROXY=/var/lib/vobox/alice/proxy_repository/.....

 

Note: In this section, the services are started/stopped individually by hand, to allow the administrator to verify the outcome. The services may also be started/stopped by issuing

 

VO-Box> ~/bin/aliend [start|stop|restart|status]

In the following, issuing alien instead of lcgAlien.sh will log in on AliEn (or start an AliEn service) using the proxy used to log on the VO-Box. To select the right renewable proxy automatically, please use ~/bin/lcgAlien.sh (it should be in the path, if you followed these instructions).

The standard way of starting, stopping or checking the status of an AliEn service is

 

VO-Box> lcgAlien.sh Start
VO-Box> lcgAlien.sh Stop
VO-Box> lcgAlien.sh Status

where can be, on the VO-Box, any of Monitor (for the ClusterMonitor), PackMan, MonaLisa, CE, SE. These commands will almost never fail, even if the service cannot be properly started. To check the outcome of a start, the log file should be checked. The log file will be in the directory set in the LOG_DIR configuration item, which should be ~alicesgm/alien-logs (anyway, the log file name is printed in the last line of the output of the start commands).

Since you will almost always check the log file after starting a service, the full command to start e.g. the CE will probably look something like the following, which will produce a running printout of the log file. You can stop it by hitting Ctrl-C when you're satisfied the service is properly running (see [[#TestingInstallation][below]]).

 

VO-Box> lcgAlien.sh StartCE; tail -F ~/ALICE/alien-logs/CE.log

When trying to understand what went wrong, it may be useful to start the service with the --debug 10 option. This will generate much more printout to go to the logs, so make sure to restart the service without the option when you've finished debugging.

On the VO-Box, AliEn services are started using a proxy which is different from the one used to log in. If the command dies with an Error setting proxy (repeated one or more times), then either you forgot to register the proxy in the database (see above), or there is something wrong with it.

Note: In principle the sequence in which you start the service is not mandatory; however, some services try to talk to each other upon startup. The following sequence tries to minimize the complaints from services trying to talk to not-yet-started ones:

 

  • Cluster Monitor: alien StartMonitor
  • CMreport: alien StartCMreport
  • Package Manager: alien StartPackMan
  • MonALISA monitoring: alien StartMonaLisa
  • Computing Element: alien StartCE

Please note that if you're restarting the services while a production is running, as soon as the CE comes up the VO-Box will start getting jobs and submitting them. You should at least run a couple of checks before starting the CE (see below).

 

7 Final steps

A couple more things need to be done before the site gets jobs from running productions.

 

  • Before any job submission, the AliEn queue must be opened for a given site. To have it opened, please send email to alice-lcg-task-force@cern.ch and ask to open a queue for your site. This is needed also to test job submission, not just to run production jobs.
    • The queue can get locked again automatically upon certain error conditions. This shows up as a message in the CE log.
  • To be included in the running production, after testing the installation please send send another email to alice-lcg-task-force@cern.ch and ask for your site to be included in the production partition.

Please note that even though all testing instructions are collected together in the next section, you're supposed to test the installation before asking to be added to the production partition!

8 Testing the installation

8.1 Testing the basic installation

The most basic thing to be tested is the ability to run AliEn and to connect to the AliEn central database:

 

VO-Box>alien login

The first lines of output should inform about what file is used for reading configuration options. Then the site ClassAd (a lengthy list of installed packages, close SEs and some more information that describes the local site) is printed, and at the end the AliEn prompt should appear; it should look something like

 

[aliendb5.cern.ch:3307] /alice/cern.ch/user/s/sbagnasc/ >

If this does not work, there's something basically wrong either with the installation, the configuration or (the most common occurrence) the authentication. It would be useless to try to start the services before fixing this.

Again, it may be useful to use the --debug 10 option to help finding out what's wrong.

One thing that can be easily checked here is whether your site is part of the production partition; if this is the case, the following lines should show up in the site ClassAd:

 

GridPartitions =
{
"Production"
};

 

8.2 Testing the Package Manager

The last lines of the log file ~alicesgm/alien-logs/PackMan.log should look like

 

Mar 24 11:46:09 info Initializing Service
Mar 24 11:46:12 info Warning: cannot create envelope sealing engine = setting backdoor
Mar 24 11:46:12 info 9359 Removing old lock files
Mar 24 11:46:12 ApMon[INFO]: Added destination 193.206.184.58:8884: with default options.
Mar 24 11:46:13 info Starting PackMan on alibox.to.infn.it:9991

Then you can test the running service by issuing

 

VO-Box>alien login -exec packman list

which should print a list of the packages defined for your site.

If the proxy with which the PackMan was started expired, and subsequently you register a new one, any communication with the PackMan will fail and you will have to restart it.

Please refer to the PackMan HowTo for more details.

8.3 Testing the Cluster Monitor

There are no particular tests for this service - if the log file does not contain errors, then the service is OK.

 

8.4 Testing the MonALISA Monitoring

 

There are no particular tests for this service - if the log file does not contain errors and you can see the site dot on the MonALISA map, then the service is OK.

8.6 Testing the ComputingElement

 

In order to be able to submit jobs, the CE queue needs to be open. You can check by issuing

 

VO-Box> alien login --exec queueinfo

where is again something like Alice::LCG::Torino. The output will be quite unreadable, but look for the beginning of a line looking like:

 

Alice::LCG::Torino open down 1150897453 0 23 0 0 1 0

The second field is what you're looking for. If it's open you're OK, if it's locked you need to ask the administrators (e.g. Latchezar.Betev@cern.ch) to open it up. There are a several methods to submit a job. Please, check HowToUseAliEn for more information.

 

Job submission can be done this way:
[LCG UI]$ alien login -exec submit
or
[AliEn prompt] > submit
JobID should appear at the end of this command execution.

One of the possibilities to check the status of submitted job is to use command "ps" (please, use "ps --help" for more information about "ps" command):
[AliEn prompt] > ps -id

 

References and further reading

 

involved in the VO-Box operation

Shortcut: upgrading or reinstalling AliEn

The following is provided with no comments, so that it can be easily cut-and-paste'd to the command line. Please refer to the discussion above for details.

 

$VO_ALICE_SW_DIR/alien/scripts/lcg/lcgAlien.sh StopCE
$VO_ALICE_SW_DIR/alien/scripts/lcg/lcgAlien.sh StopSE
$VO_ALICE_SW_DIR/alien/scripts/lcg/lcgAlien.sh StopMonitor
$VO_ALICE_SW_DIR/alien/scripts/lcg/lcgAlien.sh StopPackMan
$VO_ALICE_SW_DIR/alien/scripts/lcg/lcgAlien.sh StopMonaLisa
ps aux
rm alien-installer
wget http://alien.cern.ch/alien-installer
chmod a+x alien-installer
rm -rf .alien/cache
./alien-installer
vobox-proxy register -t 48 --proxy-safe 3600 --myproxy-safe 864000 --email your-address
$VO_ALICE_SW_DIR/alien/scripts/lcg/lcgAlien.sh StartPackMan; tail -F /home/alicesgm/alien-logs/PackMan.log
$VO_ALICE_SW_DIR/alien/scripts/lcg/lcgAlien.sh StartMonitor; tail -F /home/alicesgm/alien-logs/ClusterMonitor.log
$VO_ALICE_SW_DIR/alien/scripts/lcg/lcgAlien.sh StartSE; tail -F /home/alicesgm/alien-logs/SE.log
$VO_ALICE_SW_DIR/alien/scripts/lcg/lcgAlien.sh StartMonaLisa; tail -F /home/alicesgm/alien-logs/MonaLisa.log
$VO_ALICE_SW_DIR/alien/scripts/lcg/lcgAlien.sh login --exec packman listInstalled
$VO_ALICE_SW_DIR/alien/scripts/lcg/lcgAlien.sh login --exec status
$VO_ALICE_SW_DIR/alien/scripts/lcg/lcgAlien.sh StartCE; tail -F /home/alicesgm/alien-logs/CE.log

 

# echo "Hello world" >/tmp/myfile
# alien -exec add myNewFile /tmp/file
# alien login -exec packman list
# echo "Executable=\"date\";Requirements=other.CE==\" \" ">/tmp/jdl
# alien -exec add myJDL /tmp/jdl
# alien login -exec submit myJDL