EOS installation guide and notes

EOS installation guide and more information

This page aims to help the admins to proceed with the EOS installation in storage servers, and add some information (commands, etc) in order to configure or tune the installed instance.

WARNING: parts of these instructions may have become STALE !!!

Better rely on the official EOS docs and ask for advice on the:

  • EOS Community Forum

  • ALICE LCG TF list

Installation

Traditionally, the installation is done in SL[5-6] and CentOS[7] based machines, using the script ‘eos-deploy’, download here: eos-deploy

You can also follow official EOS documentation but you might miss some ALICE particularities: EOS docs

0) Cleanup

eos-deploy will remove eos packages, and install eos-cleaup, then run it, which will return 'most' of the things to the proper initial status needed for installation.

1) make sure there is no XRootD V4.X installed if you go for aquamarine release – if yes, uninstall!
rpm -e `rpm -qa | grep xrootd | awk ‘{printf(“%s “, $1); }’`

2) mask xrootd* and libmicrohttpd* from EPEL and base and update (e.g. CentOS-*) yum repositories:

This should be alerted by eos-deploy automatically if the repos are not ok. Do:


in /etc/yum.repos.d/epel.repo
in /etc/yum.repos.d/epel-testing.repo

add to each section:

exclude=xrootd*,libmicrohttpd*

This must be done in all eos related servers (both MGMs and FSTs)

3)

Run eos-deploy

(by root)

The latest versions of the script will create a 'eos-deploy.timestamp.config' with the values you have introduced during the wizard.

Some notes:

# You should have host certificates in the managers, under /etc/grid-security with:

Permissions 600 on hostcert.pem

Permissions 400 on hostkey.pem

# For ALICE case,

the instance name should contain alice

, e.g.: 'alicesitename'

# Make sure you have ssh access to FSTs (and if using a second MGM, too)

# When asked how many filesystems and for a regexp, use the number of disks in your FSTs. See example:

volume-n02-v01/cern-01 32T 2.5T 30T 8% /data1

volume-n02-v02/cern-02 32T 2.4T 30T 8% /data2

volume-n02-v03/cern-03 32T 2.6T 30T 8% /data3

Here, you would provide eos-deploy, with '3' for number of filesystems, and the regexp would be 'data'.

# Ports

1094: XRootD MGM port (only on MGMs)
1095: XRootD FST port (only on FSTs)
1096: XRootD SYNC port (only on MGMs)
1097: XRootD MQ port (only on MGMs)

In principle not used by ALICE:

443: https X509 port (only on HTTPS gateways or MGM)
8443: https KRB5 port (only on HTTPS gateways or MGM)
8000: http port (only on MGMs)
8001: http port (only on FSTs)

The script open the ports needed for you, but is a temporary change. Please make those network changes permanent:

After running the script, you should have all necessary eos services running.

  • Some helpful EOS commands

# You can login to eos with:

[root@mmmartinmgm1 ~]# eos -b
EOS Console [root://localhost] |/>

You get back that console, where you can run eos commands, and in a shell.

# You can see your filesystems/disks:

EOS Console [root://localhost] |/> fs ls

#…………………………………………………………………………………………………………………………
# host (#…) # id # path # schedgroup # geotag # boot # configstatus # drain # active
#…………………………………………………………………………………………………………………………
mmmartinfst1.cern.ch (1095) 1 /var/eos/fs/0 default.0 booted rw nodrain online
mmmartinfst1.cern.ch (1095) 2 /var/eos/fs/1 default.1 booted rw nodrain online
mmmartinfst1.cern.ch (1095) 3 /var/eos/fs/2 default.2 booted rw nodrain online
mmmartinfst1.cern.ch (1095) 4 /var/eos/fs/3 default.3 booted rw nodrain online
mmmartinfst2.cern.ch (1095) 5 /var/eos/fs/0 default.0 booted rw nodrain online
mmmartinfst2.cern.ch (1095) 6 /var/eos/fs/1 default.1 booted rw nodrain online
mmmartinfst2.cern.ch (1095) 7 /var/eos/fs/2 default.2 booted rw nodrain online
mmmartinfst2.cern.ch (1095) 8 /var/eos/fs/3 default.3 booted rw nodrain online

# Your spaces (default is the one created on the script)

EOS Console [root://localhost] |/> space ls
#——————————————————————————————————————————————————————————————————
# type # name # groupsize # groupmod #N(fs) #N(fs-rw) #sum(usedbytes) #sum(capacity) #capacity(rw) #nom.capacity #quota #balancing # threshold # converter # ntx # active #intergroup
#——————————————————————————————————————————————————————————————————
spaceview default 0 24 8 8 18.08 G 67.36 G 26.56 G 0 off off 20 off 2 0 off

# The same for the groups:

EOS Console [root://localhost] |/> group ls
#———————————————————————————————————————
# type # name # status #nofs #dev(filled) #avg(filled) #sig(filled) #balancing # bal-shd #drain-shd
#———————————————————————————————————————
groupview default.0 on 2 0.74 26.84 0.74 idle 0 0
groupview default.1 on 2 0.74 26.84 0.74 idle 0 0
groupview default.2 on 2 0.74 26.84 0.74 idle 0 0
groupview default.3 on 2 0.74 26.84 0.74 idle 0 0

# Common case: you want to use RAIN6 (the inner EOS Raid6 equivalent) and you don’t have 6 servers, when you need at least 6 stripes.

You can group fs in the same group, for example, doing a group twice the size. You will to reorganize the layout of the storage.

Example: 3 servers with 4 fs:

EOS Console [root://localhost] |/> fs ls
#…………………………………………………………………………………………………………………………
# host (#…) # id # path # schedgroup # geotag # boot # configstatus # drain # active
#…………………………………………………………………………………………………………………………
mmmartinfst1.cern.ch (1095) 1 /var/eos/fs/0 default.0 booted rw nodrain online
mmmartinfst1.cern.ch (1095) 2 /var/eos/fs/1 default.1 booted rw nodrain online
mmmartinfst1.cern.ch (1095) 3 /var/eos/fs/2 default.2 booted rw nodrain online
mmmartinfst1.cern.ch (1095) 4 /var/eos/fs/3 default.3 booted rw nodrain online
mmmartinfst2.cern.ch (1095) 5 /var/eos/fs/0 default.0 booted rw nodrain online
mmmartinfst2.cern.ch (1095) 6 /var/eos/fs/1 default.1 booted rw nodrain online
mmmartinfst2.cern.ch (1095) 7 /var/eos/fs/2 default.2 booted rw nodrain online
mmmartinfst2.cern.ch (1095) 8 /var/eos/fs/3 default.3 booted rw nodrain online

mmmartinfst3.cern.ch (1095) 9 /var/eos/fs/0 default.0 booted rw nodrain online
mmmartinfst3.cern.ch (1095) 10 /var/eos/fs/1 default.1 booted rw nodrain online
mmmartinfst3.cern.ch (1095) 11 /var/eos/fs/2 default.2 booted rw nodrain online
mmmartinfst3.cern.ch (1095) 12 /var/eos/fs/3 default.3 booted rw nodrain online

we want:

EOS Console [root://localhost] |/> fs ls
#…………………………………………………………………………………………………………………………
# host (#…) # id # path # schedgroup # geotag # boot # configstatus # drain # active
#…………………………………………………………………………………………………………………………
mmmartinfst1.cern.ch (1095) 1 /var/eos/fs/0 default.0 booted rw nodrain online
mmmartinfst1.cern.ch (1095) 2 /var/eos/fs/1 default.0 booted rw nodrain online
mmmartinfst1.cern.ch (1095) 3 /var/eos/fs/2 default.1 booted rw nodrain online
mmmartinfst1.cern.ch (1095) 4 /var/eos/fs/3 default.1 booted rw nodrain online
mmmartinfst2.cern.ch (1095) 5 /var/eos/fs/0 default.0 booted rw nodrain online
mmmartinfst2.cern.ch (1095) 6 /var/eos/fs/1 default.0 booted rw nodrain online
mmmartinfst2.cern.ch (1095) 7 /var/eos/fs/2 default.1 booted rw nodrain online
mmmartinfst2.cern.ch (1095) 8 /var/eos/fs/3 default.1 booted rw nodrain online

mmmartinfst3.cern.ch (1095) 9 /var/eos/fs/0 default.0 booted rw nodrain online
mmmartinfst3.cern.ch (1095) 10 /var/eos/fs/1 default.0 booted rw nodrain online
mmmartinfst3.cern.ch (1095) 11 /var/eos/fs/2 default.1 booted rw nodrain online
mmmartinfst3.cern.ch (1095) 12 /var/eos/fs/3 default.1 booted rw nodrain online

for that, we use these commands:

EOS Console [root://localhost] |/eos/alicemiguel/grid/> fs mv ‘fs_id’ default.’group_id’
success: moved filesystem ‘id’ into space default.’gid’

Then we set the number of stripes:

|eos> attr set sys.forced.stripes=6 /eos/instancename/foldername

That last command also offers other possibilities. You can always use the help of each command.

EOS Console [root://localhost] |/> attr ls

to see all possibilities.

# File information

EOS Console [root://localhost] |/eos/alicemiguel/grid/> file info passwd
File: ‘/eos/alicemiguel/grid/passwd’ Flags: 0640
Size: 1372
Modify: Wed Feb 4 17:32:57 2015 Timestamp: 1423067577.602053000
Change: Wed Feb 4 17:32:58 2015 Timestamp: 1423067578.87207739
CUid: 0 CGid: 0 Fxid: 00000006 Fid: 6 Pid: 11 Pxid: 0000000b
XStype: none XS: ETAG: 1610612736:1423067577
plain Stripes: 1 Blocksize: 4k LayoutId: 00100001
#Rep: 1
# fs-id #…………………………………………………………………………………………………………………..
# host # schedgroup # path # boot # configstatus # drain # active # geotag
#…………………………………………………………………………………………………………………..
0 3 mmmartin-fst1.cern.ch default.2 /var/eos/fs/2 booted rw nodrain online
*******

# Copy a file

EOS Console [root://localhost] |/eos/alicemmmartin/test/> cp /etc/hosts /eos/alicemmmartin/test/
doing stat of /etc/hosts
[eos-cp] going to copy 1 files and 159 B
append: /etc/hosts hosts

[eoscp] hosts Total 0.00 MB |====================| 100.00 % [0.0 MB/s]
[eos-cp] copied 1/1 files and 159 B in 0.08 seconds with 2101 B/s

# EOSHA

This service tests that the MGM is running every 10s. If it fails, it will send an email to the configured admin /etc/sysconfig/eos.

# Add new FST

In FST:

eosfstregister mgm-host-name /regex/path default:nfs

e.g.:

eosfstregister mmmartinmgm1.cern.ch /data default:4

being nfs the number of fs you have

You have to enable the new node on the MGM:

eos node set fst-host-name:1095 on

e.g.:

eos node set mmmartinfst1.cern.ch:1095 on

  • Some testing

06:18:10 # eos cp /var/tmp/1G /eos/aliceornl/test/
doing stat of /var/tmp/1G
[eos-cp] going to copy 1 files and 1.05 GB
append: /var/tmp/1G 1G

[eoscp] 1G Total 1000.00 MB |====================| 100.00 % [403.5 MB/s]
[eos-cp] copied 1/1 files and 1.05 GB in 2.64 seconds with 396.77 MB/s

_________________________________________________________

09:47:20 # eoscp -b 100000000 root://localhost//eos/aliceornl/test/1G /dev/null
[eoscp] Total 1000.00 MB |====================| 100.00 % [754.4 MB/s]
[eoscp] #################################################################
[eoscp] # Date : ( 1431881243 ) Sun May 17 09:47:23 2015[eoscp} # auth forced= krb5= gsi=
[eoscp] # Source Name [00] : root://localhost//eos/aliceornl/test/1G
[eoscp] # Destination Name [00] : /dev/null
[eoscp] # Data Copied [bytes] : 1048576000
[eoscp] # Realtime [s] : 1.390000
[eoscp] # Eff.Copy. Rate[MB/s] : 754.371250
[eoscp] # Write Start Position : 0
[eoscp] # Write Stop Position : 1048576000

_________________________________________________________

xrdcp root://localhost//eos/aliceornl/test/1G /dev/null -f
[xrootd] Total 1000.00 MB |====================| 100.00 % [699.1 MB/s]

If there is an issue with permissions, use ruid=0 parameter to map to ‘nobody’, like:

xrdcp "root://localhost//eos/aliceornl/test/1G?ruid=0" /dev/null -f

_________________________________________________________

root@alice-eos-01.ornl.gov:/eos/aliceornl/test
10:29:49 # time for name in `seq 1 1000`; do touch empty.$name; done

real 0m11.058s
user 0m0.409s
sys 0m0.882s

_________________________________________________________

EOS Console [root://localhost] |/eos/aliceornl/test/> mkdir /test
EOS Console [root://localhost] |/eos/aliceornl/test/> test mkdir 1000
info: doing directory test with loop =1000
[ mkdir] startstop : 771.510
= mkdir= startstop : 771.510
EOS Console [root://localhost] |/eos/aliceornl/test/> test rmdir 1000
info: doing directory test with loop =1000
[ rmdir] startstop : 711.595
= rmdir= startstop : 711.595

! Important: BACKUP of MGM !

You should backup

/var/eos/md/

daily. You should backup once the final config

/var/eos/config/default.eoscf

Or you run a standby MGM and use the EOS sync service for a in-time replication of the /var/eos/md files