- Submitting jobs
- Checking the status of the job
- Getting the output
- JDL syntax
- Splitting jobs
- Running Alice production with AliEn
- An example of batch analysis
- Using AliEn from ROOT
- Support
Submitting jobs
To be able to submit a job to AliEn, you have to type alien login.
[pcepalice45] ~ > alien login
Connecting to database alien_system in aliendb.cern.ch:3307 ...
Sep 10 16:41 notice (Re)making connection to ClusterMonitor:pcepalice45.cern.ch:8084
Sep 10 16:41 notice (Re)making connection to CPUServer: aliendb.cern.ch:8083
[
Requirements = ( other.Type == "Job" );
CloseSE =
{
"Alice::CERN::Castor",
"Alice::CERN::File",
"Alice::CERN::scratch",
"Alice::CERN::filepsaiz"
};
Host = "pcepalice45.cern.ch";
Type = "machine"
]
[aliendb.cern.ch:3307] /alice/cern.ch/user/p/psaiz/ >
This gives you another alien prompt. Here you can enter all the commands that we have seen before plus several commands to submit jobs and check
their status. You have to create the definition of the command that you want to submit. This is done using the Job Description Language (JDL). The next
chapter will explain the possible things that you can specify in the JDL.
There are three ways in which you can pass a jdl to AliEn:
-
If the jdl that you want to submit is already in the catalogue, just doing submit from the AliEn prompt will do it:
[aliendb.cern.ch:3307] /alice/cern.ch/user/p/psaiz/ > cat /jdl/SaveFile.jdl
Executable = "SaveFile";
[aliendb.cern.ch:3307] /alice/cern.ch/user/p/psaiz/ > submit /jdl/SaveFile.jdl
Jul 09 17:50 info Submitting job '/bin/SaveFile '...
Jul 09 17:50 info Input Box: {}
Jul 09 17:50 info Command submitted!!
[aliendb.cern.ch:3307] /alice/cern.ch/user/p/psaiz/ >In this case, the jdl from the catalogue can also be a template, with one or more arguments that will be filled at submittion time. For instance,
[aliendb.cern.ch:3307] /alice/cern.ch/user/p/psaiz/ > cat /alice/jdl/AliRoot_PPR.jdl
Executable = "AliRoot.sh";
Packages="AliRoot";
Validate = 1;
Arguments = "--round $1 --run $2 --event $3 --version $4 --grun \"G\",\"F\"";
InputFile= {"LF:/alice/simulation/$1/$4/$2/Config.C",
"LF:/alice/simulation/$1/$4/$2/grun.C"};
[aliendb.cern.ch:3307] /alice/cern.ch/user/p/psaiz/ > submit /alice/jdl/AliRoot_PPR.jdl 2002-02 00076 1 V3.0 \ -
8.03
Jul 09 17:51 info Submitting job '/Alice/bin/AliRoot.sh --round 2002-02 --run 00076 --event 1 --version V3 \ -
.08.03 ...'
Jul 09 17:51 info Input Box: {Config.C grun.C}
Jul 09 17:51 info Command submitted!!
[aliendb.cern.ch:3307] /alice/cern.ch/user/p/psaiz/ > - If you have the jdl in a local file (not registered in AliEn), you can do submit <
[pcepalice45] ~ > echo 'Executable = "SaveFile";' >jdlfile
[pcepalice45] ~ > alien login
Connecting to database alien_system in aliendb.cern.ch:3307 ...
Jul 09 17:52 info Contacting the local host monitor at pcepalice45.cern.ch:8084
Jul 09 17:52 warning Starting remotequeue...
Jul 09 17:52 warning Starting remotequeue...
Error contacting Logger as AliEn/Services/Logger in aliendb.cern.ch:8089[
Requirements = ( other.Type == "Job" );
CloseSE =
{
"Alice::CERN::Castor",
"Alice::CERN::File",
"Alice::CERN::scratch",
"Alice::CERN::filepsaiz"
};
Host = "pcepalice45.cern.ch";
Type = "machine"
]
[aliendb.cern.ch:3307] /alice/cern.ch/user/p/psaiz/ > submit < jdlfile
Jul 09 17:50 info Submitting job '/bin/SaveFile '...
Jul 09 17:50 info Input Box: {}
Jul 09 17:50 info Command submitted!!
[aliendb.cern.ch:3307] /alice/cern.ch/user/p/psaiz/ >You can enter the jdl from the stdin, doing submit < < EOF from the AliEn prompt
[aliendb.cern.ch:3307] /alice/cern.ch/user/p/psaiz/ > submit < < EOF
Enter the input for the job (end with EOF)
Executable = "SaveFile";
EOF
Thanks!!
Jul 09 17:58 info Submitting job '/bin/SaveFile '...
Jul 09 17:58 info Input Box: {}
Jul 09 17:58 info Command submitted!!
[aliendb.cern.ch:3307] /alice/cern.ch/user/p/psaiz/ >At any point, you can also kill and/or resubmit a job.
Checking the status of the job
Once the job is submitted, you can check the status either from the MonALISA web page , or from the AliEn prompt typing 'top'. From here you can
also get the queueId of your job. If there are too many jobs in the system, you can pass arguments to top to display only your jobs, or jobs with
a certain status[aliendb.cern.ch:3307] /alice/cern.ch/user/p/psaiz/ > top
Sep 10 16:55 notice (Re)making connection to ClusterMonitor:pcepalice45.cern.ch:8084
JobId Status Command name Exechost
66812 QUEUED /Alice/bin/Analysis.sh aliprod@pdsflx002.nersc.gov
75492 WAITING /bin/SaveFile
75493 RUNNING /Alice/bin/AliRoot.sh aliprod@lxplus051.cern.ch
[aliendb.cern.ch:3307] /alice/cern.ch/user/p/psaiz/ > top -user psaizSep 10 16:55 notice (Re)making connection to ClusterMonitor:pcepalice45.cern.ch:8084
JobId Status Command name Exechost
75492 WAITING /bin/SaveFile
[aliendb.cern.ch:3307] /alice/cern.ch/user/p/psaiz/ >When your job finishes, and if you have specified in the JDL your email address, you will get an email like the following:
Subject: AliEn-Job 66813 finished with status DONE
AliEn-Job 66813 finished with status DONE
You can see the ouput produced by the job in http://alien.cern.ch/Alien/main?task=job&jobID=66813
The job produced the following files: stdout, stderr, tpc.clusters.root, tpc.tracks.root
You can get the output from the AliEn prompt typing:
get /proc/66813/stdout
get /proc/66813/stderr
get /proc/66813/tpc.clusters.root
get /proc/66813/tpc.tracks.rootYou can also get the files from the shell prompt typing:
alien --exec get /proc/66813/stdout
alien --exec get /proc/66813/stderr
alien --exec get /proc/66813/tpc.clusters.root
alien --exec get /proc/66813/tpc.tracks.rootPlease, make sure to copy any file that you want, since those are temporary files, and will be deleted at some point.
If you have any problem, please contact us
Getting the output
When the job finishes, it will register the stdout, stderr, and any other file specified in the job in the /proc directory of AliEn. There will be a subdirectory /proc//, in which all the files will appear. Just by going to this directory, and getting the files you can see the output of your job.
[pcepalice45] ~ > alien
Connecting to database alien_system in aliendb.cern.ch:3307 ...
[aliendb.cern.ch:3307] /alice/cern.ch/user/p/psaiz/ > cd /proc/55720
Current path is: /proc/55720/
[aliendb.cern.ch:3307] /proc/55720/ > ls -al
drwxr-xr-x psaiz admin 0 Jul 09 17:52 .
drwxr-xr-x admin admin 0 Jul 09 17:52 ..
-rwxr-xr-x psaiz z2 15 Jul 09 23:54 message.out
-rwxr-xr-x psaiz z2 168 Jul 09 23:54 stderr
-rwxr-xr-x psaiz z2 1401 Jul 09 23:54 stdout
[aliendb.cern.ch:3307] /proc/55720/ > cat stdout
Test: ClusterMonitor is at alice01.phys.uu.nl:8084
Execution machine: alice01.phys.uu.nl
This script tests writing a file in the mss
Working Directory is /home/aliprod/alien-job-55720
Making a local copy in AliEn::MSS::File (in /data/)
Configuration done!!Printing something in a file
Saving the output
Registering /home/aliprod/alien-job-55720/message.out as message.out
Saving /home/aliprod/alien-job-55720/message.out to /message.out in File
Jul 09 23:54 info Saving the file /home/aliprod/alien-job-55720/message.out to /data///message.out
Jul 09 23:54 debug Copying the file /home/aliprod/alien-job-55720/message.out, /data///message.out
Jul 09 23:54 info Done and file://alice01.phys.uu.nl/data///message.out
Connecting to database alien_system in aliendb.cern.ch:3307 ...
Jul 09 23:54 debug Initializing Local Cache Manager
Jul 09 23:54 debug Contacting the SE at alien.cern.ch:8092
Jul 09 23:54 debug Checking the permission w on /proc/55720/
Jul 09 23:54 debug DEBUG LEVEL 3 In UserInterface selectDatabase DONE (host 9)
Jul 09 23:54 debug Checking the ownership of /proc/55720/ (I am psaiz)
Jul 09 23:54 debug DEBUG LEVEL 3 In UserInterface selectDatabase DONE (host 9)
Jul 09 23:54 debug Owner: psaiz
Jul 09 23:54 debug Checking ownership
File /proc/55720/message.out inserted in the catalog
Output saved successfully!!Deleting the working directory /home/aliprod/alien-job-55720
[aliendb.cern.ch:3307] /proc/55720/ > exitJDL syntax
These are the possible fields that you can specify in the JDL:
- Executable: This is the only compulsary field in the jdl. It gives the name of the lfn that will be executed. It has to be a file in either /bin or //bin
- Arguments: These will be passed to the executable
- Packages: This constrains the execution of the job to be done in a site where the package is installed. You can also require a specific version of a package. For example, you can put Packages="AliRoot", and it will require the current version of AliRoot, or Packages="AliRoot::3.07.02"
- InputFile:These files will be transported to the node where the job will be executed.They can be either lfn or pfn.
- InputData: It will require that the job is executed in a site close to the files specified here.You can specify patterns, like "LF:/alice/simulation/2003-02/V3.09.06/00143/*/tpc.tracks.root",and then all the LFN that satisfy this pattern will be included.
- OutputFile: The files that will be registered in the catalogue when the job finishes. The syntax is: :>[@[[no_archive][,disk=N][,tape=K]*]]. If no_archive has been specified, the file will not be put in the default archive. See SE Specification for details.
- OutputArchive: Zip archive that should contain the files. The syntax is: :[,]*[@[[no_links_registration][,disk=N][,tape=K]*]]. For instance, "root_archive.zip:*.root", "log_files:stdout,stderr,*.log@no_links_registration". The name of the file can be a pattern with \*. If no_links_registration has been specified, the individual files in the archive will not be registered in the catalogue. See SE Specification for details.
- Validate: Once the job finishes, another job will be sent to validate the output of the original jobs. The validation procedure depends on the executable that is called. For instance, for the executable AliRoot.sh, the validation is AliRoot.sh.Validate
- Email: If you want to receive an email when the job finishes.
- Split: If you want to split your job in several jobs. It can have the following values: file, directory, event, se. Check splitting jobs .
- MasterResubmitThreshold: UNDER DEVELOPMENT If you submit a job that will be split in several subjobs, you can put an automatic resubmission in case some of the subjobs fail. 'MasterResubmitThreshold is the maximum number (or percentage) of failed jobs that you accept (If you want it to be a percentage, just add the symbol '%').
- ResubmitType: UNDER DEVELOPMENT Here, you can specify the jobs that have to be resubmitted. The possible options are:
- all: All the jobs that finished with error or expired
- system: All the jobs that finished with a system error (for instance, downloading some of the input files, or failure setting up some of the packages). It will not resubmit jobs that were not validated (ERROR_V).
- : jobs that are in any of the status specified in the list. For instance, if you set it to 'ERROR_IB,ERROR_E', only the jobs that fail getting the input box or during the execution will be resubmitted.
- MaxFailed: Maximum number of subjobs that can failed. If there are more failures, all the waiting jobs will be killed. It can also be a percentage of the total number of jobs
- MaxInitFailed: Maximum number of initial subjobs that can failed. As soon as one of the subjobs finishes correctly, this field does not have any effect any more.
- TTL (Time To Live): Determines the amount of elapsed time (in seconds) the job is allowed to run. The maximum value is 24 hours (TTL="86400"). The TTL should be kept at the lowest reasonable value to allow multiple jobs of the same user to be executed by the same job agent. At the same time, it should be sufficiently long to assure the task is not killed prematurely.
- MaxWaitingTime: Maximum time that the job stays waiting in the queue to be executed. It can be defined in seconds (by default, or using 's' or 'S'), minutes (using 'm' or 'M') or hours (using 'h' or 'H'), for example, MaxWaitingTime="24h". The biggest possible value is two weeks (1209600 seconds). Only from version 2.20.
Any field (apart from the executable) can be a list of items. For instance, you can require more than one package, using: Packages= {"AliRoot", "ROOT"};
Each line has to end with a semicolon (';')
Any line starting with a hash ('#') will be treated as a comment and ignored
Splitting jobs
AliEn can split one jdl into several jobs. Splitting is useful for example when your job is going to analyse a large number of input files, and the analysis can be done independently for each file.
Splitting jobs, combined with the patterns in the specification of the inputData, makes possible to describe big production with one single JDL.
If the field 'split' is not mentioned in the JDL, AliEn will not split the job.
The user submits only one jdl, with the split field set to one of the possible values. Then, AliEn will separate the job into the different tasks. All the subtasks will have all the jdl fields of the original rask, except the inputdata, that will be a subset of all the inputdata. From now on, we will refer to the jdl submitted by the user as the master job, and each of the subtasks as a subjob.
The possibilities for the split are:
- file: there will be one subjob per file in the InputData
- directory: all the files of one directory will be analysed in the same subjob
- SE: all the files in the same SEs will be analyzed in the same subob
- event: all the files with the same name of the last subdirectory will be analysed in the same subjob
- userdefined: Check the field SplitDefinitions
- production:(-)): this kind of split does not require any inputData. It will submit the same jdl several times (from #start to #finish).
For instance, if in the InputData we specify:
InputData={"LF:/alice/simulation/2002-04/V3.08.Rev.01/00001/*/galice.root",
"LF:/alice/simulation/2002-04/V3.08.Rev.01/00001/*/galiceSDR.root",
"LF:/alice/simulation/2003-02/V3.09.06/00143/*/tpc.tracks.root"};the following table explains how many subjobs will be generated with the different values of split
Split value Jobs file One subjob per each galice.root plus one job per each galiceSDR.root plus one job per each tpc.tracks.root directory One subjob per each pair of galice.root and galiceSDR.root, plus one job per each tpc.tracks.root event One subjob per each group of galice.root, galiceSDR.root and tpc.tracks.root default Only one job for all the inputdata userdefined It will complain that the field SplitDefinitions is not defined (check SplitDefintions) At any moment, you can query the status of all the subjobs of a split job with the command top -split , where is the jobId of the master job. Once all the subjobs have finished, their output will be registered in the catalogue in the directory of the master job /proc//. Also, if you kill the master job, all of its subjobs will be killed.
There are some optional fields that you can define in the JDL and that make the job splitting even more powerful:
- SplitArguments:This is an array of strings. Each subjob will be submitted as many times as there are items in this array, and the subjobs will have the element in the array as in arguments. For instance, if we submit a jdl like:
Executable="myexec";
Split="file";
InputData={ "", ""};
SplitArguments= {"simulate", "reconstruct", "analyze"};that jdl will be split in six different jdls:
Executable="myexec";
InputData={ ""};
Arguments= "simulate";Executable="myexec";
InputData={ ""};
Arguments= "simulate";Executable="myexec";
InputData={ ""};
Arguments= "reconstruct";Executable="myexec";
InputData={ ""};
Arguments= "reconstruct";Executable="myexec";
InputData={ ""};
Arguments= "analyze";Executable="myexec";
InputData={ ""};
Arguments= "analyze";
In the SplitArguments you can specify patterns which will be changed by the name of the inputdata. All the patterns have the format #alien#, where entry can be 'first', 'last' and 'all', and name can be 'fulldir', 'filename', 'dir'. So for instance, the pattern #alienfirstfulldir# will be changed by the LFN of the first inputdata in this subjob; the pattern #alienlastdir# will be changed by the name of the directory of the last entry of the input data in this subjob.
- SplitMaxInputFileNumber: Defines the maximum number of files that are in each of the subjobs. For instance, if you split per SE, but putting 'SplitMaxInputFileNumber=10', you will be sure that no subjob will have more than ten input data.
- SplitMaxInputFileSize: Similar to the previous one, but puts the limit in the size of the file.
- SplitDefinitions:This is a list of jdls. If the user defines them, AliEn will take those jdls as the subjobs, and all of them would behave as if they were subjobs of the original job (for instance, if the original jobs gets killed, all of them will get killed, and once all of the subjobs finish, their output will be copied to the master job).
Running Alice production with AliEn
To generate Monte-Carlo events and reconstruct these events to produce data in the ESD format you need to prepare a few files:
- Config. C: this is a root macro where you select the transport model and the associated parameters, the event generator and the associated parameters and the active detectors and their geometries. Several examples can be found in $ALICE_ROOT/macros.
- sim.C and rec.C : these two root macros wrap the AliSimulation and AliReconstruction classes. They are the steering macros which are called from the aliroot prompt and launch either the simulation or the reconstruction process.
- sim.C
void sim() {
AliSimulation simu;
TStopwatch timer;
timer.Start();
simu.Run();
timer.Stop();
timer.Print();
}
-
- rec.C
void rec() {
AliReconstruction reco;
TStopwatch timer;
timer.Start();
reco.Run();
timer.Stop();
timer.Print();
}
- simRun.C: this is the root macro that launches all the tasks to be performed by every job. An example is given next:
// #define VERBOSEARGS
// simrun.C
{ // extract the run and event variables given with --run --event
int nrun = 0;
int nevent = 0;
int seed = 0;
char sseed[1024];
char srun[1024];
char sevent[1024];
sprintf(srun,"");
sprintf(sevent,"");
for (int i=0; iArgc();i++){
#ifdef VERBOSEARGS
printf("Arg %d: %s\n",i,gApplication->Argv(i));
#endif
if (!(strcmp(gApplication->Argv(i),"--run")))
nrun = atoi(gApplication->Argv(i+1));
sprintf(srun,"%d",nrun);
if (!(strcmp(gApplication->Argv(i),"--event")))
nevent = atoi(gApplication->Argv(i+1));
sprintf(sevent,"%d",nevent);
}
seed = nrun * 100000 + nevent;
sprintf(sseed,"%d",seed);
if (seed==0) {
fprintf(stderr,"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n");
fprintf(stderr,"!!!! WARNING! Seeding variable for MC is 0 !!!!\n");
fprintf(stderr,"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n");
} else {
fprintf(stdout,"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n");
fprintf(stdout,"!!! MC Seed is %d \n",seed);
fprintf(stdout,"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n");
}
// set the seed environment variable
gSystem->Setenv("CONFIG_SEED",sseed);
gSystem->Setenv("DC_RUN",srun);
gSystem->Setenv("DC_EVENT",sevent);
gSystem->Exec("cp $ROOTSYS/etc/system.rootrc .rootrc");
gSystem->Exec("aliroot -b -q sim.C > sim.log 2>&1");
gSystem->Exec("aliroot -b -q rec.C > rec.log 2>&1");
}
- xxx.jdl: this is the job description language (jdl) file where directives are given for the job execution, such as the software versions, where in the AliEn catalog to find the input files (those listed previously), where to put the results, the requirements for the resource broker, the number of events to process,� An example is given below:
Executable = "aliroot";
Jobtag={"comment:PDC05 flow events�, "round::2005�, "tag:v4-03-04", "type:Flow"};
Packages={"AliRoot::v4-03-04�, "GEANT3::v1-3"};
TTL = "86400";
Validationcommand ="/alice/cern.ch/user/a/aliprod/production_2005/flow/configs/validation.sh";
Requirements = ( other.CE == "Alice::LCG::Catania" );
InputFile= {"LF:/alice/cern.ch/user/a/aliprod/pdc05_flow/production/configs/Config.C",
"LF:/alice/cern.ch/user/a/aliprod/pdc05_flow/production/configs/simrun.C",
�LF:/alice/cern.ch/user/a/aliprod/pdc05_flow/production/configs/sim.C",
"LF:/alice/cern.ch/user/a/aliprod/pdc05_flow/production/configs/rec.C",
"LF:/alice/cern.ch/user/a/aliprod/pdc05_flow/production/configs/CheckESD.C"};
OutputArchive={"root_archive:*.root@Alice::CERN::Castor2", "log_archive:*.log,stdout,stderr@Alice::CERN::se01"};
OutputDir="/alice/cern.ch/user/a/aliprod/production_2005/flow/output_allfiles/$1/#alien_counter_03i#";
splitarguments="simrun.C --run $1 --event #alien_counter#";
split="production:1-50";
Workdirectorysize={"1500MB"};
Once your files are ready and tested on one event, you have to add all the previously listed files in the AliEn catalogue in a directory of your choice.
> alien
Nov 22 11:29:18 info Warning: cannot create envelope sealing engine = setting backdoor
[aliendb5.cern.ch:3307] /alice/cern.ch/user/s/schutz/ > cd analysis/test/
[aliendb5.cern.ch:3307] /alice/cern.ch/user/s/schutz/analysis/test/ > add test.jdl file test.jdl Alice::Cern::Castor
Nov 22 11:31:38 info access: warning - we are using the backdoor ....
Nov 22 11:31:38 info Registering the file
file://ccali.in2p3.fr/afs/in2p3.fr/group/alice/schutz/alien/test.jdl in Alice::CCIN2P3::cclcgalice
Nov 22 11:31:38 info Initializing Service
Nov 22 11:31:38 ApMon[INFO]: Added destination 193.48.99.73:8884: with default options.
Nov 22 11:31:38 info Contacting SE SE_Alice::CCIN2P3::cclcgalice, and tell it to
pick up file://ccali.in2p3.fr/afs/in2p3.fr/group/alice/schutz/alien/test.jdl
Nov 22 11:31:38 info Copying a file into an SE
Nov 22 11:31:42 info Getting the file
file://cclcgalice.in2p3.fr/sps/alice/01/13077/2bd8a68a-5b43-11da-9dd6-00096b58e933.1132655501 of size 1161
Nov 22 11:31:42 info File /alice/cern.ch/user/s/schutz/analysis/test/test.jdl inserted in the catalog
[aliendb5.cern.ch:3307] /alice/cern.ch/user/s/schutz/analysis/test/ > ls
test.jdl
You are now all set to submit the production.
ccali28:~> alien login
Nov 17 14:43:01 info Error contacting the local SE
Nov 17 14:43:01 info Warning: cannot create envelope sealing engine = setting backdoor
Nov 17 14:43:01 notice Starting remotequeue...
Nov 17 14:43:01 info
[
Requirements = ( other.Type == "Job" );
CloseSE =
{
"Alice::CCIN2P3::cclcgalice"
};
CE = "Alice::CCIN2P3::BQS";
Host = "cclcgalice.in2p3.fr";
LocalDiskSpace = 9000000;
WNHost = "ccali28.in2p3.fr";
Memory = 2054924;
TTL = "112800";
Type = "machine";
Uname = "2.4.21-32.0.1.ELsmp";
FreeMemory = 68024;
Swap = 4192912;
GridPartitions =
{
"SC05"
};
FreeSwap = 3769436
]
[aliendb5.cern.ch:3307] /alice/cern.ch/user/s/schutz/ > submit /alice/cern.ch/user/s/schutz/analysis/test/test.jdl 99
Nov 22 11:41:36 info Submitting job '/alice/bin/aliroot '...
Nov 22 11:41:37 info Checking if the packages GEANT3::v1-3 AliRoot::v4-03-04 are defined in the system
Nov 22 11:42:03 info The PackMan has the following packages:
admin@AliRoot::v4-03-05
admin@GEANT::v1-1
aliprod@ROOT::4.03.02
VO@AliRoot::4.02.07
VO@AliRoot::v4-02-Rev-01
VO@AliRoot::v4-03-03
VO@AliRoot::v4-03-04
VO@GEANT3::2.1
VO@GEANT3::v1-1
VO@GEANT3::v1-3
VO@ROOT::4.03.02
VO@ROOT::v4-04-02
VO@ROOT::v5-02-00
Nov 22 11:42:03 info Job is going to be splitted for production, running from 1 to 5
Nov 22 11:42:03 info Input Box: {CheckESD.C Config.C rec.C sim.C simrun.C}
Nov 22 11:42:04 info Command submitted (job 57943)!!
[aliendb5.cern.ch:3307] /alice/cern.ch/user/s/schutz/analysis/test/ >
Remember the job id. It is mandatory to follow the execution of your production. This is also the id you have to provide to the expert in the unlikely case where something is going wrong.
[aliendb5.cern.ch:3307] /alice/cern.ch/user/s/schutz/ > ps �rs �id 57943
schutz 57943 RS /alice/bin/aliroot
schutz -57944 ST /alice/bin/aliroot
schutz -57945 ST 00 /alice/bin/aliroot
schutz -57946 ST 00 /alice/bin/aliroot
schutz -57947 ST 00 /alice/bin/aliroot
schutz -57948 ST 00 /alice/bin/aliroot
aliendb5.cern.ch:3307] /alice/cern.ch/user/s/schutz/ > ps �WAs �id 57943
57943 -production:1-5-subjobs-- /alice/bin/aliroot RS 0 :0 :0 .0
-57944 Alice::Prague::PBS goliasx85.farm.particle.cz goliasx31.farm.particle.cz /alice/bin/aliroot ST 0 :0 :4 .30
-57945 Alice::Prague::PBS goliasx85.farm.particle.cz goliasx31.farm.particle.cz /alice/bin/aliroot ST 0 :0 :4 .33
-57946 Alice::LCG::Torino egee-wn122.torinoegee alibox.to.infn.it /alice/bin/aliroot ST 0 :0 :0 .43
-57947 Alice::LCG::Torino egee-wn110.torinoegee alibox.to.infn.it /alice/bin/aliroot ST 0 :0 :0 .44
-57948 Alice::LCG::Torino egee-wn122.torinoegee alibox.to.infn.it /alice/bin/aliroot ST 0 :0 :0 .33
If for any reasons an entire job or one or more subjobs failed you can resubmit it and only the failed jobs will be reprocessed:
[aliendb5.cern.ch:3307] /alice/cern.ch/user/s/schutz/ > masterJob 57943 resubmit
The command masterJob has several more functionalities: provide information on all subjobs, kill jobs, merge,� (see man masterJob)
An example of batch analysis
One of the jobs that can be submitted in alien is an Analysis job. This job will call aliroot with a macro specified by the user. If you want to see the real code of this command, take a look at the file /alice/bin/Analysis.sh in the catalogue. Therefore, you can analyze any event generated during the productions, or any other file registered in the catalogue. The job will be executed in the place where the data is kept, therefore minimizing the transfer of big files.
To submit one of these jobs, you will have to specify in your jdl:
- Executable="Analysis.sh";
- Packages="AliRoot";
- InputFile="LF:/alice/cern.ch/user/j/jchudoba/AliTPCTracking.C";1
- Arguments="--macro AliTPCTracking.C ";2
- InputData={"LF:/alice/simulation/2002-04/V3.08.Rev.01/00001/00100/galice.root", "LF:/alice/simulation/2002-04/V3.08.Rev.01/00001/00100/galiceSDR.root"}3
- OutputFile={"tpc.tracks.root","tpc.clusters.root"};4
- Email="your.email@address";
1This is one of the most important fields. It is the macro created by the user. In case the user has already registered the macro in the catalogue, it will be in the previous format. In case the file is not in the catalogue, it will be like "PF:"
2The arguments that will be passed to aliroot and to your macro.
3The data that is going to be analyzed. Only sites that have a copy of those files will be able to execute the job. All these files will be copied to the working directory of the job.
4The files that have to be saved.
For instance:
[aliendb.cern.ch:3307] /alice/bin/ > submit < < EOF
Enter the input for the job (end with EOF)
Executable="Analysis.sh";
Packages="AliRoot";
InputFile="LF:/alice/cern.ch/user/j/jchudoba/AliTPCTracking.C";
Arguments="--macro AliTPCTracking.C ";
InputData={"LF:/alice/simulation/2002-04/V3.08.Rev.01/00001/00100/galice.root",
"LF:/alice/simulation/2002-04/V3.08.Rev.01/00001/00100/galiceSDR.root"};
OutputFile={"tpc.tracks.root","tpc.clusters.root"};
Email=" pablo.saiz@alien.cern.ch";
EOF
Thanks!!
Sep 10 18:09 info Submitting job '/Alice/bin/Analysis.sh --macro AliTPCTracking.C '...
Sep 10 18:09 info Input Box: {galiceSDR.root galice.root AliTPCTracking.C}
Sep 10 18:09 notice (Re)making connection to ClusterMonitor:pcepalice45.cern.ch:8084
Sep 10 18:10 info Command submitted!!
[aliendb.cern.ch:3307] /alice/bin/ > top
Sep 10 18:10 notice (Re)making connection to ClusterMonitor:pcepalice45.cern.ch:8084
JobId Status Command name Exechost
66812 QUEUED /Alice/bin/Analysis.sh aliprod@pdsflx002.nersc.gov
75492 WAITING /bin/SaveFile
75493 RUNNING /Alice/bin/AliRoot.sh aliprod@lxplus051.cern.ch
75502 WAITING /Alice/bin/AliRoot.sh
[aliendb.cern.ch:3307] /alice/bin/ >
Once the job is executed, you get a mail like
AliEn-Job 75502 finished with status DONE
You can see the ouput produced by the job in http://alien.cern.ch/Alien/main?task=job&jobID=75502
The job produced the following files: stdout, stderr, tpc.clusters.root, tpc.tracks.root
You can get the output from the AliEn prompt typing:
get /proc/75502/stdout
get /proc/75502/stderr
get /proc/75502/tpc.clusters.root
get /proc/75502/tpc.tracks.root
You can also get the files from the shell prompt typing:
alien --exec get /proc/75502/stdout
alien --exec get /proc/75502/stderr
alien --exec get /proc/75502/tpc.clusters.root
alien --exec get /proc/75502/tpc.tracks.root
Please, make sure to copy any file that you want, since those are temporary files, and will be deleted at some point.
If you have any problem, please contact us
And if you execute the commands in the mail, you can get the output:
[pcepalice45] ~ > alien --exec get /proc/75502/stdout
Sep 10 19:27 info Getting the file soap://lxplus051.cern.ch:8084/tmp/AliEn/log/proc/75502/stdout?URI=ClusterMonitor
And the file is /tmp/AliEn/cache/stdout.10633
[pcepalice45] ~ > cat /tmp/AliEn/cache/stdout.10633
Test: ClusterMonitor is at lxplus051.cern.ch:8084
Execution machine: lxbatch520.cern.ch
Working Directory is /pool/lsf/aliprod/672990/alien-job-75502
Setting Package AliEn ROOT 3.03.07
...
Executing Analysis with macro AliTPCTracking.C
Arguments
Constant Field Map1 created: map= 1, factor= 1.000000
WELCOME to ALICE
Processing AliTPCTracking.C()...
...
Using AliEn from ROOT
Once you have produced data in the ESD format, you are ready to start the analysis phase. Again several files need to be edited. The first one is the analysis macro. This can be any root macro but it is highly recommended that you create an analysis class that derives from a Root TSelector. The canevas of such a macro can be generated in a root session:
root [0] TTree t
root [1] t->MakeSelector("esdAna")
Info in : Files: esdAna.h and esdAna.C generated from TTree:
(Int_t)0
You need to edit the header and implementation files to customize it to the esd tree:
In esdAna.h add the following lines, at the appropriate place:
#include "AliESD.h"
// Declaration of leave types
AliESD *fESD; //!
In esdAna.C add the following lines, at the appropriate place:
esdAna::~esdAna()
delete fESD ;
esdAna::SlaveBegin()
// here you insert all the definitions of the histograms you want to increment
esdAna::Init()
fChain->SetBranchAddress("ESD", &fESD);
esdAna::Process()
fChain->GetTree()->GetEntry(entry);
Int_t ntracks = fESD->GetNumberOfTracks();
printf("Number of tracks %d\n",ntracks);
for (Int_t tr = 0; tr < ntracks; tr++) {
// your algorithme }
esdAna::SlaveTerminate()
// here you save all the objects (histograms, you have created)
esdAna::Terminate()
// Here you can put all the actions, such as display, you want to perform at the end of the analysis of ALL events
The next thing to do is to test the macro on one event. This you do in interactive mode from the root prompt, by connecting to the AliEn catalogue and accessing one of the event files you have produced earlier, and process it:
root [0] TGrid::Connect(�alien://pcapiserv01.cern.ch:9000�, �schutz�) ;
Password:
*******************************************************************
* Welcome to the ALICE VO at alien://pcapiserv01.cern.ch:9000
* ApiService by Derek Feichtinger/Andreas-J.Peters
* Running with Server V 1.0.5
root [1] TChain * cESDTree = new TChain(�esdTree�) ;
root [2] file= �alien:///alice/cern.ch/users/s/schutz/analysis/test/output/Esd.root�?se=ALICE::CERN::Castor�) ;
root [3] cESDTree->Add(file) ;
root [4] cESDTree->Process(�esdAna.C+�)
Quick Reference to use dynamic SE discovery with failover:
Specifying in a JDL...
to store an job's output file according to the VOs default policy (for ALICE 'disk=2'):
to save number of copies depending on QoS flags, using dynamic discovery and failover:
Using the add command...
Default policy (for ALICE 'disk=2')
add fileLFN filePFN
Specify count of QoS flags, using dynamic discovery and failover:
add fileLFN filePFN disk=2,tape=1
Detailed Feature Reference of the SE specification discovery:
Specifying the Output in a JDL:
OutputArchive = {” archive: file1, file2 @ <STORAGETAGS> , <JobOutPutOptions> ”};
OutputFile = {” file3, file4, file5 @ <STORAGETAGS> , <JobOutPutOptions> ”};
<JobOutPutOptions> may be a comma separated combination of:
"no_links_registration" The files in the archive will not be registered in the catalogue
(you will only see the archive, and in case have to extract files manually).
"no_archive" files normally go to an archive, even if they are specified as files. with
this parameter they will actually be stored as files
Specifying to add or upload a file (a JobAgent uses upload to handle its output files).
upload file-PFN # for UploadOptions see 'help upload'
add file-LFN file-PFN # for AddOptions see 'help add'
The Storage Element Specification by :
may be a comma separated combination of:
# get the best N SEs for a QoS type (dynamic SE discovery)
# a list of Storage Elements (use/not)
<SELIST>, # a list of SEs to select N out of it
: The SE Rank Cache will be asked for the topmost N
entries for a given QoS.
e.g. disk=3
or tape=1
or disk=1,tape=1
If any SE fails, the Rank Cache will be requested to
deliver an adapted failover list of SEs and we won't give up
until all known+working SEs are tried out.
: A list of SEs, that can be positive (implicit)
and negative. Negative exists in order to exclude SEs
from auto discovery (see Examples below).
e.g. ALICE::NIHAM::FILE, ALICE::SARA::DCACHE, ALICE::SaoPaulo::DPM
or ALICE::NIHAM::FILE, !ALICE::SARA::DCACHE, ALICE::SaoPaulo::DPM
or !ALICE::NIHAM::FILE, !ALICE::SARA::DCACHE, !ALICE::SaoPaulo::DPM
This positive SELIST specification will be considered as static.
There will be no auto discovery or replacement based on availability
or ranking and no retry or failover in case of error. Beyond, ensure you have the permissions to write to a Storage Element before specifying it explicitly, as access might be restricted to certain users (See Exclusive SEs).
→ If you specify a SE that will not work, the job will finish with
either status <SAVED_WARNING> or even <ERROR_SV>. In the first case, the JobAgent could store in at least one location, in the later it couldn't store in any location and the job output is lost.
: Use any N elements of the list (optional combination with )
e.g. ALICE::Catania::SE,ALICE::NIHAM::DCACHE,ALICE::SaoPaulo::DPM, select=2
If N < # of positive entries in SELIST, then it will try all possibilties
in case of error. Apart from that failback within the user specification,
it is static.
Remarks:
The definition of QoS flags is independent of the code. The actual flags are stored
in the LDAP entry of an SE. The ALICE VO currently uses 'disk' and 'tape' as flags
accessable for users. (see upcoming documentation about exclusive SE users).
In case of no user-side SE specification, an LDAP entry defines the default policy
of upload and add, which is currently 'disk=2' for the ALICE VO.
By whatever method, you only may specify up to 9 replicas/copies in total. A user
speficication is evaluated and just truncated, once it exceeds this limit.
Some Examples for JDLs:
Store an output file according to the VOs default policy (for ALICE 'disk=2'):
OutputFile={”file*”};
Specify count of QoS flags, using dynamic discovery and failover:
OutputFile={”fileA*@disk=4” , ”fileB*@disk=2,tape=1”};
Specify static SEs (Without availability or performance ranking and no failover!):
OutputFile={”file*@ALICE::EARTH::SE,ALICE::VENUS::SE”};
Speficy static SEs with select N out of list.
OutputFile={”file*@ALICE::MARS::SE,ALICE::JUPITER::SE,ALICE::PLUTO::SE,select=2”};
Specify to exclude certain SEs from dynamic discovery:
OutputFile={”file*@!ALICE::MARS::SE,!ALICE::PLUTO::SE,disk=2”};
Speficy static SEs, plus a count of QoS flags with dynamic discovery:
OutputFile={”file*@ALICE::PLUTO::SE,ALICE::JUPITER::SE,!ALICE::MARS::SE,disk=1”};
Full combination of possible features:
OutputFile={”file*@ALICE::MARS::SE,ALICE::VENUS::SE,select=1,!ALICE::JUPITER::SE,disk=2,tape=3”};
Finally, the entries are not required to be in any order:
OutputFile={”file*@tape=3,!ALICE::JUPITER::SE,select=1,ALICE::MARS::SE,ALICE::VENUS::SE,disk=2”};
For add and upload at the alien prompt, the syntax and semantics are the same.
Some examples for the command add:
Default policy (for ALICE 'disk=2')
add fileLFN filePFN
Specify count of QoS flags, using dynamic discovery and failover:
add fileLFN filePFN disk=2,tape=1
Full combination of possible features:
add fileLFN filePFN ALICE::PLUTO::SE,ALICE::JUPITER::SE,!ALICE::MARS::SE,disk=1
Support
If you still have any questions, comments, problems... please check the FAQ or contact us. Thanks!
The AliEn team