forked from ECALELFS/ECALELF
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathDATAFORMATS.html
More file actions
100 lines (98 loc) · 7.09 KB
/
DATAFORMATS.html
File metadata and controls
100 lines (98 loc) · 7.09 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<title>ECALELF: Dataformats</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<link href="doxygen.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<!-- Generated by Doxygen 1.6.1 -->
<div class="navigation" id="top">
<div class="tabs">
<ul>
<li><a href="index.html"><span>Main Page</span></a></li>
<li class="current"><a href="pages.html"><span>Related Pages</span></a></li>
<li><a href="modules.html"><span>Modules</span></a></li>
<li><a href="annotated.html"><span>Classes</span></a></li>
<li><a href="files.html"><span>Files</span></a></li>
<li><a href="dirs.html"><span>Directories</span></a></li>
</ul>
</div>
</div>
<div class="contents">
<h1><a class="anchor" id="DATAFORMATS">Dataformats </a></h1><p>The calibration in 2012 with electrons (single and double electron samples) will be performed using the alcareco data format that contains only relevant collections for the ECAL calibration with electrons.</p>
<p>If you have not followed the instructions in the Calibration/README file please do it.</p>
<p>ALCARECO and ALCARAW private production logic.</p>
<ul>
<li>EcalUncalElectron (ALCARAW): Special data format for rereco purpose<ul>
<li>in CMSSW from the 7_X_X release</li>
<li>no official ALCARAW have been produced so far</li>
<li>Actually produced privately starting from RAW-RECO Z and W skims</li>
</ul>
</li>
<li>EcalCalElectron (ALCARECO): Reduced format for RECO data<ul>
<li>can be produced starting from RECO or AOD standard formats</li>
<li>this is the stardard format after rereco of data using ECALELF (starting from ALCARAW)</li>
<li>Now in CMSSW and produced centrally with a ZW skim applied (WP90 selection) -> the ALCARECO format has been updated, the CMSSW version is deprecated</li>
<li>can be produced also privately starting from RAW-RECO Z and W skims</li>
</ul>
</li>
</ul>
<p>production</p>
<ul>
<li>alcaraw_datasets.dat</li>
<li>alcareco_datasets.dat</li>
<li>alcarereco_datasets.dat</li>
<li>ntuple_datasets.dat</li>
</ul>
<ul>
<li>For each line one crab task is created.</li>
<li>Their contents are not exactly the same.</li>
<li>Lines can be commented with a # at the beginning of the line.</li>
<li>Columns are separated by TAB.</li>
</ul>
<h2><a class="anchor" id="alcarawDatasetsFormat">
alcaraw_datasets.dat</a></h2>
<div class="fragment"><pre class="fragment">
# RUNRANGE DATASETPATH DATASETNAME STORE_PATH_BASE USER_REMOTE_DIR_BASE VALIDITY PERIOD
190456-193621 /DoubleElectron/Run2012A-ZElectron-13Jul2012-v1/RAW-RECO DoubleElectron-ZSkim-RUN2012A-13Jul-v1 caf group/alca_ecalcalib/ecalelf/alcaraw VALID RUN2012ABC,Cal_Nov2012
</pre></div><p>For each line a crab task is created</p>
<ul>
<li>RUNRANGE: only events in the indicated run range are processed</li>
<li>DATASETPATH: This is the real DBS dataset path (full name)</li>
<li>DATASETNAME: This is a short name which will be used now on to refer to the specified DATASET. The DATASETNAME is used in the construction of the output folders!</li>
<li>STORE_PATH_BASE: indicates the storage element where to store the output it can be: caf or T2_IT_Rome (if other are needed, please contact cms-ecalelf-devel)</li>
<li>USER_REMOTE_DIR: this is the directory on the Storage Element under the store/ folder Usually files are stored on the T2_CH_CERN EOS system in the ALCA group folders: group/alca_ecalcalib/ecalelf/alcaraw (private production) It can be "database" that indicates that the dataset is published and available in DBS (e.g. official ALCARECO)</li>
<li>VALIDITY (only meaningful for ALCARAW): for each line in this file the corresponding ALCARAW files are present on EOS. There can be the necessity to exclude from the RERECO process some ALCARAW datasets because obsolete or superseeded by others. For example a new alcaraw can be produced on the same run range using a newer central rereco, in this way the RECO quantities in the alcaraw have better conditions, like eleID and iso variables. Possible values are:<ul>
<li>VALID (good for rereco)</li>
<li>INVALID (not used for rereco) In this way it's not necessary to remove the files from EOS and in the alcaraw_datasets.dat there is always the list of folders and files present on EOS.</li>
</ul>
</li>
<li>PERIOD: in case of rereco, you want to run on more than one line (more run ranges). To make life easier, a comma separated list of "periods" can be associated to the line. In this way, when launching a rereco, we can specify just the period and all the lines associated to it will be rerecoed (if valid!)</li>
</ul>
<h2><a class="anchor" id="alcarecoDatasetsFormat">
alcareco_datasets.dat</a></h2>
<p>In this file are reported all the datasets for which the ALCARECO is produced with the same sintax as alcaraw_datasets.dat . They can be produced centrally in ALCARECO format (prompt or rerecoes) or produced by the user from AOD or RECO or MC (AODSIM). For the official alcareco, it has to be indicated using the word "database" instead of group/alca_ecalcalib/ecalelf/alcareco and ZAlcaSkim in the dataset name to keep memory of the origin of the ALCARECO (if you want to produce the same privately for example). e.g. DoubleElectron-ZAlcaSkim-RUN2012D-22Jan-v1 (centrally produced) instead of DoubleElectron-ZSkim-RUN2012D-22Jan-v1 (private production)</p>
<h2><a class="anchor" id="alcarerecoDatasetsFormat">
alcarereco_datasets.dat</a></h2>
<p>This file is filled automatically by the rereco scripts.</p>
<p>One example line is: </p>
<div class="fragment"><pre class="fragment">
# RUNRANGE DATASETPATH DATASETNAME STORE_PATH_BASE USER_REMOTE_DIR ALCARAW_REMOTE_DIR TAG
190456-193621 /DoubleElectron/Run2012A-ZElectron-13Jul2012-v1/RAW-RECO DoubleElectron-ZSkim-RUN2012A-13Jul-v1 caf.cern.ch group/alca_ecalcalib/ecalelf/alcarereco group/alca_ecalcalib/ecalelf/alcaraw Cal_Nov2012_ICEle_v1
</pre></div><p>In this file there are two additional fields:</p>
<ul>
<li>ALCARAW_REMOTE_DIR: this is the output directory where of the ALCARAW production step, if for any reason the standard ALCARAW is produced elsewhere, this should be reported here by the rereco script (you should provide this information with the script option)</li>
<li>TAG: this is the "rereco name". This will be explained better in the rereco section</li>
</ul>
<p>#### OUTPUT DIRECTORIES: The remote directory in the storage element is set as follows: </p>
<div class="fragment"><pre class="fragment">
USER_REMOTE_DIR=$USER_REMOTE_DIR_BASE/${ENERGY}/${DATASETNAME}/${DATASETNAME}-${RUNRANGE:-allRange}
</pre></div><p> where ENERGY is 7TeV for 2011 datasets and 8TeV for 2012 datasets </p>
</div>
<hr size="1"/><address style="text-align: right;"><small>Generated on 24 Jun 2014 for ECALELF by
<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.6.1 </small></address>
</body>
</html>