In this chapter we describe facilities available in Gaudi to create and retrieve n-tuples. We discuss how Event Collections, which can be considered an extension of n-tuples, can be used to make preselections of event data. Finally, we explore some possible tools for the interactive analysis of n-tuples.
User data - so called n-tuples - are very similar to event data. Of course, the scope may be different: a row of an n-tuple may correspond to a track, an event or complete runs. Nevertheless, user data must be accessible by interactive tools such as PAW or ROOT.
Gaudi n-tuples allow to freely format structures. Later, during the running phase of the program, data are accumulated and written to disk.
The transient image of an n-tuple is stored in a Gaudi data store which is connected to the n-tuple service. Its purpose is to store user created objects that have a lifetime of more than a single event.
As with the other data stores, all access to data is via a service interface. In this case it is via the INTupleSvc interface which extends the IDataProviderSvc interface. In addition the interface to the n-tuple service provides methods for creating n-tuples, saving the current row of an n-tuple or retrieving n-tuples from a file. The n-tuples are derived from DataObject in order to be storable, and are stored in the same type of tree structure as the event data. This inheritance allows to load and locate n-tuples on the store with the same smart pointer mechanism as is available for event data items (c.f. Chapter 6).
The Algorithm base class defines a member function
INTupleSvc* ntupleSvc() |
which returns a pointer to the INTupleSvc interface.
The n-tuple service provides methods for the creation and manipulation of n-tuples and the location of n-tuples within the persistent store.
The top level directory of the n-tuple transient data store is called "/NTUPLES". The next directory layer is connected to the different output streams: e.g. "/NTUPLES/FILE1", where FILE1 is the logical name of the requested output file for a given stream. There can be several output streams connected to the service. In case of persistency using HBOOK, "FILE1" corresponds to the top level RZ directory of the file (...the name given to HROPEN). From then on the tree structure is reflected with normal RZ directories (caveat: HBOOK only accepts directory names with less than 8 characters! It is recommended to keep directory names to less than 8 characters even when using another technology (e.g. ROOT) for persistency, to make the code independent of the persistency choice.).
When defining an n-tuple the following steps must be performed:
In the following an attempt is made to explain the different steps. Please note that when using HBOOK for persistency, the n-tuple number must be unique and, in particular, that it must be different from any histogram number. This is a limitation imposed by HBOOK. It is recommended to keep this number unique even when using another technology (e.g. ROOT) for persistency, to make the code independent of the persistency choice.
When creating an n-tuple it is necessary to first define the tags to be filled in the n-tuple. Typically the tags belong to the filling algorithm and hence should be provided in the Algorithm's header file. Currently the following data types are supported: bool, long, float and double. double types (Fortran REAL*8) need special attention if using HBOOK for persistency: the n-tuple structure must be defined in a way that aligns double types to 8 byte boundaries, otherwise HBOOK will complain. In addition PAW cannot understand double types. Listing 38 illustrates how to define n-tuple tags:
When booking the n-tuple, the previously defined tags must be declared to the n-tuple. Before booking, the proper output stream (file) must be accessed. The target directory is defined automatically.
Tags which are not declared to the n-tuple are invalid and will cause an access violation at run-time.
Tags are usable just like normal data items, where
There is no implicit bounds checking possible without a rather big overhead at run-time. Hence it is up to the user to ensure the arrays do not overflow.
When all entries are filled, the row must be committed, i.e. the record of the n-tuple must be written.
Although n-tuples intended for interactive analysis, they can also be read by a regular program. An example of reading back such an n-tuple is given in Listing 41. Notice line 8, where an example is given of preselecting rows of the n-tuple according to given criteria. This option is only possible if supported by the underlying database used to make the n-tuple persistent. Currently it is possible to preselect rows from n-tuples written in ROOT format and from relational databases using ODBC. Note that the syntax of the query is also affected by the underlying technology: while an ODBC database will accept any SQL query, the ROOT implentation understands only the "And" and "Or" SQL operators - but it does understand the full C++ syntax (an example is given in section 10.3.2).
Conversion services exist to convert n-tuple objects into a form suitable for persistent storage in a number of storage technologies. In order to use this facility it is necessary to add the following line in the job options file:
NTupleSvc.Output = {"FILE1 DATAFILE='tuples.hbook' TYP='HBOOK' OPT='NEW'", "FILE2 ...", ... "FILEN ..."}; |
where <tuples.hbook> should be replaced by the name of the file to which you wish to write the n-tuple. FILE1 is the logical name of the output file - it could be any other string. A similar option
NTupleSvc.Input
exists for n-tuple input.
The following is a complete overview of all possible options:
These database technologies are supported through their ODBC interface. They were tested privately on local installations. However all these types need special setup to grant access to the database.
Except for the HBOOK data format, you need to specify the use of the
DbCnv
package in your CMT requirements file and to load explicitly in the job options the DLLs
DbConverters
and either
RootDb
or
OdbcDb
. These DLLs contain the specific database access drivers implementations.
Connect this file directly to an existing conversion service. This option however needs special care. It should only be used to replace default services.
For protected datafiles (e.g. Microsoft Access) it can happen that the file is password protected. In this case the authentication string allows to connect to these databases. The connection string in this case is the string that must be passed to ODBC, for example: AUTH='SERVER=server_host;UID=user_name;PWD=my_password;'
For all options at most three leading characters are significant: DATAFILE=<...>, DATABASE=<...> or simply DATA=<...> would lead to the same result.
The handling of row wise n-tuples does not differ. However, only individual items (class NTuple::Item) can be filled, no arrays and no matrices. Since the persistent representation of row wise n-tuples in HBOOK is done by floats only, the first row of each row wise n-tuple contains the type information - when looking at a row wise n-tuple with PAW make sure to start at the second event!
Event collections or, to be more precise, event tag collections, are used to minimize data access by performing preselections based on small amounts of data. Event tag data contain flexible event classification information according to the physics needs. This information could either be stored as flags indicating that the particular event has passed some preselection criteria, or as a small set of parameters which describe basic attributes of the event. Fast access is required for this type of event data.
Event tag collections can exist in several versions:
Starting from this definition an event tag collection can be interpreted as an n-tuple which allows to access the data used to create the n-tuple. Using this approach any n-tuple which allows access to the data is an event collection.
Event collections allow pre-selections of event data. These pre-selections depend on the underlying storage technology.
First stage pre-selections based on scalar components of the event collection. First stage preselection is not necessarily executed on your computer but on a database server e.g. when using ORACLE. Only the accessed columns are read from the event collection. If the criteria are fulfilled, the n-tuple data are returned to the user process. Preselection criteria are set through a job options, as described in section 10.3.2.
The second stage pre-selection is triggered for all items which passed the first stage pre-selection criteria. For this pre-selection, which is performed on the client computer, all data in the n-tuple can be used. The further preselection is implemented in a user defined function object (functor) as described in section 10.3.2. Gaudi algorithms are called only when this pre-selector also accepts the event, and normal event processing can start.
Event collections are written to the data file using a Gaudi sequencer. A sequencer calls a series of algorithms, as discussed in section 5.5. The execution of these algorithms may terminate at any point of the series (and the event not selected for the collection) if one of the algorithms in the sequence fails to pass a filter.
The event data is accessed using a special n-tuple tag of the type
NTuple::Item<IOpaqueAddress*> m_evtAddress |
It is defined in the algorithm's header file in addition to any other ordinary n-tuple tags, as described in section 10.2.2.1. When booking the n-tuple, the address tag must be declared like any other tag, as shown in Listing 42. It is recommended to use the name "
Address
" for this tag.
1:
NTuplePtr nt(ntupleSvc(), "/NTUPLES/EvtColl/Collection");
1:
... Book N-tuple
2:
// Add an event address column
3:
StatusCode status = nt->addItem ("Address", m_evtAddress); |
The usage of this tag is identical to any other tag except that it only accepts variables of type
IOpaqueAddress
- the information necessary to retrieve the event data.
At fill time the address of the event must be supplied to the
Address
item. Otherwise the n-tuple may be written, but the information to retrieve the corresponding event data later will be lost. Listing 43 also demonstrates the setting of a filter to steer whether the event is written out to the event collection.
The event collection is written out by an
EvtCollectionStream
, which is the last member of the event collection
Sequencer
. Listing 44 (which is taken from the job options of
EvtCollection
example), shows how to set up such a sequence consisting of a user written
Selector
algorithm (which could for example contain the code in Listing 43), and of the
EvtCollectionStream
.
Reading event collections as the input for further event processing in Gaudi is transparent. The main change is the specification of the input data to the event selector:
1:
EventSelector.Input = {
2:
"COLLECTION='Collection' ADDRESS='Address' DATAFILE='MyEvtCollection.root' TYP='ROOT' SEL='(Ntrack>80)' FUN='EvtCollectionSelector'"
3:
}; |
These tags need some explanation:
Address
". Please use this default value when writing, conventions are useful!
(NTrack>200 AND Energy>200)
(NTrack>200 && Energy>200)
.
AND
' instead of '
&&
' as well as '
OR
' instead of '
||
'. Other SQL operators are not supported.n-tuples are of special interest to the end-user, because they can be accessed using commonly known tools such as PAW, ROOT or Java Analysis Studio (JAS). In the past it was not a particular strength of the software used in HEP to plug into many possible persistent data representations. Except for JAS, only proprietary data formats are understood. For this reason the choice of the output format of the data depends on the preferred analysis tool/viewer. In the following an overview is given over the possible data formats.
In the examples below the output of the
GaudiExample/NTuple.write
program was used.
This data format is used by PAW. PAW can understand this and only this data format. Files of this type can be converted to the ROOT format using the
h2root
data conversion program. The use of PAW in the long term is deprecated.
This data format is used by the interactive ROOT program. Using the helper library
TBlobShr
located in the package
DbCnv
it is possible to interactively analyse the n-tuples written in ROOT format. However, access is only possible to scalar items (
int
,
float
, ...) not to arrays.
Analysis is possible through directly plotting variables:
root [1] gSystem->Load("D:/mycmt/DbCnv/v3/Win32Debug/libTBlobShr");
root [2] TFile* f = new TFile("tuple.root");
root [3] TTree* t = (TTree*)f->Get("<local>_MC_ROW_WISE_2");
root [4] t->Draw("pz");
or using a ROOT macro interpreted by ROOT's C/C++ interpreter (see for example the code fragment
interactive.C
shown in Listing 47):
root [0] gSystem->Load("D:/mycmt/DbCnv/v3/Win32Debug/libTBlobShr");
root [1] .L ./v8/NTuples/interactive.C
root [2] interactive("./v8/NTuples/tuple.root");
More detailed explanations can found in the ROOT tutorials (http://root.cern.ch).
Open DataBase Connectivity (ODBC) developed by Microsoft allows to access a very wide range of relational databases using the same callable interface. A Gaudi interface to store and retrieve data from ODBC tables was developed and offers the entire range of MS Office applications to access these data. The small Visual Basic program in Listing 48 shows how to fill an Excel spreadsheet using n-tuple data from an Access database. Apparently access to ODBC compliant databases using ROOT is also possible, but this was not tested.