Chapter 10 N-tuple and Event Collection facilities

10.2 N-tuples and the N-tuple Service

User data - so called n-tuples - are very similar to event data. Of course, the scope may be different: a row of an n-tuple may correspond to a track, an event or complete runs. Nevertheless, user data must be accessible by interactive tools such as PAW or ROOT.

Gaudi n-tuples allow to freely format structures. Later, during the running phase of the program, data are accumulated and written to disk.

The transient image of an n-tuple is stored in a Gaudi data store which is connected to the n-tuple service. Its purpose is to store user created objects that have a lifetime of more than a single event.

As with the other data stores, all access to data is via a service interface. In this case it is via the INTupleSvc interface which extends the IDataProviderSvc interface. In addition the interface to the n-tuple service provides methods for creating n-tuples, saving the current row of an n-tuple or retrieving n-tuples from a file. The n-tuples are derived from DataObject in order to be storable, and are stored in the same type of tree structure as the event data. This inheritance allows to load and locate n-tuples on the store with the same smart pointer mechanism as is available for event data items (c.f. Chapter 6).

10.2.1 Access to the N-tuple Service from an Algorithm.

The Algorithm base class defines a member function

INTupleSvc* ntupleSvc()

which returns a pointer to the INTupleSvc interface.

The n-tuple service provides methods for the creation and manipulation of n-tuples and the location of n-tuples within the persistent store.

The top level directory of the n-tuple transient data store is called "/NTUPLES". The next directory layer is connected to the different output streams: e.g. "/NTUPLES/FILE1", where FILE1 is the logical name of the requested output file for a given stream. There can be several output streams connected to the service. In case of persistency using HBOOK, "FILE1" corresponds to the top level RZ directory of the file (...the name given to HROPEN). From then on the tree structure is reflected with normal RZ directories (caveat: HBOOK only accepts directory names with less than 8 characters! It is recommended to keep directory names to less than 8 characters even when using another technology (e.g. ROOT) for persistency, to make the code independent of the persistency choice.).

10.2.2 Using the N-tuple Service.

When defining an n-tuple the following steps must be performed:

The n-tuple tags must be defined.
The n-tuple must be booked and the tags must be declared to the n-tuple.
The n-tuple entries have to be filled.
The filled row of the n-tuple must be committed.
Persistent aspects are steered by the job options.

In the following an attempt is made to explain the different steps. Please note that when using HBOOK for persistency, the n-tuple number must be unique and, in particular, that it must be different from any histogram number. This is a limitation imposed by HBOOK. It is recommended to keep this number unique even when using another technology (e.g. ROOT) for persistency, to make the code independent of the persistency choice.

10.2.2.1 Defining N-tuple tags

When creating an n-tuple it is necessary to first define the tags to be filled in the n-tuple. Typically the tags belong to the filling algorithm and hence should be provided in the Algorithm's header file. Currently the following data types are supported: bool, long, float and double. double types (Fortran REAL*8) need special attention if using HBOOK for persistency: the n-tuple structure must be defined in a way that aligns double types to 8 byte boundaries, otherwise HBOOK will complain. In addition PAW cannot understand double types. Listing 38 illustrates how to define n-tuple tags:

Listing 38 Definition of n-tuple tags from the Ntuples.WriteAlg.h example header file.
1: `NTuple::Item<long> m_ntrk; // A scalar item (number)` 2: `NTuple::Array<bool> m_flag; // Vector items` 3: `NTuple::Array<long> m_index;` 4: `NTuple::Array<float> m_px, m_py, m_pz;` 5: `NTuple::Matrix<long> m_hits; // Two dimensional tag`

10.2.2.2 Booking and Declaring Tags to the N-tuple

When booking the n-tuple, the previously defined tags must be declared to the n-tuple. Before booking, the proper output stream (file) must be accessed. The target directory is defined automatically.

Listing 39 Creation of an n-tuple in a specified directory and file.
1: `// Access the output file` 2: `NTupleFilePtr file1(ntupleSvc(), "/NTUPLES/FILE1");` 3: `if ( file1 ) {` 4: `// First: A column wise N tuple` 5: `NTuplePtr nt(ntupleSvc(), "/NTUPLES/FILE1/MC/1");` 6: `if ( !nt ) { // Check if already booked` 7: `nt=ntupleSvc()->book("/NTUPLES/FILE1/MC",1,CLID_ColumnWiseTuple, "Hello World");` 8: `if ( nt ) {` 9: `// Add an index column` 10: `status = nt->addItem ("Ntrack", m_ntrk, 0, 5000 );` 11: `// Add a variable size column of type float (length=length of index col)` 12: `status = nt->addItem ("px", m_ntrk, m_px);` 13: `status = nt->addItem ("py", m_ntrk, m_py);` 14: `status = nt->addItem ("pz", m_ntrk, m_pz);` 15: `// Another one, but this time of type bool` 16: `status = nt->addItem ("flg",m_ntrk, m_flag);` 17: `// Another one, type integer, numerical numbers must be within [0, 5000]` 18: `status = nt->addItem ("idx",m_ntrk, m_index, 0, 5000 );` 19: `// Add 2-dim column: [0:m_ntrk][0:2]; numerical numbers within [0, 8]` 20: `status = nt->addItem ("hit",m_ntrk, m_hits, 2, 0, 8 );` 21: `}` 22: `else { // did not manage to book the N tuple....` 23: `return StatusCode::FAILURE;` 24: `}` 25: `}`

Tags which are not declared to the n-tuple are invalid and will cause an access violation at run-time.

10.2.2.3 Filling the N-tuple

Tags are usable just like normal data items, where

Items<TYPE> are the equivalent of numbers: bool, long, float.
Array<TYPE> are equivalent to 1 dimensional arrays: bool[size], long[size], float[size]
Matrix<TYPE> are equivalent to an array of arrays or matrix: bool[dim1][dim2].

There is no implicit bounds checking possible without a rather big overhead at run-time. Hence it is up to the user to ensure the arrays do not overflow.

When all entries are filled, the row must be committed, i.e. the record of the n-tuple must be written.

Listing 40 Filling an n-tuple.
1: `m_ntrk = 0;` 2: `for( MCParticleVector::iterator i = mctracks->begin(); i != mctracks->end(); i++ ) {` 3: `const HepLorentzVector& mom4 = (*i)->fourMomentum();` 4: `m_px[m_ntrk] = mom4.px();` 5: `m_py[m_ntrk] = mom4.py();` 6: `m_pz[m_ntrk] = mom4.pz();` 7: `m_index[m_ntrk] = cnt;` 8: `m_flag[m_ntrk] = (m_ntrk%2 == 0) ? true : false;` 9: `m_hits[m_ntrk][0] = 0;` 10: `m_hits[m_ntrk][1] = 1;` 11: `m_ntrk++;` 12: `// Make sure the array(s) do not overflow.` 13: `if ( m_ntrk > m_ntrk->range().distance() ) break;` 14: `}` 15: `// Commit N tuple row.` 16: `status = ntupleSvc()->writeRecord("/NTUPLES/FILE1/MC/1");` 17: `if ( !status.isSuccess() ) {` 18: `log << MSG::ERROR << "Cannot fill id 1" << endreq;` 19: `}` 20: `}`

10.2.2.4 Reading N-tuples

Although n-tuples intended for interactive analysis, they can also be read by a regular program. An example of reading back such an n-tuple is given in Listing 41. Notice line 8, where an example is given of preselecting rows of the n-tuple according to given criteria. This option is only possible if supported by the underlying database used to make the n-tuple persistent. Currently it is possible to preselect rows from n-tuples written in ROOT format and from relational databases using ODBC. Note that the syntax of the query is also affected by the underlying technology: while an ODBC database will accept any SQL query, the ROOT implentation understands only the "And" and "Or" SQL operators - but it does understand the full C++ syntax (an example is given in section 10.3.2).

Listing 41 Reading an n-tuple.
1: `NTuplePtr nt(ntupleSvc(), "/NTUPLES/FILE1/ROW_WISE/2");` 2: `if ( nt ) {` 3: `long count = 0;` 4: `NTuple::Item<float> px, py, pz;` 5: `status = nt->item("px", px);` 6: `status = nt->item("py", py);` 7: `status = nt->item("pz", pz);` 8: `nt->attachSelector(new SelectStatement("pz>0 And px>0"));` 9: `// Access the N tuple row by row and print the first 10 tracks` 10: `while ( ntupleSvc()->readRecord(nt.ptr()).isSuccess() ) {` 11: `log << MSG::INFO << " Entry [" << count++ << "]:";` 12: `log << " Px=" << px << " Py=" << py << " Pz=" << pz` 13: `<< endreq;` 14: `}` 15: `}`

10.2.3 N-tuple Persistency

10.2.3.1 Input and Output File Specification

Conversion services exist to convert n-tuple objects into a form suitable for persistent storage in a number of storage technologies. In order to use this facility it is necessary to add the following line in the job options file:

NTupleSvc.Output = {"FILE1 DATAFILE='tuples.hbook' TYP='HBOOK' OPT='NEW'",                   "FILE2 ...",                    ...                   "FILEN ..."};

where <tuples.hbook> should be replaced by the name of the file to which you wish to write the n-tuple. FILE1 is the logical name of the output file - it could be any other string. A similar option NTupleSvc.Input exists for n-tuple input.

The following is a complete overview of all possible options:

DATAFILE='<file-specs>'
Specifies the datafile (file name) of the output stream.
TYP='<typ-spec>'
Specifies the type of the output stream. Currently supported types are:

HBOOK: Write in HBOOK RZ format.
ROOT: Write as a ROOT tree.
MS Access: Write as a Microsoft Access database.

There is also weak support for the following database types:

SQL Server
MySQL
Oracle ODBC

These database technologies are supported through their ODBC interface. They were tested privately on local installations. However all these types need special setup to grant access to the database.

Except for the HBOOK data format, you need to specify the use of the DbCnv package in your CMT requirements file and to load explicitly in the job options the DLLs DbConverters and either RootDb or OdbcDb . These DLLs contain the specific database access drivers implementations.

OPT='<opt-spec>'

NEW, CREATE, WRITE: Create a new data file. Not all implementations allow to over-write existing files.
OLD, READ: Access an existing file for read purposes
UPDATE: Open an existing file and add records. It is not possible to update already existing records.

SVC='<service-spec>' (optional)

Connect this file directly to an existing conversion service. This option however needs special care. It should only be used to replace default services.

AUTHENTICATION='<authentication-specs>' (optional)

For protected datafiles (e.g. Microsoft Access) it can happen that the file is password protected. In this case the authentication string allows to connect to these databases. The connection string in this case is the string that must be passed to ODBC, for example: AUTH='SERVER=server_host;UID=user_name;PWD=my_password;'

All other options are passed without any interpretation directly to the conversion service responsible to handle the specified output file.

For all options at most three leading characters are significant: DATAFILE=<...>, DATABASE=<...> or simply DATA=<...> would lead to the same result.

The handling of row wise n-tuples does not differ. However, only individual items (class NTuple::Item) can be filled, no arrays and no matrices. Since the persistent representation of row wise n-tuples in HBOOK is done by floats only, the first row of each row wise n-tuple contains the type information - when looking at a row wise n-tuple with PAW make sure to start at the second event!

10.3 Event Collections

Event collections or, to be more precise, event tag collections, are used to minimize data access by performing preselections based on small amounts of data. Event tag data contain flexible event classification information according to the physics needs. This information could either be stored as flags indicating that the particular event has passed some preselection criteria, or as a small set of parameters which describe basic attributes of the event. Fast access is required for this type of event data.

Event tag collections can exist in several versions:

Collections recorded during event processing stages from the online, reconstruction, reprocessing etc.
Event collections defined by analysis groups with pre-computed items of special interest to a given group.
Private user defined event collections.

Starting from this definition an event tag collection can be interpreted as an n-tuple which allows to access the data used to create the n-tuple. Using this approach any n-tuple which allows access to the data is an event collection.

Event collections allow pre-selections of event data. These pre-selections depend on the underlying storage technology.

First stage pre-selections based on scalar components of the event collection. First stage preselection is not necessarily executed on your computer but on a database server e.g. when using ORACLE. Only the accessed columns are read from the event collection. If the criteria are fulfilled, the n-tuple data are returned to the user process. Preselection criteria are set through a job options, as described in section 10.3.2.

The second stage pre-selection is triggered for all items which passed the first stage pre-selection criteria. For this pre-selection, which is performed on the client computer, all data in the n-tuple can be used. The further preselection is implemented in a user defined function object (functor) as described in section 10.3.2. Gaudi algorithms are called only when this pre-selector also accepts the event, and normal event processing can start.

10.3.1 Writing Event Collections

Event collections are written to the data file using a Gaudi sequencer. A sequencer calls a series of algorithms, as discussed in section 5.5. The execution of these algorithms may terminate at any point of the series (and the event not selected for the collection) if one of the algorithms in the sequence fails to pass a filter.

10.3.1.1 Defining the Address Tag

The event data is accessed using a special n-tuple tag of the type

NTuple::Item<IOpaqueAddress*> m_evtAddress

It is defined in the algorithm's header file in addition to any other ordinary n-tuple tags, as described in section 10.2.2.1. When booking the n-tuple, the address tag must be declared like any other tag, as shown in Listing 42. It is recommended to use the name "Address " for this tag.

Listing 42 Connecting an address tag to an n-tuple.
1: `NTuplePtr nt(ntupleSvc(), "/NTUPLES/EvtColl/Collection");` 1: `... Book N-tuple` 2: `// Add an event address column` 3: `StatusCode status = nt->addItem ("Address", m_evtAddress);`

The usage of this tag is identical to any other tag except that it only accepts variables of type IOpaqueAddress - the information necessary to retrieve the event data.

10.3.1.2 Filling the Event Collection

At fill time the address of the event must be supplied to the Address item. Otherwise the n-tuple may be written, but the information to retrieve the corresponding event data later will be lost. Listing 43 also demonstrates the setting of a filter to steer whether the event is written out to the event collection.

Listing 43 Fill the address tag of an n-tuple at execution time:
1: `SmartDataPtr<Event> evt(eventSvc(),"/Event");` 2: `if ( evt ) {` 3: `... Some data analysis deciding wether to keep the event or not` 4: `// keep_event=true if event should be written to event collection` 5: `setFilterPassed( keep_event );` 6: `m_evtAddrColl = evt->address();` 7: `}`

10.3.1.3 Writing out the Event Collection

The event collection is written out by an EvtCollectionStream , which is the last member of the event collection Sequencer . Listing 44 (which is taken from the job options of EvtCollection example), shows how to set up such a sequence consisting of a user written Selector algorithm (which could for example contain the code in Listing 43), and of the EvtCollectionStream .

Listing 44 Job options for writing out an event collection
1: `ApplicationMgr.OutStream = { "Sequencer/EvtCollection" };` 2: 3: `EvtCollection.Members = { "EvtCollectionWrite/Selector", "EvtCollectionStream/Writer"};` 4: `Writer.ItemList = { "/NTUPLES/EvtColl/Collection" };` 5: `NTupleSvc.Output = { "EvtColl DATAFILE='MyEvtCollection.root' OPT='NEW' TYP='ROOT'" };`

10.3.2 Reading Events using Event Collections

Reading event collections as the input for further event processing in Gaudi is transparent. The main change is the specification of the input data to the event selector:

Listing 45 Connecting an address tag to an n-tuple.
1: `EventSelector.Input = {` 2: `"COLLECTION='Collection' ADDRESS='Address' DATAFILE='MyEvtCollection.root' TYP='ROOT' SEL='(Ntrack>80)' FUN='EvtCollectionSelector'"` 3: `};`

These tags need some explanation:

COLLECTION
Specifies the sub-path of the n-tuple used to write the collection. If the n-tuple which was written was called e.g. "/NTUPLES/FILE1/Collection", the value of this tag must be "Collection".
ADDRESS (optional)
Specifies the name of the n-tuple tag which was used to store the opaque address to be used to retrieve the event data later. This is an optional tag, the default value is "Address ". Please use this default value when writing, conventions are useful!
SEL (optional):
Specifies the selection string used for the first stage pre-selection. The syntax depends on the database implementaion; it can be:

SQL like, if the event collection was written using ODBC.
Example: (NTrack>200 AND Energy>200)
C++ like, if the event collection was written using ROOT.
Example: (NTrack>200 && Energy>200) .
Note that event collections written with ROOT also accept the SQL operators 'AND ' instead of '&& ' as well as 'OR ' instead of '|| '. Other SQL operators are not supported.

FUN (optional)
Specifies the name of a function object used for the second-stage preselection. An example of a such an function object is shown in Listing 46. Please note that the factory declaration at line 16 is mandatory in order to allow Gaudi to instantiate the function object.
The DATAFILE and TYP tags, as well as additional optional tags, have the same meaning and syntax as for n-tuples, as described in section 10.2.3.1.

Listing 46 Example of a function object for second stage pre-selections.
1: `class EvtCollectionSelector : public NTuple::Selector {` 2: `NTuple::Item<long> m_ntrack;` 3: `public:` 4: `EvtCollectionSelector(IInterface* svc) : NTuple::Selector(svc) { }` 5: `virtual ~EvtCollectionSelector() { }` 6: `/// Initialization` 7: `virtual StatusCode initialize(NTuple::Tuple* nt) {` 8: `return nt->item("Ntrack", m_ntrack);` 9: `}` 10: `/// Specialized callback for NTuples` 11: `virtual bool operator()(NTuple::Tuple* nt) {` 12: `return m_ntrack>cut;` 13: `}` 14: `};` 15: 16: `ObjectFactory<EvtCollectionSelector> EvtCollectionSelectorFactory`

10.4 Interactive Analysis using N-tuples

n-tuples are of special interest to the end-user, because they can be accessed using commonly known tools such as PAW, ROOT or Java Analysis Studio (JAS). In the past it was not a particular strength of the software used in HEP to plug into many possible persistent data representations. Except for JAS, only proprietary data formats are understood. For this reason the choice of the output format of the data depends on the preferred analysis tool/viewer. In the following an overview is given over the possible data formats.

In the examples below the output of the GaudiExample/NTuple.write program was used.

10.4.1 HBOOK

This data format is used by PAW. PAW can understand this and only this data format. Files of this type can be converted to the ROOT format using the h2root data conversion program. The use of PAW in the long term is deprecated.

10.4.2 ROOT

This data format is used by the interactive ROOT program. Using the helper library TBlobShr located in the package DbCnv it is possible to interactively analyse the n-tuples written in ROOT format. However, access is only possible to scalar items (int , float , ...) not to arrays.

Analysis is possible through directly plotting variables:

root [1] gSystem->Load("D:/mycmt/DbCnv/v3/Win32Debug/libTBlobShr"); root [2] TFile* f = new TFile("tuple.root"); root [3] TTree* t = (TTree*)f->Get("<local>_MC_ROW_WISE_2"); root [4] t->Draw("pz");

or using a ROOT macro interpreted by ROOT's C/C++ interpreter (see for example the code fragment interactive.C shown in Listing 47):

root [0] gSystem->Load("D:/mycmt/DbCnv/v3/Win32Debug/libTBlobShr"); root [1] .L ./v8/NTuples/interactive.C root [2] interactive("./v8/NTuples/tuple.root");

More detailed explanations can found in the ROOT tutorials (http://root.cern.ch).

Listing 47 Interactive analysis of ROOT n-tuples: interactive.C
1: `void interactive(const char* fname) {` 2: `TFile input = new TFile(fname);` 3: `TTree tree = (TTree)input->Get("<local>_MC_ROW_WISE_2");` 4: `if ( 0 == tree ) {` 5: `printf("Cannot find the requested tree in the root file!\n");` 6: `return;` 7: `}` 8: `Int_t ID, OBJSIZE, NUMLINK, NUMSYMB;` 9: `TBlob BUFFER = 0;` 10: `Float_t px, py, pz;` 11: `tree->SetBranchAddress("ID",&ID);` 12: `tree->SetBranchAddress("OBJSIZE",&OBJSIZE);` 13: `tree->SetBranchAddress("NUMLINK",&NUMLINK);` 14: `tree->SetBranchAddress("NUMSYMB",&NUMSYMB);` 15: `tree->SetBranchAddress("BUFFER", &BUFFER);` 16: `tree->SetBranchAddress("px",&px);` 17: `tree->SetBranchAddress("py",&py);` 18: `tree->SetBranchAddress("pz",&pz);` 19: `Int_t nbytes = 0;` 20: `for (Int_t i = 0, nentries = tree->GetEntries(); i<nentries;i++) {` 21: `nbytes += tree->GetEntry(i);` 22: `printf("Trk#=%d PX=%f PY=%f PZ=%f\n",i,px,py,pz);` 23: `}` 24: `printf("I have read a total of %d Bytes.\n", nbytes);` 25: `delete input;` 26: `}`

10.4.3 ODBC

Open DataBase Connectivity (ODBC) developed by Microsoft allows to access a very wide range of relational databases using the same callable interface. A Gaudi interface to store and retrieve data from ODBC tables was developed and offers the entire range of MS Office applications to access these data. The small Visual Basic program in Listing 48 shows how to fill an Excel spreadsheet using n-tuple data from an Access database. Apparently access to ODBC compliant databases using ROOT is also possible, but this was not tested.

Listing 48 Feed n-tuple data from MS Access into an Excel spreadsheet using Visual Basic:
27: `Sub FillSpreadSheet()` 28: `Dim dbs As Database, rst As Recordset` 29: `Dim sqlString as String` 30: `Const conPath = "D:\mycmt\GaudiExamples\v8\Visual\tuple.mdb"` 31: `the_sheet_name = ActiveSheet.Name` 32: `Sheets(the_sheet_name).Select` 33: `If (IsEmpty(Selection)) Then` 34: `GoTo Done` 35: `End If` 36: `' Open database and return reference to Database object.` 37: `Set dbs = DBEngine.Workspaces(0).OpenDatabase(conPath)` 38: `sqlString = "SELECT px, py, pz FROM lclMCRWWS2 ORDER BY ID;"` 39: `' Open dynaset-type recordset.` 40: `Set rst = dbs.OpenRecordset(sqlString, dbOpenDynaset)` 41: `bin = 1` 42: `Do Until rst.EOF` 43: `Cells(bin, "A:A") = rst!px` 44: `Cells(bin, "B:B") = rst!py` 45: `Cells(bin, "C:C") = rst!pz` 46: `bin = bin + 1` 47: `rst.MoveNext` 48: `Loop` 49: `rst.Close ' Close recordset and database.` 50: `dbs.Close` 51: `Done:` 52: `End Sub`

Chapter 10
N-tuple and Event Collection facilities

10.1 Overview