GAUDI User Guide

Chapter 6
Accessing data

6.1  Overview

The data stores are a key component in the application framework. All data which comes from persistent storage, or which is transferred between algorithms, or which is to be made persistent must reside within a data store. In this chapter we look at how to access data within the stores, and also at the DataObject base class and some classes related to it.

We also cover how to define your own data types and the steps necessary to save newly created objects to disk files. The writing of the converters necessary for the latter is covered in chapter .

6.2  Using the data stores

There are four data stores currently implemented within the Gaudi framework: the event data store, the detector data store, the histogram store and the n-tuple store. They are described in chapters , 8 , and 10 respectively. The stores themselves are no more than logical constructs with the actual access to the data being via the corresponding services. Both the event data service and the detector data service implement the same IDataProviderSvc interface, which can be used by algorithms to retrieve and store data. The histogram and n-tuple services implement a extended versions of this interface (IHistogramSvc, INTupleSvc) which offer methods for creating and manipulating histograms and n-tuples, in addition to the data access methods provided by the other two stores.

Only objects of a type derived from the DataObject base class may be placed directly within a data store. Within the store the objects are arranged in a tree structure, just like a Unix file system. As an example consider Figure 6 which shows a part of the LHCb transient data model. An object is identified by its position in the tree expressed as a string such as: "/Event", or "/Event/MC/MCParticles". In principle the structure of the tree, i.e. the set of all valid paths, may be deduced at run time by making repeated queries to the event data service, but this is unlikely to be useful in general since the structure will be largely fixed.

Figure 6 The structure of a part of the LHCb event data model.

 

As stated above all interactions between the data stores and algorithms should be via the IDataProviderSvc interface. The key methods for this interface are shown in Listing 7 but the API reference should be consulted for the complete version.

Listing 7 Some of the key methods of the IDataProviderSvc interface.

 
StatusCode findObject(const std::string& path, DataObject*& pObject);     
StatusCode findObject(DataObject* node, const std::string& path,          
                     DataObject*& pObject); 
StatusCode retrieveObject(const std::string& path, DataObject*& pObject); 
StatusCode retrieveObject(DataObject* node, const std::string& path,      
                         DataObject*& pObject); 
 
StatusCode registerObject(const std::string path, DataObject*& pObject);  
StatusCode registerObject(DataObject *node, DataObject*& pObject);       

The first four methods are for retrieving a pointer to an object that is already in the store. How the object got into the store, whether it has been read in from a persistent store or added to the store by an algorithm, is irrelevant.

The find and retrieve methods come in two versions: One version uses a full path name as an object identifier, the other takes a pointer to a previously retrieved object and the name of the object to look for below that node in the tree.

Additionally the "find" and "retrieve" methods differ in one important respect: the "find" method will look in the store to see if the object is present (i.e. in memory) and if it is not will return a null pointer. The "retrieve" method, however, will attempt to load the object from a persistent store (database or file) if it is not found in memory. Only if it is not found in the persistent data store will the method return a null pointer (and a bad status code of course).

6.3  Using data objects

Whatever the concrete type of the object you have retrieved from the store the pointer which you have is a pointer to a DataObject, so before you can do anything useful with that object you must cast it to the correct type, for example:

 

 
DataObject *pObject; 
StatusCode sc = eventDataService()->retrieveObject("/Event/MC/MCParticles",                                                 pObject); 
if( sc.isFailure() )  
  return sc; 
 
MCParticleVector *tv = 0; 
try { 
  tv = dynamic_cast<MCParticleVector *> (pObject); 
} catch(...) { 
  // Print out an error message and exit 
} 
// tv may now be manipulated.

where after the dynamic cast all of the methods of the MCParticleVector class become available. In the event that the object which is returned from the store does not match the type to which you try to cast it, an exception will be thrown. If you do not catch this exception then your program will exit, probably with an obscure message.

As mentioned earlier a certain amount of run-time investigation may be done into what data is available in the store. For example, suppose that we have various sets of testbeam data and each data set was taken with a different number of detectors. If the raw data is saved on a per-detector basis the number of sets will vary. The following code fragment in Listing 8 illustrates how an algorithm may loop over the data sets without knowing a priori how many there are.

 

Listing 8 Code fragment for accessing an object from the store

1: std::string objectPath = "Event/RawData";
2: DataObject* pObject;
3: StatusCode sc;
4:
5: sc = eventDataService()->retrieveObject(objectPath, pObject);
6:
7: IdataDirectory *dir = pObject->directory();
8: IdataDirectory::DirIterator it;
9: for(it = dir->begin(); it != dir->end(); it++) {
10:
11:   DataObject *pDo;
12:   sc = retrieveObject(pObject, (*it)->localPath(), pDo);
13:
14:   // Do something with pDo
15: }

The last two methods shown in Listing 7 are for registering objects into the store. Suppose that an algorithm creates objects of type UDO from, say, objects of type MCParticle and wishes to place these into the store for use by other algorithms. Code to do this might look something like:

 

Listing 9 Registering of objects into the event data store

1: UDO* pO; // Pointer to an object of type UDO (derived from DataObject)
2: StatusCode sc;
3:
4: pO = new UDO;
5: sc = eventDataService()->registerObject("/Event/Recon/tmp","OK", pO);
6:
7: // THE NEXT LINE IS AN ERROR, THE OBJECT NOW BELONGS TO THE STORE
8: delete pO;
9:
10: UDO autopO;
11: // ERROR: AUTOMATIC OBJECTS MAY NOT BE REGISTERED
12: sc = eventDataService()->registerObject("/Event/Recon/tmp", "notOK", autopO);

Once an object is registered into the store, the algorithm which created it relinquishes ownership. In other words the object should not be deleted. This is also true for objects which are contained within other objects, such as those derived from or instantiated from the ObjectVector class (see the following section). Furthermore objects which are to be registered into the store must be created on the heap, i.e. they must be created with the new operator.

6.4  Object containers

As mentioned before, all objects which can be placed directly within one of the stores must be derived from the DataObject class. There is, however, another (indirect) way to store objects within a store. This is by putting a set of objects (themselves not derived from DataObject and thus not directly storable) into an object which is derived from DataObject and which may thus be registered into a store.

An object container base class is implemented within the framework and a number of templated object container classes may be implemented in the future. For the moment, two "concrete" container classes are implemented: ObjectVector<T> and ObjectList<T> . These classes are based upon the STL classes and provide mostly the same interface. Unlike the STL containers which are essentially designed to hold objects, the container classes within the framework contain only pointers to objects, thus avoiding a lot of memory to memory copying.

A further difference with the STL containers is that the type T cannot be anything you like. It must be a type derived from the ContainedObject base class, see Figure 7 . In this way all "contained" objects have a pointer back to their containing object. This is required, in particular, by the converters for dealing with links between objects. A ramification of this is that container objects may not contain other container objects (without the use of multiple inheritance).

Figure 7 The relationship between the DataObject, ObjectVector and ContainedObject classes.

 

As mentioned above, objects which are contained within one of these container objects may not be located, or registered, individually within the store. Only the container object may be located via a call to findObject() or retrieveObject(). Thus with regard to interaction with the data stores a container object and the objects that it contains behave as a single object.

The intention is that "small" objects such as clusters, hits, tracks, etc. are derived from the ContainedObject base class and that in general algorithms will take object containers as their input data and produce new object containers of a different type as their output.

The reason behind this is essentially one of optimization. If all objects were treated on an equal footing, then there would be many more accesses to the persistent store to retrieve very small objects. By grouping objects together like this we are able to have fewer accesses, with each access retrieving bigger objects.

There is an example of the use of one of these containers in the SimpleAlgorithm class of the example application, a fragment of that code is reproduced in Listing 10 (overleaf).

6.5  Using object containers

The code fragment below shows the creation of an object container. This container can contain pointers to objects of type MCTrackingHit and only to objects of this type (including derived types). An object of the required type is created on the heap (i.e. via a call to new) and is added to the container with the standard STL call.

 

 
ObjectVector <MCTrackingHit>  hitContainer; 
MCTrackingHit*   h1 = new MCTrackingHit; 
hitContainer.push_back(h1);

After the call to push_back() the hit object "belongs" to the container. If the container is registered into the store, the hits that it contains will go with it. Note in particular that if you delete the container you will also delete its contents, i.e. all of the objects pointed to by the pointers in the container.

Removing an object from a container may be done in two semantically different ways. The difference being whether on removal from a container the object is also deleted or not. Removal with deletion may be achieved in several ways (following previous code fragment):

 

 
hitContainer.pop_back(); 
hitContainer.erase( end() ); 
delete h1;

The method pop_back() removes the last element in the container, whereas erase() maybe used to remove any other element via an iterator. In the code fragment above it is used to remove the last element also.

Deleting a contained object, the third option above, will automatically trigger its removal from the container. This is done by the destructor of the ContainedObject base class.

If you wish to remove an object from the container without destroying it (the second possible semantic) use the release() method:

 

 
hitContainer.release(h1);

Since the fate of a contained object is so closely tied to that of its container life would become more complex if objects could belong to more than one container. Suppose that an object belonged to two containers, one of which was deleted. Should the object be deleted and removed from the second container, or not deleted? To avoid such issues an object is allowed to belong to a single container only.

If you wish to move an object from one container to another, you must first remove it from one and then add to the other. However, the first operation is done implicitly for you when you try to add an object to a second container:

 

 
container1.push_back(h1); // Add to fist container 
 
container2.push_back(h1); // Move to second container 
                          // Internally invokes release().

Since the object h1 has a link back to its container, the push_back() method is able to first follow this link and invoke the release() method to remove the object from the first container, before adding it into the second.

In general your first exposure to object containers is likely to be when retrieving data from the event data store. The sample code in Listing 10 is from the SimpleAlgorithm class of the example. It shows how once you have retrieved an object container from the store you may iterate over its contents, just as with an STL vector. Note that the typedef is simply to save typing!

 

Listing 10 Use of the ObjectVector templated class.

1: typedef ObjectVector MCParticles;
2: MCParticles *tracks;
3: MCParticles::iterator it;
4:
5: for( it = tracks->begin(); it != tracks->end(); it++ ) {
6:    // Get the energy of the track and histogram it
7:   double energy = (*it)->fourMomentum().e();
8:   m_hEnergyDist->fill( energy, 1. );
9: }

The variable tracks is set to point to an object in the event data store of type: ObjectVector<MCParticle> with a dynamic cast (not shown above). An iterator (i.e. a pointer like object for looping over the contents of the container) is defined on line 3 and this is used within the loop to point consecutively to each of the contained objects. In this case the objects contained within the ObjectVector are of type "pointer to MCParticle ". The iterator returns each object in turn and in the example, the energy of the object is used to fill a histogram.

6.6  Data access checklist

A little reminder:

6.7  Defining new data types

Most of the data types which will be used within Gaudi will be used by everybody and thus packaged and documented centrally. However, for your own private development work you may wish to create objects of your own types which of course you can always do with C++ (or Java) . However, if you wish to place these objects within a store, either so as to pass them between algorithms or to have them later saved into a database or file, then you must derive your type from either the DataObject or ContainedObject base class.

Consider the example below:

 

 
const static CLID CLID_UDO = 135; // Collaboration wide Unique number   
 
class UDO : public DataObject { 
public:  
  UDO() : DataObject(), m_n(0) { 
  } 
 
  static CLID& classID() { return CLID_UDO; } 
  virtual CLID& clID() { return CLID_UDO; } 
 
  int n(){ return m_n; } 
  void setN(int n){ m_n = n; } 
 
private: 
  int m_n; 
}

This defines a class UDO which since it derives from DataObject may be registered into, say, the event data store. (The class itself is not very useful as its sole attribute is a single integer and it has no behaviour).

The thing to note here is that if the appropriate converter is supplied, as discussed in , then this class may also be saved into a persistent store (e.g. a ROOT file or an Objectivity database) and read back at a later date. In order for the persistency to work two things are required: the unique class identifier number (CLID_UDO in the example), and the clID() method which returns this identifier.

Types which are derived from ContainedObject are implemented in the same way. The only point to notice is that it is the classID() method which must be implemented. This is because contained objects may only reside in the store when they belong to a container, e.g. an ObjectVector<T> which is registered into the store. The class identifier of a concrete object container class is calculated (at run time) from the type of the objects which it contains. Since the container may be empty a static method is required.

6.8  Smart references and Smart reference vectors

Smart references and Smart reference vectors should be used while referencing objects in the transient data store. They provide safe data access and automate loading of referenced data on demand. Imagine situation when MC particles are already loaded, but MC vertices don't, and an algorithm dereferences a variable pointing to the origin vertex. If smart reference is used, MC vertices will be loaded automaticly and only after that the variable would be dereferenced. If C++ plain pointer would be used instead, the program would crash.

Smart references and Smart reference vectors are declared inside a class as:

 

 
private: 
  /// Smart reference to origin vertex 
	  SmartRef<MCVertex>        m_originMCVertex; 
  /// Vector of smart references to decay vertices 
	  SmartRefVector<MCVertex>  m_decayMCVertices;

Syntax of usage of smart references is identical to plain C++ pointers:

 

 
SmartDataPtr<MCParticleVector> particles( eventDataService(), 
                                          EventModel::MC::Particles ); 
MCParticleVector::const_iterator iter; 
 
	for( iter = particles->begin(); iter != particles->end(); iter++ ) { 
		  MCVertex* originVtx = (*iter)->originMCVertex(); 
  if( 0 != originVtx ) { 
    std::cout << "Origin vertex = " << *(*iter) << std::endl; 
  } 
	}

All LHCbEvent data types use the Smart references and Smart reference vectors to reference themselves.

6.9  Saving data to a persistent store

Suppose that you have defined your own data type as discussed in the previous section. Suppose futhermore that you have an algorithm which uses, say, SicB data to create instances of your object type which you then register into the transient event store. How can you save these objects for use at a later date?

You must do the following:

Register your object in the store us usual, typically in the execute() method of your algorithm.

 
// myAlg implementation file  
 
StatusCode myAlg::execute() { 
  // Create a UDO object and register it into the event data store             
  UDO* p = new UDO(); 
  eventDataService->registerObject("/Event/myStuff/myUDO", p); 
}

In order to actually trigger the conversion and saving of the objects at the end of the current event processing it is necessary to inform the application manager. This requires some options to be specified in the job options file:

 
ApplicationMgr.OutStream  = { "DstWriter" }; 
 
DstWriter.ItemList         = { "/Event#1", "/Event/MyTracks#1"}; 
DstWriter.EvtDataSvc       = "EventDataSvc"; 
DstWriter.EvtConversionSvc = "RootEventCnvSvc"; 
DstWriter.OutputFile       = "result.root";

The first option tells the application manager that you wish to create an output stream called "DSTWriter". You may create as many output streams as you like and give them whatever name you prefer.

For each output stream object which you create you must set several properties. The ItemList option specifies the list of paths to the objects which you wish to write to this output stream. The number after the "#" symbol denotes the number of directory levels below the specified path which should be traversed. The EvtDataSvc option specifies in which transient data service the output stream should search for the objects in the ItemList. The EvtConversionSvc option specifies the conversion service which should be used to convert the objects and the output file or database name is set with the OutputFile option.

In addition to this the event persistency service must be set up correctly by specifying all necessary conversion services in the job options as shown in the much more comprehensive example distributed with the framework (Rio.Example1).

6.10  The SmartDataPtr/SmartDataLocator utilities

The usage of the data services is simple, but due to extensive status checking and other things tends to make the code difficult to read. A possibility to access data items in the store just like accesing objects with a C++ pointer would be much more convenient. Clearly the C++ language does not support these kind of constructs intrinsically. However, using smart pointers the internals of the data services could be shielded.

6.10.1  Using SmartDataPtr/SmartDataLocator objects

The SmartDataPtr and a SmartDataLocator are smart pointers that differ by the access to the data store: whereas the SmartDataPtr first checks the store if the object is present and loads the object, the locator only checks for the presence of the object but does not attempt to load.

Both objects are responsible to deliver the object in the requested type to the user. They both use the data service to get hold of the requested object. Since both objects have similar behaviour and the same user interface in the following only the SmartDataPtr is discussed. An example use of the SmartDataPtr class is shown below.

Listing 11 Use of a SmartDataPtr object.

1: StatusCode myAlgo::execute() {
2:   MsgStream log(messageService(), name());
3:   SmartDataPtr evt(eventDataService(),"/Event");
4:   if ( evt ) {
5:     // Print a two line message in case of failure.
6:     log << MSG::ERROR << " Run:" << evt->run()
7:         << " Event:" << evt->event() << endreq;
8:     // Get hold of MonteCarlo particles
9:     SmartDataPtr particles(evt,"MC/MPParticles");
10:     if ( particles ) {
11:       for(MCParticleVector::iterator i = particles->begin();
12:           i != particles->end(); i++ ) {
13:             log << "Px:" << (*i)->fourMomentum.px() << endreq;
14:       }
15:       return StatusCode::SUCCESS;
16:     }
17:   }
18:   log << MSG::ERROR << "Error accessing event" << endreq;
19:   return StatusCode::FAILURE;
20: }

When using the SmartDataPtr class just think of it as a normal C++ pointer having a constructor.

The SmartDataPtr and SmartDataLocator offer a number of possible constructors and operators to cover a wide range of needs when accesing data stores. Check the online reference documentation [2] for up-to date information concerning the interface of these utilities.