GAUDI User Guide

Chapter 2
The framework architecture

2.1  Overview

In this chapter we would like to briefly re-visit some of those issues addressed in the Architecture Design Document which are of particular interest to physicists wishing to use the framework. We also try to define a few of the words that are thrown around in the rest of the document.

A (more) complete view of the architecture, along with a discussion of the main design choices and the reasons for these choices may be found in reference [4] .

 

 

2.2  Why architecture?

The basic "requirement" of the physicists in the collaboration is a set of programs for doing event simulation, reconstruction, visualisation, etc. and a set of tools which facilitate the writing of analysis programs. Additionally a physicist wants something that is easy to use and (though he or she may claim otherwise) is extremely flexible. The purpose of the Gaudi application framework is to provide software which fulfils these requirements, but which additionally addresses a larger set of requirements, including the use of some of the software online.

If the software is to be easy to use it must require a limited amount of learning on the part of the user. In particular, once learned there should be no need to re-learn just because technology has moved on. (You do not need to re-take your licence every time you buy a new car.) Thus one of the principal design goals was to insulate users (physicist developers and physicist analysists) from irrelevant details such as what software libraries we use for data I/O, or for graphics. We have done this by developing an architecture. An architecture consists of the specification of a number of components and their interactions with each other. A component is a "block" of software which has a well specified interface and functionality. An interface is a collection of methods along with a statement of what each method actually does, i.e. its functionality.

We may summarise the main benefits we gain from this approach:

Flexibility

This approach gives flexibility because components may be plugged together in different ways to perform different tasks.

Simplicity
Software for using, for example, an object data base is in general fairly complex and time consuming to learn. Most of the detail is of little interest to someone who just wants to read data or store results. A "data access" component would have an interface which provided to the user only the required functionality. Additionally the interface would be the same independently of the underlying storage technology.
Robustness
As stated above a component can hide the underlying technology. As well as offering simplicity, this has the additional advantage that the underlying technology may be changed without the user even needing to know.

It is intended that almost all software written by physicists in the collaboration whether for event generation, reconstruction or analysis will be in the form of specialisations of a few specific components. Here, specialisation means taking a standard component and adding to its functionality while keeping the interface the same. Within the application framework this is done by deriving new classes from one of the base classes:

In the rest of this chapter we will briefly consider the first two of these components and in particular the subject of the "separation" of data and algorithm. They will be covered in more depth in chapters 5 and 6 . The third base class, Converter, exists more for technical necessity than anything else and will be discussed in chapter 12 . Following this we give a brief outline of the main components that a physicist developer will come into contact with.

2.3  Data versus code

Broadly speaking, tasks such as physics analysis and event reconstruction consist of the manipulation of mathematical or physical quantities: points, vectors, matrices, hits, momenta, etc., by algorithms which are generally specified in terms of equations and natural language. The mapping of this type of task into a programming language such as FORTRAN is very natural, since there is a very clear distinction between "data" and "code". Data consists of variables such as:

 
      integer n 
      real p(3)

and code which may consist of a simple statement or a set of statements collected together into a function or procedure:

 
      real function innerProduct(p1, p2) 
      real p1(3),p2(3) 
      innerProduct = p1(1)*p2(1) + p1(2)*p2(2) + p1(3)*p2(3) 
      end

Thus the physical and mathematical quantities map to data and the algorithms map to a collection of functions.

A priori, we see no reason why moving to a language which supports the idea of objects, such as C++, should change the way we think of doing physics analysis. Thus the idea of having essentially mathematical objects such as vectors, points etc. and these being distinct from the more complex beasts which manipulate them, e.g. fitting algorithms etc. is still valid. This is the reason why the Gaudi application framework makes a clear distinction between "data" objects and "algorithm" objects.

Anything which has as its origin a concept such as hit, point, vector, trajectory, i.e. a clear "quantity-like" entity should be implemented by deriving a class from the DataObject base class.

On the other hand anything which is essentially a "procedure", i.e. a set of rules for performing transformations on more data-like objects, or for creating new data-like objects should be designed as a class derived from the Algorithm base class.

Further more you should not have objects derived from DataObject performing long complex algorithmic procedures. The intention is that these objects are "small".

Tracks which fit themselves are of course possible: you could have a constructor which took a list of hits as a parameter; but they are silly. Every track object would now have to contain all of the parameters used to perform the track fit, making it far from a simple object. Track-fitting is an algorithmic procedure; a track is probably best represented by a point and a vector, or perhaps a set of points and vectors. They are different.

 

2.4  Main components

The principle functionality of an algorithm is to take input data, manipulate it and produce new output data. Figure 1 shows how a concrete algorithm object interacts with the rest of the application framework to achieve this.

Figure 1 The main components of the framework as seen by an algorithm object.

 

The figure shows the four main services that algorithm objects use:

In addition, a fifth service, the job options service (see Chapter 11 ) is used by the Algorithm base class, but is not usually explicitly seen by a concrete algorithm.

Each of these services is provided by a component and the use of these components is via an interface. The interface used by algorithm objects is shown in the figure, e.g. for both the event data and detector data stores it is the IDataProviderSvc interface. In general a component implements more than one interface. For example the event data store implements another interface: IDataManager which is used by the application manager to clear the store before a new event is read in.

An algorithm's access to data, whether the data is coming from or going to a persistent store or whether it is coming from or going to another algorithm is always via one of the data store components. The IDataProviderSvc interface allows algorithms to access data in the store and to add new data to the store. It is discussed further in chapter 6 where we consider the data store components in more detail.

The histogram service is another type of data store intended for the storage of histograms and other "statistical" objects, i.e. data objects with a lifetime of longer than a single event. Access is via the IHistogramSvc which is an extension to the IDataProviderSvc interface, and is discussed in chapter 9 . The n-tuple service is similar, with access via the INtupleSvc extension to the IDataProviderSvc interface, as discussed in Chapter 10 .

In general an algorithm will be configurable: It will require certain parameters, such as cut-offs, upper limits on the number of iterations, convergence criteria, etc., to be initialised before the algorithm may be executed. These parameters may be specified at run time via the job options mechanism. This is done by the job options service. Though it is not explicitly shown in the figure this component makes use of the IProperty interface which is implemented by the Algorithm base class.

During its execution an algorithm may wish to make reports on its progress or on errors that occur. All communication with the outside world should go through the message service component via the IMessageSvc interface. Use of this interface is discussed in Chapter 11 .

As mentioned above, by virtue of its derivation from the Algorithm base class, any concrete algorithm class implements the IAlgorithm and IProperty interfaces. IProperty is usually used only by the job options service.

Top level algorithms, i.e. algorithm objects created by the application manager are controlled via the IAlgorithm interface. This consists essentially of the methods: initialize(), execute(), and finalize().

The figure also shows that a concrete algorithm may make use of additional objects internally to aid it in its function. These private objects do not need to inherit from any particular base class so long as they are only used internally. These objects are under the complete control of the algorithm object itself and so care is required to avoid memory leaks etc.

We have used the terms "interface" and "implements" quite freely above. Let us be more explicit about what we mean. We use the term interface to describe a pure virtual C++ class, i.e. a class with no data members, and no implementation of the methods that it declares. For example:

 

 
class PureAbstractClass { 
  virtual method1() = 0; 
  virtual method2() = 0; 
}

is a pure abstract class or interface. We say that a class implements such an interface if it is derived from it, for example:

 

 
class ConcreteComponent: public PureAbstractClass { 
  method1() { } 
  method2() { } 
}

A component which implements more than one interface does so via multiple inheritance, however, since the interfaces are pure abstract classes the usual problems associated with multiple inheritance do not occur.

Within the framework every component, i.e. services and algorithms, has two qualities:

In addition, as discussed above, a component may implement several interfaces. These interfaces are identified by a unique number which is available via a global constant of the form: IID_InterfaceType, such as for example IID_IDataProviderSvc. Using these it is possible to enquire what interfaces a particular component implements.

 

2.5   Package structure

For large software systems, such as ours, it is clearly important to decompose the system into hierarchies of smaller and more manageable entities. This decomposition can have important consequences for implementation related issues, such as compile-time, link dependencies, configuration management, etc. For that we need to introduce the concept of package as the grouping of related components together into a cohesive physical entity. A package is also the minimal unit of software release.

We have decomposed the LHCb data processing software into the packages shown in Figure 2 . At the lower level we find Gaudi which is the framework itself and only depends on some basic standard packages (STL,...). In the second level there are the packages for the specific LHCb event and detector data models. These packages depend on the framework and CLHEP. In the same level we have a specific implementation of the Histogram persistency service based on HBOOK ( HbookCnv ). When other implementations will exist, they will be added as packages. At the next level we will have packages consisting of implementations of event and detector data persistency services and converters. Currently in this release we have one based on SicB and ZEBRA ( SicbCnv ) and ROOT I/O ( RootCnv ). Later on, the algorithms that will constitute the core of the data processing applications (trigger, reconstruction, analysis, etc.) will form a number a independent packages. Finally, at the top level we find the applications.

Figure 2 Current package structure of the LHCb software