Chapter 4
Getting started
4.1 Overview
In this chapter we walk through one of the example applications (RandomNumber) which are distributed with the framework. We look briefly at the different files and go over the steps needed to compile and execute the code. We also outline where various subjects are covered in more detail in the remainder of the document. Finally we cover briefly the other example applications which are distributed and say a few words on what each one is intended to demonstrate.
4.2 Creating a job
Traditionally, a "job" is the running of a program on a specified set of input data to produce a set of output data, usually in batch.
For the example applications supplied this is essentially a two step process. First the executable must be produced, and secondly the necessary environment variables must be set and the required job options specified, as illustrated in Figure 4.1.
Figure 4.1 Creating a job from the AlgSequencer example application
The example applications consist of a number of "source code" files which together allow you to generate an executable program. These are:
· The main program.
· Header and implementation files for each of the concrete algorithm classes.
· A CMT requirements file.
· The set of Gaudi libraries.
In order for the job to run as desired you must provide the correct configuration information for the executable. This is done via entries in the job options file.
4.3 The main program
The main program is needed to bootstrap the job. It can be completely general, and can be reused by all Gaudi applications. An example main program, from the package GaudiExamples, is shown in Listing 4.1.
Include files
These are needed for the creation of the application manager and Smart interface pointers.
Application Manager instantiation
Line 12 instantiates an ApplicationMgr object. The application manager is essentially the job controller. It is responsible for creating and correctly initialising all of the services and algorithms required, for looping over the input data events and executing the algorithms specified in the job options file, and for terminating the job cleanly.
Retrieval of Interface pointers
The code on lines 14 and 15 retrieves the pointers to the IProperty and IAppMgrUI interfaces of the application manager.
Setting the application manager's properties
The only property which needs to be set explicitly in the main program is the name of the job options file which contains all of the other configuration information needed to run the job. In this example, the name is the first argument of the program and defaults to "../options/job.opts" (line 23); it is set on line 25.
Program execution
All of the code before line 28 is essentially for setting up the job. Once this is done, a call to appMgr::run() is all that is needed to start the job proper! The steps that occur within this method are discussed briefly in section 4.6.
4.4 Configuring the job
The application framework makes use of a job options file for job configuration. Part of the job options file of an example application is shown in Listing 4.2.
The format of an options file is discussed fully in Chapter 11. Options may be set both for algorithms and services and the list of available options for standard components is given in Appendix B. Here we look briefly at a few of the more commonly used options.
4.4.1 Defining the algorithms to be executed
The option ApplicationMgr.TopAlg (line 7) is a list of algorithms that will be created and controlled directly by the application manager, the so-called top-level algorithms. The syntax is a list of the form:
ApplicationMgr.TopAlg = { "Type1/Name1", "Type2/Name2" };
The line above instructs the application manager to create two top level algorithms. One of type Type1 called "Name1" and one of type Type2 called "Name2".
In the case where the name of the algorithm is the same as the algorithm's type (i.e. class), only the class name is necessary. In the example, an instance of the class "ReadAlg" will be created with name "ReadAlg".
4.4.2 Defining the job input
Event data input is controlled by an EventSelector. The EventSelector uses a storage technology dependent data persistency service to load the data into the transient event data store, with the help of converters which are able to convert the data from the technology dependent persistent representation, to the technology independent representation in the transient data store.
In order to set up this mechanism, one needs a number of job options:
- Line 14 defines the input data file, and the persistency technology (ROOT I/O in this example).
- Line 6 tells the application manager to create a new event conversion service, to be called RootEvtCnvSvc. Note that this is just a name for our convenience, the service is of type DbEventCnvSvc and does not (yet) know that it will deal with ROOT technology. The configuration of RootEvtCnvSvc to use the ROOT I/O technology is done in line 22.
- Line 19 tells the event persistency service (EventPersistencySvc created by the application manager by default) to use the RootEvtCnvSvc to do the conversion between persistent and transient data representations.
- Line 5 tells the application manager which additional libraries to load in order to find the required conversion service. In this example, the GaudiDb library contains the DbEventCnvSvc class, the GaudiRootDb library contains the ROOT specific database drivers.
- Finally, the options on lines 15 and 16 tell the EventSelector to start reading sequentially from the first event in the file, for five events.
In the special case where no event input is required (e.g. for event generation), one can replace the above options by the two options:
ApplicationMgr.EvtMax = 20; // events to be processed (default is 10)
ApplicationMgr.EvtSel = "NONE"; // do not use any event input
A discussion of event I/O can be found in Chapter 10. Converters and the conversion process are described in Chapter 13.
4.4.3 Defining job output
One can consider three types of job output: event data (including event collections and n-tuples), statistical data (histograms) and printout. Here we discuss only the simplest (printout); histograms are discussed in Chapter 9, event data in Section 6.10.1, event collections in Section 10.3.1.
Printout in Gaudi is handled by the message service (described in Chapter 11), which allows to control the amount of printout according to severity level. The global threshold for printout is set by the option on line 10 - in this example only messages of severity level WARNING or above will be printed. This can be over-ridden for individual algorithms or services, as in line 11, where the threshold for EventSelector is set to DEBUG.
4.5 Algorithms
The subject of specialising the Algorithm base class to do something useful will be covered in detail in Chapter 5. Here we will limit ourselves to looking at an example HelloWorld class.
4.5.1 The HelloWorld.h header file
The HelloWorld class definition is shown in Listing 4.3.
· The class is derived from the Algorithm base class as must be all specialised algorithm classes. This implies that the Algorithm.h file must be included (line 6).
· All derived algorithm classes must provide a constructor with the parameters shown in line 9. The first parameter is the name of the algorithm and is used amongst other things to locate any options that may have been specified in the job options file.
· The HistoAlgorithm class has three (private) data members, defined in lines 18 to 20. These are properties that can be set via the job options file.
· The three methods on lines 12 to 14 must be implemented, since they are pure virtual in the base class.
4.5.2 The HelloWorld implementation file
The implementation file contains the actual code for the constructor and for the methods: initialize(), execute() and finalize(). It also contains two lines of code for the HelloWorld factory, which we will discuss in section 5.3.1
The constructor
must call the base class constructor, passing on its two arguments. As usual, member variables should be initialised. Here we declare and initialise the member variables that we wish to be set by the job options service. This is done by calling the declareProperty() method.
Initialisation
The application manager invokes the sysInitialize() method of the algorithm base class which, in turn, invokes the initialize() method of the base class, the setProperties() method, and finally the initialize() method of the concrete algorithm class. As a consequence all of an algorithm's properties will have been set before its initialize() method is invoked, and all of the standard services such as the message service are available. This is discussed in more detail in Chapter 5.
Looking at the code in the example (Listing 4.4) we see that we are now able to print out the values of the algorithm's properties, using the message service and the MsgStream utility class. A local MsgStream object is created (line 3), which uses the Algorithm's standard message service via the msgSvc() accessor, and the algorithm's name via the name() accessor. The use of these is discussed in more detail in Chapter 11.
Note that the job will stop if the initialize() method of any algorithm does not return StatusCode::SUCCESS. This is to avoid processing with a badly configured application.:
execution
The execute() method is called by the application manager once for every event. This is where most of the real action should take place. The trivial HelloWorld class just prints out a message... Note that the method must return StatusCode::SUCCESS on successful completion. If a particular algorithm returns a FAILURE status code more than a (configurable) maximum number of times, the application manager will decide that this algorithm is badly configured and jump to the finalisation stage before all events have been processed.
Finalisation
The finalize() method is called at the end of the job. In this trivial example a message is printed.
4.6 Job execution
From the main program and the CMT requirements file we can make an executable, as explained in section 3.5. This executable together with the file of job options form a job which may be submitted for batch or run interactively. Figure 4.2 shows a trace of an example program execution. The diagram is not intended to be complete, merely to illustrate a few of the points mentioned earlier in the chapter.
1. The application manager instantiates the required services and initialises them. The message service is done first to allow the other services to use it, and the job options service is second so that the other services may be configured at run time.
Figure 4.2 A sequence diagram showing a part of the execution of an example program.
2. The algorithms which have been declared to the application manager within the job options (via the TopAlg option) are created. We denote these algorithms "top-level" as they are the only ones controlled directly by the application manager. For illustration purposes we instantiate an EmptyAlgorithm and a HistoAlgorithm.
3. The top-level algorithms are initialised. Their properties (if they have any) are set and they may make use of the message service. If any algorithm fails to initialise, the job is stopped.
4. The application manager now starts to loop over events. After each event is read, it executes each of the top level algorithms in order. The order of execution of the algorithms is the order in which they appear in the TopAlg option. This will continue until the required number of events has been processed, unless one or more of the algorithms return a FAILURE status code more than the maximum number of times, in which case the application manager will jump to the finalisation stage before all events have been processed.
5. After the required data sample has been read the application manager finalises each top level algorithm.
6. Services are finalised.
7. All objects are deleted and resources freed. The program terminates.
4.7 Examples distributed with Gaudi
A number of examples is included in the current release of the framework, in the GaudiExamples package. The package has some sub-directories in addition to the standard ones shown in Figure 16.3. The options sub-directory contains files of standard job options common to many examples. These files are included in the job options of the specific examples when necessary. The specific job options files can be found in the home sub-directory.
The code of the examples is in sub-directories of the src directory, one sub-directory per example. The intention is that each example demonstrates how to make use of some part of the functionality of the framework. The list of available examples is shown in Table 4.1.
4.8 Additional LHCb specific examples
The examples described so far are rather simple and do not contain any specific knowledge about the LHCb event and detector data. A set of LHCb specific examples is provided in the Ex group of packages, as listed in Table 4.2
All examples share a single main program and some default job options, which can be found in the GaudiConf package.
4.8.1 Simple Physics Analysis Example
The algorithms in the examples of Table 4.2 use many of the Gaudi Services that someone would want to be able to utilize while doing physics analysis: histograms, ntuples, creating and retrieving private transient data, retrieving particle properties (like mass values), etc. Detailed examples on how to use the specific services are provided in the topical examples but in the SimpleAnalysisExample they are combined together. Tools to make physics analysis in a more elegant and complex way are under development and their concrete implementation will be part of DaVinci, the OO Physics Analysis Program. A trivial implementation of an algorithm similar to that of the SimpleAnalysisAlgorithm implemented using tools is provided in the ToolAnalysisExample.
The SimpleAnalysisAlgorithm is an example in which pi+ pi- invariant masses are made while requiring the component particles to satisfy some simple kinematic and quality cuts. Private containers of the particles satisfying successive cuts are created and filled (charged particles, detection in the silicon, best particle ID). Invariant masses are made and corresponding histograms are filled for all combinations of the final private containers, for combinations with Pt of both pions greater than a cut value and for combinations with impact parameter of both pions greater then a cut value. The Pt and impact parameter cut values are properties of the algorithm and as such can be specified in the jobOptions, where the number is taken in Gaudi Units. CLHEP vectors' classes are used to evaluate transverse momentum and invariant masses as well as to calculate the impact parameter. When nominal mass values are required they are retrieved via the ParticlePropertySvc. Since a primary vertex is required a "dummy" algorithm RecPrimaryVertex retrieves the Monte Carlo primary vertex and uses the quantites to fill a MyVertex object (/Event/MyAxVertices), which is then retrieved by the SimpleAnalysisAlgorithm. Since the MyVertex object is created and registered in the Transient Event store by the RecPrimaryVertex algorithm, the sequencing of RecPrimaryVertex and SimpleAnalysisAlgorithm in the jobOptions file is very important. A protection is put in place so that the SimpleAnalysisAlgorithm will return a failure code if not all of the necessary input data exist in the store.
When doing physics analysis on Monte Carlo data, it is necessary to compare the reconstructed decay with the Monte Carlo truth in order to calculate efficiencies. The MCDecayFinder algorithm is an example of how to find any one step decay. The parent of the decay and the list of its direct descendants are properties of the algorithm and can be specified in the jobOptions file. If no decay is specified in the jobOptions this example will look for a B0->pi+pi- decay. The Algorithm will retrieve the particle Geant3 ID from the ParticlePropertySvc (the identifying particleID in MCParticles) and search the MCParticles to find the requested parent and that is has the correct type and number of decay products. If a decay is found kinematic variables are stored in an ntuple that can be accessed by PAW. In addition the Algorithm uses the Message service with DEBUG or INFO levels to print a summary of its behaviour for each event as well as for the job.
Quadralay Corporation http://www.webworks.com Voice: (512) 719-3399 Fax: (512) 719-3606 sales@webworks.com |