minigridproject

To get some experience with the GRID, we started a mini project between CERN, RAL and Liverpool.

Globus is installed and tested at CERN, RAL and Liverpool.
Members of the GRID working group are given acces on their respective testbeds. Members cross check that they can run jobs on each others machines.
We make sure that SICBMC can be run at CERN, RAL and Liverpool. (SICBDST requires input tapes - we propose to wait until later). It could be that the presence of AFS will be a requirement, to guarantee that we all run the same executable.
We verify that the data produced by SICBMC can be shipped back to CERN. Between CERN and RAL this can be done using the "TAPE" command - perhaps a modified version can be used for Liverpool.

Objective	CERN	RAL	Liverpool
1. Installation of Globus 1.1.3	Available as of september 2000	Available and working as of july 2000 thanks to Andrew Samsun	Available as of september 2000
2. Globus script for SICBMC	--	Running at RAL thanks to Glenn Patrick	--
3. Globus accounts for GRID WG members	Chris B, Glenn P, Eric vH	Chris B, Glenn P, Eric vH	Girish P, Chris B, Glenn P, Eric vH
4. Testing Globus script for copying the SICBMC executable and running it on respective machines	lxplus009 -> pcrd25 ok lxplus009 -> csflnx01.rl.ac.uk executable runs, script not ok lxplus009 -> hep52.ph.liv.ac.uk executable runs, script not ok	csflnx01.rl.ac.uk -> csflnx01.rl.ac.uk ok csflnx01.rl.ac.uk -> pcrd25 ?? csflnx01.rl.ac.uk -> hep52.ph.liv.ac.uk ??	hep52.ph.liv.ac.uk -> rest ??
5. Testing a Globus script for executing SICBMC where executable was previously copied	lxplus009 -> pcrd25 ok lxplus009 -> csflnx01.rl.ac.uk ok lxplus009 -> hep52.ph.liv.ac.uk ok
6. Sending output data back to originating computer	pcrd25 -> lxplus009 ok	RAL -> RAL ok

It took a long time before Globus 1.1.3 was working at CERN.
- My userid (evh) had to be added to the globus administrator group for it to work.
- I can only run on lxplus009.
Several scripts were created: myglobus.sh (lxplus009 -> pcrd25 ), ralglobus.sh (lxplus009 -> csflnx01.rl.ac.uk), ralmyglobus.sh (csflnx01.rl.ac.uk -> pcrd25) , mapglobus.sh (lxplus009 -> hep52.ph.liv.ac.uk).
The objective of these scripts is to copy the sicbmc executable, database, plus cards file and script to the remote computer , then run a one event minimum bias job remotely and send the log and data files back to the computer where the job was submitted from.
The script that runs sicbmc (/afs/cern.ch/user/e/evh/public/sicbmc.job) sets things up in a machine independent way and copies all the required files and executables from afs (this means that you need to have access to afs if you want to run my scripts). I could test the executable runs correctly on pcrd25 and csflnx01.
The Globus-rcp command is very unreliable. If the file is not too big, it will work, otherwise half of it gets copied without error message. Often, upon the second invocation, we get the message:
"GRAM Job submission failed because the connection to the server failed (check host and port) (error code 12)"

When doing a file transfer, Globus uses the tar command, and so requires a lot of temporary disk space.
The script myglobus.sh ran correctly between lxplus009 and pcrd25, no error messages, output file and output dataset correctly transferred.
When running between lxplus009 and csflnx01, I got the following two fatal messages (repeated several times):
"GRAM Job submission failed because the job manager cannot find the user proxy (error code 20)"
"GRAM Job submission failed because the job manager failed to open stdout (error code 73)"

On the Globus discussion list, there is a message describing a possible solution.
Running between csflnx01 and lxplus009, ralmyglobus.sh hangs, perhaps because I have insufficient disk space at RAL.
When copying the exectuables via ftp and issuing only the globusrun command, the scripts managed to run SICBMC and transfer the data back to the place of submission (in any case, for production this is what we want to do anyhow to save network resources).

Make a single script out of the four ones above, can anybody help me with this?
Modify sicbmc.job so that it copies the data back to CERN, writes it onto tape and updates the bookkeeping database.
Try to use local batch systems through Globus (PBS, LSF, MAP etc.)

This page last edited by EvH on December 20, 2000.