Information management tools and procedures: a discussion document

LHCb Technical Note                                                                                                                    

Issue: Version 1                                                                                                                        
Revision: 1                                                                                                                             
Reference: LHCb COMP 98-xxx                                                                                               
Created: 18th April 1998                                                                                                                 
Last modified: Tuesday, 3 May 1998                                                                                       
Prepared By: LHCb Computing Group                                                                                            
Eric van Herwijnen

Abstract

Document Status Sheet

Table 1 Document Status Sheet

1. Document Title: [Project Name Qualification] User Requirements Document
2. Document Reference Number: [Document Reference Number]
3. Issue 4. Revision 5. Date 6. Reason for change

Table of Contents

1. Introduction *

1.1. Definition of the "LHCb Information Management" project

1.1. Definition of the "LHCb Information Management" project *

1.2. Type of information to be communicated *

1.3. Needs for notification *

1.4. Confidentiality *

1.5. Tools for accessing information *

1.6. Current status *

2. Paper documentation *

3. TDR production policy *

4. Web pages *

4.1. Security

4.1. Security *

4.2. PIE *

4.3. Document archive *

4.4. Support and maintenance items *

 

  1. Introduction
  2. The communication inside a collaboration should provide tools for various functions, and allow an easy use of them. This note is an attempt to identify possible extensions and improvements to the current situation (as described in the notes "Introduction to documentation for LHCb", "Design of the LHCb web" and "Webmaster's manual") and to make a plan for future developments in this area.

    1. Definition of the "LHCb Information Management" project

As you can see from this document, CERN has already made a considerable investment in the LHCb information management infrastructure. Therefore, if any of this work is to be redone, and for any new work to be done, including the items listed below, we should first discuss:

    1. If it is not easier to improve the current solution.
    2. If it is worth the effort to put something new in place.
    3. Who is going to do the work to put in place the new solution.
    4. Who will do the maintenance of the new system.

I suggest that the LHCb Information Management project comprises the following areas:

    1. Definition of a policy for forthcoming important reports by the collaboration (TDR's).
    2. The Web server. As described in the previous chapters, the LHCb Web server fulfills the functions of an HTML page server, a document archive, a mail server, a news-group server and the collaboration database. It was suggested to the collaboration by me to unite these functions on a single machine to minimize the maintenance, to give a consistent user interface, and to maximize our independence of other services for these critical functions.
    3. Procedures connected to Information management.
    4. Recommendation of Information management tools.
    1. Type of information to be communicated

There are several types of information to be distributed, which correspond to various wishes for delivery:

    • Announcements of meetings, typically with an agenda. Valid up to the meeting.
    • Should reach all those interested by the meeting, but not every time everybody. Should be readable on all platforms. Applies to all sorts of events, not only meetings.
    • Minutes of meetings. Valid forever, may not need to be distributed, but just made available e.g. on the Web, and announced. Classified by type of meeting.
    • Technical documents, notes, figures, blue prints. Not distributed, but should be easy to print. However, they need to be classified so that retrieval is easy.
    • Discussion list. Informal exchange of idea, for sub-groups.
    • General News items.
    • People news.
        1. Needs for notification
        2. Notification is the part of an information system which informs the user that he has new information, for him (mail) or interesting him (news). Notification can be done at ‘login’ time (number/list of unseen documents), but could also be done in real time. If a maximum delay for informing the user is guaranteed, this can then be used for reminder of meeting, last minute changes in agenda and so on. But with high traffic, notification can be a nuisance, so it’s level should be under the users’ control.

          Information can be pertinent only for a given subset of the collaboration, for example Sub-detector related messages. This is mainly for notification, as it is also useful to be able to get access to the whole information in the collaboration. Registering and de-registering to a category of information should be easy.

          Notification could implies the existence of a tag that specifies whether the information has or not been seen by a user. This is a 2 dimensional array, which can be big!

        3. Confidentiality
        4. It is clear that some documents should be restricted to members of the collaboration. But not all, as some 'commercial' plots or documents should be world accessible. Protection by a generic password is not very solid, as it is known by all those who worked once in the collaboration, including those working part-time like technical staff in the home labs. And it is very difficult to change without a lot of complaints, or a prior broadcast of the new password, which makes then the change partly useless.

        5. Tools for accessing information

      We should use a commercial tool for displaying information. The information can be sent to the local host (unknown delay to arrive, but easy to access after), or directly accessed on a central server, with the problems of bad/slow/unavailable network. There are three existing systems, some advantages and disadvantages of which are listed below.

        1. Web server and browser.
        • No notification.
        • Difficult to maintain a list of unseen messages, as the user name is not (usually) known by the Web server.
        • Document posting requires a special procedure.
        • User authentication usually by a 'generic' password. Protection depends on the Web server.
        • Can access old messages, in any class.
        • Old/obsolete messages archived by a Webmaster.
        • Central server.
        1. Mail messages.
        • Notification OK.
        • Seen/Unseen flag exists in some mail programs (e.g. Netscape).
        • Document posting by a server of mailing list. Maintenance
        • User authentication by the user login.
        • Can not access old message, nor non subscribed lists.
        • Old messages removed by the user when he wants.
        • Unknown time for delivery.
        1. News messages on a dedicated news server.
        • Notification OK.
        • Unseen tag by a local file on the reader.
        • Document posting by the news protocol.
        • User authentication by a 'generic' password.
        • Can access old message, in any class.
        • Old messages archived by their expiration date, either specified when posting or handler by the news server.
        • Central server. But can be easily replicated, at the expense of more 'risks' on confidentiality.

       

      As an example, the client side of these three systems are integrated in Netscape, so the look and the possibilities to display graphic are quite similar.

        1. Current status

      The current LHCb system is based essentially on MAIL, with a part on the Web. The subscription to the lists is done by editing BWHO, which requires a password than one tends to forget. The way to post information (address of the correct mailing list) is not widely known, and the selection of the correct distribution list far from being adequate. Private mailing lists are then built and used.

      It is also not easy to see if a message was sent to YOU as single person, or to you as member of a distribution list. Last, multipart messages are not properly propagated on the mailing list, but this can clearly be fixed.

      Several attempts have been made to obtain all the functions in one tool. The newswatcher to inform on new Web pages, a private news server, and a news-like interface on a dedicated Web server.

      Information in LHCb is delivered using:

        1. Paper documents.
        2. Web pages.
        3. Email.
        4. Discussion groups and News.

      We believe this situation will continue to exist in the foreseeable future.

      1. Paper documentation
        1. We need to make a list of what to document.
        2. A procedure needs to be defined for TDR's. Items to consider are:
        • Choice of a single text formatter.
        • A consistent style of LHCb documents for the LHCC.
        • The necessity of an editorial board.
        • Use of the Web to distribute the latest version of contributions.
        • How to avoid problems printing PostScript.
        1. If we continue to use TeX for Technical Notes, then we need to make a LaTeX template to ensure a consistent style of LHCb technical notes.
        2. We should request a standard NICE NT installation of TeX.
        3. Considering the problems we have with PostScript, we should explore the use of an additional format for document archival.
        4. Do we need a quality control procedure for non-public documents?
        5. We need a procedure for release notes.
        6. Design a directory/subweb structure for archiving various document types on the Web.
        7. Create a single place for storing recommended figures for publications and presentations.
        8. Define/describe the procedure for publication in peer reviewed journals.
        9. Identify where training is required, in what tools/technology and who for.

       

      1. TDR production policy
        1. Selection/recommendation of system.
        2. Training requirements.
        3. Support.

       

       

      1. Web pages

      Under this point, the following issues should be discussed:

        1. Migration to FrontPage 98. Design of pages - review of structure of the Web.
        2. Should we move the Web server to the ALNTS1 server?
        3. FrontPage training requirements. Teach people how to make consistent pages and to integrate them with the Web server. For LHCb documents, people should use the LHCb server rather than their own server. Make it clear when something should be put on the Web.
        4. Performance issues. The Microsoft server is the slowest in benchmarks. Should we consider moving to Apache or Netscape? When?
        5. Configure test Web server.
        1. Security
        1. Protection of pages on the Web with the userid lhcb may not be sufficient.
        2. How should the BWHO userid evolve?
        3. Do we need a secure session management session as offered by the Netscape Web server?
      1. PIE
        1. When should we move to PIE (in any case we can't before August)?
        2. How will PIE coexist with BWHO?
        3. Phase-out plan for BWHO?
      1. Document archive

        1. Archival of LHCb notes and other documents.
        2. Interface with other CERN projects, e.g. library preprint server.
        1. Support and maintenance items

      Who can take over this work from me?

      Annex A. Discussion by O. Callot on why we need a news server

      Mailing lists have a STRONG disadvantage: Access to information produced before you registered to the list, or to lists you have not registered to, is IMPOSSIBLE. MAIL should then be coupled with a permanent repository, storing all messages posted in the list, with tools to select them as a mail reader would do.

      Also, creating new mailing lists is not easy, and the procedure for registering to or canceling from mailing lists is not fully mastered by all the collaborators. And some people do not like to have all the activity of meeting announcement,. in the middle of their private mails.

      A better solution is a private news server. A generic password allows to restrict access to members of the collaboration. Posting, reading old news or news on other 'lists' is easy. Creating new lists also. Notification, by scanning the interesting lists every 5-10 mins is feasible. Note that an LHCb news server exist, try on axaonl.cern.ch with the "lhcb" user and usual password. Of course this does not provide a very solid functionality for long term archives, i.e. if all messages are kept, a new user will see thousand of messages, and it will be difficult to find its way.

      The only good long term repository seems then to be the Web. However, adding a document to the Web should be easy, essentially automatic for all the traffic on the mailing lists. Tools to retrieve a message with some search string should be available, an Alta-Vista like search engine would be superb... An implementation using LEP tools is available at http://axaonl.cern.ch/VAXNEWS using username "lhcb/xxxx" (xxxx being a string which identifies you) and the usual LHCb password.

      Posting is more the problem. The Web protocol does not allow an easy posting of document, a few lines of plain text is the maximum one can reasonably do. Mail can specify the category by using different mail addresses (as now), but can not specify other parameters like expiration date. One can imagine to force the text sent to the mail distributor to start by some technical lines, like list name and expiration, but this is not very practical. News is the best solution for posting, even if the specification of an expiration date is not possible in some newsreader like Netscape.