DGI-2 Work Package 5

Development projects

Jobzentrisches Monitoring mit AMon

| WP 1 | WP 2 | WP 3 | WP 4 | WP 5 | WP 6 |

DGI-2 work package 5, "Development Projects," involves realising the requirements formulated by the communities relating to new developments or enhancements to existing components.

The activities concentrate on the areas of accounting, Customer Relationship Management (CRM), monitoring and GridSphere. The activities of the individual work packages will be presented to the communities via the publication of deliverables as well as the organisation of workshops to present the individual focal points. This will allow the communities to discuss the individual work packages so that their feedback can be obtained, which will then be taken into consideration in further activities.

Accounting in the D-Grid records data about the utilisation of the resources provided by the individual users. This accounting process records the use of resources of the D-Grid"s backbone as well as those of the communities. D-Grid accounting is based on the Distributed Grid Accounting System (DGAS), which was originally developed in the DataGrid. DGAS was adjusted to the requirements of the D-Grid in order to record the use of computer resources, independently of the middleware via which the usage was initiated. A client records the accounting information in the batch system of the individual site and transfers it to the accounting server. An encrypted connection is used between the client and the central accounting server in order to protect the data against attacks on integrity and confidentiality.

Access to the accounting data occurs on the basis of strong authentication using X.509 certificates in accordance with a hierarchical model of roles. The primary focus of the tasks in this work package is on developing new metrics in terms of accounting for relevant resource types. Examples for these developments include accounting for usage of storage resources, software subject to licensing and data that is subject to fees.

The Customer Relationship Management (CRM) work package involves analysing the data recorded in accounting. CRM makes it possible to join the individual pieces of data in context and interpret them. The information correlated in this way provides insights into trends in user behaviour, enabling resource providers to handle changing requirements and react to them in a timely manner. Within the context of the Grid, the aim of CRM is to give resource providers the means to optimise their service for the communities through interpretation of the available information. Knowledge of user behaviour should continue to facilitate the development of new services as well as the optimisation of resource utilisation.

The GridSphere portal framework provides a web portal that makes it quick and easy for developers to create web applications that run in GridSphere portal containers. The grid functionality in GridSphere is provided by the GridPortlet application, which is based on the GridSphere portal framework. This can be used to equip any GridSphere-based application with grid functionality. The core GridSphere provides a comprehensive array of portlets for configuring and administering the portal. The layout portlet is responsible for creating the portal pages for guests and users who are logged in. The administration portlet is responsible for the creation and administration of users and groups, as well as the general portal settings. The aim of the development is to improve the usability of the GridSphere portal framework and enhance it in order to optimise the usability of the grid infrastructure for current users and future user groups. Among other things, this includes the integration of GridSphere in existing environments (e.g. Shibboleth and X.509 certificates) as well as improving support for the Globus Toolkit 4.

The GRID is designed so that a single user can process a large number of jobs simultaneously on a number of potentially different computer systems. The user requires an effective tool to track whether the jobs are running optimally. For this reason, job-related monitoring with AMon collects a range of monitoring data about the actual job and the computer system in which the job is being carried out. In order to allow the user to benefit from this data, it is prepared and presented in an intuitive, graphical web interface during runtime. AMon can rapidly provide information about jobs for which processing has been unsuccessful or jobs with unusual execution behaviour. For example, the circular chart (as shown above) provides an overview of the current condition/status of the jobs. However, it is not only with cancelled jobs that a more detailed analysis is useful. As such, AMon provides filters that allow users to identify the jobs that are behaving irregularly as quickly as possible. The filters analyse the monitoring data and categorise potential errors. Thus, it is possible, for example, to identify jobs for which there is insufficient main memory or which have made no progress for a long period of time. The filter results are displayed in traffic light-style colour coding.

Suspect jobs can be compared with other jobs or analysed in detail thanks to the recorded monitoring data. This allows a distinction to be made, for example, between problematic jobs and "normal" jobs. Thus, the user can react to problems at an early stage.

All graphical displays are interactive so users can zoom in or click to display detailed information. AMon is already being used successfully in combination with gLite. A monitoring architecture for recording job-related data is currently being developed for Globus4 in order to also make it possible to take advantage of the benefits of AMon with this middleware.