F&L Grid

A generic infrastructure for backup and recovery

Abb.: Backup und Recovery Prototyp

A generic infrastructure for backup and recovery

The aim of the F&L Grid project is to develop a generic, service-oriented grid for research and teaching. The services offered will be provided in the DFN infrastructure which connects universities and research institutions in Germany. A backup and recovery service was developed by the University of Marburg as an example service.

Within the context of this service, the Karlsruhe Computing Centre and T-Systems SfR act as providers, while the DFN takes on the role of service intermediary between the institutions involved. In this respect, the DFN developed a business model that brings providers and users of the service together in an uncomplicated manner.

The backup and recovery prototype consists primarily of a client component and a server component. The following is an overview of the implementation of the F&L Grid prototype.

Communication with the two components is carried out via two separate channels:

  • Control channel: The control channel transfers commands for coordinating backup and recovery.
  • Data channel: The data channel is used to transfer the actual data. The Secure Copy Protocol (SCP) is used here.

On the client side, the job submission client allows the user to define a backup or recovery job via a graphical user interface. As soon as a new job has been defined, it is transferred to the job submission server via the control channel. For a backup job, data is transferred using the ScpSender, while the ScpReceiver is used for a recovery job. The CertificateReader administers the required certificates.

On the fat client side, the job submission server waits for incoming backup or recovery jobs. As soon as a new job arrives, an additional worker thread is created to process this job. The SSH Daemon is a standard component of the operating system in use and sends or receives data during backup or recovery. Files transferred by the client are stored in the cache of the fat client first and, in the case of TSM, written to the TSM server via a TSM access point.
Depending on the backup solution in use, other access points are possible since the connection is made via a generic interface.
The backup and recovery service has the following attributes:

  • Minimal invasiveness: The backup and recovery service does not influence any of the providers’ existing processes.
  • Easy to use: Scientists can save or recover experimental results without requiring the assistance of an administrator or a help desk. This distinguishes it from many commercial solutions.
  • User-based backup and recovery: Many commercial solutions only support a node-based approach for backup and recovery because the file systems in different nodes can vary considerably. However, the major disadvantage of the node-based approach is that users who are working in a number of nodes can only recover the data of the node they are currently using and not all of their data. In order to solve this problem, it was decided to take a user-based approach that facilitates cross-node data recovery.
  • Simple installation: The necessary software is procured via Java Web Start. Thus, no client-side installation is necessary and only the Java Runtime Environment (JRE) has to be installed. Client software updates are installed automatically on the users’ computers.
  • Sustainability: The backup and recovery service developed will be facilitated by the DFN and provided to the named providers on a long-term basis, beyond the completion of the project.
  • Exchangeable backend: A dependency of the service backend on a special, commercial system for backup and recovery is avoided by using a generic interface to the backend. This makes it possible to exchange this system in future.

Independent of operating system: The use of Java means that the backup and recovery service can be utilised on various operating systems without the need to maintain various versions of the service or the client software.
A potential business model was published by the DFN and is predominantly oriented towards universities and research institutions. The existing infrastructure for authentication and authorisation (DFN-AAI) will be enhanced with a component that gives providers of B&R systems the option to be connected to the DFN-AAI in the service provider role.  The DFN ensures that the corresponding middleware components are available in the long-term, thereby arranging transparent market access for potential providers and users of B&R systems. The users directly instruct their preferred providers to provide a B&R service via this platform. If required by users, the DFN can also draft a framework agreement with conditions that serve the interests of users who do not want to select a service provider themselves.