![]() |
|
||||||
| Home
| Mission
|
about SciDAC
|
Contact Us |
||||||
Alumni ProjectParticle Physics Data Grid (PPDG): Terabyte-Scale Multi-file Replication
PI’s: Richard Mount, SLAC, Miron Livny, Wisconsin, Harvey Newman, Caltech SummaryFile replication of thousands of files is a tedious, error prone, but extremely important task in High Energy and Nuclear Physics applications. Groups of physicists need the data generated by experiments or simulations close to their facilities in order to take advantage of distributed computational facilities, and to avoid repeated down loading of files. The automation of the file replication task requires automatic space acquisition and reuse, and monitoring the progress of staging thousands of files from the source mass storage system, transferring them over the network, and archiving them at the target mass storage system. We have used Storage Resource Manager (SRM) technology to achieve robust file replication for the STAR experiment. The SRM monitors the staging, transfer, and archiving of files, and recover from transient failures. Members of the STAR experiment1 and the Scientific Data Management group2 at Lawrence Berkeley National Laboratory (LBNL) have collaborated on deploying Hierarchical Resource Managers (HRM) to automate data transport between the RHIC Computing Facility3 (RCF) storage system at Brookhaven National Laboratory and the storage system National Energy Research Scientific Computing Center4 (NERSC) at LBNL. Data is carried over the Energy Sciences production network (ESnet) between these two DOE laboratories. HRM is an implementation of the Storage Resource Manager (SRM) service5. It provides an interface to multiple types of storage systems (HPSS6 in this case) as well as cache management of the disk buffers used for staging. A single request to HRM can transfer a thousand or more files, and the number of files transferred simultaneously at any stage can be specified to optimize the throughput. HRMs also provide recovery from transient failures of the HPSS systems as well as the network without any human intervention. GridFTP from the Globus Toolkit7 is used for the WAN stage of the transfer, as illustrated in the diagram below.
Since STAR began data taking two years ago 10’s of TB have been transferred at rates of about 1 TB/week using ad hoc methods (with considerable effort). In tests with the new grid-enable implementation, rates of up to 8 MB/sec for the wide-area-network stage have been achieved. After resolving some end-point configuration issues we expect that rates of 3-4 TB/week will be easily achieved during the 2003 data taking run for STAR. This application shows the result of collaborative work between the computer scientists and an experiment group already in the middle of its data taking run. The diagram below shows the graphical tool that can be used to check status and monitor
progress of several file transfer requests, where each request is many files. 1 www.star.bnl.gov (from PPDG News Update, Sept. 25, 2002)
|
Home | ASCR | Contact Us | DOE disclaimer |
|
|