Getting the Science out of the Data

A SciDAC Collaboratory providing scientific data management to help scientists spend more time studying their results and less time managing their data

Ian Foster (project webpage)
Argonne National Laboratory

The five-year duration of SciDAC-2 will see DOE science firmly enter the petascale era. Both simulation science (e.g., climate, computational chemistry, fusion, astrophysics) and experimental science (e.g., high energy physics, nuclear physics, light sources, fusion) are poised to produce enormous quantities of data. However, this data is only useful if it can be effectively accessed and analyzed. Thus, we must move data to where it is needed and/or enable analysis to occur near the data. Each task is challenging in a petascale environment, due to not only the sheer size of the data but also the need to coordinate numerous shared resources, including CPUs, storage, and networks. The Center for Enabling Distributed Petascale Science (CEDPS) will address these challenging tasks. Specifically, it will produce technical innovations designed to allow for:

  • rapid and dependable data placement within a distributed high-performance environment,
  • the convenient construction of scalable services that provide for the reliable and high-performance processing of computation and data analysis requests from many remote clients, and
  • the troubleshooting of ultra-high-performance distributed activities from the perspective of both performance and functionality.

Working in close collaboration with DOE application science communities, we have defined the work program for the Center for Enabling Distributed Petascale Science (CEDPS) to address these challenges. CEDPS will first design and develop and then—in collaboration with DOE application projects—deploy and evaluate powerful services and tools for data placement and science service construction.

DOE computational and experimental facilities will soon be producing petabytes of data per year, in fields as diverse as astrophysics, biology, chemistry, combustion, fusion, high energy physics, nanoscience, and nuclear physics. Application communities—often large and distributed—must be able to access this data so they can translate it into knowledge. The tasks that the CEDPS will address have been defined in close consultation with leading DOE science application groups, and will be deployed, applied, and evaluated in close collaboration with major DOE projects in high energy and nuclear physics, combustion, astrophysics, fusion, biology, and other sciences.

Center for Enabling Technology: Distributed Computing

Project Title: Center for Enabling Distributed Petascale Science

Principal Investigator: Ian Foster
Affiliation: Argonne National Laboratory

Project Webpage:

Participating Institutions and Co-Investigators:
Argonne National Laboratory - Ian Foster (PI), Kate Keahey, Rajkumar Kettimuthu, and Ravi Madduri
Fermi National Accelerator Laboratory - Andrew Baranovski
Lawrence Berkeley National Laboratory - Joshua Boverhof and Dan Gunter
University of Wisconsin-Madison - Miron Livny
University of Southern California - Ann Chervenak and Carl Kesselman

Funding Partners: Office of ScienceOffice of Advanced Scientific Computing Research

Budget and Duration: Approximately $2.4 million per year for five years 1

Other SciDAC Enabling Technologies Centers
Other SciDAC computer science efforts

1Subject to acceptable progress review and the availability of appropriated funds


Home  |  ASCR  |  Contact Us  |  DOE disclaimer