Getting the Science out of Data
The Scientific Data Management Center for Enabling Technologies

Helping scientists spend more time studying their results and less time managing their data

Arie Shoshani (project webpage)
Lawrence Berkeley National Laboratory

With the increasing volume and complexity of data produced by ultra-scale simulations and high-throughput experiments, understanding the science is often hampered by the lack of comprehensive, end-to-end data management solutions ranging from initial data acquisition to final analysis and visualization. The initial SciDAC investments succeeded in bringing an initial set of advanced data management technologies to DOE application scientists in astrophysics, climate, fusion, and biology. Equally important was the establishment of collaborations with these scientists to better understand their science as well as their forthcoming data management and data analytics challenges.

Building on these early successes, this project will improve the scientific data management framework to address the needs of petascale science. Specifically, the center will enhance and extend existing tools to allow for more interactivity and fault tolerance when managing scientists’ workflows, for better parallelism and feature extraction capabilities in their data analytics operations, and for greater efficiency and functionality in users’ interactions with local parallel file systems and remote storage. These improvements will prepare the scientific data management framework for the scalability and complexity challenges presented by hardware and applications at the petascale, and are complemented by targeted data management efforts under partnerships with application and computer scientists.

Managing scientific data has been identified as one of the most important emerging needs by the scientific community because of the sheer volume and increasing complexity of data being collected. Effectively generating, managing, and analyzing this information requires a comprehensive, end-to-end approach to data management that encompasses all of the stages from the initial data acquisition to the final analysis of the data. The data management problems encountered by many Department of Energy scientific domains face common technical problems and benefit from shared technology solutions.

Center for Enabling Technology: Computer Science

Project Title: The Scientific Data Management Center for Enabling Technologies

Principal Investigator: Arie Shoshani
Affiliation: Lawrence Berkeley National Laboratory

Project Webpage:

Participating Institutions and Co-Investigators:
Argonne National Laboratory - Bill Gropp, Rob Ross, Rajeev Thakur
Lawrence Berkeley National Laboratory - Arie Shoshani (PI), Ekow Otoo, Doron Rotem, and Kesheng (John) Wu
Lawrence Livermore National Laboratory - Terence Critchlow, Chandrika Kamath
Oak Ridge National Laboratory - Scott Klasky, Nagiza Samatova, Jeff Vetter
Pacific Northwest National Laboratory - George Chin, Jarek Nieplocha
North Carolina State University - Mladen Vouk
Northwestern University - Alok Choudhary, Wei-Keng Liao
University of California, Davis - Bertram Ludäescher
University of California, San Diego - Ilkay Altintas
University of Utah - Steve Parker, Claudio Silva

Funding Partners: Office of ScienceOffice of Advanced Scientific Computing Research

Budget and Duration: Approximately $3.3 million per year for five years 1

Other Media:
SCIENTIFIC DATA MANAGEMENT CENTER: From Data to Discovery, article in Issue 2 of SciDAC Review

Other SciDAC Enabling Technologies Centers
Other SciDAC Visualization and data management efforts

1Subject to acceptable progress review and the availability of appropriated funds


Home  |  ASCR  |  Contact Us  |  DOE disclaimer