Mining Science Data

Chandrika Kamath
Lawrence Livermore National Laboratory

The data from scientific simulations, observations, and experiments is now being measured in terabytes and will soon reach the petabyte regime. The size of the data, as well as its complexity, make it difficult to find useful information in the data. This is of course disconcerting to scientists who wonder about the science still undiscovered in the data. The Sapphire scientific data mining project at Lawrence Livermore National Laboratory [www.llnl.gov/casc/sapphire] has been addressing this concern by applying data mining techniques to problems ranging in size from a few megabytes to a hundred terabytes in a variety of domains. In this poster, I will describe how we are using data mining techniques to separate signals in climate simulations, identify key features for edge-harmonic oscillations, classify orbits in a Poincare plot, and track features of interest in experimental images.