![]() |
|
||||||
| Home
| Mission
|
about SciDAC
|
Contact Us |
||||||
Alumni ProjectPERC Performance Modeling PI: SummaryPERC is pioneering a framework for automated, time-tractable, performance modeling and using the resulting understanding to improve applications performance and inform machine procurement and design. PERC researchers are developing several distinct but related performance modeling and analysis strategies, including machine profiles, application signatures, statistical modeling and performance bound analysis. The performance modeling activityincludes outreach activities. These activities accelerate both the development of SciDAC application codes that run efficiently on high performance computing systems and PERC research in performance tools, modeling, and optimization. The performance modeling activity of the PERC project is: • Exploring a range of modeling methodologies, ranging from statistical methods to advanced “convolution” schemes. • Developing a framework with tools to enable automated, time-tractable performance modeling. • Using this framework to model SciDAC applications and improve performance. • Conducting research into the factors that affect performance. • Raising the bar for performance prediction by doing blind predictions and reporting error. • Extend the framework to model long-running applications and to extrapolate the performance of future systems. The components of the PERC framework: Statistical Black Box Modeling: semi-automatic generation of analytic performance models with non-linear regression based on performance measurements. Machine Profiles: characterizations of the rates at which a machine can carry out fundamental operations abstract from the particular application. Application Signatures: characterizations of an application, independent of specific host machine, resulting in detailed summaries of the fundamental operations to be carried out on its behalf. Convolution Methods: algebraic mappings of the Application Profile on to the Machine Signature to arrive at a set of performance predictions or bounds. A study using statistical black box modeling for the Parallel Ocean Program (POP) showed the substantially different scaling behavior and limits of various code phases on different architectures. For Machine Profiles a set of low-level benchmark “probes” have been developed to measure a set of key machine performance metrics. All large DOE machines have been measured with the probes. Figure 1 below shows MAPS probe results for an IBM Power3 processor. Even prior to carrying out the modeling step, MAPS curves can reveal the basic capabilities of a machine's memory subsystem. For Application Signatures, tools have been developed to enable tracing of applications and to summarize their operations.
There is special emphasis on acquiring and summarizing details of the application's memory and communication patterns—these are often critical-path for performance, yet have historically not been well supported in tracing tools. Application signatures have been gathered using the PERC tools for several SciDAC application codes including Terascale Optimal PDE Simulation (TOPS), the General Atomic and Molecular Electronic Structure System (GAMESS), and the Parallel Ocean Program (POP). Recent progress has sped up the application tracing process required to obtain application signatures by an order of magnitude and added the ability to trace I/O. This has rendered large-scale application modeling tractable. Convolution models have been successfully developed to rapidly estimate the performance of the applications listed above and the results used to understand their performance sensitivity to machine attributes and improve their performance. The rapidity of our modeling approach is in contrast to traditional methods that have rendered full-application performance prediction intractable. Rapidity enables parameter sweeping to explore multiple algorithms and/or target machines for an application. Also, our automation for gathering and combining machine and application information stands in contrast to approaches that require in-depth analysis of codes by teams of computer and domain scientists. Convolution-based performance estimates have now been extended to model I/O and have been proven accurate across a diverse set of architectures and machines using Power3, Power4, and Alpha processors and can predict performance of fullscale applications to within a few percent in almost all tested cases. As an independent test of this assertion, a series of blind predictions was carried out and then independently verified on the DoD HPCMO workload. Applications modeled were NLOM, HYCOM, COBALT, COBALT60, and CTH. Performance was predicted on various production-sized problems at many processor accounts across several DOE and DOD machine with less than 11% average error. The tools that embody these methods are published (so people can see our models) and publicly available from the PERC web site http://perc.nersc.gov . Modeling methods have been conveyed to the wider community via a series of tutorials ( SC'02 and SC'03, PTOOLs, SciComp, and SIAM) . The details of the performance modeling methodology and research results are published in workshops and conferences as (SC'02, ICCS03, and ParCo03). The PERC modeling activities are further coordinated with the computational science projects and with other Integrated Software Infrastructure Centers (ISICs). These collaborations provide motivating applications for modeling and are intended to provide feedback to inform code tuning, guide applications to the machines best suited for them, and to inform the procurement process. For further information please contact:
|
Home | ASCR | Contact Us | DOE disclaimer |
|
|