![]() |
|
||||||
| Home
| Mission
|
about SciDAC
|
Contact Us |
||||||
Alumni ProjectNet100 –Developing Network-Aware Operating Systems Tom Dunigan, Oak Ridge National Laboratory SummaryMany high performance distributed applications require high network throughput but are only able to achieve a fraction of the available bandwidth. The goal of the Net100 project is to transparently tune legacy scientific network applications using knowledge about the network path and incorporating the latest improvements in network transport protocols. Tuning can be done selectively based on network path or policy. With no changes to the network application we have demonstrated an order-of-magnitude improvement in network performance. 1. Introduction The Net100 project is funded by the Department of Energy (DOE) Office of Science and is a collaboration between DOE national labs and university researchers. The goal of the project is to reduce the time-to-solution of distributed scientific applications. DOE has a large investment in high speed (gigabit) networks that interconnect supercomputers, experimental facilities, and researchers at national labs and universities. Many of the key scientific applications (e.g., climate modeling, high energy physics, astrophysics, and fusion) require high network data rates to transfer data across the country. Despite the gigabits of available network bandwidth, many of these applications only utilize 10's of megabits. The Net100 project seeks to remove bottlenecks and speed up these transfers. 2. Methodology A common cause of poor network performance is improperly tuned network settings and network protocol limitations. The Net100 project seeks to measure and understand end-to-end network and application performance and apply that knowledge to tune network protocols and applications. The project improves network performance by automatically configuring network control parameters and tuning network protocols to try and avoid loss of data packets and to speed recovery in the event of packet loss. The project consists of three major components. First, new ways to manage the transfer of data across the networks (protocols) are being developed and evaluated. Second, a collection of network sensors and probes have been deployed across the Internet to collect “current conditions” on various network paths. Third, a modified operating system includes the new protocols and uses the sensor data to tune designated network applications. 3. Accomplishments The project has deployed numerous network probes and sensors at national labs, universities, and research institutions in Europe. Data gathered by these sensors is used to transparently tune network flows by optimally tuning network control parameters and selecting among several new protocol options that have developed and deployed by the project team.
In collaboration with other network researchers around the world, we have incorporated network tuning options from Sally Floyd (High Speed TCP) and from Tom Kelly (scalable TCP). These extensions allow a network flow to recover from data loss much faster than the standard Internet protocol. Figure 1 illustrates the improvement these two options make over the standard protocol. The figure shows three different tests between a host on the US west coast and one in Europe, where there are packet losses about 3 seconds into the transfer. Kelly's option resumes full speed in just 4 seconds, the standard Internet protocol will take nearly a half an hour!
Even in the absence of data loss on the network, many legacy network applications fail to utilize the available network bandwidth due to poorly configured network control parameters. Figure 2 illustrates the data throughput over time of a legacy network code. With no changes to the legacy application, the Net100 software automatically tunes the network control parameters for the application and improves the data transfer rate by an order of magnitude over the untuned case. The Net100 project has introduced several novel enhancements to network monitoring and tuning. The Net100 software is able to tune each network application based on the network path and quality of service desired. Tuning information is gathered in real-time from Net100 monitors and probes located at critical points in the Internet. Notably, tuning is accomplished without modification to the network applications. In this final year of the project, we will continue to push our network enhancements through the standards process and to port the implementation to other operating systems. Recently, we deployed our Net100 accelerants on two DOE supercomputers, the Cray X1 and the SGI ALTIX. Finally, we are improving the user interface to make it easier for scientists to speed up their network applications. For further information on this subject contact:
|
Home | ASCR | Contact Us | DOE disclaimer |
|
|