iCAIR Press Releases

Lambda Join Demonstration Wins Award at Supercomputing 02 Conference
November 21, 2002

Baltimore, Maryland----Project DataSpace, in a collaborative project with researchers from Chicago, Ottawa and Amsterdam, has won the SuperComputing '02 High Performance Bandwidth Challenge Award for Innovative, High Speed, Data Correlation--Best Use of Emerging Infrastructure. The group includes researchers from the National Center for Data Mining at the University of Illinois at Chicago (UIC), CANARIE, and SARA, who have been working together over the past year to produce real-time merging of data over lambda networks. At SC02, they presented the first demonstration of the technology, with impressive results.

For the past two decades, database researchers have optimized the ability of databases to join two tables in a database by a common key, such as an employee or product ID. Database joins are one of the key technologies that make data processing practical.

As more and more data is distributed over the internet, the ability to join data located in two different global locations is becoming critical. There are two fundamental problems: finding efficient protocols to move data over long distances and finding efficient algorithms to merge two data streams.

At the Supercomputing '02, significant progress was made on both fronts.

A stream of data was moved over SURFnet connecting a cluster of computers at SARA Computing and Networking Services in Amsterdam and a cluster of computers at StarLight in Chicago at over 2.8 Gb/s. At the same time a stream of data was moved over Canada's CA*net4 network connecting a computer cluster at CANARIE in Ottawa and a UIC computer cluster at StarLight in Chicago at over 2 Gb/s. Both streams used a new protocol called SABUL designed for high performance data transport developed by the National Center for Data Mining/Laboratory for Advanced Computing at the University of Illinois at Chicago.

At the same conference, using computer clusters at the StarLight facility in Chicago, two streams of data were merged at over 500 Mb/s per node in the three node cluster. These so called "lambda joins" are an important component for distributed data mining applications. The algorithm for joining two lambda streams was developed by scientists at the National Center for Data Mining at the University of Illinois at Chicago.

"Lambda data joins are an excellent early example of how CA*net4's lightpath provisioning facility can be used to help build new and innovative distributed services,' according to Bill St. Arnaud, Senior Director for Advanced Networks at CANARIE.

To many network engineers, lambda and lightpath are used interchangeably to describe a low layer end-to-end dedicated communications channel of effective guaranteed bandwidth. Using protocols such as SABUL, it is now possible to use lambdas to move large data sets over long distances as fast as the data can be pulled from disk. Using lambda joins, it is now possible to merge two such streams and look for patterns.

"With lambda joins, it is now practtical to look for correlation in data even if the data is scattered around the world," said Robert Grossman, Director of the National Center for Data Mining at the University of Illinois at Chicago and President of the Two Cultures Group.

This demonstration was awarded one of the three Quest Bandwidth Challenges Awards presented at this year's Supercomputing 02 Conference.

For more information, contact:

Shirley Connelly, Associate Director, NCDM
312-413-2176, connelly@uic.edu.

Robert Grossman Director, NCDM
312-413-2176, grossman@uic.edu.

National Center for Data Mining
The National Center for Data Mining (NCDM) at the University of Illinois at Chicago (UIC) was established in 1998 to serve as a national resource for high performance and distributed data mining. The Center sponsors research projects, standards, testbeds, and outreach. The Center is coordinating the development of the Predictive Model Markup Language (PMML), the standard for data mining models, and sponsoring the Terra Wide Data Mining Testbed, a worldwide testbed for high performance and distributed data mining. For more information about NCDM, see www.ncdm.uic.edu.

SURFnet operates and innovates the national research network, to which two hundred institutions in higher education and research in the Netherlands are connected. To remain in the lead SURFnet puts in a sustained effort to improve the infrastructure and to develop new applications to give users faster and better access to new Internet services. For more information please visit www.surfnet.nl. For SARA, see www.sara.nl.

SARA Computing and Networking Services
SARA is the Dutch National Supercomputing Facility. SARA provides High Performance Computing and Networking Services and Visualization (including Virtual Reality) facilities to the Dutch Academia and Research institutions, and to commercial business. SARA is a not-for-profit foundation. SARA does the day-to-day operational management of the SURFnet network.

CANARIE is Canada's advanced Internet development organization, a not-for-profit corporation supported by its members, project partners and the Government of Canada. Canarie's mission is to accelerate Canada's advanced Internet development and use by facilitating the widespread adoption of high-performance, end-user enabled networks and by stimulating the development of new, next generation products, applications and services to run on them. Following a $110M funding agreement with Industry Canada, Canarie, Inc. designed, developed and is operating CA*Net 4, Canada's national research and innovation network. For more information, visit www.canarie.ca.

StarLight(sm), the optical STAR TAP(sm) initiative, is an advanced optical infrastructure and proving ground for network services optimized for high-performance applications. Operational since summer 2001, StarLight is a 1GigE and 10GigE switch/router facility for high-performance access to participating networks and will ultimately become a true optical switching facility for wavelengths. StarLight is being developed by the Electronic Visualization Laboratory (EVL) at the University of Illinois at Chicago (UIC), the International Center for Advanced Internet Research (iCAIR) at Northwestern University, and the Mathematics and Computer Science Division at Argonne National Laboratory, in partnership with Canada's CANARIE and Holland's SURFnet. For more information please visit www.startap.net/starlight.

Return to iCAIR Press Releases