Lambda Join Demonstration Wins Award at Supercomputing 02 Conference
November 21, 2002
Baltimore, Maryland----Project DataSpace, in a collaborative project with researchers from Chicago, Ottawa and Amsterdam, has won the SuperComputing '02 High Performance Bandwidth Challenge Award for Innovative, High Speed, Data Correlation--Best Use of Emerging Infrastructure. The group includes researchers from the National Center for Data Mining at the University of Illinois at Chicago (UIC), CANARIE, and SARA, who have been working together over the past year to produce real-time merging of data over lambda networks. At SC02, they presented the first demonstration of the technology, with impressive results.
For the past two decades, database researchers have optimized the ability of databases to join two tables in a database by a common key, such as an employee or product ID. Database joins are one of the key technologies that make data processing practical.
As more and more data is distributed over the internet, the ability to join data located in two different global locations is becoming critical. There are two fundamental problems: finding efficient protocols to move data over long distances and finding efficient algorithms to merge two data streams.
At the Supercomputing '02, significant progress was made on both fronts.
A stream of data was moved over SURFnet connecting a cluster of computers at SARA Computing and Networking Services in Amsterdam and a
cluster of computers at StarLight in Chicago at over 2.8 Gb/s. At the same time a stream of data was moved over Canada's CA*net4 network
connecting a computer cluster at CANARIE in Ottawa and a UIC computer cluster at StarLight in Chicago at over 2 Gb/s. Both streams used a new protocol called SABUL designed for high performance data transport developed by the National Center for Data Mining/Laboratory for Advanced Computing at the University of Illinois at Chicago.
At the same conference, using computer clusters at the StarLight facility in Chicago, two streams of data were merged at over 500 Mb/s
per node in the three node cluster. These so called "lambda joins" are an important component for distributed data mining applications. The algorithm for joining two lambda streams was developed by scientists at the National Center for Data Mining at the University of Illinois at Chicago.
"Lambda data joins are an excellent early example of how CA*net4's lightpath provisioning facility can be used to help build new and
innovative distributed services,' according to Bill St. Arnaud, Senior Director for Advanced Networks at CANARIE.
To many network engineers, lambda and lightpath are used interchangeably to describe a low layer end-to-end dedicated communications channel of effective guaranteed bandwidth. Using protocols such as SABUL, it is now possible to use lambdas to move large data sets over long distances as fast as the data can be pulled from disk. Using lambda joins, it is now possible to merge two such streams and look for patterns.
"With lambda joins, it is now practtical to look for correlation in data even if the data is scattered around the world," said Robert
Grossman, Director of the National Center for Data Mining at the University of Illinois at Chicago and President of the Two Cultures Group.
This demonstration was awarded one of the three Quest Bandwidth Challenges Awards presented at this year's Supercomputing 02 Conference.
For more information, contact:
Shirley Connelly, Associate Director, NCDM
Robert Grossman Director, NCDM
National Center for Data Mining
The National Center for Data Mining (NCDM) at the University of
Illinois at Chicago (UIC) was established in 1998 to serve as a national
resource for high performance and distributed data mining. The Center
sponsors research projects, standards, testbeds, and outreach. The
Center is coordinating the development of the Predictive Model Markup
Language (PMML), the standard for data mining models, and sponsoring the
Terra Wide Data Mining Testbed, a worldwide testbed for high performance
and distributed data mining. For more information about NCDM, see
SURFnet operates and innovates the national research network, to which
two hundred institutions in higher education and research in the
Netherlands are connected. To remain in the lead SURFnet puts in a
sustained effort to improve the infrastructure and to develop new
applications to give users faster and better access to new Internet
services. For more information please visit www.surfnet.nl. For SARA,
SARA Computing and Networking Services
SARA is the Dutch National Supercomputing Facility. SARA provides High
Performance Computing and Networking Services and Visualization
(including Virtual Reality) facilities to the Dutch Academia and
Research institutions, and to commercial business. SARA is a
not-for-profit foundation. SARA does the day-to-day operational
management of the SURFnet network.
CANARIE is Canada's advanced Internet development organization, a
not-for-profit corporation supported by its members, project partners
and the Government of Canada. Canarie's mission is to accelerate
Canada's advanced Internet development and use by facilitating the
widespread adoption of high-performance, end-user enabled networks and
by stimulating the development of new, next generation products,
applications and services to run on them. Following a $110M funding
agreement with Industry Canada, Canarie, Inc. designed, developed and is
operating CA*Net 4, Canada's national research and innovation network.
For more information, visit www.canarie.ca.
StarLight(sm), the optical STAR TAP(sm) initiative, is an advanced
optical infrastructure and proving ground for network services optimized
for high-performance applications. Operational since summer 2001,
StarLight is a 1GigE and 10GigE switch/router facility for
high-performance access to participating networks and will ultimately
become a true optical switching facility for wavelengths. StarLight is
being developed by the Electronic Visualization Laboratory (EVL) at the
University of Illinois at Chicago (UIC), the International Center for
Advanced Internet Research (iCAIR) at Northwestern University, and the
Mathematics and Computer Science Division at Argonne National
Laboratory, in partnership with Canada's CANARIE and Holland's SURFnet.
For more information please visit www.startap.net/starlight.