PHOTO:Proceedings of the National Academy of Sciences
|
As the old adage goes, a picture is worth a thousand
words. But when you're talking about a picture of the
Internet, it's worth a whole lot more (about 100
trillion words according to Google's director of
research, Peter Norvig). Creating a complete map of the
Internet is a goal that has remained largely elusive
because of the ever-growing, highly complex nature of
the Net. Now, through the efforts of researchers in
Israel—from Bar-Ilan University, in Ramat Gan, Hebrew
University of Jerusalem, and Tel-Aviv University—the
fuzzy image of the Internet has just gotten a bit
clearer. The Israeli team recently published a paper in
the Proceedings of the National Academy of
Science in which they construct a new, more
accurate picture of the Internet using a combination of
graph-theory analysis and distributed computing.
A network like the Internet is composed of various
nodes, or devices, such as computers, routers, PDAs, and
so on, that are linked by one or more physical or
virtual paths. To determine the structure of the
Internet, researchers had to map out all these nodes and
links and their relationships to one another. The main
problem previous Internet cartographers faced was
insufficient information. Data about network structure
had been acquired through a limited number of
observation points—computers using a software tool
called traceroute to determine the path taken by data
packets as they moved from source to destination through
a network. The trouble was there were too few of these
data points to generate a complete picture—a task
comparable to a handful of people walking around and
trying to map out the entire world.
In order to overcome this limitation, a team of
researchers at Tel Aviv University headed by Yuval
Shavitt developed the Distributed
Internet Measurement and Simulation
(DIMES) project. DIMES gathered network
structure information through a technique known as
distributed computing. Volunteers download an agent
program that runs in the background onto their personal
computers—in a similar fashion to SETI@home
or Folding@home—gathering
network data without interfering with system processes.
This data, consisting of information related to 20 000
network nodes, is then processed by using a technique of
graph theory called k-shell
decomposition.
According to Shai Carmi of Bar-Ilan University, who
performed much of the data analysis, k-shell decomposition
breaks up the nodes in the network into nested
hierarchical shells. It starts by finding all the nodes
having only one connection. Those nodes are removed from
the network and assigned the outermost shell, the
1-shell. Next, the nodes having two or fewer remaining connections
are removed to the 2-shell. The process continues until
all the nodes have been indexed. Using this technique,
the team categorized the various shells as belonging to
the network's crust, core, or nucleus—the
best-connected nodes.
The nucleus consisted of more than one type of node.
It included nodes like ATT Worldnet, having more than
2500 direct connections, and nodes like Google, which
links to only 50 other nodes. The reason Google is in
the nucleus is that it connects almost exclusively to
the best-connected nodes in the Internet.
The size of the dots on the map identifies how many
connections they make, while their k-shell index is
indicated by their color and position.