Although transistors continue to get smaller and more numerous on each microchip, they have stopped getting faster because they would get too hot to work if they sped up further. To continue improving electronics, chipmakers are instead giving chips more processing units, or cores, to execute computations in parallel.
The way in which a chip distributes its operations can make a big difference to performance. In 2013, MIT electrical engineer and computer scientist Daniel Sanchez and his colleagues showed a way to distribute data around the memory banks of multicore chips that could improve speed by about 18 percent on average.
Now, in simulations involving a 64-core chip, Sanchez and his colleagues find a new way to distribute both computations and data on such a chip can boost computational speeds by 46 percent and reduce power consumption by 36 percent.
The researchers studied "place and route" algorithms that chipmakers use to minimize the distances between circuit components on microchips. When these algorithms are used to allocate computations and data on a 64-core chip, they will arrive at a solution after several hours. Sanchez and his colleagues developed their own algorithm that does the job in milliseconds and is more than 99 percent as efficient. "We observed the complex way in which the standard algorithms were operating and we essentially figured out a way to simplify it," Sanchez says.
Their new approach first spreads data across the chip. Next, it distributes computational tasks or threads across the chip so they are close to the data. Finally, it refines where the data is placed given where the threads are placed. "The energy associated with data movement is quite significant, so it's crucial for large-scale multicores that data travel as little as possible throughout the chip," Sanchez says.
The system involves a monitor that occupies about 1 percent of the chip's area, which Sanchez thinks is worth it given the significant improvements to chip performance. One advantage of this strategy is that "as system size increases, the benefits you get are significantly higher, both because the distance between cores becomes relatively larger, and because the algorithm has many more choices to do the right thing to maximize performance," Sanchez adds.
So far no company has applied the research team's work to multicores, "but I would expect it to be useful for multicore designs now in the pipeline," Sanchez says. He and his students Nathan Beckmann and Po-An Tsai presented their findings Feb. 10 at the IEEE International Symposium on High-Performance Computer Architecture in San Francisco.