Local computation with resistive grids
In the "living" retina, the output of the photoreceptors is fed into a tightly interconnected network of cells that spreads the signal horizontally within the retina
[see "cellular neural network"].
A kindred strategy has been adopted in many neuromorphic vision chips, where the photoreceptor outputs are fed into a 2-D square or hexagonal resistive grid.
|
Resistive networks are an essential part of the neuromorphic engineer's repertoire, because they implement particular filtering operations. Assume, for instance, that the values of the batteries, E, attached to one node in a rectangular resistive grid [Fig. 2] are proportional to (actual or logarithmically compressed) image intensity at this location. Then the voltage, V, at each node of the network can be considered to be an average resulting from the current between battery and node and the currents across all four horizontal resistances, R, connecting the node to its neighbors. Decreasing R increases the current within the grid, leading to more averaging (smoothing). Increasing R or, conversely, increasing the conductance G that links the input to the network, has the opposite effect of coupling V closer to the battery, resulting in less smoothing. At steady state, the mathematical relationship between the input E and the output V (the voltage in the grid) can be expressed by convolving or filtering the input with a filter function that depends on the exact network configuration and on R and G [Fig. 2, inset]. The degree of smoothing or low-pass filtering is determined by the product of R and G.
The filtering properties of an analog very large-scale integration (VLSI) resistive grid are demonstrated in Fig. 3. Because the parasitic capacitances are small, the steady-state voltage distribution is reached in less than 5 us—one key advantage of analog computation for early vision. The theoretical insights on how to use resistive grids in early vision processes were pioneered at the Massachusetts Institute of Technology (MIT), Cambridge, by Berthold Horn, Tomaso Poggio, and one of us (Koch).
|
Implementing filtering by means of convolution on a digital computer is straightforward but can be expensive. The spatial extent, the region over which the filter function is non-zero, is called its support and is expressed in pixels, m. (The Fig. 2 filter's support is about seven pixels.) Implementing the 2-D filters used in early vision algorithms on a digital processor requires 4m additions, multiplications, or divisions per pixel. Thus, the total computational cost of filtering grows linearly with the number of pixels, rendering most early vision algorithms expensive in terms of machine cycles. This becomes particularly painful when using larger filters—that is, with more blurring. Blurring a single 1000-by-1000-pixel picture with a 2-D filter that is 11 pixels across takes about 44 million operations. In analog hardware, conversely, implementing a large filter can be done by simply adjusting R or G. The convergence time of the resistive net depends only weakly on image size.
Interestingly, using resistances and batteries to compute would not have been news to engineers in the 1940s and '50s. At the time, digital computers were still too cumbersome for many practical problems, and engineers resorted to analog computers that occupied entire rooms. Today, analog computers are making a limited comeback, allowing applications that require much computation to be carried out in a highly parallel manner on a single chip.
Early vision algorithms must be able to smooth over surfaces; but they must also detect discontinuities in the image and prevent smoothing from occurring there. This capability, called segmentation or nonlinear, data-dependent smoothing, comes naturally to resistive nets. In the early '80s, many vision algorithms were introduced that used a binary variable, termed a line process, to represent discontinuities in color, motion, texture, or depth (surfaces at different distances from the imaging system).
In its simplest form, nonlinear smoothing assumes that if the depth (or color or motion) of two neighboring locations is very similar (that is, if the difference between the two is below some threshold), both pixels represent a portion of a scene lying on the same surface out there in the world and should have the same depth. Therefore, any difference between the two locations is due to noise and should be reduced by smoothing. Conversely, if the difference between the depth at neighboring pixels is above some threshold, presumably the two pixels are lying on two planes of different depths (in this example). The implication is that a discontinuity has been detected that would be smeared if averaging were used to smooth the image.
|
These high-level constraints have been embodied in a two-terminal multitransistor device, termed a resistive fuse by its inventor, John Harris, now at the University of Florida, Gainsville. If the voltage across the device is small, it acts like a resistance, conducting a current proportional to the voltage gradient. If the voltage gradient is above a threshold, the fuse kicks in and the current drops to zero, preventing any smoothing [Fig. 4]. These devices can be added at little cost to every node in the resistive grid, substantially enhancing performance.
The cellular neural network (CNN) chip is another way of using analog hardware to implement such image-processing operations as smoothing and discontinuity detection. Its chief advantage over a resistive net is that it can be programmed to carry out a wide range of local mathematical operations. This flexibility comes at a price, though, for the basic pixel is very large, and the chip, while very fast is quite power hungry.