Allow machines to listen and understand


// php echo do_shortcode (‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’)?>

As we move towards ever more ubiquitous detection and computing, power becomes more and more important. There is perhaps no better example where this matters than the voice activated devices on our desks, in our pockets and distributed around our homes. As we saw last year, keyword tracking in particular is currently a target for all kinds of neuromorphic technologies.

The silicon cochlea

The 2020 winner of the Misha Mahowald Prize for Neuromorphic Engineering is Professor Shih-Chii Liu and his team, who worked on low-latency, low-power sensors for detecting speech. The dynamic audio sensors that Shih-Chii Liu and his team at the Institute of Neuroinformatics (INI) have developed could ultimately address this market. At their heart is a silicon cochlea designed to mimic biology. First, the incoming sound is filtered through frequency channels using a set of analog bandpass filters, the output of which is rectified to half a wave. Together, this emulates the function of hair cells in the ear.

In a conventional audio system, the sound is first converted using an analog-to-digital converter, and then the characteristics are extracted using digital fast Fourier transform (FFT) and band-pass filtering. (BPF). These are processed by a digital signal processor (DSP) performing voice activity detection (VAD) or automatic speech recognition algorithms. B. In the INI-Zurich dynamic audio sensor, the signal is received as analog audio tapes from the features with and the changes are encoded, in parallel, into asynchronous peak trains (events), which are then processed.

As happens in biology, the various channels are then prepared for processing in the brain. In the ear, ganglion cells encode signals in the form of an influx of chemical ions: in the silicon cochlea, they are transformed into electrical spikes. This can be done using either a classic integration and trigger function or an asynchronous delta modulator (ADM) that compares the signal to two thresholds and sends the appropriate events as they pass, thus acting as a feature extractor. As immutable signals are ignored, the amount of redundant information transmitted to the next stage is reduced.

From an energetic standpoint, if nothing happens, the silicon cochlea barely expends energy, but as activity increases, so does the number of spikes. Depending on the application, this can be a huge benefit (if there is a lot of listening but very little action) or no benefit at all (when there is relevant stuff to be decoded all the time).

However, as an audio sensor operating in the low µW regime, the chip could offer system designers a valuable option for increasing energy efficiency. It also allows for a very high dynamic range, as there is an almost infinite range for the tips to be far or closer to each other as they operate in continuous time.

Speech recognition

An essential part of this work has been to demonstrate usefulness. Specifically, the event streams produced by the silicon cochlea can be used in real-world applications such as voice activity detection, the first step in keyword recognition. Liu and his team succeeded in doing this by using the event output to create 2D data frames: histograms of the incoming peaks, by frequency, arranged over the 5 msec of the frame. Called cochleagrams, these can be read into a neural network and their meaning decoded from there.

According to Liu, “The use of deep networks on a sensor is of great interest to the IEEE ISSCC community and very timely given the huge current interest in audio edge computing.” There have been many articles on low-power ASICs for keyword detection, she says, but these use classic spectrogram-like functionality. One of its goals “is to show that hybrid solutions (mixed designs of analog signals) could lead to even lower power design solutions with lower latency responses.”

Last year INI released a video showing the system recognizing the numbers (you can see Liu from around 2:06). It is far from foolproof, but it is also still relatively early in the development of the system. The team, which included Minhao Yang, Chang Gao, Enea Ceolini, Adrian Huber, Jithendar Anumula, Ilya Kiselev and Daniel Neil over the years, also experimented with sensor fusion: Liu and his colleagues combined audio and visual information. to perform a classification. More reliable [1]. They published initial design rules for choosing when analog sensors are beneficial and when to stick with digital. [2].

Misha Mahowald, one of the inventors of address-event representation, and for whom the Neuromorphic Engineering Award is nominated.

Another constant effort has been to improve the energy efficiency and performance of the DAS. Part of this involved examining the implementation of individual functions, from source follower-based bandpass filters to the design of analog feature extractors.

Reducing the effect of variability in analog electronics has been another important area of ​​research. To help with that, they built a hardware emulator that they could use to test these problems much faster, they say, than with commercial software like Cadence Virtuoso. By training the binary neural network they use for classification from software rather than hardware, they were able to accurately predict classification performance on a range of real test chips. [3]. They are now looking to add noise to the system as a proxy for variability to make the design process even more robust.

Mahowald Prize

Liu was one of the first researchers in neuromorphic engineering; she not only worked with in Carver Mead’s lab at Caltech (where Mahowald had worked), but was a founding member of the Institute of Neuroinformatics when many of the group left California for Zurich.

Upon winning the award, Liu said, “It is a great honor for us to receive this award, especially with so many good neuromorphic engineering researchers. The work drew on decades of early silicon cochlea design stretching from Dick Lyon, Carver Mead, Lloyd Watts, Rahul Sarpeshkar, Eric Vittoz and Andre van Schaik.

On the importance of neuromorphic engineering, she says, “Even at the end of Moore’s Law, numerical computation will be at least a thousand times behind the energy efficiency of biology. Thus, the potential efficiency of hybrid analog electronic systems such as DAS becomes more important than ever. “

– Sunny baths teaches at University College London, is the author of Explaining the future: how to research, analyze and report on emerging technologies, and is currently writing a book on neuromorphic engineering.

The references

[1] D. Neil and SC Liu, “Efficient fusion of sensors with event-based sensors and deep network architectures”, in Proceedings – IEEE International Symposium on Circuits and Systems, July 2016, vol. 2016-July, pp. 2282-2285, doi: 10.1109 / ISCAS.2016.7539039.

[2] SC Liu, B. Rueckauer, E. Ceolini, A. Huber and T. Delbruck, “Event detection for efficient perception: vision and hearing algorithms” IEEE signal process. Mag., flight. 36, no. 6, p. 29-37, November 2019, doi: 10.1109 / MSP.2019.2928127.

[3] M. Yang, S.-C. Liu, M. Seok and C. Enz, “Ultra-low power intelligent acoustic detection using cochlea-inspired feature extraction and DNN classification. “

[4] M. Yang, CH Chien, T. Delbruck and SC Liu, “A 0.5V 55μW 64 × 2-channel binaural silicon cochlea for event-driven stereo audio detection” IEEE J. Semiconductor circuits, flight. 51, no. 11, pp. 2554-2569, November 2016, doi: 10.1109 / JSSC.2016.2604285.

Source link


About Author

Leave A Reply