Chloe Tebbutt | 5/10/15
If you have ever used the Google Voice search within the Google app and hoped it could work in noisy places, or wished it would work faster, your wish has been granted.
Google recently announced that they were adding new computational models that not only require much lower computational resources, but are also:
· More accurate – can better assess sounds
· Robust to noise – can be used in noisy places
· Faster to respond to voice search queries – quicker results
With these upgrades, search within the Google app is now 300 milliseconds faster, though it may feel like 500 milliseconds faster for some people depending on the device they’re using.
So, what do these upgrades entail?
In a recent interview, the Google Speech Team shared a few details of how the upgrades came about.
The first change for Google Voice Search took place in 2012 when the 30-year old core technology used to model the sounds of a language, Gaussian Mixture Model (GMM) – considered standard in the industry – was replaced with Deep Neural Networks (DNNs). The DNN technology was better able to recognise spoken words or commands by the user, which led to a dramatic improvement in speech recognition accuracy.
The recent upgrade was based on better neural network acoustic models using CTC – Connectionist Temporal Classification and sequence discriminative training methods. Actually, these models can be considered an extension of recurrent neural networks (RNNs) that deliver more accuracy, particularly in noisy environments, plus they are superbly fast.
According to the team, the type of RNN being used by Google for its acoustic models can memorise information better than the DNN and model “temporal dependencies”. To reduce computations, the models have also been trained to take in larger audio chunks while enhancing recognition in noisy settings by adding artificial noise to the training data.
To create these improvements, Google’s team had to tweak the different models to establish an optimal balance between improved forecasts and latency.
How it works
The new improved acoustic models are based on RNNs that have feedback loops, which use memory cells and a complex gating mechanism, compared to the conventional speech recognisers that break down the waveform spoken by the user into tiny consecutive “frames” of 10 milliseconds of audio.
Basically, these improvements allow Google to recognise spoken words with greater accuracy, even in noisy settings, while demanding lower computational resources to assess sounds in real-time.
Happy Google voice searching!
The new acoustic models are already been used for voice searches and commands in the Android and iOS Google apps, as well as for dictation on Android devices. So, you can now expect a better user experience when putting questions that you want to be searched on the web, or when asking for directions to the nearest filling station.
Previous: google’s declining desktop searches