Intel is proposing a better voice recognition system experience for users; its own concept is in the form of a headset that would serve you answers without the cloud. People are talking about a recent interview in Quartz with Intel's general manager of the New Devices Group, Michael A. Bell, about the Jarvis headset that wraps around the back of the wearer's ear, connects to the smartphone, listens to commands and answer in its own voice, all without reliance on a cloud connection.
The reason why Intel favors taking voice control out of the cloud is time, and time affects the user experience. As it stands, the user stands or sits—-waiting for a response, depending on the speed of the connection, among other things. Intel would like to s have off the time it would take to ship the voice command off to cloud servers.
As Geek.com explained, the mobile device is sending your command off to a server farm, where it is translated into a command that the device can understand. Even with fast connections, there is still a delay that could be avoided "if the hardware itself could parse your language and turn it into commands."
Enter Jarvis. Requests would be handled locally, not on server farms, via a combined processor/software in the headset that could translate the human voice. With personal assistant functionality, it would be the wearable device itself to process commands.
Intel's Bell in the Quartz interview offered a brief but effective reason for liking the idea of Jarvis: "How annoying is it when you're in Yosemite and your personal assistant doesn't work because you can't get a wireless connection?" he asked.
After seeing Intel's Jarvis idea, however, one Reddit contributor, who said he works for a Microsoft division that does voice recognition work, offered some thoughtful observations. His comments suggest that a voice recognition system that can work well off of the cloud is a real challenge. "On one hand, not having to use the cloud means that you no longer have the latency ding of cloud based reco, and that you have more privacy. Reco happens nearly instantaneously and creates a really great experience. At least, it does when it works. That's the major problem." He noted that in the cloud "we don't have the storage/RAM concerns, so we can have reco engines for individual accents." With so much variation within a single language, he said, "often times a single reco engine for the entire language simply doesn't cut it. A single cell phone simply can't store all of the information needed to adapt to every single accent."
Another plus for cloud-based recognition is that the sheer amount of input from so many different people means the engine can be adapted over time. "We can teach it as we go," he said. "With local reco, while their personal phone may learn to adapt, the system as a whole doesn't."
Until storage on phones is on the terabyte scale, he stated, "we really won't see local reco being equally accurate than cloud based reco."