Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Research Collaboration and Enablement
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Tools, Software and IDEs blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Arm Community blogs
Arm Community blogs
Internet of Things (IoT) blog AI’s role in next-generation voice recognition
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI and ML blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded blog

  • Graphics, Gaming, and VR blog

  • High Performance Computing (HPC) blog

  • Infrastructure Solutions blog

  • Internet of Things (IoT) blog

  • Operating Systems blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • White Paper
  • Artificial Intelligence (AI)
  • Arm Insights
  • Internet of Things (IoT)
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

AI’s role in next-generation voice recognition

Brian Fuller
Brian Fuller
August 15, 2017
3 minute read time.

It was one thing when some of Amazon’s voice-enabled Alexa devices picked up children’s voices and then ordered goods online. It was another thing altogether when families watching television coverage of that story found that their Amazon devices ordered those same products because they heard the reference on the news report. Ah, the unintended consequences of powerful voice recognition and artificial intelligence!

This anecdote highlights the power of speech recognition technology, 60 years after Bell Labs’ Audrey device and 50 years after IBM showed off its Shoebox machine.

Speech recognition has vastly improved over the decades, thanks to electronics innovation and artificial intelligence advances. Yet, even amazing applications like voice-activated assistants are, in some ways, still in their adolescence, partly because of the complexity of human language and speech.

Comprehending complexity

Consider this: speech is a fundamental form of human connection that allows us to communicate, articulate, vocalize, recognize, understand, and interpret. But here’s where the complexity comes in: There are thousands of languages and even more dialects. Each of us has a unique vocabulary: Researchers from an independent American-Brazilian research project found that native English-speaking adults understood an average of 22,000 to 32,000 vocabulary words and learned about one word a day. Non-native English-speaking adults knew an average of 11,000 to 22,000 English words and learned about 2.5 words a day.

While English speakers might use upwards of 30,000 words, most embedded speech-recognition systems use a vocabulary of fewer than 10,000 words. Accents and dialects increase the vocabulary size needed for a recognition system to be able to correctly capture and process a wide range of speakers within a single language.

You can see that the state of speech-recognition and artificial intelligence still has a way to go to match human capability. To close that gap, we’ll be looking for advancements in voice recognition technologies that resolve existing accuracy and security issues and can fully operate as an embedded solution.

Voice recognition meets artificial intelligence

With the continually improving computing power and compact size of mobile processors, large vocabulary engines that promote the use of natural speech are now available as an embedded option for OEMs. The footprint for such an engine has been shrunk and optimized, making it an even more attractive option for these OEMs as they start to leverage artificial intelligence more. Effective speaker recognition requires the segmentation of the audio stream, detection and/or tracking of speakers, and identification of those speakers. The recognition engine provides fusion functionality that leads to a fused result that is used to make decisions more readily. For the engine to function at its full potential and to allow users to speak naturally and be understood—even in a noisy environment—pre-processing techniques are integrated to help improve the quality of the audio input to the recognition system.

The other key to improved voice recognition technology is distributed computing. We’ve gotten to this amazing point in voice-recognition (yes, even considering the accidental Amazon orders!) thanks to the cloud, but there are limitations to cloud technology when it comes to its application in a real-time enterprise environment that requires user privacy, security, and reliable connectivity. The world is moving quickly to a new model of collaborative embedded-cloud operation—called an embedded glue layer—that promotes uninterrupted connectivity and directly addresses emerging cloud challenges for the enterprise. 

artificial intelligence whitepapers

With an embedded glue layer, capturing and processing user voice or visual data can be performed locally and without complete dependence on the cloud. In its simplest form, the glue layer acts as an embedded service and collaborates with the cloud-based service to provide native on-device processing. The glue layer allows for mission-critical voice tasks—where user or enterprise security, privacy and protection are required—to be processed natively on the device as well as ensuring continuous availability. Non-mission-critical tasks, such as natural language processing, can be processed in the cloud using low-bandwidth, textual data as the mode of bilateral transmission. The embedded recognition glue layer provides nearly the same level of scope as a cloud-based service, albeit as a native process.

This approach to voice recognition technology will not only revolutionize applications but devices as well, and it’s on our doorstep, just like those packages.

This white paper from Recognition Technologies and Arm offers excellent technical insight into the architecture and design approach that’s making the gateway a more powerful, efficient place for voice recognition. And read more about Arm's artificial intelligence technologies.

Chect out Recognition Technologies and Arm White Paper


Anonymous
Internet of Things (IoT) blog
  • Expanded access to Arm Virtual Hardware for the entire IoT ecosystem

    Eric Sondhi
    Eric Sondhi
    Arm Virtual Hardware has transitioned from private to public beta and is now open to anyone with an Arm account to try out and use for commercial purposes. Find out more.
    • May 10, 2023
  • Integrating IoT Edge devices with cloud-native analytics for smarter insights

    Ajeet Singh Raina
    Ajeet Singh Raina
    A step-by-step guide on how to integrate IoT edge devices with cloud-native analytics for smarter insights, using a smart agriculture use case.
    • March 29, 2023
  • A bare-metal programming guide

    Sergey Lyubka
    Sergey Lyubka
    Get started with the Arm bare metal programming with only GCC compiler, text editor, and a datasheet. From blinky to an embedded Web device dashboard.
    • March 15, 2023