Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A method performed by a robot, the method comprising: operating, by the robot, an audio processing system to detect utterance of voice commands in a set of first commands that control behavior of the robot and in a set of second commands that control behavior of the robot, wherein the audio processing system is configured to detect utterance of voice commands in the set of first commands using a local recognition model of the robot that does not require communication over a network, and the audio processing system is configured to detect utterance of commands in the set of second commands through communication with a server over a network; executing, by the robot, one or more applications that are configured to respond to voice commands in a set of application commands that include commands different from the first commands and second commands; receiving, by the robot, audio data for an utterance; and processing, by the robot using the audio processing system, the audio data for the utterance to evaluate the utterance with respect to each of the set of first commands, the set of second commands, and the set of application commands, wherein the processing is performed according to a predetermined hierarchy that prioritizes detection of the first commands first, the second commands second, and the application commands third, wherein the audio processing system is configured to detect utterance of the first commands, the second commands, and the application commands during execution of the one or more applications by the robot.
A robot operates an audio processing system to detect and respond to voice commands, distinguishing between three types of commands: first commands, second commands, and application commands. The first commands are detected locally using a recognition model that does not require network communication, ensuring low-latency and offline functionality. The second commands are detected by communicating with a remote server over a network, allowing for more complex or dynamic command recognition. The robot also executes applications that respond to application commands, which are distinct from the first and second commands. When the robot receives audio data from an utterance, the audio processing system evaluates the utterance against all three command sets in a predetermined hierarchy. First, it checks for first commands, then second commands, and finally application commands. This prioritization ensures that critical or time-sensitive commands are processed first, while less urgent or application-specific commands are handled afterward. The system operates continuously during application execution, allowing seamless integration of voice control across different robot functions. This approach balances local processing efficiency with the flexibility of cloud-based recognition and application-specific command handling.
2. The method of claim 1 , wherein the audio processing system of the robot is configured to complete evaluation of a received voice command to determine whether the received voice command is one of the first commands within a predetermined amount of time from utterance of the voice command; and wherein the audio processing system of the robot is not guaranteed to complete evaluation of the received voice command to determine whether the received voice command is one of the second commands within a predetermined amount of time from utterance of the voice command.
3. The method of claim 1 , wherein the audio processing system is configured to detect utterance of commands in the set of application commands through communication with a second server over the network.
4. The method of claim 1 , wherein the voice commands in the set of first commands include a reserved word.
A system and method for processing voice commands in a computing environment involves detecting and handling a set of first voice commands that include a reserved word. The reserved word is a predefined term that triggers a specific action or response from the system. When a voice command containing the reserved word is detected, the system processes the command differently from other voice commands. This may involve prioritizing the command, routing it to a specialized processing module, or executing a predefined sequence of operations. The system may also include a second set of voice commands that do not contain the reserved word and are processed using standard methods. The reserved word mechanism allows for distinguishing between different types of voice commands, enabling more efficient and targeted processing. This approach is particularly useful in environments where certain commands require immediate attention or specialized handling, such as in voice-controlled interfaces, virtual assistants, or automated customer service systems. The system may further include validation steps to ensure the reserved word is correctly identified and processed, reducing errors and improving reliability.
5. The method of claim 1 , wherein the voice commands in the set of first commands include a stop command, a let go command, or a status command.
6. The method of claim 1 , wherein the voice commands in the set of second commands include a move command, a pick up command, or a bring command.
7. The method of claim 1 , wherein processing the audio data for the utterance comprises: processing the audio data for the utterance to detect the first set of commands using the local recognition model while concurrently using a server-based recognition process to process the audio data for the utterance to detect the set of second commands and the set of application commands.
This invention relates to audio processing systems for voice command recognition, specifically improving responsiveness and accuracy by combining local and server-based recognition models. The problem addressed is the delay in processing voice commands when relying solely on remote servers, which can degrade user experience, while also ensuring high accuracy in command detection. The method involves processing audio data from a user's utterance using two parallel recognition processes. First, a local recognition model on the user's device quickly detects a first set of commands, enabling immediate local execution for time-sensitive tasks. Concurrently, a server-based recognition process analyzes the same audio data to detect a broader set of commands, including application-specific commands that may require more complex processing. The server-based process leverages cloud resources for higher accuracy and broader command recognition capabilities. By running both processes simultaneously, the system ensures fast local responses while maintaining the ability to handle more sophisticated commands through server-based analysis. This dual-process approach optimizes both speed and accuracy in voice command recognition.
8. A system comprising: a robot that includes: one or more physically moveable components, one or more microphones, a control system configured to actuate the one or more physically moveable components, an audio processing system that includes: a local recognition model configured to detect, without communication over a network and using the one or more microphones, utterance of voice commands in a set of first commands that control behavior of the robot, a command selection module configured to distinguish between voice commands in the set of first commands, voice commands in a set of second commands that control behavior of the robot, and voice commands in a set of application commands that include commands different from the first commands and second commands, and wherein the command selection module prioritizes detection of the first commands first, the second commands second, and the application commands third, and a hotword model configured to recognize utterance of a reserved word; and a remote server communicatively coupled to the robot, wherein the remote server is configured to provide a speech recognition service configured to detect voice commands in the set of second commands and voice commands in the set of application commands; wherein the robot is configured to receive audio data for an utterance and process the audio data for the utterance by (i) using the local recognition model evaluate the audio data for the utterance with respect to each of the set of first commands, and (ii) using the speech recognition service provided by the remote server to evaluate the audio data for the utterance with respect to the set of second commands and the set of application commands.
This invention relates to a robotic system with an integrated audio processing system for voice command recognition. The system addresses the challenge of efficiently processing voice commands in a robot, balancing local and remote processing to ensure responsiveness and accuracy. The robot includes physically moveable components, microphones, and a control system to actuate the components. The audio processing system features a local recognition model that detects voice commands from a predefined set of first commands, which control the robot's behavior, without relying on network communication. A command selection module distinguishes between three types of voice commands: first commands (prioritized for immediate local processing), second commands (processed remotely via a server), and application commands (also processed remotely). The system prioritizes detection of first commands first, followed by second commands, and then application commands. Additionally, a hotword model recognizes reserved words to trigger command processing. The robot receives audio data from an utterance and processes it by first evaluating it locally against the first commands using the local recognition model. Simultaneously, the audio data is sent to a remote server for evaluation against the second commands and application commands. This dual-processing approach ensures that critical robot control commands are handled locally for low latency, while more complex or less time-sensitive commands are processed remotely for accuracy. The system optimizes resource usage and responsiveness in robotic voice command applications.
9. The system of claim 8 , wherein the remote server includes: a speech recognizer configured to generate a transcription of audio data corresponding to a voice command in the set of second commands or a voice command in the set of application commands; and a semantic analysis module configured to generate a semantic interpretation of audio data corresponding to a voice command in the set of second commands or a voice command in the set of application commands.
This invention relates to a voice command processing system for managing and executing commands in a computing environment. The system addresses the challenge of efficiently processing and interpreting voice commands, particularly in scenarios where commands may be directed to different applications or system functions. The system includes a remote server that processes voice commands received from a client device. The server is equipped with a speech recognizer that converts audio data from voice commands into text transcriptions. Additionally, the server includes a semantic analysis module that interprets the transcribed commands to determine their intended meaning and context. This allows the system to accurately distinguish between commands intended for specific applications and those directed to broader system functions. The system ensures that voice commands are properly routed and executed, enhancing the usability and responsiveness of voice-controlled interfaces. The invention improves the efficiency and accuracy of voice command processing by leveraging both speech recognition and semantic analysis to handle diverse command types.
10. The system of claim 8 , wherein the audio processing system is configured to detect utterance of commands in the set of second commands through communication with the remote server over the network.
The system relates to audio processing for voice command recognition, particularly in environments where local processing may be insufficient. The problem addressed is the need for reliable detection of voice commands, especially when local processing resources are limited or when commands require advanced processing capabilities not available locally. The system includes an audio processing system that communicates with a remote server over a network to detect the utterance of commands. The audio processing system is configured to send audio data to the remote server, which processes the data to identify commands from a predefined set. The remote server may use advanced algorithms, machine learning models, or other sophisticated techniques to accurately recognize commands that the local system alone cannot process effectively. This approach ensures robust command detection even in challenging acoustic conditions or when dealing with complex or nuanced commands. The system may also include a local processing component that handles simpler commands or pre-processing tasks, while delegating more complex command recognition to the remote server. The overall architecture improves reliability and accuracy in voice command systems by leveraging cloud-based processing when needed.
11. The system of claim 8 , wherein the audio processing system is configured to detect utterance of commands in the set of application commands through communication with the remote server over the network.
The invention relates to an audio processing system designed to detect and process voice commands for controlling applications. The system operates within a networked environment, communicating with a remote server to identify and execute application-specific commands. The audio processing system is configured to recognize spoken commands from a predefined set of application commands, ensuring accurate detection and execution of user instructions. This functionality enhances user interaction with applications by enabling hands-free control through voice inputs. The system leverages network connectivity to access and process command data from the remote server, ensuring up-to-date command recognition and execution capabilities. By integrating with the remote server, the system can dynamically update its command set, improving adaptability and responsiveness to user needs. The overall design focuses on seamless integration with existing applications, providing a robust and efficient method for voice-based command processing. This approach addresses the challenge of accurately detecting and executing voice commands in a networked environment, enhancing user convenience and system functionality.
12. The system of claim 8 , wherein the audio processing system of the robot is configured to complete evaluation of a received voice command to determine whether the received voice command is one of the first commands within a predetermined amount of time from utterance of the voice command; and wherein the audio processing system of the robot is not guaranteed to complete evaluation of the received voice command to determine whether the received voice command is one of the second commands within a predetermined amount of time from utterance of the voice command.
A robot system includes an audio processing system designed to evaluate voice commands with different processing priorities. The system distinguishes between first commands, which must be evaluated within a strict time constraint from the moment of utterance, and second commands, which do not have such a strict time requirement. The audio processing system ensures that first commands are processed and recognized within a predetermined time window, guaranteeing responsiveness for critical or time-sensitive instructions. In contrast, second commands may take longer to evaluate, allowing the system to prioritize urgent tasks while still processing less critical commands. This prioritization enables the robot to handle real-time interactions efficiently, such as responding to immediate user needs while managing background tasks or less urgent requests. The system may include additional components, such as a voice recognition module, a command classification system, and a response execution module, to support this prioritized processing. The design ensures that the robot remains responsive to high-priority commands while maintaining flexibility in handling lower-priority tasks.
13. The system of claim 8 , wherein the reserved word is a wake word for the robot.
A system for a robot includes a processor and a memory storing instructions that, when executed, cause the processor to recognize a reserved word in an audio input. The reserved word is a wake word that activates the robot, enabling it to respond to subsequent commands or perform predefined actions. The system may include a microphone to capture the audio input and a speech recognition module to process the input. The wake word is distinct from other commands, ensuring the robot only activates when the specific reserved word is detected. This prevents unintended activations and improves user interaction by requiring explicit initiation. The system may also include a feedback mechanism, such as a light or sound, to confirm recognition of the wake word. The robot may then transition from a low-power or idle state to an active state, ready to process further instructions. This design enhances energy efficiency and user control by minimizing unnecessary processing until the wake word is detected. The system may also include error handling to distinguish the wake word from similar-sounding words or background noise, ensuring reliable operation.
14. The system of claim 8 , wherein the voice commands in the set of first commands include a stop command, a let go command, or a status command.
This invention relates to a voice-controlled system for managing physical interactions, particularly in environments where hands-free operation is critical. The system addresses the challenge of safely and efficiently controlling mechanical or robotic devices using voice commands, ensuring precise and reliable operation without physical intervention. The system includes a voice recognition module that processes spoken commands to generate control signals for actuators or other mechanical components. These commands are categorized into sets, with a first set of commands specifically designed to manage physical interactions, such as stopping movement, releasing a grip, or querying the system status. The system ensures that these commands are accurately interpreted and executed, even in noisy or dynamic environments, by employing noise-filtering algorithms and contextual analysis. The voice recognition module may also include adaptive learning to improve command recognition over time. The system further integrates safety protocols to prevent unintended actions, such as requiring confirmation for critical commands or implementing fail-safes. The overall goal is to provide a robust, user-friendly interface for controlling physical interactions through voice, enhancing safety and efficiency in applications like robotics, industrial automation, or assistive devices.
15. The system of claim 8 , wherein the voice commands in the set of second commands include a move command, a pick up command, or a bring command.
A system for controlling robotic devices using voice commands is disclosed. The system addresses the challenge of enabling intuitive and natural interaction between users and robots, particularly in environments where manual control is impractical or inefficient. The system processes voice inputs to generate control signals for robotic devices, allowing users to issue commands such as moving to a specified location, picking up objects, or bringing items to a designated area. The system includes a voice recognition module that interprets spoken commands and translates them into executable instructions for the robot. These instructions are then transmitted to the robot, which performs the requested actions. The system may also include feedback mechanisms to confirm command execution or request clarification if the command is ambiguous. The voice commands are part of a predefined set, ensuring reliable interpretation and execution. The system is designed to enhance user convenience and efficiency in robot operation, particularly in applications such as logistics, healthcare, or domestic assistance.
16. One or more computer-readable storage devices storing a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: operating, by the robot, an audio processing system to detect utterance of voice commands in a set of first commands that control behavior of the robot and in a set of second commands that control behavior of the robot, wherein the audio processing system is configured to detect utterance of voice commands in the set of first commands using a local recognition model of the robot that does not require communication over a network, and the audio processing system is configured to detect utterance of commands in the set of second commands through communication with a server over a network; executing, by the robot, one or more applications that are configured to respond to voice commands in a set of application commands that include commands different from the first commands and second commands; and receiving, by the robot, audio data for an utterance; and processing, by the robot using the audio processing system, the audio data for the utterance to evaluate the utterance with respect to each of the set of first commands, the set of second commands, and the set of application commands, wherein the processing is performed according to a predetermined hierarchy that prioritizes detection of the first commands first, the second commands second, and the application commands third, wherein the audio processing system is configured to detect utterance of the first commands, the second commands, and the application commands during execution of the one or more applications by the robot.
A robot system includes an audio processing system that detects and processes voice commands from multiple sources. The system distinguishes between three types of voice commands: a set of first commands that control the robot's behavior using a local recognition model without network communication, a set of second commands that control the robot's behavior through communication with a remote server, and a set of application commands that are specific to applications running on the robot and differ from the first and second commands. When the robot receives audio data from an utterance, the audio processing system evaluates the utterance against all three command sets in a predetermined hierarchy. The system prioritizes detecting the first commands first, followed by the second commands, and then the application commands. This hierarchical processing ensures that the robot can respond to critical control commands before handling application-specific commands, even while applications are actively running. The local recognition model allows for immediate response to essential commands without relying on network connectivity, while the server-based recognition enables more advanced or context-aware command processing when available. This dual-mode approach enhances the robot's responsiveness and reliability in various operational environments.
17. The one or more computer-readable storage devices of claim 16 , wherein the audio processing system of the robot is configured to complete evaluation of a received voice command to determine whether the received voice command is one of the first commands within a predetermined amount of time from utterance of the voice command; and wherein the audio processing system of the robot is not guaranteed to complete evaluation of the received voice command to determine whether the received voice command is one of the second commands within a predetermined amount of time from utterance of the voice command.
Robots equipped with audio processing systems often struggle to reliably interpret voice commands, particularly when distinguishing between urgent and non-urgent instructions. This can lead to delays in critical operations or unnecessary responses to less important commands. To address this, a robot's audio processing system is configured to prioritize the evaluation of certain voice commands over others. Specifically, the system is designed to complete the assessment of a received voice command to determine if it belongs to a predefined set of high-priority commands within a strict time limit from the moment the command is spoken. However, the system does not guarantee the same rapid evaluation for commands that fall into a secondary, lower-priority category. This prioritization ensures that urgent instructions are processed quickly, while less critical commands may experience variable processing delays. The system leverages distinct processing pathways or algorithms to achieve this differentiation, optimizing responsiveness for time-sensitive operations without compromising overall functionality. This approach enhances the robot's ability to handle voice-based interactions efficiently, particularly in environments where immediate action is required for certain commands.
18. The one or more computer-readable storage devices of claim 16 , wherein the audio processing system is configured to detect utterance of commands in the set of application commands through communication with a second server over the network.
This invention relates to audio processing systems for detecting and executing voice commands in a networked environment. The system addresses the challenge of accurately identifying and processing spoken commands in real-time, particularly when the commands are part of a predefined set of application commands. The audio processing system is designed to communicate with a second server over a network to detect and interpret these commands. This networked approach allows for centralized command processing, enabling scalability and consistency across multiple devices or applications. The system may also include features for filtering background noise, recognizing different user voices, and handling variations in command phrasing to improve accuracy. By leveraging network communication, the system can access updated command sets or processing algorithms, ensuring adaptability to new applications or user preferences. The invention aims to enhance the reliability and efficiency of voice-controlled systems in various computing environments.
19. The one or more computer-readable storage devices of claim 16 , wherein the voice commands in the set of first commands include a reserved word.
This invention relates to voice command processing systems, specifically improving the handling of voice commands in computing environments. The problem addressed is the ambiguity and inefficiency in interpreting voice commands, particularly when multiple commands are issued in sequence or when certain commands need to be prioritized or restricted. The system involves a method for processing voice commands using a computing device with one or more processors and memory. The method includes receiving a set of first voice commands from a user, where these commands are intended to control one or more applications or functions of the device. The system then processes these commands by executing them in a specific order, such as sequentially or based on predefined priorities. Additionally, the system may receive a second set of voice commands, which are processed in a similar manner, ensuring that the commands are executed in a controlled and predictable way. A key feature of the invention is the inclusion of a reserved word within the set of first commands. This reserved word acts as a trigger or modifier, allowing the system to distinguish between different types of commands or to enforce specific processing rules. For example, the reserved word may indicate that certain commands should be executed immediately, skipped, or handled differently from standard commands. The system may also include mechanisms to detect and resolve conflicts between commands, ensuring smooth operation even when conflicting instructions are received. The invention improves the reliability and efficiency of voice command processing by providing structured handling of commands, including the use of reserved words to manage execution flow and prioritization. This is particularly useful in environ
20. The one or more computer-readable storage devices of claim 16 , wherein the voice commands in the set of first commands include a stop command, a let go command, or a status command.
This invention relates to voice-controlled systems for managing physical interactions, particularly in scenarios where a user may need to disengage or control a device or system through voice commands. The problem addressed is the need for reliable and intuitive voice-based control mechanisms that allow users to safely and effectively stop, release, or query the status of a system without physical interaction, which is especially critical in situations where manual control may be impractical or unsafe. The system involves one or more computer-readable storage devices storing instructions that, when executed, enable a device to process voice commands. The voice commands are part of a predefined set of first commands, which include at least one of a stop command, a let go command, or a status command. The stop command initiates an action to halt or deactivate a process or device, ensuring safety by preventing further operation. The let go command triggers the release of a physical or digital hold, allowing the system to disengage from a current state or action. The status command retrieves and provides information about the current state or condition of the system, enabling the user to make informed decisions. These commands are processed to execute corresponding actions, ensuring responsive and secure control over the system. The system may also include additional commands or functionalities to enhance usability and safety in various applications.
Unknown
January 19, 2021
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.