Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A method, comprising: obtaining, using a processor, audio data collected by a slave device; obtaining, using the processor, contextual data characterizing a voice environment where the audio data is collected by the slave device, the contextual data including data characterizing a space where the slave device is located and including contextual parameters generated using historical audio data collected by the slave device according to a frequency of occurrence of a voice input entry in the historical audio data that has a relevance with a context in a historical collection period; and obtaining, using the processor, a recognition result of recognizing the audio data based on the contextual data, including: determining a function of the space where the slave device is located according to the contextual data, the function indicating an intended use of the space; and recognizing the audio data based on the function of the space to obtain the recognition result, the recognition result belonging to a topic associated with the intended use of the space, wherein obtaining the contextual data further comprises: determining the contextual data at (n+1)th moment according to topic information mapped by audio data collected at (n)th moment, n being a positive integer, wherein a pause time between the (n)th moment of collecting the audio data and the (n+1)th moment is less than a pre-set pause time.
This invention relates to voice recognition systems that adapt to contextual information for improved accuracy. The problem addressed is the lack of contextual awareness in traditional voice recognition systems, which often misinterpret commands or fail to recognize relevant inputs due to insufficient environmental context. The method involves a processor obtaining audio data from a slave device, such as a microphone or smart speaker, and contextual data characterizing the voice environment where the audio data is collected. The contextual data includes details about the physical space where the device is located, such as room type or function (e.g., kitchen, office), and contextual parameters derived from historical audio data. These parameters are generated based on the frequency of voice input entries in past recordings that are relevant to the current context, helping the system understand recurring patterns or user habits. The system determines the function of the space (e.g., cooking, working) using the contextual data and then recognizes the audio data based on this function, ensuring the recognition result aligns with the intended use of the space. For example, in a kitchen, voice commands related to cooking or timers would be prioritized. The contextual data is dynamically updated at each new audio collection moment (n+1) based on topic information from the previous moment (n), provided the pause between moments is shorter than a preset threshold. This ensures real-time adaptation to changing environments or user needs. The system improves voice recognition accuracy by leveraging spatial context and historical usage patterns.
2. The method according to claim 1 , wherein: obtaining the audio data collected by the slave device includes: receiving, by a master device, the audio data sent from the slave device via a first connection mode; and obtaining the contextual data corresponding to the slave device includes: sending, via a second connection mode, the audio data and the contextual data to a server; and receiving, via the second connection mode, the recognition result returned from the server after the server recognizes the audio data and the contextual data, wherein a maximum communication distance of the first connection mode is less than a maximum communication distance of the second connection mode, the first connection mode being a local network transmission mode, and the second connection mode being a mobile data signal transmission mode.
This invention relates to a system for audio data processing and recognition using multiple communication modes. The system involves a master device, a slave device, and a server. The slave device collects audio data and sends it to the master device over a local network (first connection mode), which has a shorter maximum communication distance than the second connection mode. The master device then forwards the audio data along with contextual data (such as device location or user preferences) to a server via a mobile data signal (second connection mode). The server processes the audio data and contextual data to generate a recognition result, which is sent back to the master device. The use of two different communication modes ensures efficient data transfer—local networks for short-range, high-speed transmission and mobile data for longer-range, server-based processing. This approach optimizes bandwidth and reduces latency by leveraging the strengths of each communication method. The system is particularly useful in scenarios where devices need to offload processing to a server while maintaining low-power, short-range communication for initial data transfer.
3. The method according to claim 1 , wherein: obtaining the contextual data corresponding to the slave device includes: receiving, from the slave device among at least two slave devices, attribute data characterizing device attributes of the slave device, and determining the contextual data based on the attribute data.
This invention relates to a system for managing and utilizing contextual data from multiple slave devices in a networked environment. The problem addressed is the need to efficiently gather and process contextual information from various slave devices to enhance system functionality, such as device identification, configuration, or interaction. The method involves obtaining contextual data corresponding to a slave device by first receiving attribute data from the slave device. This attribute data characterizes the device's attributes, such as its type, capabilities, or operational status. The contextual data is then determined based on this attribute data, allowing the system to understand the device's role, state, or requirements within the network. This process is repeated for multiple slave devices, enabling the system to maintain an up-to-date and comprehensive view of the network's devices and their attributes. By analyzing the contextual data derived from the attribute data, the system can make informed decisions, such as optimizing device interactions, dynamically configuring settings, or improving resource allocation. The method ensures that the system adapts to changes in the network, such as new devices joining or existing devices updating their attributes, thereby maintaining efficient and accurate device management.
4. The method according to claim 3 , wherein determining the contextual data based on the attribute data includes: determining the contextual data, based on the attribute data and a predetermined correspondence relationship between the attribute data and the contextual data.
This invention relates to a method for determining contextual data based on attribute data, particularly in systems where contextual information is inferred from observed attributes. The problem addressed is the need for accurate and efficient determination of contextual data from attribute data, ensuring relevance and reliability in applications such as user behavior analysis, environmental monitoring, or system diagnostics. The method involves analyzing attribute data, which may include user inputs, sensor readings, or system parameters, to derive meaningful contextual data. A key aspect is the use of a predetermined correspondence relationship between the attribute data and the contextual data. This relationship defines how specific attribute values map to contextual interpretations, ensuring consistency and accuracy. For example, sensor readings from a device may be mapped to operational states like "normal," "warning," or "failure" based on predefined thresholds or patterns. The method may also involve preprocessing the attribute data to normalize or filter irrelevant information before applying the correspondence relationship. This step enhances the reliability of the derived contextual data. Additionally, the method may adapt the correspondence relationship over time based on feedback or new data, improving accuracy in dynamic environments. The invention is particularly useful in applications requiring real-time or near-real-time contextual awareness, such as smart home systems, industrial automation, or personalized user experiences. By leveraging a structured correspondence relationship, the method ensures that contextual data is derived systematically and can be scaled across different domains.
5. The method according to claim 1 , wherein obtaining the recognition result of recognizing the audio data based on the contextual data further includes: for the audio data containing one or more homophone entries corresponding to a plurality of recognition results, selecting a recognition result matched with the contextual data as a final recognition result of the one or more homophone entries.
6. The method according to claim 1 , wherein obtaining the recognition result of recognizing the audio data based on the contextual data further includes: for correcting the recognition result of the audio data, selecting a correction result matched with the contextual data as a final recognition result of the audio data.
This invention relates to audio recognition systems that improve accuracy by incorporating contextual data. The problem addressed is the inherent difficulty in accurately recognizing speech or other audio signals due to background noise, accents, or ambiguous phonetic similarities. Traditional systems often produce incorrect transcriptions or misinterpretations without additional context. The method enhances audio recognition by using contextual data to refine and correct initial recognition results. Contextual data may include metadata such as speaker identity, location, time, or prior knowledge about the expected content. The system first generates an initial recognition result from the audio data. Then, it selects a correction result that aligns with the contextual data to produce a final, more accurate recognition output. For example, if the audio data is a voice command in a smart home system, contextual data like the user's profile or device history may help disambiguate similar-sounding commands. The correction process ensures the final result is contextually appropriate, reducing errors caused by acoustic ambiguities. This approach is particularly useful in applications like voice assistants, transcription services, and real-time speech recognition where accuracy is critical.
7. A device, comprising: a first device, configured to obtain audio data collected by a slave device; a second device, configured to obtain contextual data characterizing a voice environment where the audio data is collected by the slave device, the contextual data including data characterizing a space where the slave device is located and including contextual parameters generated using historical audio data collected by the slave device according to a frequency of occurrence of a voice input entry in the historical audio data that has a relevance with a context in a historical collection period; and a third device, configured to obtain a recognition result of recognizing the audio data based on the contextual data, by: determining a function of the space where the slave device is located according to the contextual data, the function indicating an intended use of the space; and recognizing the audio data based on the function of the space to obtain the recognition result, the recognition result belonging to a topic associated with the intended use of the space, wherein the second device is further configured to: determine the contextual data at (n+1)th moment according to topic information mapped by audio data collected at (n)th moment, n being a positive integer, wherein a pause time between the (n)th moment of collecting the audio data and the (n+1)th moment is less than a pre-set pause time.
8. The device according to claim 7 , wherein: the first device receives the audio data from the slave device among at least two slave devices; and the second device receives, from the slave device, attribute data characterizing device attributes of the slave device, and determines the contextual data based on the attribute data.
9. The device according to claim 8 , wherein: the second device determines the contextual data, based on the attribute data and a predetermined correspondence relationship between the attribute data and the contextual data.
The invention relates to a system for determining contextual data based on attribute data using a predetermined correspondence relationship. The system involves at least two devices, where a first device collects attribute data and transmits it to a second device. The second device processes this attribute data to derive contextual data by applying a predefined mapping or relationship between the attribute data and the contextual data. This allows the system to infer meaningful contextual information from raw attribute data, enabling applications such as environmental monitoring, user behavior analysis, or device operation optimization. The correspondence relationship may be stored in a database or algorithmically defined, ensuring accurate and consistent contextual data derivation. The system enhances data interpretation by transforming raw attributes into actionable insights, improving decision-making in various technical and operational contexts. The invention ensures efficient and reliable contextual data determination by leveraging structured relationships between attribute data and contextual outputs.
10. The device according to claim 8 , wherein: for the audio data containing one or more homophone entries corresponding to a plurality of recognition results, the third device selects a recognition result matched with the contextual data as a final recognition result of the one or more homophone entries.
11. A device, comprising: a communication interface; and a processor, operatively coupled to the communication interface, wherein: the processor, under a predetermined execution instruction, uses the communication interface to: obtain audio data collected by a slave device; obtain contextual data characterizing a voice environment where the audio data is collected by the slave device, the contextual data including data characterizing a space where the slave device is located and including contextual parameters generated using historical audio data collected by the slave device according to a frequency of occurrence of a voice input entry in the historical audio data that has a relevance with a context in a historical collection period; and obtain a recognition result of recognizing the audio data based on the contextual data, by: determining a function of the space where the slave device is located according to the contextual data, the function indicating an intended use of the space; and recognizing the audio data based on the function of the space to obtain the recognition result, the recognition result belonging to a topic associated with the intended use of the space; wherein the processor is configured to: determine the contextual data at (n+1)th moment according to topic information mapped by audio data collected at (n)th moment, n being a positive integer, wherein a pause time between the (n)th moment of collecting the audio data and the (n+1)th moment is less than a pre-set pause time.
12. The device according to claim 11 , wherein: the communication interface includes a first communication interface and a second communication interface different from the first communication interface, wherein: the first communication interface receives the audio data sent from the slave device via a first connection method; and the second communication interface sends, via a second connection mode, the audio data and the contextual data to a server, and receives, via the second connection mode, the recognition result returned from the server after the server recognizes the audio data and the contextual data, wherein a maximum communication distance of the first connection mode is less than a maximum communication distance of the second connection mode.
This invention relates to a device for processing audio data with contextual information, addressing the challenge of efficiently transmitting and recognizing audio signals in environments where direct server communication may be limited. The device includes a communication interface with two distinct interfaces: a first interface for receiving audio data from a slave device using a short-range connection method, and a second interface for sending the audio data along with contextual data to a server via a longer-range connection mode. The server processes the audio and contextual data to generate a recognition result, which is then received back by the device through the second interface. The first connection method has a shorter maximum communication distance than the second, enabling flexible deployment in scenarios where the slave device is close to the main device but the server is farther away. The contextual data may include metadata such as location, time, or device status, enhancing the accuracy of the server's recognition process. This system ensures reliable audio data transmission and processing even in environments with varying communication constraints.
13. The device according to claim 11 , wherein: the communication interface receives the audio data from the slave device among at least two slave devices; and receives, from the slave device, attribute data characterizing device attributes of the slave device; and the processor determines the contextual data based on the attribute data.
This invention relates to a device for processing audio data received from multiple slave devices, where the device enhances audio processing by incorporating contextual information derived from device attributes. The problem addressed is the need to improve audio data handling in systems with multiple input sources by leveraging metadata about the devices themselves to refine processing decisions. The device includes a communication interface that receives audio data from at least two slave devices and also obtains attribute data from these devices. The attribute data describes characteristics of the slave devices, such as their type, location, or configuration. A processor then uses this attribute data to determine contextual data, which influences how the audio data is processed. For example, if a slave device is identified as a microphone in a specific room, the processor may adjust audio filtering or routing based on that context. This approach ensures that audio processing is dynamically adapted to the specific conditions and capabilities of the input sources, improving accuracy and relevance. The system may be used in applications like multi-device audio conferencing, smart home systems, or industrial monitoring, where contextual awareness enhances performance.
14. The device according to claim 13 , wherein the processor further: determines the contextual data, based on the attribute data and a predetermined correspondence relationship between the attribute data and the contextual data.
This invention relates to a device for processing attribute data to derive contextual information. The device includes a processor that analyzes attribute data associated with an object or environment, such as sensor readings, user inputs, or system logs. The processor determines contextual data by mapping the attribute data to predefined contextual categories using a predetermined correspondence relationship. This relationship defines how specific attribute values or patterns correlate with broader contextual meanings, such as identifying a user's activity, environmental conditions, or system states. The device may also include a memory for storing the attribute data, contextual data, and the correspondence relationship. The processor dynamically updates the correspondence relationship based on new data or user feedback to improve accuracy. The invention addresses the challenge of extracting meaningful contextual insights from raw attribute data, enabling applications in smart environments, user behavior analysis, and automated decision-making systems. The device ensures real-time or near-real-time contextual interpretation, enhancing system responsiveness and adaptability.
15. The device according to claim 11 , wherein the processor further: when the audio data contains one or more homophone entries corresponding to a plurality of recognition results, selects a recognition result matched with the contextual data as a final recognition result of the one or more homophone entries.
16. The device according to claim 11 , wherein the processor further: when correcting the recognition result of the audio data, selects a correction result matched with the contextual data as a final recognition result of the audio data.
17. The method according to claim 1 , wherein: the space is selected from at least one of a kitchen or conference rooms of different departments.
18. The method according to claim 1 , wherein: the space is a kitchen and the topic associated with the intended use of the space is food and beverage; or the space is a conference room of a chemical department, and the topic associated with the intended use of the space is chemistry.
Unknown
March 23, 2021
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.