Patentable/Patents/US-20250384883-A1

US-20250384883-A1

Electronic Devices with Voice Command and Contextual Data Processing Capabilities

PublishedDecember 18, 2025

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

An electronic device may capture a voice command from a user. The electronic device may store contextual information about the state of the electronic device when the voice command is received. The electronic device may transmit the voice command and the contextual information to computing equipment such as a desktop computer or a remote server. The computing equipment may perform a speech recognition operation on the voice command and may process the contextual information. The computing equipment may respond to the voice command. The computing equipment may also transmit information to the electronic device that allows the electronic device to respond to the voice command.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

. (canceled)

. An electronic device, comprising:

. The electronic device of, wherein the media file includes a photo.

. The electronic device of, wherein the media file includes a video.

. The electronic device of, wherein:

. The electronic device of, wherein causing processing of the voice command and the media file includes processing the voice command at the electronic device.

. The electronic device of, the one or more programs further including instructions for:

. The electronic device of, wherein the first predefined set of words associated with the electronic device includes fewer words than a second predefined set of words associated with a second electronic device.

. The electronic device of, the one or more programs further including instructions for:

. The electronic device of, wherein causing performance of the action includes performing one or more actions at the electronic device.

. The electronic device of, wherein causing the action to be performed includes:

. The electronic device of, wherein causing performance of the action includes performing one or more actions at a second electronic device.

. The electronic device of, wherein performing the one or more actions at the second electronic device includes:

. The electronic device of, the one or more programs further including instructions for:

. A method, comprising:

. A non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of an electronic device, the one or more programs including instructions for:

Detailed Description

Complete technical specification and implementation details from the patent document.

This invention relates generally to electronic devices, and more particularly, to electronic devices such as portable electronic devices that can capture voice commands and contextual information.

Electronic devices such as portable electronic devices are becoming increasingly popular. Examples of portable devices include handheld computers, cellular telephones, media players, and hybrid devices that include the functionality of multiple devices of this type. Popular portable electronic devices that are somewhat larger than traditional handheld electronic devices include laptop computers and tablet computers.

Portable electronic devices such as handheld electronic devices may have limited speech recognition capabilities. For example, a cellular telephone may have a microphone that can be used to receive and process cellular telephone voice commands that control the operation of the cellular telephone.

Portable electronic devices generally have limited processing power and are not always actively connected to remote databases and services of interest. Conventional devices are often not contextually aware. These shortcomings can make it difficult to use conventional portable electronic devices for sophisticated voice-based control functions.

It would therefore be desirable to be able to provide improved systems for electronic devices such as portable electronic devices that handle voice-based commands.

A portable electronic device such as a handheld electronic device is provided. The electronic device may have a microphone that is used to receive voice commands. The electronic device may use the microphone to record a user's voice. The recording of the user's voice may be stored as a digital audio file in storage associated with the electronic device.

When the electronic device receives a voice command, the electronic device may store information about the current state of the electronic device and its operating environment as contextual information (metadata). With one suitable arrangement, stored contextual information may include information about the operational state of the electronic device such as which applications are running on the device and their status. The electronic device may determine which portions of the information on the state of the device are relevant to the voice command and may store only the relevant portions. If desired, the electronic device may determine which contextual information is most relevant by performing a speech recognition operation on the recorded voice command to look for specific keywords.

The electronic device may process voice commands locally or voice commands processing may be performed remotely. For example, the electronic device may transmit one or more recorded voice commands and associated contextual information to computing equipment such as a desktop computer. Captured voice commands and contextual information may also be uploaded to server computing equipment over a network. The electronic device may transmit recorded voice commands and the associated contextual information at any suitable time such as when instructed by a user, as each voice command is received, immediately after each voice command is received, whenever the electronic device is synched with appropriate computing equipment, or other suitable times.

After a recorded voice command and associated contextual information have been transferred to a desktop computer, remote server, or other computing equipment, the computing equipment may process the voice command using a speech recognition operation. The computing equipment may use the results of the speech recognition operation and any relevant contextual information together to respond to the voice command properly. For example, the computing equipment may respond to the voice command by displaying search results or performing other suitable actions). If desired, the computing equipment may convey information back to the electronic device in response to the voice command.

In a typical scenario, a user may make a voice command while directing the electronic device to record the voice command. The user may make the voice command while the electronic device is performing a particular operation with an application. For example, the user may be using the electronic device to play songs with a media application. While listening to a song, the user may press a record button on the electronic device to record the voice command “find more like this.” The voice command may be processed by the electronic device (e.g., to create a code representative of the spoken command) or may be stored in the form of an audio clip by the electronic device. At an appropriate time, such as when the electronic device is connected to a host computer or a remote server through a communications path, the code or the audio clip corresponding to the spoken command may be uploaded for further processing. Contextual information such as information on the song that was playing in the media application when the voice command was made may be uploaded with the voice command.

A media playback application on a computer such as the iTunes program of Apple Inc. may take an appropriate action in response to an uploaded voice command and associated contextual data. As an example, the media playback application may present a user with recommended songs for purchase. The songs that are recommended may be songs that are similar to the song that was playing on the electronic device when the user captured the audio clip voice command “find more like this.”

The computer to which the voice command audio clip is uploaded may have greater processing power available than that available on a handheld electronic device, so voice processing accuracy may be improved by offloading voice recognition operations to the computer from the handheld electronic device in this way. The computer to which the audio clip is uploaded may also have access to more extensive data that would be available on a handheld electronic device such as the contents of a user's full home media library. The computer that receives the uploaded command may also have access to online resources such as an online server database. This database may have been difficult or impossible for the user to access from the handheld device when the voice command was captured.

If desired, the contextual information that is captured by the electronic device in association with a captured voice command may include audio information. For example, a user may record a spoken phrase. Part of the spoken phrase may represent a voice command and part of the spoken phrase may include associated contextual information. As an example, a user may be using a mapping application on a handheld electronic device. The device may be presenting the user with a map that indicates the user's current position. The user may press a button or may otherwise instruct the handheld electronic device to record the phrase “I like American restaurants in this neighborhood.” In response, the electronic device may record the spoken phrase. The recorded phrase (in this example), includes a command portion (“I like”) that instructs the mapping application to create a bookmark or other indicator of the user's preference. The recorded phrase also includes the modifier “American restaurants” to provide partial context for the voice command. Additional contextual information (i.e., the phrase “in this neighborhood) and accompanying position data (e.g., geographic coordinates from global positioning system circuitry in the device) may also be supplied in conjunction with the recorded voice command. When uploaded, the audio clip voice command and the associated audio clip contextual information can be processed by speech recognition software and appropriate actions taken.

Further features of the invention, its nature and various advantages will be more apparent from the accompanying drawings and the following detailed description of the preferred embodiments.

The present invention relates to using voice commands to control electronic systems.

Voice commands may be captured with an electronic device and uploaded to computing equipment for further processing. Electronic devices that may be used in this type of environment may be portable electronic devices such as laptop computers or small portable computers of the type that are sometimes referred to as ultraportables. Portable electronic devices may also be somewhat smaller devices. Examples of smaller portable electronic devices include wrist-watch devices, pendant devices, headphone and earpiece devices, and other wearable and miniature devices. With one suitable arrangement, the portable electronic devices may be wireless electronic devices.

The wireless electronic devices may be, for example, handheld wireless devices such as cellular telephones, media players with wireless communications capabilities, handheld computers (also sometimes called personal digital assistants), global positioning system (GPS) devices, and handheld gaming devices. The wireless electronic devices may also be hybrid devices that combine the functionality of multiple conventional devices. Examples of hybrid portable electronic devices include a cellular telephone that includes media player functionality, a gaming device that includes a wireless communications capability, a cellular telephone that includes game and email functions, and a portable device that receives email, supports mobile telephone calls, has music player functionality and supports web browsing. These are merely illustrative examples.

An illustrative environment in which a user may interact with system components using voice commands is shown in. A user in systemmay have an electronic device such as user device. User devicemay be used to receive voice commands (e.g., to record a user's voice). If devicehas sufficient processing power, the voice commands may be partly or fully processed by user device(e.g., using a speech recognition engine such as speech recognition engine). If desired, the voice commands may be transmitted by user deviceto computing equipmentover communications path. Voice commands may also be conveyed to remote servicesover network(e.g., via pathor via path, equipment, and path).

When user devicetransmits voice commands to computing equipment, the user device may include contextual information along with the voice commands. User device, computing equipment, and servicesmay be connected through a network such as communications network. Networkmay be, for example, a local area network, a wide area network such as the Internet, a wired network, a wireless network, or a network formed from multiple networks of these types. User devicemay connect to communications networkthrough a wired or wireless communications path such as pathor may connect to networkvia equipment. In one embodiment of the invention, user devicemay transmit voice commands and contextual information to computing equipmentthrough communications network. User devicemay also transmit voice commands and contextual information to computing equipmentdirectly via communications path. Pathmay be, for example, a universal serial bus (USB) path or any other suitable wired or wireless path.

User devicemay have any suitable form factor. For example, user devicemay be provided in the form of a handheld device, desktop device, or even integrated as part of a larger structure such as a table or wall. With one particularly suitable arrangement, which is sometimes described herein as an example, user devicemay be provided with a handheld form factor. For example, devicemay be a handheld electronic device. Illustrative handheld electronic devices that may be provided with voice command recording capabilities include cellular telephones, media players, media players with wireless communications capabilities, handheld computers (also sometimes called personal digital assistants), global positioning system (GPS) devices, handheld gaming devices, and other handheld devices. If desired, user devicemay be a hybrid device that combines the functionality of multiple conventional devices. Examples of hybrid handheld devices include a cellular telephone that includes media player functionality, a gaming device that includes a wireless communications capability, a cellular telephone that includes game and email functions, and a handheld device that receives email, supports mobile telephone calls, supports web browsing, and includes media player functionality. These are merely illustrative examples.

Computing equipmentmay include any suitable computing equipment such as a personal desktop computer, a laptop computer, a server, etc. With one suitable arrangement, computing equipmentis a computer that establishes a wired or wireless connection with user device. The computing equipment may be a server (e.g., an internet server), a local area network computer with or without internet access, a user's own personal computer, a peer device (e.g., another user device), any other suitable computing equipment, and combinations of multiple pieces of computing equipment. Computing equipmentmay be used to implement applications such as media playback applications (e.g., iTunes® from Apple Inc.), a web browser, a mapping application, an email application, a calendar application, etc.

Computing equipment(e.g., one or more servers) may be associated with one or more online services.

Communications pathand the other paths in systemsuch as pathbetween deviceand equipment, pathbetween deviceand network, and the paths between networkand servicesmay be based on any suitable wired or wireless communications technology. For example, the communications paths in systemmay be based on wired communications technology such as coaxial cable, copper wiring, fiber optic cable, universal serial bus (USB®), IEEE 1394 (FireWire®), paths using serial protocols, paths using parallel protocols, and Ethernet paths. Communications paths in systemmay, if desired, be based on wireless communications technology such as satellite technology, radio-frequency (RF) technology, wireless universal serial bus technology, and Wi-Fi® or Bluetooth® 802.11 wireless link technologies. Wireless communications paths in systemmay also include cellular telephone bands such as those at 850 MHz, 900 MHz, 1800 MHz, and 1900 MHz (e.g., the main Global System for Mobile Communications or GSM cellular telephone bands), one or more proprietary radio-frequency links, and other local and remote wireless links. Communications paths in systemmay also be based on wireless signals sent using light (e.g., using infrared communications) or sound (e.g., using acoustic communications).

Communications pathmay be used for one-way or two-way transmissions between user deviceand computing equipment. For example, user devicemay transmit voice commands and contextual information to computing equipment. After receiving voice commands and contextual information from user device, computing equipmentmay process the voice commands and contextual information using a speech recognition engine such as speech recognition engine. Enginemay be provided as a standalone software component or may be integrated into a media playback application or other application. If desired, computing equipmentmay transmit data signals to user device. Equipmentmay, for example, transmit information to devicein response to voice commands transmitted by deviceto system. For example, when a voice command transmitted by deviceincludes a request to search for information, systemmay transmit search results back to device.

Communications networkmay be based on any suitable communications network or networks such as a radio-frequency network, the Internet, an Ethernet network, a wireless network, a Wi-Fi® network, a Bluetooth® network, a cellular telephone network, or a combination of such networks.

Servicesmay include any suitable online services. Servicesmay include a speech recognition service (e.g., a speech recognition dictionary), a search service (e.g., a service that searches a particular database or that performs Internet searches), an email service, a media service, a software update service, an online business service, etc. Servicesmay communicate with computing equipmentand user devicethrough communications network.

In typical user, user devicemay be used to capture voice commands from a user during the operation of user device. For example, user devicemay receive one or more voice commands during a media playback operation (e.g., during playback of a music file or a video file). User devicemay then store information about its current operational state as contextual information. User devicemay record information related to the current media playback operation. Other contextual information may be stored when other applications are running on device. For example, user devicemay store information related to a web-browsing application, the location of user device, or other appropriate information on the operating environment for device. Following the reception of a voice command, user devicemay, if desired, perform a speech recognition operation on the voice command. User devicemay utilize contextual information about the state of the user device at the time the voice command was received during the associated speech recognition operation.

In addition to or in lieu of performing a local speech recognition operation on the voice command using engine, user devicemay forward the captured voice command audio clip and, if desired, contextual information to computing equipmentfor processing. Computing equipmentmay use engineto implement speech recognition capabilities that allow computing equipmentto respond to voice commands that user devicemight otherwise have difficulties in processing. For example, if user devicewere to receive a voice command to “find Italian restaurants near me,” user devicemight not be able to execute the voice command immediately for reasons such as an inability to perform adequate speech processing due to a lack of available processing power, an inability to perform a search requested by a voice command due to a lack of network connectivity, etc. In this type of situation, devicemay save the voice command (e.g., as a recorded audio file of a user's voice) and relevant contextual information (e.g., the current location of user device) for transmission to computing equipmentfor further processing of the voice command. Devicemay transmit voice commands and contextual information to computing equipmentat any suitable time (e.g., when deviceis synched with computing equipment, as the voice commands are received by device, whenever deviceis connected to a communications network, etc.). These transmissions may take place simultaneously or as two separate but related transmissions.

With one suitable arrangement, devicemay save all available contextual information. With another arrangement, devicemay perform a either a cursory or a full speech recognition operation on voice commands to determine what contextual information is relevant and then store only the relevant contextual information. As an example, user devicemay search for the words “music” and “location” in a voice command to determine whether the contextual information stored in association with the voice command should include information related to a current media playback operation or should include the current location of user device(e.g., which may be manually entered by a user or may be determined using a location sensor).

An illustrative user devicein accordance with an embodiment of the present invention is shown in. User devicemay be any suitable electronic device such as a portable or handheld electronic device.

User devicemay handle communications over one or more wireless communications bands such as local area network bands and cellular telephone network bands.

Devicemay have a housing. Displaymay be attached to housingusing bezel. Displaymay be a touch screen liquid crystal display (as an example).

Devicemay have a microphone for receiving voice commands. Openingsandmay, if desired, form microphone and speaker ports. With one suitable arrangement, devicemay have speech recognition capabilities (e.g., a speech recognition engine that can be used to receive and process voice commands from a user). Devicemay also have audio capture and playback capabilities. Devicemay be able to receive voice commands from a user and other audio though a microphone (e.g., formed as part of one or more ports such as openingsand). Portmay be, for example, a speaker sport. If desired, devicemay activate its audio recording and/or speech recognition capabilities (e.g., devicemay begin recording audio signals associated with a user's voice with a microphone) in response to user input. For example, devicemay present an on-screen selectable option to the user to activate speech recognition functionality. Devicemay also have a user input device such as buttonthat is used to receive user input to activate speech recognition functionality.

User devicemay have other input-output devices. For example, user devicemay have other buttons. Input-output components such as portand one or more input-output jacks (e.g., for audio and/or video) may be used to connect deviceto computing equipmentand external accessories. Buttonmay be, for example, a menu button. Portmay contain a 30-pin data connector (as an example). Suitable user input interface devices for user devicemay also include buttons such as alphanumeric keys, power on-off, power-on, power-off, voice memo, and other specialized buttons, a touch pad, pointing stick, or other cursor control device, or any other suitable interface for controlling user device. In the example of, display screenis shown as being mounted on the front face of user device, but display screenmay, if desired, be mounted on the rear face of user device, on a side of user device, on a flip-up portion of user devicethat is attached to a main body portion of user deviceby a hinge (for example), or using any other suitable mounting arrangement. Displaymay also be omitted

Although shown schematically as being formed on the top face of user devicein the example of, buttons such as buttonand other user input interface devices may generally be formed on any suitable portion of user device. For example, a button such as buttonor other user interface control may be formed on the side of user device. Buttons and other user interface controls can also be located on the top face, rear face, or other portion of user device. If desired, user devicecan be controlled remotely (e.g., using an infrared remote control, a radio-frequency remote control such as a Bluetooth® remote control, etc.). With one suitable arrangement, devicemay receive voice commands and other audio through a wired or wireless headset or other accessory. Devicemay also activate its speech recognition functionality in response to user input received through a wired or wireless headset (e.g., in response to a button press received on the headset).

Devicemay use portto perform a synchronization operation with computing equipment. With one suitable arrangement, devicemay transmit voice commands and contextual information to computing equipment. For example, during a media playback operation, devicemay receive a voice command to “find more music like this.” If desired, devicemay upload the voice command and relevant contextual information (e.g., the title and artist of the media file that was playing when the voice command was received) to computing equipment. Computing equipmentmay receive and process the voice command and relevant contextual information and may perform a search for music that is similar to the media file that was playing when the voice command was received. Computing equipmentmay then respond by displaying search results, purchase recommendations, etc.

Devicemay receive data signals from computing equipmentin response to uploading voice commands and contextual information. The data received by devicefrom equipmentin response to voice commands and contextual information may be used by deviceto carry out requests associated with the voice commands. For example, after processing the voice command and contextual information, computing equipmentmay transmit results associated with the voice command to user devicewhich may then display the results.

A schematic diagram of an embodiment of an illustrative user deviceis shown in. User devicemay be a mobile telephone, a mobile telephone with media player capabilities, a media player, a handheld computer, a game player, a global positioning system (GPS) device, a combination of such devices, or any other suitable electronic device such as a portable device.

As shown in, user devicemay include storage. Storagemay include one or more different types of storage such as hard disk drive storage, nonvolatile memory (e.g., flash memory or other electrically-programmable-read-only memory), volatile memory (e.g., battery-based static or dynamic random-access-memory), etc. Storagemay be used to store voice commands and contextual information about the state of devicewhen voice commands are received. Processing circuitrymay be used to control the operation of user device. Processing circuitrymay be based on a processor such as a microprocessor and other suitable integrated circuits. With one suitable arrangement, processing circuitryand storageare used to run software on user device, such as speech recognition applications, internet browsing applications, voice-over-internet-protocol (VOIP) telephone call applications, email applications, media playback applications, operating system functions (e.g., operating system functions supporting speech recognition capabilities), etc. Processing circuitryand storagemay be used in implementing analog-to-digital conversion functions for capturing audio and may be used to implement speech recognition functions.

Input-output devicesmay be used to allow data to be supplied to user deviceand to allow data to be provided from user deviceto external devices. Display screen, button, microphone port, speaker port, speaker port, and dock connector portare examples of input-output devices.

Input-output devicescan include user input devicessuch as buttons, touch screens, joysticks, click wheels, scrolling wheels, touch pads, key pads, keyboards, microphones, cameras, etc. A user can control the operation of user deviceby supplying commands through user input devices. Display and audio devicesmay include liquid-crystal display (LCD) screens or other screens, light-emitting diodes (LEDs), and other components that present visual information and status data. Display and audio devicesmay also include audio equipment such as speakers and other devices for creating sound. Display and audio devicesmay contain audio-video interface equipment such as jacks and other connectors for external headphones, microphones, and monitors.

Wireless communications devicesmay include communications circuitry such as radio-frequency (RF) transceiver circuitry formed from one or more integrated circuits, power amplifier circuitry, passive RF components, one or more antennas, and other circuitry for handling RF wireless signals. Wireless signals can also be sent using light (e.g., using infrared communications circuitry in circuitry).

User devicecan communicate with external devices such as accessoriesand computing equipment, as shown by paths. Pathsmay include wired and wireless paths (e.g., bidirectional wireless paths). Accessoriesmay include headphones (e.g., a wireless cellular headset or audio headphones) and audio-video equipment (e.g., wireless speakers, a game controller, or other equipment that receives and plays audio and video content).

Computing equipmentmay be any suitable computer such as computing equipmentor computing equipmentof. With one suitable arrangement, computing equipmentis a computer that has an associated wireless access point (router) or an internal or external wireless card that establishes a wireless connection with user device. The computer may be a server (e.g., an internet server), a local area network computer with or without internet access, a user's own personal computer, a peer device (e.g., another user device), or any other suitable computing equipment. Computing equipmentmay be associated with one or more online services. A link such as linkmay be used to connect deviceto computing equipment such as computing equipmentof.

Wireless communications devicesmay be used to support local and remote wireless links. Examples of local wireless links include infrared communications, Wi-Fi® (IEEE 802.11), Bluetooth®, and wireless universal serial bus (USB) links.

If desired, wireless communications devicesmay include circuitry for communicating over remote communications links. Typical remote link communications frequency bands include the cellular telephone bands at 850 MHz, 900 MHz, 1800 MHz, and 1900 MHz, the global positioning system (GPS) band at 1575 MHz, and data service bands such as the 3G data communications band at 2170 MHz band (commonly referred to as UMTS or Universal Mobile Telecommunications System). In these illustrative remote communications links, data is transmitted over linksthat are one or more miles long, whereas in short-range links, a wireless signal is typically used to convey data over tens or hundreds of feet.

A schematic diagram of an embodiment of illustrative computing equipmentis shown in. Computing equipmentmay include any suitable computing equipment such as a personal desktop computer, a laptop computer, a server, etc. and may be used to implement computing equipmentand/or computing equipmentof. Computing equipmentmay be a server (e.g., an internet server), a local area network computer with or without internet access, a user's own personal computer, a peer device (e.g., another user device), other suitable computing equipment, or combinations of multiple pieces of such computing equipment. Computing equipmentmay be associated with one or more services such as servicesof.

As shown in, computing equipmentmay include storagesuch as hard disk drive storage, nonvolatile memory, volatile memory, etc. Processing circuitrymay be used to control the operation of computing equipment. Processing circuitrymay be based on one or more processors such as microprocessors, microcontrollers, digital signal processors, application specific integrated circuits, and other suitable integrated circuits. Processing circuitryand storagemay be used to run software on computing equipmentsuch as speech recognition applications, operating system functions, audio capture applications, other applications with voice recognition and/or audio capture functionality, and other software applications.

Input-output circuitrymay be used to gather user input and other input data and to allow data to be provided from computing equipmentto external devices. Input-output circuitrycan include devices such as mice, keyboards, touch screens, microphones, speakers, displays, televisions, speakers, wired communications circuitry, and wireless communications circuitry.

Illustrative steps involved in using an electronic device such as user deviceto gather voice commands and contextual information are shown in.

Patent Metadata

Filing Date

Unknown

Publication Date

December 18, 2025

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search