Patentable/Patents/US-20260164177-A1
US-20260164177-A1

Methods, Systems, and Media for Providing Spatial Audio by Synchronizing Output Audio from One or More Speaker Devices to Input Audio from a Directional Audio Source

PublishedJune 11, 2026
Assigneenot available in USPTO data we have
InventorsDongeek Shin
Technical Abstract

Methods, systems, and media for providing spatial audio by synchronizing output audio from one or more speaker devices to input audio from a directional audio source are provided. In some implementations, a method for providing spatial audio is provided that includes: receiving, from a microphone of a plurality of microphones associated with an electronic speaker device that is not paired with a media device, a plurality of microphone signals that are responsive to input audio being provided by the media device; performing directional processing on the plurality of microphone signals to produce an audio signal that corresponds with the input audio being provided by the media device; amplifying the audio signal to generate an amplified signal; and, while receiving the plurality of microphone signals, generating output audio using at least one speaker associated with the electronic speaker device in response to the amplified signal.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

receiving, from a microphone of a plurality of microphones associated with an electronic speaker device that is not paired with a media device, a plurality of microphone signals that are responsive to input audio being provided by the media device; performing directional processing on the plurality of microphone signals to produce an audio signal that corresponds with the input audio being provided by the media device; amplifying the audio signal to generate an amplified signal; and while receiving the plurality of microphone signals, generating output audio using at least one speaker associated with the electronic speaker device in response to the amplified signal. . A method, the method comprising:

2

claim 1 . The method of, wherein generating the output audio using the at least one speaker associated with the electronic speaker device in response to the amplified signal further comprises selecting a speaker from a plurality of speakers that are included in the electronic speaker device.

3

claim 1 determining a beamformed signal from the amplified signal that corresponds with a listening zone in an environment that includes the media device and the electronic speaker device; and causing a plurality of speakers within the electronic speaker device to produce the output audio corresponding to the beamformed signal. . The method of, wherein generating the output audio using the at least one speaker associated with the electronic speaker device in response to the amplified signal further comprises:

4

claim 1 . The method of, wherein performing directional processing further comprises performing a first weighted delay and sum calculation on the plurality of microphone signals to separate the input audio being provided by the media device from background audio signals.

5

claim 4 . The method of, wherein the first weighted delay and sum calculation further comprises complex valued weights comprising an amplitude and a phase.

6

claim 4 . The method of, wherein the first weighted delay and sum calculation further comprises weights determined in a second weighted delay and sum calculation.

7

claim 1 in response to performing directional processing on the plurality of microphone signals to produce the audio signal, causing a plurality of weights to be determined, wherein the plurality of weights correspond to the one or more locations; and causing the plurality of weights to be stored in the electronic speaker device. . The method of, wherein the input audio comprises a set of audio tones produced from one or more locations, and wherein the method further comprises:

8

a plurality of microphones; a plurality of speakers; and receive, from a microphone of the plurality of microphones associated with the electronic speaker device that is not paired with a media device, a plurality of microphone signals that are responsive to input audio being provided by the media device; perform directional processing on the plurality of microphone signals to produce an audio signal that corresponds with the input audio being provided by the media device; amplify the audio signal to generate an amplified signal; and while receiving the plurality of microphone signals, generate output audio using at least one speaker associated with the electronic speaker device in response to the amplified signal. a hardware processor that is configured to: . An electronic speaker device that provides spatial audio, wherein the electronic speaker device comprises:

9

claim 8 . The electronic speaker device of, wherein generating the output audio using the at least one speaker associated with the electronic speaker device in response to the amplified signal further comprises causing the hardware processor to select a speaker from a plurality of speakers that are included in the electronic speaker device.

10

claim 8 determining a beamformed signal from the amplified signal that corresponds with a listening zone in an environment that includes the media device and the electronic speaker device; and causing a plurality of speakers within the electronic speaker device to produce the output audio corresponding to the beamformed signal. . The electronic speaker device of, wherein generating the output audio using the at least one speaker associated with the electronic speaker device in response to the amplified signal further comprises:

11

claim 8 . The electronic speaker device of, wherein the hardware processor is further configured to perform a first weighted delay and sum calculation on the plurality of microphone signals to separate the input audio being provided by the media device from background audio signals.

12

claim 11 . The electronic speaker device of, wherein the first weighted delay and sum calculation further comprises complex valued weights comprising an amplitude and a phase.

13

claim 11 . The electronic speaker device of, wherein the first weighted delay and sum calculation further comprises weights determined in a second weighted delay and sum calculation.

14

claim 8 in response to performing directional processing on the plurality of microphone signals to produce the audio signal, cause a plurality of weights to be determined, wherein the plurality of weights correspond to the one or more locations; and cause the plurality of weights to be stored in the electronic speaker device. . The electronic speaker device of, wherein the input audio comprises a set of audio tones produced from one or more locations, and wherein the hardware processor is further configured to:

15

receiving, from a microphone of a plurality of microphones associated with an electronic speaker device that is not paired with a media device, a plurality of microphone signals that are responsive to input audio being provided by the media device; performing directional processing on the plurality of microphone signals to produce an audio signal that corresponds with the input audio being provided by the media device; amplifying the audio signal to generate an amplified signal; and while receiving the plurality of microphone signals, generating output audio using at least one speaker associated with the electronic speaker device in response to the amplified signal. . A non-transitory computer-readable medium containing computer executable instructions that, when executed by a processor, cause the processor to execute method for providing spatial audio, the method comprising:

16

claim 15 . The non-transitory computer-readable medium of, wherein generating the output audio using the at least one speaker associated with the electronic speaker device in response to the amplified signal further comprises selecting a speaker from a plurality of speakers that are included in the electronic speaker device.

17

claim 15 determining a beamformed signal from the amplified signal that corresponds with a listening zone in an environment that includes the media device and the electronic speaker device; and causing a plurality of speakers within the electronic speaker device to produce the output audio corresponding to the beamformed signal. . The non-transitory computer-readable medium of, wherein generating the output audio using the at least one speaker associated with the electronic speaker device in response to the amplified signal further comprises:

18

claim 15 . The non-transitory computer-readable medium of, wherein performing directional processing further comprises performing a first weighted delay and sum calculation on the plurality of microphone signals to separate the input audio being provided by the media device from background audio signals.

19

claim 18 . The non-transitory computer-readable medium of, wherein the first weighted delay and sum calculation further comprises complex valued weights comprising an amplitude and a phase.

20

claim 18 . The non-transitory computer-readable medium of, wherein the first weighted delay and sum calculation further comprises weights determined in a second weighted delay and sum calculation.

21

claim 15 in response to performing directional processing on the plurality of microphone signals to produce the audio signal, causing a plurality of weights to be determined, wherein the plurality of weights correspond to the one or more locations; and causing the plurality of weights to be stored in the electronic speaker device. . The non-transitory computer-readable medium of, wherein the input audio comprises a set of audio tones produced from one or more locations, and wherein the method further comprises:

Detailed Description

Complete technical specification and implementation details from the patent document.

The disclosed subject matter relates to methods, systems, and media for providing spatial audio by synchronizing output audio from one or more speaker devices to input audio from a directional audio source.

Modern audio-visual experiences include features such as room-wide spatial audio while streaming content from a television device. One common way to achieve this is to use external speakers connected to the television device, either using a wired connection or a wireless connection (e.g., Bluetooth).

Such configurations, however, are not always available to users. For example, the television device may not be capable of wireless pairing. As another example, the television device may have a maximum number of speaker connection ports (either wired or wireless) that limit the user's ability to implement enough speakers for a spatial audio experience. Additionally, wireless speaker connections can, at times, be unreliable (e.g. where packets are dropped due to interference), thereby creating a frustrating audio streaming experience for the user.

Accordingly, it is desirable to provide new methods, systems, and media for providing spatial audio by synchronizing output audio from one or more speaker devices to input audio from a directional audio source.

In accordance with some implementations of the disclosed subject matter, methods, systems, and media for providing spatial audio by synchronizing output audio from one or more speaker devices to input audio from a directional audio source are provided.

In accordance with some implementations of the disclosed subject matter, a method for providing spatial audio is provided, the method comprising: receiving, from a microphone of a plurality of microphones associated with an electronic speaker device that is not paired with a media device, a plurality of microphone signals that are responsive to input audio being provided by the media device; performing directional processing on the plurality of microphone signals to produce an audio signal that corresponds with the input audio being provided by the media device; amplifying the audio signal to generate an amplified signal; and, while receiving the plurality of microphone signals, generating output audio using at least one speaker associated with the electronic speaker device in response to the amplified signal.

In some implementations, generating the output audio using the at least one speaker associated with the electronic device in response to the amplified signal further comprises selecting a speaker from a plurality of speakers that are included in the electronic speaker device.

In some implementations, generating the output audio using the at least one speaker associated with the electronic device in response to the amplified signal further comprises: determining a beamformed signal from the amplified signal that corresponds with a listening zone in an environment that includes the media device and the electronic speaker device; and causing a plurality of speakers within the electronic speaker device to produce the output audio corresponding to the beamformed signal.

In some implementations, performing directional processing further comprises performing a first weighted delay and sum calculation on the plurality of microphone signals to separate the input audio being provided by the media device from background audio signals. In some implementations, the first weighted delay and sum calculation further comprises complex valued weights comprising an amplitude and a phase. In some implementations, the first weighted delay and sum calculation further comprises weights determined in a second weighted delay and sum calculation.

In some implementations, the input audio comprises a set of audio tones produced from one or more locations, and wherein the method further comprises: in response to performing directional processing on the plurality of microphone signals to produce the audio signal, causing a plurality of weights to be determined, wherein the plurality of weights correspond to the one or more locations; and causing the plurality of weights to be stored in the electronic speaker device.

In accordance with some implementations of the disclosed subject matter, an electronic speaker device that provides spatial audio is provided, wherein the electronic speaker device comprises a plurality of microphones, a plurality of speakers, and a hardware processor that is configured to: receive, from a microphone of the plurality of microphones associated with the electronic speaker device that is not paired with a media device, a plurality of microphone signals that are responsive to input audio being provided by the media device; perform directional processing on the plurality of microphone signals to produce an audio signal that corresponds with the input audio being provided by the media device; amplify the audio signal to generate an amplified signal; and, while receiving the plurality of microphone signals, generate output audio using at least one speaker associated with the electronic speaker device in response to the amplified signal.

In accordance with some implementations of the disclosed subject matter, a nontransitory computer-readable medium containing computer executable instructions that, when executed by a processor, cause the processor to execute method for providing spatial audio is provided, the method comprising: receiving, from a microphone of a plurality of microphones associated with an electronic speaker device that is not paired with a media device, a plurality of microphone signals that are responsive to input audio being provided by the media device; performing directional processing on the plurality of microphone signals to produce an audio signal that corresponds with the input audio being provided by the media device; amplifying the audio signal to generate an amplified signal; and, while receiving the plurality of microphone signals, generating output audio using at least one speaker associated with the electronic speaker device in response to the amplified signal.

In accordance with some implementations of the disclosed subject matter, a system for providing spatial audio is provided, the system comprising: means for receiving, from a microphone of a plurality of microphones associated with an electronic speaker device that is not paired with a media device, a plurality of microphone signals that are responsive to input audio being provided by the media device; means for performing directional processing on the plurality of microphone signals to produce an audio signal that corresponds with the input audio being provided by the media device; means for amplifying the audio signal to generate an amplified signal; and, while receiving the plurality of microphone signals, means for generating output audio using at least one speaker associated with the electronic speaker device in response to the amplified signal.

In accordance with some implementations, mechanisms (which can include methods, systems, and media) for providing spatial audio by synchronizing output audio from one or more speaker devices to input audio from a directional audio source are provided.

Mechanisms are presented for unpaired smart speaker devices and/or other smart home devices that can be configured to provide room-scale spatial audio. By using a microphone array that is associated with each of multiple smart speaker devices, the mechanisms can beamform audio that is output from a media device speaker or other suitable audio output device associated with a media device and can configure one or more of the smart speaker devices to play the audio back in unison towards a user that is consuming the content being presented by the media device, thereby creating an immersive spatial audio experience.

For example, the mechanisms described herein can use beamforming for the one or more microphones in a microphone array within a smart speaker device or any other suitable smart home device to focus and understand audio from an audio source (e.g., a media device). Without the use of beamforming, the smart speaker would amplify all non-audio source sounds in the environment (e.g., background sounds in a room that the media device and the smart speaker devices are located).

In some implementations, the mechanisms can beamform incoming audio by using a weighted delay-and-sum principle to amplify and/or accept sounds coming from the direction of interest and reject sounds or portions of an audio signal that are not coming from the direction of interest. A calculation using the weighted delay-and-sum principle can use waveforms from multiple microphones and can multiply each waveform by a complex-valued weight. The weights (amplitude and/or phase) can be tuned to represent different distances from the audio source and/or to reject unwanted audio that is not coming from the direction of interest. In continuing this example, each weighted waveform can be included in a summation to produce a beamformed audio waveform, which can then be amplified and/or played from one or more speakers within the smart speaker device.

The beamforming approach and calculation can also be used to generate output audio that is synchronized with the audio being output by the media device (e.g., a television device). For example, in the implementation in which a smart speaker device has multiple output speakers (e.g., one or more woofers and/or one or more tweeters), the weighted delay-and-sum principle can again be used to send weighted signals to each speaker and thus produce output audio in a desired direction.

1 5 FIGS.- These and other features for providing spatial audio by synchronizing output audio from one or more speaker devices to input audio from a directional audio source are described further in connection with.

1 FIG. 4 FIG. 100 100 402 406 100 Turning to, an example flow diagram of an illustrative processfor providing spatial audio by synchronizing output audio from one or more electronic speaker devices to input audio from a directional audio source in accordance with some implementations of the disclosed subject matter is shown. In some implementations, processcan run on a server, such as server, and/or a user device, such as user devices, described below in connection with. In some implementations, processcan run on any device that includes at least a microphone array and/or a speaker array (e.g., a smart speaker device, an assistant device, a smart home device, etc.).

102 100 100 2 2 FIGS.A andB In some implementations, at, processcan receive, from a plurality of microphones associated with an electronic speaker device, a plurality of microphone signals. In some implementations, each signal in the plurality of microphone signals can be generated by a microphone in a plurality of microphones. In some implementations, the plurality of microphones can be responsive to input audio. For example, in some implementations, a media device, such as a television device, can be located in the same room as an electronic speaker device that is executing process, as discussed below in connections with, and a microphone array within the electronic speaker device can create microphone signals from audio detected within the room, including, but not limited to, any audio being output by the media device. In a more particular example, the electronic speaker device can be located nearby a media device that is presenting media content while not being paired or otherwise associated with the media device, where the one or more microphones in the microphone array of the electronic speaker device can detect audio that can include the audio being output by the media device is presenting the media content. In another more particular example, the electronic speaker device can be located nearby a media device that is presenting media content while not being paired or otherwise associated with the media device, where a mobile device that is executing a media setup application can instruct the media device to provide output audio (e.g., an audio clip) for detection by one or more microphones in the microphone array of the electronic speaker device.

104 100 3 3 FIGS.A andB In some implementations, at, processcan perform directional processing on the plurality of microphone signals to produce an audio signal. In some implementations, directional processing can include any suitable modeling or calculation. For example, as discussed below in connection with, directional processing can include a weighted delay-and-sum calculation on the microphone signals in some implementations. In some implementations, directional processing can separate background sounds (e.g., talking, HVAC noise, pet noises, traffic noise, etc.) from sounds being produced by a nearby media device and/or any other suitable audio source.

106 100 100 100 In some implementations, at, processcan amplify the audio signal to generate an amplified signal. In some implementations, processcan amplify the audio signal using any suitable hardware and/or mechanisms. For example, in some implementations, processcan use a spectral representation (e.g., a frequency domain representation, a waveform, etc.) of the audio signal and increase the strength of the signal (e.g., adjust DC offset, increase amplitude, etc.) uniformly, above or below a specific frequency, in a specific frequency band, and/or using any other criteria.

100 In some implementations, processcan beamform incoming audio that is detected by one or more microphones of a microphone array in an electronic speaker device by using a weighted delay-and-sum algorithm that can amplify and/or accept sounds coming from the nearby media device that is in the direction of interest and reject sounds or portions of an audio signal that are not coming from the nearby media device that is in the direction of interest. For example, a calculation using the weighted delay-and-sum algorithm can use waveforms from multiple microphones in the microphone array and can multiply each waveform by a complex-valued weight. The weights based on amplitude and/or phase can be tuned to represent different distances from the audio source (e.g., the media device) and/or to reject unwanted audio that is not coming from the media device that is in the direction of interest.

108 100 100 100 108 100 2 2 FIGS.A andB In some implementations, at, processcan determine a beamformed signal from the amplified signal. In some implementations, processcan use a weighted delay-and-sum algorithm to form a beamformed signal from the amplified signal. In some implementations, processcan determine the direction of the output audio atthrough any suitable beamforming technique. For example, in some implementations, a room with the electronic speaker device executing processcan have a preset target listening zone, as discussed below in connection with. In another example, a mobile device that is executing a media setup application can prompt the user to provide a verbal command or any other suitable input audio to determine a target listening zone within the environment of the media device and the one or more electronic speaker devices.

100 It should be noted that, in some implementations, processcan determine and/or manage at least two audio paths—e.g., one beamformed signal received from the media device and one beamformed signal transmitted to a target listening zone in which a user of the media device is consuming media content.

110 100 100 100 100 100 110 100 In some implementations, at, processcan calculate a delay for the output audio to be provided by the electronic speaker device that is relative to the input audio detected from the media device. In some implementations, processcan use any suitable information, such as a source-to-speaker distance (e.g., a media device-to-speaker device distance) to calculate any suitable delay value. In some implementations, processcan use a previously calculated delay value (e.g., from previous executions of process). For example, in some implementations, an electronic speaker device can be 1 meter away from a media device (the audio source), and processcan use a speed of sound of 343 meters-per-second (m/s) to calculate a 3 millisecond audio delay at. In some implementations, processcan use any suitable mechanism to determine the source-to-speaker distance.

100 In some implementations, a mobile device that is executing a media setup application can determine a delay for the output audio to be provided by the electronic speaker device that is relative to the input audio detected from the media device. For example, the mobile device that is executing the media setup application can instruct the media device to transmit an audio sample that is detected by the microphone array of the electronic speaker device, where the electronic speaker device executing processcan calculate the delay for the output audio and can generate output audio based on the detected audio sample from the media device and where the microphone array of the electronic speaker device can detect the audio sample and the generated output audio that is played back with the calculated delay to determine whether the audio sample and the generated output audio are in synchronization.

112 100 100 100 In some implementations, at, processcan select at least one speaker from a plurality of speakers associated with the electronic speaker device to provide an output. For example, in some implementations, an electronic speaker device can have a speaker array that is incorporated within the electronic speaker device, where the speaker array can produce output audio to any suitable direction. Continuing this example, in some implementations, processcan select a portion of the speaker array that is facing a particular direction to produce output audio. In some implementations, processcan select all of the speakers in the speaker array to provide output audio (e.g., in multiple directions).

100 100 100 Additionally or alternatively, in some implementations in which the electronic speaker device has multiple speakers of varying types, processcan select particular speakers within the electronic speaker device for providing an output audio signal based on audio capabilities. For example, processcan select tweeters on electronic speaker devices that are determined to have an audio path from the media device that is less than a threshold value and can select woofers on electronic speaker devices that are determined to have an audio path from the media device that is greater than the threshold value. In another example, processcan select particular speakers from electronic speaker devices based on its relative position to the media device (e.g., whether the electronic speaker device is located on the left side or the right side of the media device).

114 100 112 106 100 108 112 100 2 FIG.B In some implementations, at, processcan produce output audio using the speakers selected atand the amplified signal from. Additionally or alternatively, in some implementations, processcan use the beamformed signal fromand the speakers selected atto produce output audio. In some implementations, processcan produce output audio to an output direction by sending the beamformed signal to speakers in the output direction, as discussed below in connection with.

114 100 110 114 Additionally, at, processcan include any suitable audio delay, such as an audio delay calculated at. In some implementations, output audio produced atcan be synchronized to incoming audio that is provided by the media device (e.g., the output from the media device that is presenting media content).

116 100 102 100 100 At, processcan loop toin some implementations. In some implementations, processcan loop at any suitable rate, and for any suitable number of iterations. In some implementations, processcan operate continuously while the electronic speaker device is powered on.

100 100 104 100 100 100 In some implementations, processcan end at any suitable time and through any suitable mechanism. For example, in some implementations, processcan end atwhen directional processing produces an audio signal substantially different from a previous iteration of process(e.g., the media device is turned off, the media device is playing a commercial, etc.). In another example, in some implementations, processcan end in response to detecting that a user has left the listening zone or a proximity of the media device and/or the electronic speaker device, and/or directs the electronic speaker device to end process(e.g., powers off the electronic speaker device, selects an option from an options menu, etc.), and/or has any other suitable user interaction with the electronic speaker device.

100 100 100 1 FIG. It should be understood that at least some of the above-described blocks of processcan be executed or performed in any order or sequence not limited to the order and sequence shown in and described in connection with. Also, some of the above blocks of processcan be executed or performed substantially simultaneously where appropriate or in parallel to reduce latency and processing times. Additionally or alternatively, some of the above described blocks of processcan be omitted.

2 FIG.A 200 200 202 210 220 230 Turning to, an example illustration of a roomwith audio-visual equipment in accordance with some implementations of the disclosed subject matter is shown. In some implementations, roomcan include equipment such as media device, electronic speaker devicesand, and listening zone.

202 202 202 200 In some implementations, media devicecan be any suitable display device (e.g., a cathode-ray tube television, a liquid crystal display panel, a computer monitor, an organic light emitting diode (OLED) panel, a projection system, etc.) having any suitable size and/or any suitable resolution. In some implementations, media devicecan be connected to any suitable periphery device(s) (e.g., a set-top box, a cable box, a media rendering device, a gaming console, a DVD player, a Blu-Ray disk player, a home theater, a soundbar speaker, etc.). In some implementations, media devicecan be placed at any suitable location within room.

202 202 204 206 204 206 202 204 206 202 2 FIG.A In some implementations, media devicecan be playing any suitable media content which results in media deviceproducing audio wavesand. In some implementations, audio wavesandcan be any suitable audio at any suitable audio frequencies. In some implementations, media devicecan produce audio wavesandusing any suitable mechanism including mechanisms not shown in. For example, in some implementations, media devicecan produce audio using built-in speakers, periphery speakers, and/or any other suitable speaker arrangement.

204 206 204 202 206 202 2 FIG.A 2 FIG.A In some implementations, audio wavesandcan travel in any suitable direction. For example, as shown in, audio wavecan originate from the left-hand viewing side of media device, and travel in a radial direction outwards in some implementations. Similarly, in another example, as shown in, audio wavecan originate from the right-hand viewing side of television, and travel in a radial direction outwards in some implementations.

200 204 206 202 2 FIG.A Note that although roominshows audio wavesand, in some implementations, any suitable number of audio waves can be produced by media device.

210 220 210 220 210 220 210 220 210 220 3 5 FIGS.A and In some implementations, electronic speaker devicesandcan be any suitable computing device. In some implementations, electronic speaker devicesandcan include a microphone array, a speaker array, and/or any other suitable hardware, as discussed below in connection with. In some implementations, electronic speaker devicesandcan be networked speakers (e.g., using Wi-Fi, Ethernet, etc.). In some implementations, electronic speaker devicesandcan be associated with a user account that can be used to store any suitable information relating to electronic speaker devicesand.

210 220 202 210 220 202 In some implementations, electronic speaker devicesandcan refrain from using any type of wired or wireless connection to connect with television. For example, in some implementations, electronic speaker deviceand/or electronic speaker devicecan have a Bluetooth antenna or any other suitable wireless antenna and can be unpaired or otherwise not associated with media device.

210 220 200 210 220 In some implementations, electronic speaker devicesandcan be placed at any suitable positions within room. For example, in some implementations, electronic speaker devicecan be positioned in one corner of the room on a side table. Continuing this example, electronic speaker devicecan be positioned in an opposite corner on a table of a different height, in some implementations.

200 210 220 200 202 2 FIG.A Note that although roominshows electronic speaker devicesand, in some implementations, any suitable number of electronic speaker devices can be included in room. For example, in some implementations, a media setup application can allow a user to select particular electronic speaker devices to generate an audio output that is synchronized with the output being provided by media device.

230 200 230 230 210 220 230 2 FIG.A 2 FIG.A In some implementations, listening zonecan be any suitable location within room. As illustrated in, listening zonecan include a seating area (e.g., couch) and/or any other suitable furniture. In some implementations, listening zonecan be defined through any suitable mechanism. For example, in some implementations, a user can associate additional devices (e.g., phone, remote, tablet, computer, etc.) not shown inwith the same user account that have electronic speaker devicesandassociated. Continuing this example, in some implementations, listening zonecan be associated with the user account.

230 230 202 In a more particular example, a media setup application executing on a mobile device can allow a user to define listening zoneand the electronic speaker devices or any other suitable devices having one or more microphones and/or one or more speakers to produce spatial audio. In continuing this example, the media setup application executing on the mobile device can allow a user to indicate a region within listening zonein which the user is positioned to consume media content from media device.

230 230 230 210 220 230 In some implementations, listening zonecan be activated by a user device (e.g., phone, remote, tablet, computer, etc.) having any suitable location capability (e.g., GPS, Bluetooth) entering a set of pre-defined coordinates that are stored and/or otherwise associated with the user account. In some implementations, listening zonecan be activated by a button, menu selection, and/or any other suitable user interaction on a user device. In some implementations, listening zonecan be activated by a voice command. For example, in some implementations, a user can direct a voice command (e.g., a wake word, a wake phrase, etc.) to electronic speaker devicesand/or, followed by any suitable voice instruction to initiate listening zone.

2 FIG.B 2 FIG.A 250 250 202 210 220 200 250 212 214 216 210 222 224 226 220 Turning to, an example illustration of a roomwith audio-visual equipment in accordance with some implementations of the disclosed subject matter is shown. In some implementations, roomcan include equipment such as media device, electronic speaker devicesand, and listening zone discussed in connection with roominabove. In some implementations, roomcan additionally include a beamforming zone, an output audio, and a beamforming zoneassociated with electronic speaker device, and/or a beamforming zone, an output audio, and a beamforming zoneassociated with electronic speaker device.

212 222 250 210 220 202 212 210 204 222 220 206 212 222 212 206 222 204 In some implementations, beamforming zonesandcan correspond to a region in roomwhere microphones within electronic speaker devicesand(respectively) can process audio from media device. For example, in some implementations, beamforming zonecan correspond to a region where microphones within electronic speaker devicecan process audio waveinto microphone signals. In another example, in some implementations, beamforming zonecan correspond to a region where microphones within electronic speaker devicecan process audio waveinto microphone signals. In some implementations, beamforming zonesandcan be any suitable size, shape, and/or volume. For example, in some implementations, beamforming zonecan additionally include audio waveand beamforming zonecan include audio wave.

212 222 100 212 222 104 In some implementations, beamforming zonesandcan be determined during the execution of process. For example, in some implementations, beamforming zonesandcan be used atto perform directional processing on the plurality of microphone signals.

212 222 100 202 210 220 212 222 212 222 210 220 212 222 In some implementations, beamforming zonesandcan be determined during a particular execution of processto calibrate the location of media devicewith respect to electronic speaker devicesand. In some implementations, beamforming zonesandcan be described by weights within a weighted delay-and-sum calculation. In some implementations, the weights describing beamforming zonesandcan be stored on the respective electronic speaker devicesandand/or in any other suitable device. In some implementations, beamforming zonesandcan be described by any suitable mathematical representation and/or calculation.

210 214 210 214 204 206 202 220 224 220 224 204 206 202 In some implementations, electronic speaker devicecan produce output audioin any suitable direction. In some implementations, electronic speaker devicecan produce output audiosynchronized to audio waveand/or audio wavefrom media device. Similarly, electronic speaker devicecan produce output audioin any suitable direction. In some implementations, electronic speaker devicecan produce output audiosynchronized to audio waveand/or audio wavefrom media device.

214 224 106 108 100 1 FIG. In some implementations, output audioandcan be produced from an amplified and/or beamformed signal, such as the signals discussed atandof processdescribed in connection withabove.

216 226 250 210 220 216 210 214 204 216 230 226 220 224 206 226 230 In some implementations, beamforming zonesandcan correspond to a region in roomwhere a speaker array within electronic speaker devicesand(respectively) can produce output audio. For example, in some implementations, beamforming zonecan correspond to a region where a speaker array within electronic speaker devicecan produce output audiosynchronized to audio. In some implementations, beamforming zonecan additionally be directed towards listening zoneand/or any other suitable direction. In another example, in some implementations, beamforming zonecan correspond to a region where a speaker array within electronic speaker devicecan produce output audiosynchronized to audio. In some implementations, beamforming zonecan be directed towards listening zoneand/or any other suitable direction.

216 226 100 216 226 108 100 1 FIG. In some implementations, beamforming zonesandcan be determined during execution of process. For example, in some implementations, beamforming zonesandcan be calculated atof processas described in connection withabove.

216 226 100 230 210 220 216 226 216 226 210 220 216 226 In some implementations, beamforming zonesandcan be determined during a particular execution of processto calibrate the location of listening zonewith respect to electronic speaker devicesand. In some implementations, beamforming zonesandcan be described by weights within a weighted delay-and-sum calculation. In some implementations, the weights describing beamforming zonesandcan be stored on the respective electronic speaker devicesand, and/or stored on any suitable device. In some implementations, beamforming zonesandcan be described by any suitable mathematical representation and/or calculation.

3 FIG.A 2 2 FIGS.A andB 300 300 210 310 320 210 Turning to, an example illustrationof an electronic speaker device in accordance with some implementations is shown. As illustrated, exampleincludes electronic speaker device, a microphone array, and a speaker array. In some implementations, electronic speaker devicecan be any suitable speaker device, such as that described in connection withabove.

310 310 102 100 1 FIG. In some implementations, microphone arraycan include any suitable number of microphones. In some implementations, microphone arraycan create any suitable microphone signals, as described atof processin connection withabove.

310 311 312 313 311 312 313 310 310 311 312 313 5 FIG. As illustrated, microphone arraycan include microphones,, andin some implementations. In some implementations, microphones,, andcan be positioned within microphone arrayin any suitable arrangement, location, and/or orientation. In some implementations, microphone arrayand/or microphones,, andcan include any other suitable hardware such as audio drivers, as discussed below in connection with.

320 320 102 100 1 FIG. In some implementations, speaker arraycan include any suitable number of speakers. In some implementations, speaker arraycan create any suitable output audio from an amplified and/or beamformed signal, as described atof processin connection withabove.

320 321 322 321 322 320 320 321 322 5 FIG. As illustrated, speaker arrayincludes speakersandin some implementations. In some implementations, speakersandcan be positioned within speaker arrayin any suitable arrangement, location, and/or orientation. In some implementations, speaker arrayand/or speakersandcan include any other suitable hardware such as audio drivers, as discussed below in connection with.

3 FIG.B 350 350 360 370 380 Turning to, an example equationof a beamforming calculation in accordance with some implementations is shown. As illustrated, equationrepresents an application of a weighted delay-and-sum principle to microphone signalsusing variable weightsto calculate an audio signal.

360 310 311 350 312 313 350 311 312 313 In some implementations, microphone signalscan be electronic signals produced in microphone array. For example, in some implementations, microphonecan produce a corresponding microphone signal represented in equationas x. Continuing this example, in some implementations, microphonesandcan produce corresponding microphone signals represented in equationas xand x, respectively.

360 360 In some implementations, microphone signalscan be of any suitable duration. In some implementations, microphone signalscan be any suitable bit depth and/or audio quality.

370 350 311 370 350 372 374 370 100 380 100 370 311 380 313 100 370 311 311 In some implementations, weightscan be any suitable real-valued (positive or negative) and/or complex valued number. In some implementations, each microphone signal can have a unique weight associated with the microphone signal in equation. For example, in some implementations, microphone signal xfrom microphonecan use weight w. In some implementations, weightscan be complex-valued and can be represented in equationwith an amplitudeand a phase. In some implementations, weightscan be used in processas part of performing directional processing to produce an audio signal. For example, in some implementations, processcan adjust the values (e.g., amplitude and phase) of weightsso that microphone signals from microphonehave a stronger influence on audio signalthan microphone. In some implementations, any of the individual weights can be set or otherwise initialized to a value of 0. In some implementations, processcan use any suitable mechanism (e.g., machine learning models, audio models of a living room, prior calibration values, etc.) to adjust the values of weights.

380 380 350 380 In some implementations, audio signalcan be any suitable audio waveform. In some implementations, audio signalcan be the output from equation. In some implementations, audio signalcan be an isolated audio waveform from a direction of interest (e.g., a television show) with background sounds removed (e.g., talking, pet noises, etc.).

350 310 310 A practical application of equationcan include a weight and a microphone signal from all microphones in microphone array, with each weight multiplying the associated microphone signal to produce the audio signal, as shown in Equation 1 below where three terms in the weight-and-delay summation are shown as there are three microphones in microphone array:

350 320 100 350 380 100 370 321 322 350 In some implementations, equationcan additionally represent a weighted delay-and-sum beamforming calculation to produce output audio that can be played by an array of speakers such as speakers. For example, in some implementations, processcan use equationwith a desired output audio as audio signal. In this example, in some implementations, processcan use weightsto determine speaker signals that can be sent to individual speakersand. In some implementations, equationcan include any other suitable terms (e.g., a delay term) that can be used to determine speaker signals.

4 FIG. 400 400 402 404 406 408 410 Turning to, an illustrative exampleof hardware for providing spatial audio by synchronizing output audio from one or more speaker devices to input audio from a directional audio source in accordance with some implementations is shown. As illustrated, hardwarecan include a server, a communication network, and/or one or more user devices, such as user devicesand.

402 402 402 350 3 FIG.B Servercan be any suitable server(s) for storing information, data, programs, media content, and/or any other suitable content. In some implementations, servercan perform any suitable function(s). For example, in some implementations, servercan perform calculations shown in equationas discussed above in connection with.

404 406 412 404 414 402 406 402 Communication networkcan be any suitable combination of one or more wired and/or wireless networks in some implementations. For example, communication network can include any one or more of the Internet, an intranet, a wide-area network (WAN), a local-area network (LAN), a wireless network, a digital subscriber line (DSL) network, a frame relay network, an asynchronous transfer mode (ATM) network, a virtual private network (VPN), and/or any other suitable communication network. User devicescan be connected by one or more communications links (e.g., communications links) to communication networkthat can be linked via one or more communications links (e.g., communications links) to server. The communications links can be any communications links suitable for communicating data among user devicesand serversuch as network links, dial-up links, wireless links, hard-wired links, any other suitable communications links, or any suitable combination of such links.

406 100 406 User devicescan include any one or more user devices suitable for use with process. In some implementations, user devicecan include any suitable type of user device, such as speakers (with or without voice assistants), mobile phones, tablet computers, wearable computers, laptop computers, desktop computers, smart televisions, media players, game consoles, vehicle information and/or entertainment systems, and/or any other suitable type of user device.

402 402 402 Although serveris illustrated as one device, the functions performed by servercan be performed using any suitable number of devices in some implementations. For example, in some implementations, multiple devices can be used to implement the functions performed by server.

408 410 4 FIG. Although two user devicesandare shown into avoid overcomplicating the figure, any suitable number of user devices, (including only one user device) and/or any suitable types of user devices, can be used in some implementations.

402 406 402 406 500 502 504 506 508 510 512 504 516 518 5 FIG. Serverand user devicescan be implemented using any suitable hardware in some implementations. For example, in some implementations, devicesandcan be implemented using any suitable general-purpose computer or special-purpose computer and can include any suitable hardware. For example, as illustrated in example hardwareof, such hardware can include hardware processor, memory and/or storage, an input device controller, an input device, display/audio drivers, display and audio output circuitry, communication interface(s), an antenna, and a bus.

502 502 504 502 Hardware processorcan include any suitable hardware processor, such as a microprocessor, a micro-controller, digital signal processor(s), dedicated logic, and/or any other suitable circuitry for controlling the functioning of a general-purpose computer or a special-purpose computer in some implementations. In some implementations, hardware processorcan be controlled by a computer program stored in memory and/or storage. For example, in some implementations, the computer program can cause hardware processorto perform functions described herein.

504 504 Memory and/or storagecan be any suitable memory and/or storage for storing programs, data, documents, and/or any other suitable information in some implementations. For example, memory and/or storagecan include random access memory, read-only memory, flash memory, hard disk storage, optical media, and/or any other suitable memory.

506 508 506 508 310 Input device controllercan be any suitable circuitry for controlling and receiving input from one or more input devicesin some implementations. For example, input device controllercan be circuitry for receiving input from a touchscreen, from a keyboard, from a mouse, from one or more buttons, from a voice recognition circuit, from one or more microphones, from a camera, from an optical sensor, from an accelerometer, from a temperature sensor, from a near field sensor, and/or any other type of input device. For example, input devicescan be a series of microphones such as microphone array.

510 512 510 510 310 320 510 106 100 1 FIG. Display/audio driverscan be any suitable circuitry for controlling and driving output to one or more display/audio output devicesin some implementations. For example, display/audio driverscan be circuitry for driving a touchscreen, a flat-panel display, a cathode ray tube display, a projector, a speaker or speakers, and/or any other suitable display and/or presentation devices. For example, display/audio driverscan be used in connection with microphone arrayand/or speaker array. In another example, in some implementations, display/audio driverscan include circuitry for amplifying audio signals atof process, as described in connection withabove.

514 404 514 4 FIG. Communication interface(s)can be any suitable circuitry for interfacing with one or more communication networks, such as networkas shown in. For example, interface(s)can include network interface card circuitry, wireless communication circuitry, and/or any other suitable type of communication network circuitry.

516 404 516 Antennacan be any suitable one or more antennas for wirelessly communicating with a communication network (e.g., communication network) in some implementations. In some implementations, antennacan be omitted.

518 502 504 506 510 514 Buscan be any suitable mechanism for communicating between two or more components,,,, andin some implementations.

500 Any other suitable components can be included in hardwarein accordance with some implementations.

In some implementations, any suitable computer readable media can be used for storing instructions for performing the functions and/or processes described herein. For example, in some implementations, computer readable media can be transitory or non-transitory. For example, non-transitory computer readable media can include media such as non-transitory forms of magnetic media (such as hard disks, floppy disks, etc.), non-transitory forms of optical media (such as compact discs, digital video discs, Blu-ray discs, etc.), non-transitory forms of semiconductor media (such as flash memory, electrically programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), etc.), any suitable media that is not fleeting or devoid of any semblance of permanence during transmission, and/or any suitable tangible media. As another example, transitory computer readable media can include signals on networks, in wires, conductors, optical fibers, circuits, any suitable media that is fleeting and devoid of any semblance of permanence during transmission, and/or any suitable intangible media.

Although the invention has been described and illustrated in the foregoing illustrative implementations, it is understood that the present disclosure has been made only by way of example, and that numerous changes in the details of implementation of the invention can be made without departing from the spirit and scope of the invention, which is limited only by the claims that follow. Features of the disclosed implementations can be combined and rearranged in various ways.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

November 14, 2022

Publication Date

June 11, 2026

Inventors

Dongeek Shin

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “METHODS, SYSTEMS, AND MEDIA FOR PROVIDING SPATIAL AUDIO BY SYNCHRONIZING OUTPUT AUDIO FROM ONE OR MORE SPEAKER DEVICES TO INPUT AUDIO FROM A DIRECTIONAL AUDIO SOURCE” (US-20260164177-A1). https://patentable.app/patents/US-20260164177-A1

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.