Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A system comprising: an audio playback device configured to drive an audio reproduction device at a volume level; an audio capture device configured to convert sound waves into an audio input; and an environment sensing device configured to detect, based on the audio input, environmental conditions surrounding a user of the audio playback device, the environmental conditions including a loudness estimation indicative of a level of background noise included in the audio input and an audio content classification indicative of a presence of speech in the audio input, determine, according to the environmental conditions, a playback action to alter the volume level being provided by the audio playback device, and provide, to the audio playback device, an adjustment to the volume level in accordance with the playback action.
A system adjusts audio playback volume based on environmental conditions. It includes an audio playback device, a microphone, and an environment sensing device. The microphone captures audio input. The environment sensing device analyzes the audio input to determine background noise level (loudness estimation) and whether speech is present (audio content classification). Based on these environmental conditions, the system determines if the volume should be changed and sends an adjustment signal to the audio playback device.
2. The system of claim 1 , wherein the environment sensing device is further configured to determine the playback action according to a mapping of environmental conditions to playback actions, wherein the mapping specifies to lower the volume level when: (i) the audio content classification indicates the presence of speech and the loudness estimation indicates a loudness condition below a predefined sound pressure level, or (ii) the audio content classification indicates a presence of background noise and the loudness estimation indicates a loudness condition above the predefined sound pressure level.
The system from the previous claim adjusts volume based on a mapping of environmental conditions to actions. This mapping lowers the volume if either: (1) speech is detected and the background noise is below a certain quiet threshold, or (2) background noise is detected and the noise level is above a certain loud threshold. Therefore, the system intelligently lowers the volume when someone is speaking nearby or when the surrounding area becomes noisy.
3. The system of claim 1 , wherein the audio capture device is integrated into one of: (i) the audio playback device, or (ii) the audio reproduction device.
The system from the first claim integrates the microphone into either the audio playback device or the audio reproduction device (e.g. speaker or headphones). This means the microphone can be built directly into the device playing the audio or the device outputting the audio, creating a compact and integrated design.
4. The system of claim 1 , wherein the environment sensing device is further configured to: determine an average absolute amplitude of the audio input; identify a sound pressure level according to the average absolute amplitude according to a sound-input-level characterization of the audio capture device; count a number of samples of the audio input that exceed a pre-determined loudness threshold over a predetermined period of time; and determine that the loudness estimation corresponds to one of (i) a high loudness condition when the count exceeds a threshold value, (ii) a low loudness condition when the count does not exceed the threshold value, and (iii) a silence condition when the audio input includes substantially no sound information.
The system from the first claim calculates background noise level by: (1) calculating the average amplitude of the audio input; (2) determining a corresponding sound pressure level; (3) counting the number of audio samples exceeding a preset loudness threshold over a period of time. The loudness is categorized as high if the count exceeds a limit, low if it doesn't, or silence if there's nearly no sound.
5. The system of claim 1 , wherein the environment sensing device is further configured to: pass the audio input through a band pass filter to select for first formants of speech; estimate a pitch of the audio input; count a number of samples of the audio input in which the pitch is within a range for a first formant of speech over a predetermined period of time; and determine that the audio content classification corresponds to one of (i) a speech and noise condition when the count exceeds a first threshold value, (ii) a speech condition when the count exceeds a second threshold value but does not exceed the first threshold value, and (iii) a noise condition when the count does not exceeds the second threshold value.
The system from the first claim classifies audio content by: (1) filtering the audio to isolate speech formants (key frequencies); (2) estimating the pitch of the audio; (3) counting the samples where the pitch falls within the formant range for a time period. If the count is above a high threshold, the classification is "speech and noise". If above a lower threshold but below the high threshold, it's "speech". Otherwise, it is classified as "noise".
6. The system of claim 5 , wherein the environment sensing device is further configured to estimate the pitch using an average magnitude difference function (AMDF), the first threshold value is approximately 65% of the samples, and the second threshold value is approximately 40% of the samples.
The system from the previous claim estimates audio pitch using the Average Magnitude Difference Function (AMDF). The "speech and noise" threshold is set at approximately 65% of the samples, while the "speech" threshold is approximately 40% of the samples. These specific values allow for precise speech detection in varying noise conditions.
7. The system of claim 1 , wherein the environment sensing device is further configured to: perform recognition of the speech on the audio input; compare the recognized speech with user-customizable text; and mute the volume level when a match is detected of the recognized speech with the user-customizable text.
The system from the first claim recognizes speech from the microphone's audio input and compares it to user-defined text. If there's a match (e.g., a trigger word or phrase), the volume is muted. This allows the user to create custom voice commands to control the playback volume.
8. The system of claim 7 , wherein the environment sensing device is further configured to, when the environment sensing device is moving, further determine (i) a first speed changing position in which the environment sensing device is moving at a rate indicative of travel without a vehicle, and (ii) a second speed changing position in which the device is moving at a speed indicative of travel within the vehicle.
The system from the previous claim also uses motion data to determine context. When the device is moving, it determines if the speed suggests the user is walking (speed indicative of travel without a vehicle) or in a car (speed indicative of travel within the vehicle). This context might be used to adjust the voice command sensitivity or the muting behavior.
9. The system of claim 1 , wherein the environment sensing device is further configured to: receive accelerometer positional data; determine, based on the accelerometer position data, a position change of one of: (i) a static position in which the environment sensing device is not moving, and (ii) a changing position in which the environment sensing device is moving; and determine the playback action to be performed further according to a determined position change estimation of the enviroment sensing device.
The system from the first claim uses accelerometer data to detect movement. It determines whether the device is stationary or moving. This position change information is then used to further refine the playback action. For example, the volume might be automatically reduced when the device starts moving.
10. The system of claim 1 , further comprising a pressure sensor, wherein the environment sensing device is further configured to mute the volume level when the pressure sensor indicates a change in movement.
The system from the first claim includes a pressure sensor. The system mutes the volume when the pressure sensor detects a change in movement (e.g., a change in pressure indicating that the user is starting to move). This pressure sensor provides an additional environmental input.
11. A method comprising: detecting, based on audio input from an audio capture device, environmental conditions surrounding a user of an audio playback device driving an audio reproduction device at a volume level, the environmental conditions including a loudness estimation indicative of a level of background noise included in the audio input and an audio content classification indicative of presence of speech in the audio input; determining, according to the environmental conditions, a playback action to alter the volume level being provided by the audio playback device; and providing, to the audio playback device, an adjustment to the volume level according to the playback action.
A method adjusts audio playback volume based on environmental conditions. It detects environmental conditions (background noise level and presence of speech) using audio input from a microphone. Based on these conditions, it determines a playback action to change the volume and then adjusts the volume accordingly.
12. The method of claim 11 , further comprising determining the playback action according to a mapping of environmental conditions to playback actions, wherein the mapping specifies to lower the volume level when: (i) the audio content classification indicates the presence of speech and the loudness estimation indicates a loudness condition below a predefined sound pressure level, or (ii) the audio content classification indicates a presence of noise and the loudness estimation indicates a loudness condition above the predefined sound pressure level.
The method from the previous claim determines the playback action according to a mapping of environmental conditions to actions. Specifically, it lowers the volume if either: (1) speech is detected and the background noise is quiet, or (2) background noise is detected and the noise is loud.
13. The method of claim 11 , further comprising: determining an average absolute amplitude of the audio input; identifying a sound pressure level according to the average absolute amplitude according to a sound-input-level characterization of the audio capture device; counting a number samples of the audio input that exceed a pre-determined loudness threshold over a predetermined period of time; and determining that the loudness estimation corresponds to one of (i) a high loudness condition when the count exceeds a threshold value, (ii) a low loudness condition when the count does not exceed the threshold value, and (iii) a silence condition when the audio input includes substantially no sound information.
The method from the eleventh claim calculates background noise level by: (1) determining the average amplitude of the audio input; (2) relating average amplitude to sound pressure level; (3) counting the number of audio samples exceeding a threshold. Loudness is categorized as high if the count exceeds a limit, low if it doesn't, or silence if there's nearly no sound.
14. The method of claim 11 , further comprising: passing the audio input through a band pass filter to select for first formants of speech; estimating a pitch of the audio input; counting a number of samples of the audio input in which the pitch is within a range for a first formant of speech over a predetermined period of time; and determining that the audio content classification corresponds to one of (i) a speech and noise condition when the count exceeds a first threshold value, (ii) a speech condition when the count exceeds a second threshold value but does not exceed the first threshold value, and (iii) a noise condition when the count does not exceeds the second threshold value.
The method from the eleventh claim classifies audio content by: (1) filtering the audio to isolate speech formants; (2) estimating audio pitch; (3) counting the samples where the pitch falls within the formant range. Based on the count being above certain thresholds, the audio is classified as "speech and noise", "speech", or "noise".
15. The method of claim 11 , further comprising one or more of: performing recognition of the speech on the audio input, comparing the recognized speech with the user-customizable text, and muting the volume level when a match is detected of the recognized speech with the user-customizable text; and muting the volume level when data received from a presure sensor indicates a change in movement of the audio playback device.
The method from the eleventh claim includes either: (1) speech recognition, comparing it to user-defined text, and muting volume if there's a match; or (2) muting the volume if a pressure sensor indicates movement of the audio playback device. This provides speech-based muting control or movement-based muting control.
16. A non-transitory computer-readable medium comprising computer instructions that, when executed by a processor of an audio playback device, cause the audio playback device to perform operations including to: detect, based on audio input from an audio capture device, environmental conditions surrounding a user of an audio playback device driving an audio reproduction device at a volume level, the environmental conditions including a loudness estimation indicative of a level of background noise included in the audio input and an audio content classification indicative of presence of speech in the audio input; determine, according to the environmental conditions, a playback action to alter the volume level being provided by the audio playback device; and provide an adjustment to the volume level in accordance with the playback action.
A computer-readable storage medium contains instructions that, when executed, cause an audio playback device to adjust volume based on environmental conditions. The device detects environmental conditions (background noise level, presence of speech) using audio input. Based on these, it decides if the volume should be changed and adjusts the volume.
17. The medium of claim 16 , further comprising instructions configured to cause the audio playback device to determine the playback action according to a mapping of environmental conditions to playback actions, wherein the mapping specifies to lower the volume level when: (i) the audio content classification indicates the presence of speech and the loudness estimation indicates a loudness condition below a predefined sound pressure level, or (ii) the audio content classification indicates a presence of noise and the loudness estimation indicates a loudness condition above the predefined sound pressure level.
The storage medium from the sixteenth claim includes instructions to determine playback action based on a mapping of environmental conditions to actions. Specifically, lowering the volume if either: (1) speech is present and the loudness is below a threshold, or (2) noise is present and the loudness is above a threshold.
18. The medium of claim 16 , further comprising instructions configured to cause the audio playback device to: determine an average absolute amplitude of the audio input; identify a sound pressure level according to the average absolute amplitude according to a sound-input-level characterization of the audio capture device; count a number of samples of the audio input that exceed a pre-determined loudness threshold over a predetermined period of time; and determine that the loudness estimation corresponds to one of (i) a high loudness condition when the count exceeds a threshold value, (ii) a low loudness condition when the count does not exceed the threshold value, and (iii) a silence condition when the audio input includes substantially no sound information.
The storage medium from the sixteenth claim contains instructions to calculate the background noise level by: (1) determining the average amplitude of the audio input; (2) relating average amplitude to sound pressure level; (3) counting audio samples above a predetermined loudness. Loudness is categorized as high, low, or silence based on threshold comparisons.
19. The medium of claim 16 , further comprising instructions configured to cause the audio playback device to: pass the audio input through a band pass filter to select for first formants of speech; estimate a pitch of the audio input; count a number of samples of the audio input in which the pitch is within a range for a first formant of speech over a predetermined period of time; and determine that the audio content classification corresponds to one of (i) a speech and noise condition when the count exceeds a first threshold value, (ii) a speech condition when the count exceeds a second threshold value but does not exceed the first threshold value, and (iii) a noise condition when the count does not exceeds the second threshold value.
The storage medium from the sixteenth claim contains instructions to classify audio content by: (1) filtering for speech formants; (2) estimating audio pitch; (3) counting samples with pitch in the formant range. Audio is classified as "speech and noise", "speech", or "noise" depending on the count relative to defined thresholds.
20. The medium of claim 16 , further comprising instructions configured to cause the audio playback device to one or more of: perform recognition of the speech on the audio input, compare the recognized speech with user-customizable text, and mute the volume level when a match is detected of the recognized speech with the user-customizable text; and mute the volume level when data received from a pressure sensor indicates a change in movement of the audio playback device.
The storage medium from the sixteenth claim includes instructions for either: (1) speech recognition, comparing recognized speech to user-defined text, and muting if there is a match; or (2) muting if a pressure sensor indicates that the audio playback device is moving.
Unknown
December 19, 2017
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.