Patentable/Patents/US-12641365-B2
US-12641365-B2

Control apparatus and speaker control method

PublishedMay 26, 2026
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

A control apparatus () includes a position information acquiring unit () and a control unit (). The position information acquiring unit () acquires information (position information) indicating a position of a person in a store. The position information acquiring unit () generates position information by, for example, acquiring and processing image data generated by an image capture apparatus installed in the store in real time. The control unit () controls a plurality of speakers (and) in the store independently of each other by using the position information generated or acquired by the position information acquiring unit (). A target of control to be performed herein includes, as described above, at least one of selection of sound data to be supplied, a timing of supplying sound data, and a sound volume.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

. A control apparatus comprising:

2

. The control apparatus according to, wherein

3

. The control apparatus according to, wherein

4

. The control apparatus according to, wherein

5

. The control apparatus according to, wherein

6

. A control apparatus comprising:

7

. A control method performed by a computer, the control method comprising:

Detailed Description

Complete technical specification and implementation details from the patent document.

This application is a National Stage Entry of PCT/JP2021/010859 filed on Mar. 17, 2021, the contents of all of which are incorporated herein by reference, in their entirety.

The present invention relates to a control apparatus, a speaker control method, and a system.

In a store, it is often a case that a plurality of speakers are disposed to play music or output a voice guidance. In most cases, same voice is output from these plurality of speakers.

Note that, Patent Document 1 discloses a navigation apparatus for guiding a beneficiary by using a sound. The navigation apparatus, first, outputs a trigger sound from a speaker, and detects a reaction of a person who heard the trigger sound. Then, by the reaction, it is determined whether the person is a beneficiary. Then, the navigation apparatus outputs a guiding sound to the beneficiary. The guiding sound is a sound for guiding the beneficiary to a sweet spot of a parametric speaker. When the beneficiary moves to the sweet spot, the navigation apparatus outputs a guiding sound from the parametric speaker.

In a store, providing information to a person such as a customer is effective. The inventors of the present application have investigated a technique for making a person easy to recognize information, in a case where the information is provided to the person by using a voice. One of objects of the present invention is to make it easy for a person in a store to recognize information provided by a voice.

The present invention provides a control apparatus including:

The present invention provides a control apparatus including:

The present invention provides a control method including,

The present invention provides a control method including,

The present invention provides a program causing a computer to include:

The present invention provides a program causing a computer to include:

The present invention provides a system including the above-described control apparatus, and the above-described plurality of speakers.

The present invention makes it easy for a person in a store to recognize information provided by a voice.

Hereinafter, example embodiments according to the present invention are described by using the drawings. Note that, in all drawings, a similar constituent element is indicated by a similar reference sign, and description thereof will not be repeated as appropriate.

is a diagram illustrating a usage environment of a control apparatusaccording to a present example embodiment. The control apparatusis a part of a system. The system includes a plurality of speakers, in addition to the control apparatus. The control apparatuscontrols the plurality of speakers (in the example illustrated in, speakersand) independently of each other. Control to be performed independently herein includes at least one of selection of sound data to be supplied, a timing of supplying sound data, and a sound volume.

The plurality of speakers are installed away from each other in a same store. In a case where a plurality of floors are present in a store, the above-described plurality of speakers are disposed on a same floor. Further, a sensor for generating information (hereinafter, described as position information) indicating a position of a person is provided in the store. One example of the sensor is an image capture apparatus. In this case, position information is generated by processing image data generated by the image capture apparatus. Then, the control apparatuscontrols the above-described plurality of speakers by using the position information.

In the present example embodiment, both of the speakersandinclude directivity. One example of the speakersandis a parametric speaker. In a case where position information indicates that a person is present in a range (hereinafter, described as a partial area) where a sound from the speakercan be heard, the control apparatussupplies first sound data to the speaker. Further, in a case where position information indicates that a person is present in a range (hereinafter, described as a partial area) where a sound from the speakercan be heard, the control apparatussupplies second sound data to the speaker. First sound data and second sound data indicate, for example, information relating to a product. As one example, first sound data are information relating to a product disposed near the partial area, and second sound data are information relating to a product disposed near the partial area.

Note that, the partial areasandinclude, for example, at least one of in front of a cash register counter, in front of a terminal to be operated by a customer, and in front of a predetermined product shelf. Herein, one example of a terminal to be operated by a customer is a kiosk terminal, an automated teller machine (ATM), or a self-type POS terminal.

In the example illustrated in, an image capture apparatusincluding the partial areaas an image capture range, an image capture apparatusincluding the partial areaas an image capture range are provided in a store. Then, in a case where image data generated by the image capture apparatusinclude a person, the control apparatussupplies first sound data to the speaker. Further, in a case where image data generated by the image capture apparatusinclude a person, the control apparatussupplies second sound data to the speaker. Note that, a framerate of image data to be generated by the image capture apparatusesandis, for example, five frames/second or more, but may be more than the above, or may be less than the above.

Note that, in a case where a person is present in a specific area among image data generated by the image capture apparatus, the control apparatusmay supply first sound data to the speaker. Further, in a case where a person is present in a specific area among image data generated by the image capture apparatus, the control apparatusmay supply second sound data to the speaker.

Further, in a case where both of the partial areasandare included in an image capture range of one image capture apparatus, the control apparatusmay determine whether a person is present in the partial area, and further determine whether a person is present in the partial areaby processing image data generated by the image capture apparatus.

Further, a sensor for generating position information may be a human sensor such as an infrared sensor. In this case, a human sensor is provided at a plurality of positions in a store. When a person is detected, these human sensors output identification information of the human sensor to the control apparatus. The control apparatusstores identification information of a human sensor in association with information indicating a detection range of the human sensor. Therefore, the control apparatuscan generate the above-described position information by using received identification information of the human sensor.

is a diagram illustrating one example of a functional configuration of the control apparatus. In the example illustrated in, the control apparatusincludes a position information acquiring unitand a control unit.

The position information acquiring unitacquires position information. In the example illustrated in, the position information acquiring unitalso functions as an image acquiring unit and an image processing unit, and generates the above-described position information by acquiring and processing image data generated by an image capture apparatus (e.g., the image capture apparatusesand) installed in a store in real time. Note that, in a case where the above-described plurality of human sensors are used in place of the image capture apparatusesand, the position information acquiring unitgenerates the above-described position information by using received identification information of the human sensor.

Note that, in a case where an image acquiring unit and an image processing unit are provided outside the control apparatus, the position information acquiring unitacquires position information generated by the image processing unit.

The control unitcontrols a plurality of speakers (e.g., the speakersand) in a store independently of each other by using position information generated or acquired by the position information acquiring unit. As described above, a target of control (hereinafter, described as a control target) to be performed herein includes at least one of selection of sound data to be supplied, a timing of supplying sound data, and a sound volume.

For example in a case where position information indicates that a person is present in the partial areaassociated with the speaker, the control unitsupplies first sound data to the speaker. Further, in a case where position information indicates that a person is present in the partial areaassociated with the speaker, the control unitsupplies second sound data to the speaker.

Further, there is a case that an attribute of a person can be determined by the above-described image processing. An attribute of a person is, for example, at least one of a gender, an age group, clothes, a pose, a gesture, a perspiring state (e.g., whether a person is perspiring), and information (hereinafter, described as face recognition information) that serves as master data for face recognition. One example of a pose and a gesture is a pose and/or an action when the person feels cold, and a pose and/or an action when the person feels hot. In this case, the above-described position information further includes an attribute of a person. Then, when the control unitselects sound data (e.g., the above-described first sound data or second sound data) to be supplied to a speaker, an attribute of a person being present in a partial area associated with the speaker may be used. In other words, the control unitmay change sound data to be output to a speaker according to an attribute of a person. Further, in a case where face recognition information is included in an attribute of a person, specific sound data may be output to a specific person by using the face recognition information.

Sound data to be supplied to a speaker are stored in the sound data storage unit. Then, the control unitreads the sound data from the sound data storage unit, and supplies the sound data to a speaker. In the example illustrated in, the sound data storage unitis a part of the control apparatus. However, the sound data storage unitmay be located outside the control apparatus.

Note that, the control apparatusmay include a common mode in which same control is performed for a plurality of speakers, and an independent mode in which the plurality of speakers are controlled independently of each other. In this case, in a case of the independent mode, the control apparatuscontrols the above-described control target among a plurality of speakers independently of each other by using the above-descried position information.

is a diagram illustrating a first example of data stored in the sound data storage unit. In the example illustrated in, the sound data storage unitstores sound data to be supplied to a speaker for each of a plurality of speakers. This allows the control unitto read sound data from the sound data storage unitfor each speaker.

is a diagram illustrating a second example of data stored in the sound data storage unit. In the example illustrated in, the sound data storage unitstores sound data for each attribute of a person. More specifically, the sound data storage unitstores sound data to be supplied to a speaker, and an attribute associated with the sound data for each of a plurality of speakers. This allows the control unitto read, from the sound data storage unit, sound data according to an attribute of a person entering the partial area(or the partial area), and supply the sound data to the speaker(or the speaker). For example, the control unitselects sound data including information relating to a product according to an age group or a gender. Further, in a case where an attribute includes at least one of a pose, a gesture, and a perspiring state, the control unitmay select sound data including information relating to a product according to the attribute. For example, in a case where it is possible to presume that the person feels hot from at least one of a gesture and a perspiring state, the control unitselects sound data relating to a cool product (a beverage or a food product). Further, in a case where it is possible to presume that the person feels cold, the control unitselects sound data relating to a warm product (a beverage or a food product). Note that, the control unitmay control a sound volume of the speaker(or the speaker) according to an age group of a person entering the partial area(or the partial area). For example, the control unitincreases the sound volume, as the age group is getting old.

is a diagram illustrating a hardware configuration example of the control apparatus. The control apparatusincludes a bus, a processor, a memory, a storage device, an input/output interface, and a network interface.

The busis a data transmission path along which the processor, the memory, the storage device, the input/output interface, and the network interfacemutually transmit and receive data. However, a method of mutually connecting the processorand the like is not limited to bus connection.

The processoris a processor to be achieved by a central processing unit (CPU), a graphics processing unit (GPU), or the like.

The memoryis a main storage apparatus to be achieved by a random access memory (RAM) or the like.

The storage deviceis an auxiliary storage apparatus to be achieved by a hard disk drive (HDD), a solid state drive (SSD), a memory card, a read only memory (ROM), or the like. The storage devicestores a program module for achieving each function of the control apparatus(e.g., the position information acquiring unitand the control unit). The processorachieves each function associated with a program module by reading each program module in the memoryand executing each program module. Further, the storage devicealso functions as the sound data storage unit.

The input/output interfaceis an interface for connecting the control apparatusto a piece of input/output equipment of each type. For example, the control apparatusmay communicate with an image capture apparatus and a speaker via the input/output interface. Note that, although not illustrated, the number of input/output interfacesmay be increased or decreased according to the number of pieces of input/output equipment to be connected. For example, in the example illustrated in, four input/output interfacesmay be provided.

The network interfaceis an interface for connecting the control apparatusto a network. The network is, for example, a local area network (LAN) or a wide area network (WAN). A method of connecting the network interfaceto a network may be wireless connection, or may be wired connection. The control apparatusmay communicate with an image capture apparatus and a speaker via the network interface.

is a flowchart illustrating one example of processing to be performed by the control apparatus. The control apparatusrepeatedly performs processing illustrated in.

When the image capture apparatusesandgenerate image data, the image capture apparatusesandimmediately transmit the image data to the control apparatus. At this occasion, the image capture apparatusesandalso transmit information for identifying the image capture apparatus. When the position information acquiring unitof the control apparatusacquires image data (step S), the position information acquiring unitgenerates the above-described position information by processing the image data. At this occasion, the position information acquiring unitalso generates information indicating an attribute of a person according to needs (step S). The control unitcontrols the speakersandindependently of each other by using the information generated in step S(step S). Details of control to be performed herein are as described by using.

As described above, according to the present example embodiment, the control apparatuscontrols a plurality of speakers (e.g., the speakersand) installed in a store independently of each other according to a position of a person in the store. Control to be performed herein includes at least one of selection of sound data to be supplied, a timing of supplying sound data, and a sound volume. Therefore, a person in the store can easily recognize information provided by a voice.

is a diagram illustrating a usage environment of a control apparatusaccording to a present example embodiment. The usage environment of the control apparatusaccording to the present example embodiment is similar to the usage environment illustrated inexcept for a point that at least one speakeris provided in a store, in addition to speakersandincluding directivity.

The speakerdoes not include directivity, and can convey a voice in a wide range as compared with the speakersand. A range within which a voice from the speakerreaches also includes partial areasand.

Then, a control unitof the control apparatuscontrols the speaker(or the speaker), and the speakerindependently of each other by presence or absence of a person in the partial area(or the partial area).

For example, when position information indicates that a person is not present in either of the partial areaor the partial area, the control unitsupplies sound data to the speaker. The sound data are different from both of first sound data to be supplied to the speaker, and second sound data to be supplied to the speaker. Sound data to be supplied to the speakermay be a voice input to a microphone by a salesperson at that time, or may be read from a sound data storage unit. Further, the control unitdoes not supply sound data to either of the speakeror the speaker.

Then, when position information indicates that a person is present in at least one of the partial areasand, the control unitstops supply of sound data to the speaker, or lowers a sound volume of the speaker. Further, when a person is present in the partial area, the control unitsupplies first sound data to the speaker, and when a person is present in the partial area, the control unitsupplies second sound data to the speaker.

Note that, the speakermay not be provided. In this case, the control apparatuscontrols the one speakerincluding directivity, and the speakernot including directivity. Further, in this case, an image capture apparatusmay not be provided.

According to the present example embodiment, when a person is not entering the partial areasand, the person can recognize a voice to be output from the speaker. At this occasion, the speakersanddo not output a voice. Further, when the person has entered the partial area(or the partial area), the person can recognize a voice to be output from the speaker(or the speaker). At this occasion, the speakerdoes not output a voice, or a sound volume of the speakeris lowered. Therefore, the person in the store can easily recognize information provided by the voice. Further, information can be provided exclusively to a person entering the partial area(or the partial area).

Patent Metadata

Filing Date

Unknown

Publication Date

May 26, 2026

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Control apparatus and speaker control method” (US-12641365-B2). https://patentable.app/patents/US-12641365-B2

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.

Control apparatus and speaker control method | Patentable