Legal claims defining the scope of protection, as filed with the USPTO.
1. A system comprising: an array of cameras positioned above a space, each camera of the array of cameras configured to capture a video of a portion of the space, the space containing a person; a first camera client configured to, for each frame of a first video received from a first camera of the array of cameras: determine a bounding area around the person shown in that frame of the first video; and generate a timestamp of when that frame of the first video was received by the first camera client; a second camera client configured to, for each frame of a second video received from a second camera of the array of cameras: determine a bounding area around the person shown in that frame of the second video; and generate a timestamp of when that frame of the second video was received by the second camera client; a camera server separate from the first and second camera clients, the camera server configured to: for each frame of the first video, assign, based at least on the timestamp of when that frame was received by the first camera client, coordinates defining the bounding area around the person shown in that frame to one of a plurality of time windows; for each frame of the second plurality of frames, assign, based at least on the timestamp of when that frame was received by the second camera client, coordinates defining the bounding area around the person shown in that frame to one of the plurality of time windows; for a first time window of the plurality of time windows: calculate, based at least on the coordinates that (1) define bounding areas around the person shown in the first plurality of frames and (2) are assigned to the first time window, a combined coordinate for the person during the first time window for the first video from the first camera; and calculate, based at least on the coordinates that (1) define bounding areas around the person shown in the second plurality of frames and (2) are assigned to the first time window, a combined coordinate for the person during the first time window for the second video from the second camera; and determine, based at least on the combined coordinate for the person during the first time window for the first video from the first camera and the combined coordinate for the person during the first time window for the second video from the second camera, a position of the person within the space during the first time window; a plurality of weight sensors positioned within the space; a weight server separate from the first and second camera clients and the camera server, the weight server configured to determine, based at least on a signal produced by a first weight sensor of the plurality of weight sensors, that an item positioned above the first weight sensor was removed; and a central server separate from the first and second camera clients, the camera server, and the weight server, the central server configured to determine, based at least on the position of the person within the space during the first time window, that the person removed the item.
2. The system of claim 1 , further comprising: an array of light detection and ranging (LiDAR) sensors positioned above the space; and a LiDAR server separate from the first and second camera clients, the camera server, the weight server, and the central server, the LiDAR server configured to determine a position of the person within the space based at least on a coordinate received from a LiDAR sensor of the array of LiDAR sensors.
3. The system of claim 1 , wherein the determination that the first person removed the item is based at least on (1) a distance between the position of the person in the space and a position of the first weight sensor in the space, and (2) a time of when the signal produced by the first weight sensor was received by the weight server falling within the first time window.
4. The system of claim 1 , wherein the first camera client is further configured to receive a height of the person.
5. The system of claim 1 , wherein: the first camera client implements a first clock used to generate the timestamps of when the frames of the first video were received by the first camera client; the second camera client implements a second clock used to generate the timestamps of when the frames of the second video were received by the second camera client; and the camera server implements a third clock, the first, second, and third clocks are synchronized using a clock synchronization protocol.
6. The system of claim 5 , wherein the weight server implements a fourth clock that is synchronized with the first, second, and third clocks using the clock synchronization protocol.
7. The system of claim 1 , wherein the combined coordinate for the person during the first time window for the first video from the first camera comprises an average of the coordinates that (1) define bounding areas around the person shown in the frames of the first video, and (2) are assigned to the first time window.
8. The system of claim 1 , wherein the array of cameras is arranged in a grid such that a camera that is communicatively coupled to the first camera client is not directly adjacent in the same row or the same column of the grid to another camera that is communicatively coupled to the first camera client.
9. The system of claim 1 , further comprising: a rack within the space, the rack comprising a shelf and a base comprising a drawer, the base positioned vertically lower than the shelf, the shelf divided into a first region and a second region, the first weight sensor of the plurality of weight sensors positioned within the first region and configured to produce a first signal based at least on a weight within the first region experienced by the first weight sensor, a second weight sensor of the plurality of weight sensors positioned within the second region and configured to produce a second signal based at least on a weight within the second region experienced by the second weight sensor, each weight sensor of the plurality of weight sensors comprising a plurality of load cells; and a circuit board positioned within the drawer, the circuit board communicatively coupled to the first weight sensor and the second weight sensor, the circuit board configured to: receive the first and second signals; communicate, to the weight server, a signal indicating the weight experienced by the first weight sensor; and communicate, to the weight server, a signal indicating the weight experienced by the second weight sensor.
10. The system of claim 1 , wherein the camera server is further configured to request a first frame of the first video from the first camera client and a second frame of the second video from the second camera client, the determination of the position of the person is further based at least on the first and second frames.
11. A method comprising: capturing, by each camera of an array of cameras positioned above a space, a video of a portion of the space, the space containing a person; for each frame of a first video received from a first camera of the array of cameras: determining, by a first camera client, a bounding area around the person shown in that frame of the first video; and generating, by the first camera client, a timestamp of when that frame of the first video was received by the first camera client; for each frame of a second video received from a second camera of the array of cameras: determining, by a second camera client, a bounding area around the person shown in that frame of the second video; and generating, by the second camera client, a timestamp of when that frame of the second video was received by the second camera client; for each frame of the first video and based at least on the timestamp of when that frame was received by the first camera client, assigning by a camera server separate from the first and second camera clients, coordinates defining the bounding area around the person shown in that frame to one of a plurality of time windows; for each frame of the second plurality of frames and based at least on the timestamp of when that frame was received by the second camera client, assigning by the camera server coordinates defining the bounding area around the person shown in that frame to one of the plurality of time windows; for a first time window of the plurality of time windows: calculating by the camera sever, based at least on the coordinates that (1) define bounding areas around the person shown in the first plurality of frames and (2) are assigned to the first time window, a combined coordinate for the person during the first time window for the first video from the first camera; and calculating by the camera sever, based at least on the coordinates that (1) define bounding areas around the person shown in the second plurality of frames and (2) are assigned to the first time window, a combined coordinate for the person during the first time window for the second video from the second camera; and determining by the camera sever, based at least on the combined coordinate for the person during the first time window for the first video from the first camera and the combined coordinate for the person during the first time window for the second video from the second camera, a position of the person within the space during the first time window; producing, by a plurality of weight sensors positioned within the space, signals indicative of weights experienced by the plurality of weight sensors; determining, by a weight server separate from the first and second camera clients and the camera server, based at least on a signal produced by a first weight sensor of the plurality of weight sensors, that an item positioned above the first weight sensor was removed; and determining, by a central server separate from the first and second camera clients, the camera server, and the weight server, that the person removed the item based at least on the position of the person within the space during the first time window.
12. The method of claim 11 , further comprising determining, by a light detection and ranging (LiDAR) server separate from the first and second camera clients, the camera server, the weight server, and the central server, a position of the person within the space based at least on a coordinate received from a LiDAR sensor of the array of LiDAR sensors.
13. The method of claim 11 , wherein the determination that the first person removed the item is based at least on (1) a distance between the position of the person in the space and a position of the first weight sensor in the space, and (2) a time of when the signal produced by the first weight sensor was received by the weight server falling within the first time window.
14. The method of claim 11 , further comprising receiving a height of the person by the first camera client.
15. The method of claim 11 , further comprising: implementing, by the first camera client, a first clock used to generate the timestamps of when the frames of the first video were received by the first camera client; implementing, by the second camera client, a second clock used to generate the timestamps of when the frames of the second video were received by the second camera client; and implementing, by the camera server, a third clock, wherein the first, second, and third clocks are synchronized using a clock synchronization protocol.
16. The method of claim 15 , further comprising implementing, by the weight server, a fourth clock that is synchronized with the first, second, and third clocks using the clock synchronization protocol.
17. The method of claim 11 , wherein the combined coordinate for the person during the first time window for the first video from the first camera comprises an average of the coordinates that (1) define bounding areas around the person shown in the frames of the first video, and (2) are assigned to the first time window.
18. The method of claim 11 , wherein the array of cameras is arranged in a grid such that a camera that is communicatively coupled to the first camera client is not directly adjacent in the same row or the same column of the grid to another camera that is communicatively coupled to the first camera client.
19. The method of claim 11 , further comprising: receiving, by a circuit board, a first signal from the first weight sensor of the plurality of weight sensors and a second signal from the second weight sensor of the plurality of weight sensors, the circuit board positioned within a drawer of a rack positioned within the space, the rack comprising a shelf and a base comprising the drawer, the base positioned vertically lower than the shelf, the shelf divided into a first region and a second region, the first weight sensor of the plurality of weight sensors positioned within the first region, wherein the first signal is based at least on a weight within the first region experienced by the first weight sensor, a second weight sensor of the plurality of weight sensors positioned within the second region, wherein the second signal is based at least on a weight within the second region experienced by the second weight sensor, and wherein each weight sensor of the plurality of weight sensors comprises a plurality of load cells; and communicating, by the circuit board to the weight server, a signal indicating the weight experienced by the first weight sensor; and communicating, by the circuit board to the weight server, a signal indicating the weight experienced by the second weight sensor.
20. The method of claim 11 , further comprising requesting, by the camera server, the first frame of the first video from the first camera client and a second frame of the second video from the second camera client, wherein the determination of the position of the person is further based at least on the first and second frames.
Unknown
February 8, 2022
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.