Patentable/Patents/US-20260002791-A1

US-20260002791-A1

Semantic Map Updating and Object Searching Using Iot Device Cameras

PublishedJanuary 1, 2026

Assigneenot available in USPTO data we have

InventorsSong Wang Mengnan Wang Kevin W. Beck Russell S. VanBlon

Technical Abstract

In one aspect, a first device includes a processor assembly and storage accessible to the processor assembly. The storage includes instructions executable by the processor assembly to access a semantic map, receive input from a first camera on an Internet of things (IoT) device, and update the semantic map based on the input.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

a processor assembly; and storage accessible to the processor assembly and comprising instructions executable by the processor assembly to: present a graphical user interface (GUI) on a display, the GUI comprising a first option that is selectable a single time to enable the apparatus to, for multiple future instances, update one or more semantic maps using one or more Internet of things (IoT) device cameras; create a first semantic map using simultaneous localization and mapping (SLAM); in a first instance, receive input from a first camera on a first IoT device; and in the first instance, update, based on the input and based on the first option being selected from the GUI, the first semantic map. . An apparatus, comprising:

claim 1 . The apparatus of, wherein the GUI comprises a second option different from the first option, the second option being selectable to select a preferred labeling process for labeling objects indicated in one or more semantic maps.

claim 2 . The apparatus of, wherein the second option is selectable to use a second GUI as the preferred labeling process, the second GUI being different from the first GUI.

claim 2 . The apparatus of, wherein the second option is selectable to use audible input as the preferred labeling process.

claim 1 . The apparatus of, wherein the GUI comprises a second option different from the first option, the second option being selectable to select a specific device to use for labeling objects indicated in one or more semantic maps.

claim 5 . The apparatus of, wherein the second option is selectable to use a smartphone as the specific device.

claim 5 . The apparatus of, wherein the second option is selectable to use a headset as the specific device.

claim 1 . The apparatus of, wherein the GUI comprises a second option different from the first option, the second option being selectable to select a specific IoT device with camera to use for updating one or more semantic maps based on camera input from the specific IoT device.

claim 8 . The apparatus of, wherein the second option is selectable to use a television as the specific IoT device with camera to use for updating one or more semantic maps.

claim 1 . The apparatus of, wherein the GUI comprises a second option different from the first option, the second option being selectable to select a first particular trigger to use for updating one or more semantic maps.

claim 10 . The apparatus of, wherein the GUI comprises a third option different from the first and second options, the third option being selectable to select a second particular trigger to use for updating one or more semantic maps, the first particular trigger being a recurring period of time ending, the second particular trigger being a determination that part of a real-world space represented in a particular semantic map has changed.

claim 11 . The apparatus of, wherein the second option comprises an input box at which the recurring period of time can be set.

claim 1 . The apparatus of, comprising the display.

claim 1 . The apparatus of, wherein the apparatus is different from the first IoT device.

presenting a graphical user interface (GUI) on a display, the GUI comprising a first option that is selectable a single time to enable an apparatus to, for multiple future instances, update one or more semantic maps using one or more cameras; creating, at the apparatus, a first semantic map using simultaneous localization and mapping (SLAM); in a first instance, receiving input from a first camera on a first device; and in the first instance, updating, based on the input and based on the first option being selected from the GUI, the first semantic map. . A method, comprising:

claim 15 . The method of, wherein the GUI comprises a second option different from the first option, the second option being selectable to select a specific device with camera to use for updating one or more semantic maps based on camera input from the specific device.

claim 15 . The method of, wherein the GUI comprises a second option different from the first option, the second option being selectable to select a first particular trigger to use for updating one or more semantic maps.

at least one computer readable storage medium (CRSM) that is not a transitory signal, the at least one CRSM comprising instructions executable by a processor assembly to: present a graphical user interface (GUI) on a display, the GUI comprising a first option that is selectable a single time to enable the processor assembly to, for multiple future instances, update one or more semantic maps using one or more cameras; create a first semantic map of objects in three-dimensional space using simultaneous localization and mapping (SLAM); in a first instance, receive input from a first camera on a device accessible to the processor assembly; in the first instance, update, based on the input and based on the first option being selected from the GUI, the first semantic map with respect to at least one of the objects. . An apparatus, comprising:

claim 18 . The apparatus of, wherein the GUI comprises a second option different from the first option, the second option being selectable to select a specific device with camera to use for updating one or more semantic maps based on camera input from the specific device.

claim 18 . The apparatus of, wherein the GUI comprises a second option and a third option each different from the first option, the second option being selectable to select a first particular trigger to use for updating one or more semantic maps, the third option being selectable to select a second particular trigger to use for updating one or more semantic maps, the second particular trigger being different from the first particular trigger.

Detailed Description

Complete technical specification and implementation details from the patent document.

The disclosure below relates to technically inventive, non-routine solutions that are necessarily rooted in computer technology and that produce concrete technical improvements. In particular, the disclosure below relates to techniques for updating a semantic map using one or more Internet of things (IoT) device cameras.

As recognized herein, electronic semantic maps can indicate the three-dimensional (3D) locations of various real-world objects within a real-world space. However, as also recognized herein, those objects might move or change over time, and real-time electronic tracking of those objects is typically not possible without attaching an electronic tracking tag like a GPS tag to each object. But this is often times not feasible or scalable. Moreover, even when used, using electronic tags can lead to undue constraints on processing resources and too much power being consumed in the tracking. But failure to use such tags can lead to the semantic map becoming outdated relatively fast. There are currently no adequate solutions to the foregoing computer-related, technological problem.

Accordingly, in one aspect a first device includes a processor assembly and storage accessible to the processor assembly. The storage includes instructions executable by the processor assembly to access a semantic map and receive input from a first camera on an Internet of things (IoT) device. The instructions are also executable to update, based on the input, the semantic map.

In certain example implementations, the instructions may be executable to command the IoT device to provide the input responsive to at least one trigger. In various examples, the trigger may include a recurring period of time ending, receipt of a user command to update the semantic map, and/or a determination using object recognition that part of a real-world space represented in the semantic map has changed.

Additionally, in some examples the instructions may be executable to, during creation of the semantic map, identify one or more real-world devices that each include a camera. The one or more real-world devices may include the IoT device. The instructions may then be executable to save data indicating the one or more real-world devices that each include a camera and use the data to command the IoT device to provide the input.

Also, if desired the instructions may be executable to present a user interface (UI), where the UI indicates an object in the semantic map that has not been identified via object recognition. In these examples, the instructions may then be executable to receive user input indicating a label for the object and update the semantic map with the label. So, for example, the instructions may be executable to present a prompt for a user to use a second camera to capture images of the object from different angles, receive the images of the object and generate three-dimensional (3D) data for the object based on the images, and update the semantic map with the 3D data. The second camera may be the same as or different from the first camera. The UI may include a graphical user interface and/or an audible user interface.

In various example implementations, the first device may even include the camera. Also in various example implementations, the first device may be the same as or different from the IoT device.

In another aspect, a method includes accessing, via a first device, a semantic map. The method also includes receiving input from a first camera on a second device and updating the semantic map based on the input.

In certain examples, the second device may be an Internet of things (IoT) device. E.g., the IoT device may include a television, a smartphone, a tablet computer, a laptop computer, a headset, a stand-alone camera, a digital assistant device, a cooking appliance, an electronic door lock, and/or an electronic doorbell.

If desired, in various example implementations the method may include updating the semantic map using one or more cameras at recurring periods of time.

In still another aspect, at least one computer readable storage medium (CRSM) that is not a transitory signal includes instructions executable by a processor assembly to access a semantic map, receive input from a first camera on a device accessible to the at least one processor, and update, based on the input, the semantic map.

Thus, in certain examples the instructions may be executable to request a label from a user for an object that cannot be recognized via object recognition, where the object is represented in the semantic map. Here the instructions may also be executable to receive user input indicating the label and update the semantic map with the label.

Also, if desired in various example embodiments the instructions may be executable to trigger the first camera to generate the input for updating the semantic map, where the first camera may be triggered responsive to user command and/or a recurring period of time ending.

The details of present principles, both as to their structure and operation, can best be understood in reference to the accompanying drawings, in which like reference numerals refer to like parts, and in which:

Among other things, the detailed description below allows for electronically tracking real-world objects over time and also placing real-world objects at designated locations when a user is unfamiliar with a space. Thus, systems and methods for tracking and placement are provided for both electronic/trackable objects and non-electronic objects that are located in a mappable space.

Accordingly, in one particular aspect, principles set forth below can be used for tracking and placement of an object via semantic mapping and user-assisted/intuitive labeling of objects of importance. Mapping of a space may be accomplished through scanning via cameras to create a point cloud for various objects as well as 3D coordinates for the objects/points in the cloud themselves. Additionally, object recognition and/or other artificial intelligence (AI)-based systems can be used with the scanning process such that objects may be identified from a data training set as part of the scan and then labelled accordingly inside the space/map to achieve semantic understanding. Thus, the AI-enhanced semantic map may not only create a 3D feature-rich map but also contain data like instances of objects recognized, their names, and their respective locations inside the mappable space. Utilizing semantic understanding, a device may thus be used to track and place objects relevant to the user.

For instance, the following approach may be used in non-limiting embodiments. For object recognition, semantic mapping technology that includes a predefined database of trained objects for recognition may be used. If an object to be identified is already included in the database, its information may be autonomously included in the semantic map.

For objects that are not recognized autonomously, user-specified objects and labels can be added through intuitive means before/during/after the time of space mapping. A purpose-designed application/UI as well as voice commands input through appropriate devices can therefore be leveraged for model scanning and processing to add to the existing database of recognized objects with a user's labels. The semantic map can then be further updated as new objects are scanned, trained, and recognized at any stage of the mapping process. Objects added pre-mapping may be used to improve an already-created database of objects that can be recognized, and objects scanned and labeled during mapping can be added in real time to the semantic map as it forms. Objects added to the database after the semantic map has been created can further update the semantic understanding of the existing map.

Sensor fusion may also be used for the updated map. Thus, sensor fusion may be used to allow multiple devices with cameras and/or IMU sensors and that are within the mappable space to constantly or periodically update the map. This might be particularly useful for spaces such as homes and offices where objects of importance to a user might not always stay stationary. Accordingly, sensors from an AR glass, indoor camera, mobile phones, etc. can be used to work together and scan the space at different angles and times. The computations can then be offloaded via the cloud for a server to process (and/or processed locally), and therefore each device's scanned data and time stamps can be analyzed to create a “real time” version of the semantic map of the space that contains all the relevant objects and their relative locations inside the space.

Present principles may also be used to locate particular real-world objects within the mappable space. For instance, to locate an object, the user can use a purpose-designed application or voice command to locate the object if it exists in the semantic map's object recognition database. Additionally, since the semantic map for the space can contain data of unrecognized objects and their respective locations in its feature map, the user can also sort through images of objects derived from the map through the purposed-designed application to look for untrained/unrecognized objects and their last known locations. Voice recognition of an oral description of the object as provided by the user can also be used further narrow down this search.

Present principles may also be used to help with real-world object placement via map route planning. Route planning inside a mappable space is possible for recognized/mapped objects and so a map created for a space can be visually presented to the user through the purpose-built viewing application so that user can then place objects recognized by the semantic map at preferred location inside that map. Utilizing any AR/mobile device that has access to the semantic map, a navigational path may therefore be created from the AR/mobile device's current location to the desired location for each object for placement by the user at the designated location. This might be particularly useful when a user wants to place objects inside an unfamiliar space but needs help navigating, and as such might be used by moving companies, cleaners, etc.

What's more, note that hardware that may be used to implement present principles in some example instances might include only fisheye cameras and IMUs that already reside in the mobile phone and/or head-mounted display, reducing hardware infrastructure and processing constraints that might otherwise burden implementation of present principles.

Thus, present principles may be used to locate objects easily and intuitively at their last known locations, and/or to place objects correctly in a space unfamiliar to the user through semantic mapping.

Accordingly, in non-limiting examples, a semantic map may be generated using SLAM (and/or other mapping technologies for 3D electronic space mapping), where the SLAM map is enhanced with object recognition capability (e.g., object name/type tags) in the map to render a semantic map. Each recognized object in the semantic map may have known coordinates within the map, and the map may be searchable by object to locate the object within the map.

Prior to delving further into the details of the instant techniques, note with respect to any computer systems discussed herein that a system may include server and client components, connected over a network such that data may be exchanged between the client and server components. The client components may include one or more computing devices including televisions (e.g., smart TVs, Internet-enabled TVs), computers such as desktops, laptops and tablet computers, so-called convertible devices (e.g., having a tablet configuration and laptop configuration), and other mobile devices including smart phones. These client devices may employ, as non-limiting examples, operating systems from Apple Inc. of Cupertino CA, Google Inc. of Mountain View, CA, or Microsoft Corp. of Redmond, WA. A Unix® or similar such as Linux® operating system may be used, as may a Chrome or Android or Windows or macOS operating system. These operating systems can execute one or more browsers such as a browser made by Microsoft or Google or Mozilla or another browser program that can access web pages and applications hosted by Internet servers over a network such as the Internet, a local intranet, or a virtual private network.

As used herein, instructions refer to computer-implemented steps for processing information in the system. Instructions can be implemented in software, firmware or hardware, or combinations thereof and include any type of programmed step undertaken by components of the system; hence, illustrative components, blocks, modules, circuits, and steps are sometimes set forth in terms of their functionality.

100 A processor may be any single- or multi-chip processor that can execute logic by means of various lines such as address lines, data lines, and control lines and registers and shift registers. Moreover, any logical blocks, modules, and circuits described herein can be implemented or performed with a system processor, a digital signal processor (DSP), a field programmable gate array (FPGA) or other programmable logic device such as an application specific integrated circuit (ASIC), discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A processor can also be implemented by a controller or state machine or a combination of computing devices. Thus, the methods herein may be implemented as software instructions executed by a processor, suitably configured application specific integrated circuits (ASIC) or field programmable gate array (FPGA) modules, or any other convenient manner as would be appreciated by those skilled in those art. Where employed, the software instructions may also be embodied in a non-transitory device that is being vended and/or provided, and that is not a transitory, propagating signal and/or a signal per se. For instance, the non-transitory device may be or include a hard disk drive, solid state drive, or CD ROM. Flash drives may also be used for storing the instructions. Additionally, the software code instructions may also be downloaded over the Internet (e.g., as part of an application (“app”) or software file). Accordingly, it is to be understood that although a software application for undertaking present principles may be vended with a device such as the systemdescribed below, such an application may also be downloaded from a server to a device over a network such as the Internet. An application can also run on a server and associated presentations may be displayed through a browser (and/or through a dedicated companion app) on a client device in communication with the server.

Software modules and/or applications described by way of flow charts and/or user interfaces herein can include various sub-routines, procedures, etc. Without limiting the disclosure, logic stated to be executed by a particular module can be redistributed to other software modules and/or combined together in a single module and/or made available in a shareable library. Also, the user interfaces (UI)/graphical UIs described herein may be consolidated and/or expanded, and UI elements may be mixed and matched between UIs.

Logic when implemented in software, can be written in an appropriate language such as but not limited to hypertext markup language (HTML)-5, Java®/JavaScript, C #or C++, and can be stored on or transmitted from a computer-readable storage medium such as a hard disk drive (HDD) or solid state drive (SSD), a random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), a hard disk drive or solid state drive, compact disk read-only memory (CD-ROM) or other optical disk storage such as digital versatile disc (DVD), magnetic disk storage or other magnetic storage devices including removable thumb drives, etc.

In an example, a processor can access information over its input lines from data storage, such as the computer readable storage medium, and/or the processor can access information wirelessly from an Internet server by activating a wireless transceiver to send and receive data. Data typically is converted from analog signals to digital by circuitry between the antenna and the registers of the processor when being received and from digital to analog when being transmitted. The processor then processes the data through its shift registers to output calculated data on output lines, for presentation of the calculated data on the device.

Components included in one embodiment can be used in other embodiments in any appropriate combination. For example, any of the various components described herein and/or depicted in the Figures may be combined, interchanged or excluded from other embodiments.

“A system having at least one of A, B, and C” (likewise “a system having at least one of A, B, or C” and “a system having at least one of A, B, C”) includes systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.

The term “circuit” or “circuitry” may be used in the summary, description, and/or claims. As is well known in the art, the term “circuitry” includes all levels of available integration, e.g., from discrete logic circuits to the highest level of circuit integration such as VLSI, and includes programmable logic components programmed to perform the functions of an embodiment as well as processors (e.g., special-purpose processors) programmed with instructions to perform those functions.

1 FIG. 100 100 100 100 100 Now specifically in reference to, an example block diagram of an information handling system and/or computer systemis shown that is understood to have a housing for the components described below. Note that in some embodiments the systemmay be a desktop computer system, such as one of the ThinkCentre® or ThinkPad® series of personal computers sold by Lenovo (US) Inc. of Morrisville, NC, or a workstation computer, such as the ThinkStation®, which are sold by Lenovo (US) Inc. of Morrisville, NC; however, as apparent from the description herein, a client device, a server or other machine in accordance with present principles may include other features or only some of the features of the system. Also, the systemmay be, e.g., a game console such as XBOX®, and/or the systemmay include a mobile communication device such as a mobile telephone, notebook computer, and/or other portable computerized device.

1 FIG. 100 110 As shown in, the systemmay include a so-called chipset. A chipset refers to a group of integrated circuits, or chips, that are designed to work together. Chipsets are usually marketed as a single product (e.g., consider chipsets marketed under the brands INTEL®, AMD®, etc.).

1 FIG. 1 FIG. 110 110 120 150 142 144 142 In the example of, the chipsethas a particular architecture, which may vary to some extent depending on brand or manufacturer. The architecture of the chipsetincludes a core and memory control groupand an I/O controller hubthat exchange information (e.g., data, signals, commands, etc.) via, for example, a direct management interface or direct media interface (DMI)or a link controller. In the example of, the DMIis a chip-to-chip interface (sometimes referred to as being a link between a “northbridge” and a “southbridge”).

120 122 126 124 122 120 The core and memory control groupincludes a processor assembly(e.g., one or more single core or multi-core processors, etc.) and a memory controller hubthat exchange information via a front side bus (FSB). A processor assembly such as the assemblymay therefore include one or more processors acting independently or in concert with each other to execute an algorithm, whether those processors are in one device or more than one device. Additionally, as described herein, various components of the core and memory control groupmay be integrated onto a single processor die, for example, to make a chip that supplants the “northbridge” style architecture.

126 140 126 140 The memory controller hubinterfaces with memory. For example, the memory controller hubmay provide support for DDR SDRAM memory (e.g., DDR, DDR2, DDR3, etc.). In general, the memoryis a type of random-access memory (RAM). It is often referred to as “system memory.”

126 132 132 192 138 132 126 134 136 126 The memory controller hubcan further include a low-voltage differential signaling interface (LVDS). The LVDSmay be a so-called LVDS Display Interface (LDI) for support of a display device(e.g., a CRT, a flat panel, a projector, a touch-enabled light emitting diode (LED) display or other video display, etc.). A blockincludes some examples of technologies that may be supported via the LVDS interface(e.g., serial digital video, HDMI/DVI, display port). The memory controller hubalso includes one or more PCI-express interfaces (PCI-E), for example, for support of discrete graphics. Discrete graphics using a PCI-E interface has become an alternative approach to an accelerated graphics port (AGP). For example, the memory controller hubmay include a 16-lane (x16) PCI-E port for an external PCI-E-based graphics card (including, e.g., one or more GPUs). An example system may include AGP or PCI-E for support of graphics.

150 151 152 153 154 122 155 170 161 162 163 194 164 165 166 168 190 150 1 FIG. 1 FIG. In examples in which it is used, the I/O hub controllercan include a variety of interfaces. The example ofincludes a SATA interface, one or more PCI-E interfaces(optionally one or more legacy PCI interfaces), one or more universal serial bus (USB) interfaces, a local area network (LAN) interface(more generally a network interface for communication over at least one network such as the Internet, a WAN, a LAN, a Bluetooth network using Bluetooth 5.0 communication, etc. under direction of the processor(s)), a general purpose I/O interface (GPIO), a low-pin count (LPC) interface, a power management interface, a clock generator interface, an audio interface(e.g., for speakersto output audio), a total cost of operation (TCO) interface, a system management bus interface (e.g., a multi-master serial computer bus interface), and a serial peripheral flash memory/controller interface (SPI Flash), which, in the example of, includes basic input/output system (BIOS)and boot code. With respect to network connections, the I/O hub controllermay include integrated gigabit Ethernet controller lines multiplexed with a PCI-E interface port. Other network features may operate independent of a PCI-E interface. Example network connections include Wi-Fi as well as wide-area networks (WANs) such as 4G and 5G cellular networks.

150 151 152 180 180 150 180 152 182 153 184 The interfaces of the I/O hub controllermay provide for communication with various devices, networks, etc. For example, where used, the SATA interfaceand/or PCI-E interfaceprovide for reading, writing or reading and writing information on one or more drivessuch as HDDs, SSDs or a combination thereof, but in any case the drivesare understood to be, e.g., tangible computer readable storage mediums that are not transitory, propagating signals. The I/O hub controllermay also include an advanced host controller interface (AHCI) to support one or more drives. The PCI-E interfaceallows for wireless connectionsto devices, networks, etc. The USB interfaceprovides for input devicessuch as keyboards (KB), mice and various other devices (e.g., cameras, phones, storage, media players, etc.).

1 FIG. 170 171 172 173 174 175 176 177 178 179 172 In the example of, the LPC interfaceprovides for use of one or more ASICs, a trusted platform module (TPM), a super I/O, a firmware hub, BIOS supportas well as various types of memorysuch as ROM, Flash, and non-volatile RAM (NVRAM). With respect to the TPM, this module may be in the form of a chip that can be used to authenticate software and hardware devices. For example, a TPM may be capable of performing platform authentication and may be used to verify that a system seeking access is the expected system.

100 190 168 166 140 168 The system, upon power on, may be configured to execute boot codefor the BIOS, as stored within the SPI Flash, and thereafter processes data under the control of one or more operating systems and application software (e.g., stored in system memory). An operating system may be stored in any of a variety of locations and accessed, for example, according to instructions of the BIOS.

1 FIG. 100 191 191 122 100 122 As also shown in, the systemmay include a camera. The cameramay gather one or more images and provide the images and related input (e.g., metadata like an image timestamp) to the processor assembly. The camera may be a thermal imaging camera, an infrared (IR) camera, a digital camera such as a webcam, a three-dimensional (3D) camera, and/or a camera otherwise integrated into the systemand controllable by the processor assemblyto gather still images and/or video.

100 100 122 100 122 100 122 191 Additionally, though not shown for simplicity, in some embodiments the systemmay include a gyroscope that senses and/or measures the orientation of the systemand provides related input to the processor assembly, an accelerometer that senses acceleration and/or movement of the systemand provides related input to the processor assembly, and/or a magnetometer that senses and/or measures directional movement of the systemand provides related input to the processor assembly. These three components may form part of an inertial measurement unit (IMU) in certain examples, where the IMU may be used in conjunction with one or more cameras (like the camera) to generate a three-dimensional (3D) point cloud and/or map of an area using simultaneous localization and mapping (SLAM) and/or other techniques consistent with present principles. Thus, coordinates for different objects and other 3D real-world features may be stored as part of the map. Object recognition may then be executed using the images to identify the names and/or object types for various objects recognized from the area via the camera input. The names and/or types may then be used as labels to label various objects shown in the point cloud/SLAM map to thus render a semantic map that indicates both 3D visual appearances and locations for the objects as well as tags corresponding to the labels identifying the objects.

100 122 100 122 100 Still further, the systemmay include an audio receiver/microphone that provides input from the microphone to the processor assemblybased on audio that is detected, such as via a user providing audible input to the microphone. Also, the systemmay include a global positioning system (GPS) transceiver that is configured to communicate with satellites to receive/identify geographic position information and provide the geographic position information to the processor assembly. However, it is to be understood that another suitable position receiver other than a GPS receiver may be used in accordance with present principles to determine the location of the system.

100 100 1 FIG. It is to be understood that an example client device or other machine/computer may include fewer or more features than shown on the systemof. In any case, it is to be understood at least based on the foregoing that the systemis configured to undertake present principles.

2 FIG. 200 200 202 204 206 208 210 212 214 216 218 216 220 218 222 Turning now to, an example real-world areais shown, which in this case is a living room of a personal residence. Shown in the areaare a televisionwith a built-in camera, a couch, a userholding a smartphonewith camera, and a gift bag. Also shown is a coffee table, a stand-alone digital assistant devicesitting on top of the table(e.g., an Amazon Alexa device, Google Assistant, or a Lenovo Assistant device), and a set of car keys. Note that the devicemay have its own camera.

202 210 218 200 100 100 200 200 202 210 218 Also note that the TV, smartphone, assistant device, and any other electronic smart devices in the areamay communicate over a network such as a Wi-Fi network, the Internet, a Bluetooth network, an ultra-wideband network, etc. in accordance with present principles. It is to also be understood that each of these devices may include at least some of the features, components, and/or elements of the systemdescribed above. Indeed, any of the devices disclosed herein may include at least some of the features, components, and/or elements of the systemdescribed above. Also note that these devices may communicate with an Internet-based cloud storage server accessible to the devices within the areaso that the devices in the areacan access a semantic map and/or other data described below as stored at the server, depending on implementation. The semantic map and/or other data may additionally or alternatively be stored at one of the devices,,, or other local device themselves.

3 FIG. 1 FIG. 300 200 200 302 300 300 300 Turning now to, an example illustration of a semantic mapis shown. As described above in reference to, the semantic map may be generated by first creating a SLAM map. Creating a SLAM map may involve using cameras imaging the areafrom different angles to thus generate 3D point/feature data from the area. Object recognition-generated labelsmay then be added to the SLAM map to render the 3D semantic map. The labels may therefore be stored as metadata for the 3D semantic mapand/or be included as part of the mapitself. In either case, the labels may be rendered over the associated object itself as the person views the semantic map on a display of a device such as a laptop, smartphone, or even augmented or virtual reality headset.

3 FIG. 3 FIG. 300 310 300 310 312 300 312 314 312 312 318 320 300 300 200 318 320 300 As also shown in, the semantic mapis presented as part of a graphical user interface (GUI)presented on a display. In addition to including the map, the GUIincludes a listof one or more objects that the device/system was unable to recognize during generation of the semantic map. The listis accompanied by a promptfor an end-user to select one of the items from the listto label the associated object with a user-generated label.therefore shows that objects generally designated as objects “A” and “B” are included in the list, and graphical indicators,for each respective object are also overlaid on the semantic mapto show the user both a visual image of the object and its relative location within the map(and hence areaitself). Note that the indicators,include not just corollary “A” and “B” designations for the respective objects but also respective arrows pointing to the respective objects in the mapto further highlight them to the user.

The user may thus pick and choose some or all of the objects from the list for which to provide/generate labels. Note that the user might only choose to label objects that the user considers important in certain non-limiting embodiments, since labeling each and every computer-unknown object from a given area might be tedious and not altogether necessary.

316 312 300 330 330 332 300 In any case, responsive to touch or voice input to select one of the itemsfrom the listor mapitself as presented on the device display, a text input fieldmay be rendered and/or selected for a user to then enter the user's desired label for the selected object. The user may then use voice input, a hard or soft keyboard, or other input means to enter the user-designated label into the input field. The user may then select the save selectorto save the entered label and apply the label to the object in the mapitself so that the map/map metadata indicates the user-designated label for the associated object. In the present instance, the user has labeled object “A” as “car keys”.

300 300 400 400 400 410 410 300 4 FIG. Now suppose the system does not currently have enough 3D data on the car keys in the semantic mapfor the keys to be rendered at different angles in the mapand/or identified from different angles and locations.shows that a promptmay be presented on the display of the user's device. As shown, the promptmay indicate that more images of “car keys” are being requested by the system. The promptmay further instruct the user to press the start selectorand then show the keys to the user's smartphone camera (or another device camera) from different angles. Thus, the user might select the start selectorand then hold the keys up to his/her smartphone camera and then rotate the keys around 360 degrees in both the horizontal and vertical planes within the camera's field of view for the system/smartphone to generate a 3D point cloud of the keys for recognition of the keys in the future (e.g., inclusion of those 3D points/features into the mapitself).

5 FIG. 500 300 410 510 220 500 520 220 530 thus shows another GUIthat may be presented during this process of generating digital 3D points for the key for inclusion in the semantic map(e.g., responsive to selection of the start selector). As shown, a viewfinderof the current live feed from the smartphone camera is shown, with the live feed showing the keys. Responsive to the system/smartphone having enough data points on the keys based on the user's rotation of the keys in front of the smartphone camera as discussed above, the GUImay change to include a green check markindicating that enough 3D points have been identified to successfully identify the keysat a later time using IoT device cameras (e.g., regardless of viewing angle to the keys and orientation of the keys themselves). An indicationmay also be presented indicating that the 3D point cloud data for the keys is being saved (e.g., as part of the 3D semantic map).

300 200 300 200 300 600 2 FIG. 6 FIG. Now suppose that while the mapitself is initially generated, or upon updating of the map, a certain portion of the arearepresented in the mapdoes not have enough camera coverage (e.g., none of the cameras described above in reference tocurrently show that portion of the areain their field of view (FOV)). This can lead to a hole in the semantic mapwhere no or an insufficient amount of 3D data points exist.therefore shows a GUIthat may be presented in this situation.

6 FIG. 600 610 200 610 218 200 620 As shown in, the GUImay include a promptthat not enough image/camera coverage exists of the northwest corner of the living room area. The promptmay also instruct the end-user to move the Lenovo Assistant deviceor another device with a camera over toward the northwest corner to image that portion of the area. Once the user has done so, the user may select the selectorto command the device/system to generate images of the northwest corner and use those images to generate 3D feature data of the corner for inclusion in the SLAM map/semantic map itself.

220 220 300 700 700 710 7 FIG. Now suppose that, at a later time, the user is going to leave his/her house but cannot find the keys. The user may open an application (“app”) such as a semantic map app, IoT device map, home user experience (UX) app, etc. to help him/her locate the keysusing the mapitself. Thus, responsive to the app being opened or based on navigation of other screens within the app, the GUIofmay be presented. As shown, the GUImay include a text input fieldinto which the user may enter the name (label) of the object he/she is seeking to locate, either through voice input or text input or other type of input. In the present example, the user has entered “car keys”.

7 FIG. 720 300 700 720 300 720 720 204 212 222 720 710 700 730 220 220 216 720 also shows that a real-time versionof the semantic mapmay be presented as part of the GUI. The real-time versionmay be an updated semantic map derived from the mapbut with certain object representations moved within the updated semantic mapto locations within the mapcorresponding to the current real-world locations of the corresponding real-world objects themselves as may have been identified using IoT device cameras (e.g., using live feeds from one or more of the cameras,, and/or). Note that the semantic mapmay be presented on the display at an angle/FOV matching the current angle/FOV of the user themselves according to the user's current position and viewing angle into the real-world space itself (which may be determined using ultrawideband location tracking, IMU input and dead reckoning, GPS, etc.). Thus, based on the user entering the desired object into the fieldto locate it, which might be the car keys or even another object for which the user did not specify a user-specific label themselves, the GUImay be updated to include a graphical indicationin the form of a star and arrow pointing toward the current location of the keysaccording to the user's current perspective so that the user may easily locate the keys in the real-world. In the present instance, the keyshave fallen off the tableand are sitting underneath it as represented in the updated semantic mapitself.

220 700 800 800 700 720 300 800 200 800 810 720 730 720 730 8 FIG. 7 FIG. As another example means to help locate the keys, in addition to or in lieu of the GUI, the GUIofmay be presented in the app. Here the user may navigate to the GUIusing any of the same methods described above in reference to navigation to the GUIbut to sort through individual images of objects as derived from the updated semantic map(or map) and/or gathered from IoT camera input. The user may thus look, via the GUI, for untrained/unrecognized objects, user-labeled objects, and/or system-recognized objects to ascertain their last known locations within the area. Here, the GUIpresents the individual object images in the form of thumbnailsthat are accompanied by respective text identifiers indicating the respective label for each object (if available). Depending on desired implementation, the user may then locate the corresponding real-world object itself by appreciating its current location from the thumbnail, by selecting the thumbnail to present a larger version of the thumbnail to appreciate the object's current location from the larger version, and/or by selecting the thumbnail to command the device/app to present an updated semantic map and indication of the object (like the mapand graphical indicationof). Additionally or alternatively, the user may provide voice input of a description of the object, and voice recognition may then be executed to further narrow down the user's search for the desired object (e.g., if multiple candidates exist, like multiple sets of car keys) and possibly present the mapand indicationonce the intended object has been located.

206 200 200 300 200 Present principles may also be used for real-world object placement via map route planning. For example, suppose a cleaner or mover has been instructed by the premises' owner to move the couchwithin the areaor to place a new couch within the area. The owner may manipulate the mapwith his/her smart device to move object representations about and thus render an updated map where the logical position of the couch in the updated map does not yet correspond to the actual (current) real-world position of the couch but rather a desired future location of the couch. The owner may then send or otherwise grant access to the updated map to the cleaner or mover so that the map can be presented on the cleaner/mover device's display along with navigational assistance for placing the couch at the owner's desired location within the area.

9 FIG. 900 910 910 920 930 930 therefore shows a GUIwith this updated semantic map. The mapshows a 3D locationdesignated by the owner for where the couch should be placed, and graphical navigational assistanceis also provided in the form of arrows directing the user to that spot. Similar but audible navigational assistance may also be provided, if desired. The assistancemay thus indicate how to enter the area (e.g., a direction from which to enter, such as through a front door) as well as indicate a path to where the couch should be placed and the real-world location itself at which to place the couch.

Thus, present principles make route planning inside a mappable space possible for recognized/labeled objects. The map created for that space can be visually presented to the user through the (e.g., purpose-built) viewing application, and users can then place objects recognized by the semantic map at preferred locations as indicated inside that map. This may be done utilizing any augmented reality device, mobile device, or other device that has access to the semantic map so that a navigational path may be created from the current location of the accessing device to the desired location of each object for placement/positioning by the user. This may be particularly useful when a user wants to place objects inside an unfamiliar space but needs help navigating the space, and it can be particularly useful for moving companies, cleaners, etc.

10 FIG. 10 FIG. 10 FIG. 100 122 Now referring to, it shows example logic that may be executed by a device such as the systemand/or processor assemblyconsistent with present principles. The logic ofmay therefore be executed by one or multiple devices (e.g., client device and remotely-located server) in any appropriate combination. Note that while the logic ofis shown in flow chart format, other suitable logic may also be used.

1000 1000 1000 1010 Beginning at block, the device(s) may create a semantic map and, during creation of the semantic map, identify one or more real-world devices in the mapped area that each include at least one camera (e.g., using communication with those devices to identify their specifications as indicating camera inclusion, using object recognition to identify the cameras themselves, etc.). Also at block, the device may store the semantic map and other data (e.g., data indicating the one or more real-world devices that each include a camera, data indicating computer-derived labels for respective objects shown in the map as determined using object recognition, etc.). From blockthe logic may then proceed to block.

1010 1020 1020 1030 3 5 FIGS.- At blockthe device may prompt an end-user for labels for any objects that the device was unable to identify via object recognition to then, at block, receive user input of those labels and save those labels. This process may operate as already described above in reference to. From blockthe logic may then proceed to block.

1030 1040 1040 At blockthe device may access the semantic map again at a later time and proceed to decision diamondwhere the device may determine whether one or more triggers exist to update the semantic map. The trigger(s) may include a recurring/threshold period of time ending, receipt of a user command to update the semantic map, and/or a determination using object recognition that part of a real-world space represented in the semantic map has changed (e.g., as might occur if a user is already using the semantic map for navigational assistance as described above). A negative determination may cause the logic to continue making the determination at diamonduntil an affirmative determination is made, or the logic might revert back to a previous block to proceed again therefrom, depending on implementation.

1040 1050 1040 1050 1000 Then once an affirmative determination is made at diamond, the logic may proceed to block. Responsive to the affirmative determination at diamond, at blockthe device command one or more Internet of things (IoT) devices already identified as having cameras at blockto generate and provide updated images of the area indicated in the semantic map itself. The IoT devices may include, as non-limiting examples, a television, a smartphone, a tablet computer, a laptop computer, a headset, a stand-alone camera, a digital assistant device, a cooking appliance, an electronic door lock, and/or an electronic doorbell.

1060 1070 1000 1000 At blockthe device may thus receive the input (e.g., images) from one or more of the IoT device cameras to, at block, update the semantic map based on the input so that the updated semantic map indicates the current real-time locations of the real-world objects within the area. Additionally, the updated semantic map may remove 3D data for objects that are determined to no longer be present in the area, and to include 3D data for additional objects that are currently located in the area but were not there when the initial semantic map was created at block. Also note that the camera(s) used to update the semantic map may be the same as or different from the camera(s) used to create the initial semantic map at block.

1070 1080 1080 1000 1080 1080 3 5 FIGS.- From blockthe logic may then proceed to block, though blockmay additionally or alternatively be executed as part of blockor immediately thereafter. In any case, at blockthe device may execute an unidentified object labeling process as set forth above with respect to. Thus, as an example, at blockthe device may present a user interface indicating an object in the initial or updated semantic map that has not been identified via object recognition, receive user input indicating a label for the object, and update the semantic map with the label. The user interface might include a GUI as described above and/or an audible user interface (e.g., audible prompts provided to the user via a speaker on the user's device, to which the user may also respond audibly as detected via a microphone on the user's device).

1080 Also at block, the device may present a prompt for a user to use a camera to capture images of the unidentified object from different angles, receive the images of the unidentified object in response, generate three-dimensional (3D) data for the unidentified object based on those images, and update the semantic map with that 3D data as also already described above.

11 FIG. 1100 1100 Continuing the detailed description in reference to, it shows an example settings GUIthat may be presented on the display of a device configured to undertake present principles (e.g., a device executing a semantic mapping app consistent with present principles). The GUImay be presented to set or enable one or more settings of the device/app, and may might be navigated to through a device or app menu for example. Also note that each of the example options discussed below may be selected by directing touch or cursor input to the associated check box adjacent to the respective option.

11 FIG. 1100 1102 1102 As shown in, the GUIincludes an optionthat may be selected to set or configure the device/app to undertake present principles. Thus, optionmay be selected a single time to set or enable the device/app to, in multiple future instances, update a semantic map as described above (and/or to perform other functions described above with respect to the figures above).

1100 1104 1106 1108 If desired, the GUImay also include a settingat which the user may select a preferred labeling process. The user may thus select optionto label unidentified objects via a GUI, and optionto label unidentified objects via audible interaction with the system as discussed above.

1100 1110 1112 1114 Also if desired, the GUImay include a settingfor the end-user to select one or more specific devices to use for labeling objects that have not been identified via object recognition. Per the example shown, an optionmay be selected for the user to select his/her smartphone as the labeling device, and optionmay be selected for the user to select his/her AR headset/glasses to use as the labeling device.

1100 1116 1118 1120 1122 Additionally, in some example embodiments the GUImay include a settingat which the user may select one or more devices with cameras to authorize as devices to use for semantic map updates. Per the example shown, an optionmay be selected to select a stand-alone digital assistant device to use (e.g., a Lenovo Assistant), optionmay be selected to use a television with its own camera to use, and an optionmay be selected to use the user's own smartphone.

1100 1124 1100 1126 1128 Still further, the GUImay include a settingat which the user may select one or more triggers to use for semantic map updates. Accordingly, the GUImay include a first optionthat may be selected for semantic map updates, where the first option sets the device/app to update a semantic map at a recurring period of time (e.g., responsive to the recurring period of time ending). The recurring period of time itself may be specified by entering numerical input to input box, and in the present instance has been set at two hours such that a semantic map is updated every two hours.

11 FIG. 1130 1124 1130 As also shown in, a second optionmay be included for the setting. The second optionmay be selected to update a semantic map responsive to a determination using object recognition that part of a real-world space represented in the semantic map has changed. For instance, if the user or another person were using the semantic map and a smartphone to help place a couch at a particular location within the space as described above, and the smartphone determined based on the map that other objects are not currently at locations indicated in the map since their current locations as determined from camera input do not match map locations, the smartphone may trigger a semantic map update.

1124 1132 1132 Additionally, if desired the settingmay include a selectorthat may be selectable to update a semantic map immediately and responsive to the user's command via selection of the option. Thus, the user is provided a way to update the semantic map at will at a time of his/her choosing.

11 FIG. Moving on from, it may therefore be appreciated that, among other things, principles set forth above may be used for tracking and/or placement of objects. A user might therefore say “Tell me where my blue cup is” and the device may audibly navigate the user to the blue cup using a current version of a semantic map with the real-time locations of various objects within the corresponding space. Present principles may also be used to place furniture somewhere specific or show a clearer a specific item to clean. A mover might also use the semantic map to guide them to where to set certain objects back at locations desired by the space's owner or renter (e.g., by presenting a previous, saved version of the semantic map) so that the mover can set things back a certain way according to a previous configuration of the room. A user can thus change the logical position of an object in map, but not real-world position, to then that instruct the mover to actually move the object to the desired location in the real-world.

Augmented reality (AR) and virtual reality (VR) implementations are also envisioned. As such, the GUIs and other aspects described above may be presented at AR/VR headsets and other types of headsets.

Route planning inside mappable spaces can also be provided, where the semantic map may be used to route a user to a current location of an object he/she is seeking. Thus, the semantic map may be accessed from the cloud and loaded into any desired smart device to direct the user to the item he/she is seeking.

Additionally, not only can a system operating consistent with present principles request that a user change locations of a certain camera to help get adequate coverage of a real-world space for semantic mapping to recreate the space via the map, but if the camera is subsequently moved again by the user, the user may be provided with a GUI that reminds the user to move the camera back to the previous location so that semantic map updates can be executed with adequate camera coverage as well.

Before concluding, note that any of the GUIs discussed above may be presented on headset display transparently or semi-transparently using alpha-blending, and/or may be presented opaquely on smartphone display or other non-transparent display (such as a non-transparent virtual reality headset display). Audible user interfaces may also be implemented on a variety of different device types, including headsets and mobile devices.

Also before concluding, note that objects may be labeled audibly as well as through a GUI if desired. For instance, an audible prompt may be presented to label a given object by current location rather than character, and then a user may provide an audible response as detected via a microphone and processed using speech recognition and natural language understanding to then apply a location-based label indicated in the audible input.

It may now be appreciated that additional electronic tags for each object need not be used for tracking objects within a space, providing a technical improvement as additional tracking devices and communication channels need not necessarily be used. Accordingly, present principles provide for an improved computer-based user interface that increases the functionality and ease of use of the devices disclosed herein. The disclosed concepts are rooted in computer technology for computers to carry out their functions.

It is to be understood that whilst present principals have been described with reference to some example embodiments, these are not intended to be limiting, and that various alternative arrangements may be used to implement the subject matter claimed herein. Components included in one embodiment can be used in other embodiments in any appropriate combination. For example, any of the various components described herein and/or depicted in the Figures may be combined, interchanged or excluded from other embodiments.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G01C G01C21/3856 G01C21/3623

Patent Metadata

Filing Date

September 2, 2025

Publication Date

January 1, 2026

Inventors

Song Wang

Mengnan Wang

Kevin W. Beck

Russell S. VanBlon

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search