Patentable/Patents/US-20260057619-A1

US-20260057619-A1

Augmented Reality for Rendering Virtual Representations of Objects in Video Conferencing Platforms

PublishedFebruary 26, 2026

Assigneenot available in USPTO data we have

InventorsJuntao Feng Wenchong Lin Bo Ling Chong Lv Xingguo Zhu

Technical Abstract

A system for implementing augmented reality in connection with a conferencing platform obtains, via an imaging device, interaction space image information indicative of an interaction space. The system generates, based on the interaction space image information, virtual interaction space information corresponding to a virtual interaction space comprising a virtual three-dimensional (3D) structure representing the interaction space. The system obtains virtual object information for displaying a virtual object within the virtual interaction space. The system provides, for output, rendering information configured to cause a computing device to present, via a display device, the virtual object within the virtual interaction space.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

obtaining, via an imaging device, interaction space image information indicative of an interaction space; generating, based on the interaction space image information, virtual interaction space information corresponding to a virtual interaction space associated with a video conference, the virtual interaction space comprising a virtual three-dimensional (3D) structure representing the interaction space; obtaining, via an artificial intelligence content generation (AICG) system, virtual object information for displaying a virtual object within the virtual interaction space, wherein the virtual object information comprises new content generated by the AICG system based on input data associated with user interactions and environmental contexts; and providing, for output and based on the virtual interaction space information and the virtual object information, rendering information configured to cause a computing device to present, via a display device, the virtual object within the virtual interaction space. . A method, comprising:

claim 1 obtaining the virtual object information via at least one of the imaging device or an additional imaging device, the virtual object comprising a virtual representation of a physical object. . The method of, wherein obtaining the virtual object information comprises:

claim 1 obtaining the virtual object information via a content service. . The method of, wherein obtaining the virtual object information comprises:

claim 1 obtaining, from a plurality of users, a plurality of concurrent user inputs indicative of user interaction with a virtual object; and prioritizing the plurality of concurrent user inputs based on at least one of predefined criteria or a user. . The method of, further comprising:

claim 1 obtaining a gesture indication indicative of detection, in user image information associated with the video conference, of a hand gesture made by a participant; and providing, for output and based on the gesture indication, additional rendering information configured to cause the computing device to present, via the display device, a modification of the virtual object. . The method of, further comprising:

claim 1 obtaining an indication of a request, from a participant, to display a 3D virtual participant depiction within the virtual interaction space; and providing, for output and based on the indication, additional rendering information configured to cause the computing device to present, via the display device, the 3D virtual participant depiction within the virtual interaction space, wherein the 3D virtual participant depiction comprises a 3D representation of the participant, wherein providing the additional rendering information comprises providing the additional rendering information configured to cause the computing device to present the 3D virtual participant depiction in a virtual location within the virtual interaction space based on a location of the participant. . The method of, further comprising:

claim 1 obtaining an indication of a request, from a participant, to display a 3D virtual participant depiction within the virtual interaction space; and providing, for output and based on the indication, additional rendering information configured to cause the computing device to present, via the display device, the 3D virtual participant depiction within the virtual interaction space, wherein the 3D virtual participant depiction comprises a 3D representation of the participant, wherein providing the additional rendering information comprises providing the additional rendering information configured to cause the computing device to present the 3D virtual participant depiction in a virtual pose based on a pose of the participant. . The method of, further comprising:

claim 1 obtaining an indication of a request, from a participant, to display a 3D virtual participant depiction within the virtual interaction space; and providing, for output and based on the indication, additional rendering information configured to cause the computing device to present, via the display device, the 3D virtual participant depiction within the interaction space, wherein the 3D virtual participant depiction comprises a 3D representation of the participant, wherein providing the additional rendering information comprises providing the additional rendering information configured to cause the computing device to present the 3D virtual participant depiction as having a virtual facial expression based on a facial expression of the participant. . The method of, further comprising:

claim 1 obtaining an indication of a request, from a participant, to display a 3D virtual participant depiction within the virtual interaction space; and providing, for output and based on the indication, additional rendering information configured to cause the computing device to present, via the display device, the 3D virtual participant depiction within the virtual interaction space, wherein the 3D virtual participant depiction comprises a 3D representation of the participant, wherein providing the additional rendering information comprises providing the additional rendering information configured to cause the computing device to present a virtual action performed by the 3D virtual participant depiction based on an action performed by the participant. . The method of, further comprising:

obtaining, via an imaging device, interaction space image information indicative of an interaction space; generating, based on the interaction space image information, virtual interaction space information corresponding to a virtual interaction space associated with a video conference, the virtual interaction space comprising a virtual three-dimensional (3D) structure representing the interaction space; obtaining, via an artificial intelligence content generation (AICG) system, virtual object information for displaying a virtual object within the virtual interaction space, wherein the virtual object information comprises new content generated by the AICG system based on input data associated with user interactions and environmental contexts; and providing, for output, and based on the virtual interaction space information and the virtual object information, rendering information configured to cause a computing device to present, via a display device, the virtual object within the virtual interaction space. . A non-transitory computer-readable medium storing instructions operable to cause one or more processors to perform operations comprising:

claim 11 . The non-transitory computer-readable medium of, wherein the virtual object comprises a virtual representation of a white board.

claim 11 . The non-transitory computer-readable medium of, wherein the virtual object comprises a virtual representation of a white board, and wherein the virtual representation of the white board is configured to be manipulated by a participant via an input device of the computing device.

claim 11 . The non-transitory computer-readable medium of, wherein the virtual object comprises a virtual representation of a physical object.

claim 11 . The non-transitory computer-readable medium of, wherein the virtual object information comprises content provided by a content service.

claim 11 obtaining, from a plurality of users, a plurality of concurrent user inputs indicative of user interaction with a virtual object; and prioritizing the plurality of concurrent user inputs based on at least one of predefined criteria or a user role. . The non-transitory computer-readable medium of, the operations further comprising:

claim 11 obtaining a gesture indication indicative of detection, in user image information associated with the video conference, of a hand gesture made by a participant; and providing, for output and based on the gesture indication, additional rendering information configured to cause the computing device to present, via the display device, a modification of a virtual position of the virtual object. . The non-transitory computer-readable medium of, the operations further comprising:

one or more memories; and one or more processors configured to execute instructions stored in the one or more memories to: obtain, via an imaging device, interaction space image information indicative of an interaction space; generate, based on the interaction space image information, virtual interaction space information corresponding to a virtual interaction space associated with a video conference, the virtual interaction space comprising a virtual three-dimensional (3D) structure representing the interaction space; obtain, via an artificial intelligence content generation (AICG) system, virtual object information for displaying a virtual object within the virtual interaction space, wherein the virtual object information comprises new content generated by the AICG system based on input data associated with user interactions and environmental contexts; and provide, for output, rendering information configured to cause a computing device to present, via a display device and based on the virtual 3D structure, the virtual object within the virtual interaction space. . A system, comprising:

claim 18 obtain an indication of a request, from a participant, to display a 3D virtual participant depiction within the virtual interaction space; and provide, for output and based on the indication, additional rendering information configured to cause the computing device to present, via the display device, the 3D virtual participant depiction within the virtual interaction space, wherein the 3D virtual participant depiction comprises a 3D representation of the participant. . The system of, wherein the one or more processors are further configured to execute the instructions stored in the one or more memories to:

claim 18 . The system of, wherein the one or more processors are further configured to execute the instructions stored in the one or more memories to provide, for output, additional rendering information configured to cause the computing device to present an additional virtual object within the virtual interaction space, wherein the additional virtual object is configured to interact with the virtual object.

Detailed Description

Complete technical specification and implementation details from the patent document.

This disclosure generally relates to video conferencing platforms, and, more specifically, to augmented reality for video conferencing platforms.

Conferencing software is frequently used across various industries to support video-enabled conferences between participants in multiple locations. A participant (which may be referred to interchangeably herein as a “user”) refers to a human that is involved in a video conference, whether or not the human is taking any action during the video conference. In some cases, each of the conference participants separately connects to the conferencing software from their own remote locations. In other cases, one or more of the conference participants may be physically located in and connect to the conferencing software from a conference room or similar physical space (e.g., in an office setting) while other conference participants connect to the conferencing software from one or more remote locations. Conferencing software thus enables people to conduct video conferences without requiring them to be physically present with one another. Conferencing software may be available as a standalone software product or it may be integrated within a software platform, such as a unified communications as a service (UCaaS) platform.

Online conferences have become increasingly prevalent in recent years, offering a convenient and efficient way for individuals to collaborate remotely. The current state of video conferencing technology primarily involves two-dimensional (2D) video and audio communication, where participants view each other and interact through flat screens. While advancements have been made in improving video quality, audio clarity, and integrating collaborative tools, these systems often lack the ability to provide a fully immersive experience. Traditional video conferencing systems fail to capture the depth and spatial dynamics of physical interactions, leading to a less engaging and interactive experience. Participants are thus confined to interacting within the limited scope of 2D environments, which can hinder effective communication and collaboration, particularly in scenarios that require a more dynamic and spatially aware interaction.

Implementations of this disclosure address problems such as these by introducing a new approach to online conferencing that leverages augmented reality (AR) technology to create a more engaging and interactive environment for video conferencing. The disclosure focuses on generating a virtual representation of an interaction space. The interaction space is a real-world, physical space such as a conference room, an office, a room in a house, a park bench, a backyard, or any other physical space in which a human may locate themselves during a video conference. The virtual representation of the interaction space may be referred to as a “virtual interaction space”and may include a three-dimensional (3D) structure representing the interaction space.

Some implementations include populating the virtual interaction space with 3D virtual objects that can be manipulated and interacted with by participants of a video conference. In this way, implementations facilitate an AR conference experience that may enhance the conference experience by providing a more immersive and dynamic environment, thereby facilitating a richer and more collaborative experience in which participants can demonstrate ideas, share visual aids, and interact with digital representations in real-time, which may foster clearer communication and a deeper understanding of the presented content. Additionally, the system may support the introduction of 3D virtual participant depictions, enabling participants to express emotions and actions more naturally, thereby creating a more lifelike and engaging presence in the virtual interaction space. The teachings of this disclosure thus significantly improve the immersive experience of video conferencing, making remote communication more effective and interactive.

According to the implementations of this disclosure, an interaction space (e.g., any 3D space within which one or more humans may be located and interact with one or more other humans via a video conferencing platform) may be imaged using 2D and/or 3D imaging techniques and a virtual representation of the interaction space may be generated. The interaction space may include a physical conference room and/or an office, among other non-limiting examples, and may generally refer to an indoor or outdoor space or combination of such spaces. With the captured 3D structure of the interaction space, the virtual representation of the interaction space can be decorated with different 3D virtual objects in realistic geometric positions for entertainment and information transmission. This decoration may be viewed by participants of the video conference at one or more times throughout the video conference. As the virtual objects are projected into the virtual interaction space, an AR conference experience is created. The virtual objects may be controlled, modified, and/or otherwise manipulated by any number of different participants in the video conference. In some implementations, a virtual object may be designed with one or more effects that can be triggered by a participant. For instance, the conferencing platform may employ gesture recognition to facilitate selection of virtual objects, prompting the virtual objects to move, rotate, or to trigger effects such as expanding collapsed information. In some implementations, participants may interact with virtual objects by clicking on the virtual objects.

In some implementations, participants may introduce their 3D virtual participant depictions into the virtual representation of the interaction space. Utilizing pose tracking and face tracking technology, a participant may be able to control the expressions and actions of a 3D virtual participant depiction of the participant as if the participant were physically present in the interaction space. Various 3D effects can be combined into the system for participants to experience, allowing for a wide range of creative implementations. Previously created virtual objects can be downloaded from the internet or captured using a 3D imaging device. In some implementations, the virtual objects may be generated using an artificial intelligence content generation (AICG) system.

Camera calibration may be conducted using intrinsic parameters of the camera that may be provided to the system. Additionally, camera calibration may be performed using computer vision algorithms to obtain intrinsic and extrinsic parameters of the interaction space camera. These parameters facilitate accurate projection of the virtual objects into the virtual interaction space during an online conference as well as realistic interaction with the 3D structure of the virtual interaction space by the virtual objects. An AI-based interaction toolbox may be provided for the participants to interact with virtual objects using hand gestures, allowing control over the virtual objects'positions, poses, and virtual effects.

In some examples of the present disclosure, implementations may include or otherwise use one or more artificial intelligence or machine learning (collectively, AI/ML) systems having one or more models trained for one or more purposes. Use or inclusion of such AI/ML systems, such as for implementation of certain features or functions, may be turned off by default, where a user, an organization, or both must opt-in to utilize the features or functions that include or otherwise use an AI/ML system. User or organizational consent to use the AI/ML systems or features may be provided in one or more ways, for example, as explicit permission granted by a user prior to using an AI/ML feature, as administrative consent configured by administrator settings, or both. Users for whom such consent is obtained can be notified that they will be interacting with one or more AI/ML systems or features, for example, by an electronic message (e.g., delivered via a chat or email service or presented within a client application or webpage) or by an on-screen prompt, which can be applied on a per-interaction basis. Those users can also be provided with an easy way to withdraw their user consent, for example, using a form or like element provided within a client application, webpage, or on-screen prompt to allow individual users to opt-out of use of the AI/ML systems or features.

For example, a software platform, such as a UCaaS platform, may provide artificial intelligence (AI) functionality for use with the software services thereof. Use of the AI functionality may enhance the user experience by automating processes, answering prompted questions with minimal or no disruption to an active communication session, or introducing capabilities previously unavailable to software service users. Such AI functionality may be implemented using one or more machine learning models, which may be trained to process specific types of input and produce specific types of output. For example, machine learning functionality enabled for use during a video conference may be implemented using a large language model (LLM) trained to obtain user requests as natural language prompts and to produce output responsive to the user requests in a same language as that which the prompts are obtained. In one non-limiting example, a video conference participant who joins the video conference after it began may submit a user request to an LLM to ask for a summary of the discussion that occurred during the video conference before the participant joined. The LLM may evaluate a real-time transcription of the video conference (e.g., produced using automated speech recognition or a like tool) to present output concisely summarizing that discussion.

Machine learning models may be implemented for use in a variety of use cases (e.g., language processing, image feature extraction, cyberthreat detection, or recommendation production), using a variety of approaches (e.g., supervised learning, unsupervised learning, or reinforcement learning), and in a variety of structures (e.g., a neural network, decision tree, linear regression, vector machine, Bayesian network, genetic algorithm, or deep learning system).

To enhance privacy and safety, as well as provide other benefits, the AI/ML processing system may be prevented from using a user's or organization's personal information (e.g., audio, video, chat, screen-sharing, attachments, or other communications-like content (such as poll results, whiteboards, or reactions)) to train any AI/ML models and instead only use the personal information for inference operations of the AI/ML processing system. Instead of using the personal information to train AI/ML models, AI/ML models may be trained using one or more commercially licensed data sets that do not contain the personal information of the user or organization.

1 FIG. 100 To describe some implementations in greater detail, reference is first made to examples of hardware and software structures used to implement a system for implementing AR in connection with a conferencing platform.is a block diagram of an example of an electronic computing and communications system, which can be or include a distributed computing system (e.g., a client-server computing system), a cloud computing system, a clustered computing system, or the like.

100 102 102 102 104 104 102 104 104 104 104 102 104 104 102 The systemincludes one or more customers, such as customersA throughB, which may each be a public entity, private entity, or another corporate entity or individual that purchases or otherwise uses software services, such as of a UCaaS platform provider. Each customer can include one or more clients. For example, as shown and without limitation, the customerA can include clientsA throughB, and the customerB can include clientsC throughD. A customer can include a customer network or domain. For example, and without limitation, the clientsA throughB can be associated or communicate with a customer network or domain for the customerA and the clientsC throughD can be associated or communicate with a customer network or domain for the customerB.

104 104 A client, such as one of the clientsA throughD, may be or otherwise refer to one or both of a client device or a client application. Where a client is or refers to a client device, the client can comprise a computing system, which can include one or more computing devices, such as a mobile phone, a tablet computer, a laptop computer, a notebook computer, a desktop computer, or another suitable computing device or combination of computing devices. Where a client instead is or refers to a client application, the client can be an instance of software running on a customer device (e.g., a client device or another device). In some implementations, a client can be implemented as a single physical unit or as a combination of physical units. In some implementations, a single physical unit can include multiple clients.

100 100 1 FIG. The systemcan include a number of customers and/or clients or can have a configuration of customers or clients different from that generally illustrated in. For example, and without limitation, the systemcan include hundreds or thousands of customers, and at least some of the customers can include or be associated with a number of clients.

100 106 106 100 100 106 102 102 1 FIG. The systemincludes a datacenter, which may include one or more servers. The datacentercan represent a geographic location, which can include a facility, where the one or more servers are located. The systemcan include a number of datacenters and servers or can include a configuration of datacenters and servers different from that generally illustrated in. For example, and without limitation, the systemcan include tens of datacenters, and at least some of the datacenters can include hundreds or another suitable number of servers. In some implementations, the datacentercan be associated or communicate with one or more datacenter networks or domains, which can include domains other than the customer domains for the customersA throughB.

106 106 108 110 112 108 112 108 112 106 108 112 102 102 The datacenterincludes servers used for implementing software services of a UCaaS platform. The datacenteras generally illustrated includes an application server, a database server, and a telephony server. The serversthroughcan each be a computing system, which can include one or more computing devices, such as a desktop computer, a server computer, or another computer capable of operating as a server, or a combination thereof. A suitable number of each of the serversthroughcan be implemented at the datacenter. The UCaaS platform uses a multi-tenant architecture in which installations or instantiations of the serversthroughis shared amongst the customersA throughB.

108 112 108 110 112 106 108 112 In some implementations, one or more of the serversthroughcan be a non-hardware server implemented on a physical device, such as a hardware server. In some implementations, a combination of two or more of the application server, the database server, and the telephony servercan be implemented as a single hardware server or as a single non-hardware server implemented on a single hardware server. In some implementations, the datacentercan include servers other than or in addition to the serversthrough, for example, a media server, a proxy server, or a web server.

108 104 104 108 108 The application serverruns web-based software services deliverable to a client, such as one of the clientsA throughD. As described above, the software services may be of a UCaaS platform. For example, the application servercan implement all or a portion of a UCaaS platform, including conferencing software, messaging software, and/or other intra-party or inter-party communications software. The application servermay, for example, be or include a unitary Java Virtual Machine (JVM).

108 108 104 108 108 108 108 108 In some implementations, the application servercan include an application node, which can be a process executed on the application server. For example, and without limitation, the application node can be executed in order to deliver software services to a client, such as one of the clientsA through 104D, as part of a software application. The application node can be implemented using processing threads, virtual machine instantiations, or other computing features of the application server. In some such implementations, the application servercan include a suitable number of application nodes, depending upon a system load or other characteristics associated with the application server. For example, and without limitation, the application servercan include two or more nodes forming a node cluster. In some such implementations, the application nodes implemented on a single application servercan run on different hardware servers.

110 108 104 104 110 108 110 108 110 100 The database serverstores, manages, or otherwise provides data for delivering software services of the application serverto a client, such as one of the clientsA throughD. In particular, the database servermay implement one or more databases, tables, or other information sources suitable for use with a software application implemented using the application server. The database servermay include a data storage unit accessible by software executed on the application server. A database implemented by the database servermay be a relational database management system (RDBMS), an object database, an XML database, a configuration management database (CMDB), a management information base (MIB), one or more flat files, other suitable non-transient storage mechanisms, or a combination thereof. The systemcan include one or more database servers, in which each database server can include one, two, three, or another suitable number of databases configured as or comprising a suitable database type or combination thereof.

100 110 104 108 In some implementations, one or more databases, tables, other suitable information sources, or portions or combinations thereof may be stored, managed, or otherwise provided by one or more of the elements of the systemother than the database server, for example, the clientA or the application server.

112 104 104 102 104 104 102 104 104 114 112 102 102 114 108 108 112 The telephony serverenables network-based telephony and web communications from and/or to clients of a customer, such as the clientsA throughB for the customerA or the clientsC throughD for the customerB. For example, one or more of the clientsA throughD may be voice over internet protocol (VOIP)-enabled devices configured to send and receive calls over a network. The telephony serverincludes a session initiation protocol (SIP) zone and a web zone. The SIP zone enables a client of a customer, such as the customerA orB, to send and receive calls over the networkusing SIP requests and responses. The web zone integrates telephony data with the application serverto enable telephony-based traffic access to software services run by the application server. Given the combined functionality of the SIP zone and the web zone, the telephony servermay be or include a cloud-based private branch exchange (PBX) system.

112 112 112 The SIP zone receives telephony traffic from a client of a customer and directs same to a destination device. The SIP zone may include one or more call switches for routing the telephony traffic. For example, to route a VOIP call from a first VOIP-enabled client of a customer to a second VOIP-enabled client of the same customer, the telephony servermay initiate a SIP transaction between a first client and the second client using a PBX for the customer. However, in another example, to route a VOIP call from a VOIP-enabled client of a customer to a client or non-client device (e.g., a desktop phone which is not configured for VOIP communication) which is not VOIP-enabled, the telephony servermay initiate a SIP transaction via a VOIP gateway that transmits the SIP signal to a public switched telephone network (PSTN) system for outbound communication to the non-VOIP-enabled client or non-client phone. Hence, the telephony servermay include a PSTN system and may in some cases access an external PSTN system.

112 112 104 104 112 The telephony serverincludes one or more session border controllers (SBCs) for interfacing the SIP zone with one or more aspects external to the telephony server. In particular, an SBC can act as an intermediary to transmit and receive SIP requests and responses between clients or non-client devices of a given customer with clients or non-client devices external to that customer. When incoming telephony traffic for delivery to a client of a customer, such as one of the clientsA throughD, originating from outside the telephony serveris received, an SBC receives the traffic and forwards it to a call switch for routing to the client.

112 112 112 112 In some implementations, the telephony server, via the SIP zone, may enable one or more forms of peering to a carrier or customer premise. For example, Internet peering to a customer premise may be enabled to ease the migration of the customer from a legacy provider to a service provider operating the telephony server. In another example, private peering to a customer premise may be enabled to leverage a private connection terminating at one end at the telephony serverand at the other end at a computing aspect of the customer environment. In yet another example, carrier peering may be enabled to leverage a connection of a peered carrier to the telephony server.

112 112 112 In some such implementations, an SBC or telephony gateway within the customer environment may operate as an intermediary between the SBC of the telephony serverand a PSTN for a peered carrier. When an external SBC is first registered with the telephony server, a call from a client can be routed through the SBC to a load balancer of the SIP zone, which directs the traffic to a call switch of the telephony server. Thereafter, the SBC may be configured to communicate directly with the call switch.

108 108 108 The web zone receives telephony traffic from a client of a customer, via the SIP zone, and directs same to the application servervia one or more Domain Name System (DNS) resolutions. For example, a first DNS within the web zone may process a request received via the SIP zone and then deliver the processed request to a web service which connects to a second DNS at or otherwise associated with the application server. Once the second DNS resolves the request, it is delivered to the destination service at the application server. The web zone may also include a database for authenticating access to a software application for telephony traffic processed within the SIP zone, for example, a softphone.

104 108 112 106 114 114 114 The clientsA through 104D communicate with the serversthroughof the datacentervia the network. The networkcan be or include, for example, the Internet, a local area network (LAN), a wide area network (WAN), a virtual private network (VPN), or another public or private means of electronic computer communication capable of transferring data between a client and one or more servers. In some implementations, a client can connect to the networkvia a communal connection point, link, or path, or using a distinct connection point, link, or path. For example, a connection point, link, or path can be wired, wireless, use other communications technologies, or a combination thereof.

114 106 100 106 116 114 106 116 106 The network, the datacenter, or another element, or combination of elements, of the systemcan include network hardware such as routers, switches, other network devices, or combinations thereof. For example, the datacentercan include a load balancerfor routing traffic from the networkto various servers associated with the datacenter. The load balancercan route, or direct, computing communications traffic, such as signals or messages, to respective elements of the datacenter.

116 104 104 108 112 116 116 106 For example, the load balancercan operate as a proxy, or reverse proxy, for a service, such as a service provided to one or more remote clients, such as one or more of the clientsA throughD, by the application server, the telephony server, and/or another server. Routing functions of the load balancercan be configured directly or via a DNS. The load balancercan coordinate requests from remote clients and can simplify client access by masking the internal configuration of the datacenterfrom the remote clients.

116 116 106 116 106 106 116 1 FIG. In some implementations, the load balancercan operate as a firewall, allowing or preventing communications based on configuration settings. Although the load balanceris depicted inas being within the datacenter, in some implementations, the load balancercan instead be located outside of the datacenter, for example, when providing global routing for multiple datacenters. In some implementations, load balancers can be included both within and outside of the datacenter. In some implementations, the load balancercan be omitted.

2 FIG. 1 FIG. 200 200 104 108 110 112 100 is a block diagram of an example internal configuration of a computing deviceof an electronic computing and communications system. In one configuration, the computing devicemay implement one or more of the clientA, the application server, the database server, or the telephony serverof the systemshown in.

200 202 204 206 208 210 212 214 204 208 210 212 214 202 206 The computing deviceincludes components or units, such as a processor, a memory, a bus, a power source, peripherals, a user interface, a network interface, other suitable components, or a combination thereof. One or more of the memory, the power source, the peripherals, the user interface, or the network interfacecan communicate with the processorvia the bus.

202 202 202 202 202 The processoris a central processing unit, such as a microprocessor, and can include single or multiple processors having single or multiple processing cores. Alternatively, the processorcan include another type of device, or multiple devices, configured for manipulating or processing information. For example, the processorcan include multiple processors interconnected in one or more manners, including hardwired or networked. The operations of the processorcan be distributed across multiple devices or units that can be coupled directly or across a local area or other suitable type of network. The processorcan include a cache, or cache memory, for local storage of operating data or instructions.

204 204 204 204 The memoryincludes one or more memory components, which may each be volatile memory or non-volatile memory. For example, the volatile memory can be random access memory (RAM) (e.g., a DRAM module, such as DDR SDRAM). In another example, the non-volatile memory of the memorycan be a disk drive, a solid state drive, flash memory, or phase-change memory. In some implementations, the memorycan be distributed across multiple devices. For example, the memorycan include network-based memory or memory in multiple clients or servers performing the operations of those multiple devices.

204 202 204 216 218 220 216 202 216 218 218 220 The memorycan include data for immediate access by the processor. For example, the memorycan include executable instructions, application data, and an operating system. The executable instructionscan include one or more application programs, which can be loaded or copied, in whole or in part, from non-volatile memory to volatile memory to be executed by the processor. For example, the executable instructionscan include instructions for performing some or all of the techniques of this disclosure. The application datacan include user data, database data (e.g., database catalogs or dictionaries), or the like. In some implementations, the application datacan include functional programs, such as a web browser, a web server, a database server, another program, or a combination thereof. The operating systemcan be, for example, Microsoft Windows®, Mac OS X®, or Linux®; an operating system for a mobile device, such as a smartphone or tablet device; or an operating system for a non-mobile device, such as a mainframe computer.

208 200 208 208 200 200 208 The power sourceprovides power to the computing device. For example, the power sourcecan be an interface to an external power distribution system. In another example, the power sourcecan be a battery, such as where the computing deviceis a mobile device or is otherwise configured to operate independently of an external power distribution system. In some implementations, the computing devicemay include or otherwise use multiple power sources. In some such implementations, the power sourcecan be a backup battery.

210 200 200 210 200 202 200 210 The peripheralsincludes one or more sensors, detectors, or other devices configured for monitoring the computing deviceor the environment around the computing device. For example, the peripheralscan include a geolocation component, such as a global positioning system location unit. In another example, the peripherals can include a temperature sensor for measuring temperatures of components of the computing device, such as the processor. In some implementations, the computing devicecan omit the peripherals.

212 The user interfaceincludes one or more input interfaces and/or output interfaces. An input interface may, for example, be a positional input device, such as a mouse, touchpad, touchscreen, or the like; a keyboard; or another suitable human or machine interface device. An output interface may, for example, be a display, such as a liquid crystal display, a cathode-ray tube, a light emitting diode display, or other suitable display.

214 114 214 200 214 1 FIG. The network interfaceprovides a connection or link to a network (e.g., the networkshown in). The network interfacecan be a wired network interface or a wireless network interface. The computing devicecan communicate with other devices via the network interfaceusing one or more network protocols, such as using Ethernet, transmission control protocol (TCP), internet protocol (IP), power line communication, an IEEE 802.X protocol (e.g., Wi-Fi, Bluetooth, or ZigBee), infrared, visible light, general packet radio service (GPRS), global system for mobile communications (GSM), code-division multiple access (CDMA), Z-Wave, another protocol, or a combination thereof.

3 FIG. 1 FIG. 1 FIG. 1 FIG. 300 100 300 104 104 102 104 104 102 300 108 110 112 106 is a block diagram of an example of a software platformimplemented by an electronic computing and communications system, for example, the systemshown in. The software platformis a UCaaS platform accessible by clients of a customer of a UCaaS platform provider, for example, the clientsA throughB of the customerA or the clientsC throughD of the customerB shown in. The software platformmay be a multi-tenant platform instantiated using one or more servers at one or more datacenters including, for example, the application server, the database server, and the telephony serverof the datacentershown in.

300 302 304 306 308 310 The software platformincludes software services accessible using one or more clients. For example, a customeras shown includes four clients-a client(e.g., a desk phone), a client(e.g., a computer), a client(e.g., a mobile device), and a client(e.g., a shared device). The desk phone is a desktop unit configured to at least send and receive calls and includes an input device for receiving a telephone number or extension to dial to and an output device for outputting audio and/or video for a call in progress. The computer is a desktop, laptop, or tablet computer including an input device for receiving some form of user input and an output device for outputting information in an audio and/or visual format. The mobile device is a smartphone, wearable device, or other mobile computing aspect including an input device for receiving some form of user input and an output device for outputting information in an audio and/or visual format. The desk phone, the computer, and the mobile device may generally be considered personal devices configured for use by a single user. The shared device is a desk phone, a computer, a mobile device, or a different device which may instead be configured for use by multiple specified or unspecified users.

304 310 300 302 302 302 3 FIG. Each of the clientsthroughincludes or runs on a computing device configured to access at least a portion of the software platform. In some implementations, the customermay include additional clients not shown. For example, the customermay include multiple clients of one or more client types (e.g., multiple desk phones or multiple computers) and/or one or more clients of a client type not shown in(e.g., wearable devices or televisions other than as shared devices). For example, the customermay have tens or hundreds of desk phones, computers, mobile devices, and/or shared devices.

300 300 312 314 316 318 312 318 320 302 320 110 1 FIG. The software services of the software platformgenerally relate to communications tools but are in no way limited in scope. As shown, the software services of the software platforminclude telephony software, conferencing software, messaging software, and other software. Some or all of the softwarethroughuses customer configurationsspecific to the customer. The customer configurationsmay, for example, be data stored within a database or other data store at a database server, such as the database servershown in.

312 304 310 304 310 302 302 312 The telephony softwareenables telephony traffic between ones of the clientsthroughand other telephony-enabled devices, which may be other ones of the clientsthrough, other VOIP-enabled clients of the customer, non-VOIP-enabled devices of the customer, VOIP-enabled clients of another customer, non-VOIP-enabled devices of another customer, or other VOIP-enabled clients or non-VOIP-enabled devices. Calls sent or received using the telephony softwaremay, for example, be sent or received using the desk phone, a softphone running on the computer, a mobile application running on the mobile device, or using the shared device that includes telephony features.

312 300 312 302 314 316 318 The telephony softwarefurther enables phones that do not include a client application to connect to other software services of the software platform. For example, the telephony softwaremay receive and process calls from phones not associated with the customerto route that telephony traffic to one or more of the conferencing software, the messaging software, or the other software.

314 314 314 314 314 314 The conferencing softwareenables audio, video, and/or other forms of conferences between multiple participants, such as to facilitate a conference between those participants. In some cases, the participants may all be physically present within a single location, for example, a conference room, in which the conferencing softwaremay facilitate a conference between only those participants and using one or more clients within the conference room. In some cases, one or more participants may be physically present within a single location and one or more other participants may be remote, in which the conferencing softwaremay facilitate a conference between all of those participants using one or more clients within the conference room and one or more remote clients. In some cases, the participants may all be remote, in which the conferencing softwaremay facilitate a conference between the participants using different clients for the participants. The conferencing softwarecan include functionality for hosting, presenting scheduling, joining, or otherwise participating in a conference. The conferencing softwaremay further include functionality for recording some or all of a conference and/or documenting a transcript for the conference.

316 316 The messaging softwareenables instant messaging, unified messaging, and other types of messaging communications between multiple devices, such as to facilitate a chat or other virtual conversation between users of those devices. The unified messaging functionality of the messaging softwaremay, for example, refer to email messaging which includes a voicemail transcription service delivered in email format.

318 300 318 318 314 318 The other softwareenables other functionality of the software platform. Examples of the other softwareinclude, but are not limited to, device management software, resource provisioning and deployment software, administrative software, third party integration software, and the like. In one particular example, the other softwarecan include AR software such as, for example, an offline preparation component and/or an online processing component. In some such cases, the conferencing softwarecan include the other software.

312 318 106 312 318 108 112 312 318 312 318 108 112 312 318 1 FIG. 1 FIG. 1 FIG. The softwarethroughmay be implemented using one or more servers, for example, of a datacenter such as the datacentershown in. For example, one or more of the softwarethroughmay be implemented using an application server, a database server, and/or a telephony server, such as the serversthroughshown in. In another example, one or more of the softwarethroughmay be implemented using servers not shown in, for example, a meeting server, a web server, or another server. In yet another example, one or more of the softwarethroughmay be implemented using one or more of the serversthroughand one or more other servers. The softwarethroughmay be implemented by different servers or by the same server.

300 316 302 312 314 302 314 302 312 318 304 310 Features of the software services of the software platformmay be integrated with one another to provide a unified experience for users. For example, the messaging softwaremay include a user interface element configured to initiate a call with another user of the customer. In another example, the telephony softwaremay include functionality for elevating a telephone call to a conference. In yet another example, the conferencing softwaremay include functionality for sending and receiving instant messages between participants and/or other users of the customer. In yet another example, the conferencing softwaremay include functionality for file sharing between participants and/or other users of the customer. In some implementations, some or all of the softwarethroughmay be combined into a single software application run on clients of the customer, such as one or more of the clientsthrough.

4 FIG. 1 FIG. 3 FIG. 3 FIG. 1 FIG. 4 FIG. 400 100 400 402 404 406 406 314 408 410 412 408 410 304 310 406 412 406 406 400 100 108 400 is a block diagram of an example of a conferencing systemfor delivering conferencing software services in an electronic computing and communications system, for example, the systemshown in. The conferencing systemincludes a thread encoding tool, a switching/routing tool, and conferencing software. The conferencing software, which may, for example, the conferencing softwareshown in, is software for implementing conferences (e.g., video conferences) between users of clients and/or phones, such as clientsandand phone. For example, the clientsormay each be one of the clientsthroughshown inthat runs a client application associated with the conferencing software, and the phonemay be a telephone which does not run a client application associated with the conferencing softwareor otherwise access a web application associated with the conferencing software. The conferencing systemmay in at least some cases be implemented using one or more servers of the system, for example, the application servershown in. Although two clients and a phone are shown in, other numbers of clients and/or other numbers of phones can connect to the conferencing system.

408 410 412 400 406 408 410 412 408 410 412 Implementing a conference includes transmitting and receiving video, audio, and/or other data between clients and/or phones, as applicable, of the conference participants. Each of the client, the client, and the phonemay connect through the conferencing systemusing separate input streams to enable users thereof to participate in a conference together using the conferencing software. The various channels used for establishing connections between the clientsandand the phonemay, for example, be based on the individual device capabilities of the clientsandand the phone.

406 400 406 The conferencing softwareincludes a user interface tile for each input stream received and processed at the conferencing system. A user interface tile as used herein generally refers to a portion of a conferencing software user interface which displays information (e.g., a rendered video) associated with one or more conference participants. A user interface tile may, but need not, be generally rectangular. The size of a user interface tile may depend on one or more factors including the view style set for the conferencing software user interface at a given time and whether the one or more conference participants represented by the user interface tile are active speakers at a given time. The view style for the conferencing software user interface, which may be uniformly configured for all conference participants by a host of the subject conference or which may be individually configured by each conference participant, may be one of a gallery view in which all user interface tiles are similarly or identically sized and arranged in a generally grid layout or a speaker view in which one or more user interface tiles for active speakers are enlarged and arranged in a center position of the conferencing software user interface while the user interface tiles for other conference participants are reduced in size and arranged near an edge of the conferencing software user interface. In some cases, the view style or one or more other configurations related to the display of user interface tiles may be based on a type of video conference implemented using the conferencing software(e.g., a participant-to-participant video conference, a contact center engagement video conference, or an online learning video conference, as will be described below).

406 408 410 400 400 406 412 412 The content of the user interface tile associated with a given participant may be dependent upon the source of the input stream for that participant. For example, where a participant accesses the conferencing softwarefrom a client, such as the clientor, the user interface tile associated with that participant may include a video stream captured at the client and transmitted to the conferencing system, which is then transmitted from the conferencing systemto other clients for viewing by other participants (although the participant may optionally disable video features to suspend the video stream from being presented during some or all of the conference). In another example, where a participant access the conferencing softwarefrom a phone, such as the phone, the user interface tile for the participant may be limited to a static image showing text (e.g., a name, telephone number, or other identifier associated with the participant or the phone) or other default background aspect since there is no video stream presented for that participant.

402 408 410 400 114 404 406 406 408 410 406 1 FIG. The thread encoding toolreceives video streams separately from the clientsandand encodes those video streams using one or more transcoding tools, such as to produce variant streams at different resolutions. For example, a given video stream received from a client may be processed using multi-stream capabilities of the conferencing systemto result in multiple resolution versions of that video stream, including versions at 90p, 180p, 360p, 720p, and/or 1080p, amongst others. The video streams may be received from the clients over a network, for example, the networkshown in, or by a direct wired connection, such as using a universal serial bus (USB) connection or like coupling aspect. After the video streams are encoded, the switching/routing tooldirect the encoded streams through applicable network infrastructure and/or other hardware to deliver the encoded streams to the conferencing software. The conferencing softwaretransmits the encoded video streams to each connected client, such as the clientsand, which receive and decode the encoded video streams to output the video content thereof for display by video output components of the clients, such as within respective user interface tiles of a user interface of the conferencing software.

412 412 412 414 400 414 100 106 112 414 412 404 406 406 412 414 412 1 FIG. A user of the phoneparticipates in a conference using an audio-only connection and may be referred to an audio-only caller. To participate in the conference from the phone, an audio signal from the phoneis received and processed at a VOIP gatewayto prepare a digital telephony signal for processing at the conferencing system. The VOIP gatewaymay be part of the system, for example, implemented at or in connection with a server of the datacenter, such as the telephony servershown in. Alternatively, the VOIP gatewaymay be located on the user-side, such as in a same location as the phone. The digital telephony signal is a packet switched signal transmitted to the switching/routing toolfor delivery to the conferencing software. The conferencing softwareoutputs an audio signal representing a combined audio capture for each participant of the conference for output by an audio output component of the phone. In some implementations, the VOIP gatewaymay be omitted, for example, where the phoneis a VOIP-enabled phone.

406 A conference implemented using the conferencing softwaremay be referred to as a video conference in which video streaming is enabled for the conference participants thereof. The enabling of video streaming for a conference participant of a video conference does not require that the conference participant activate or otherwise use video functionality for participating in the video conference. For example, a conference may still be a video conference where none of the participants joining using clients turns on their video stream for any portion of the conference. In some cases, however, the conference may have video disabled, such as where each participant connects to the conference using a phone rather than a client, or where a host of the conference selectively configures the conference to exclude video functionality.

5 FIG. 2 FIG. 4 FIG. 500 502 500 502 504 500 502 200 504 400 500 502 504 is a block diagram of an example of an AR system for incorporating AR in a video conference. The AR system includes or otherwise uses an offline preparation componentand an online processing component. Together, the offline preparation componentand the online processing componentprovide AR functionality to a conferencing systemfor incorporation in video conferences. Any one or more aspects of the offline preparation componentand/or the online processing componentmay be implemented on one or more computing devices (e.g., the computing deviceshown in). The conferencing systemmay be, be similar to, include, or be included in, the conferencing systemshown in. In some implementations, any two or more of the offline preparation component, the online processing component, and the conferencing systemmay be integrated with one another.

500 506 508 510 506 508 510 408 410 306 308 310 4 FIG. 3 FIG. As shown, the offline preparation componentmay include an imaging device, a 3D modeling component, and an AIGC component. The imaging device, the 3D modeling component, and/or the AIGC componentmay be implemented in a single device (e.g., a clientor a clientshown in, a client, a client, and/or a clientshown in) or in multiple devices.

506 506 506 The imaging devicemay be any type of imaging device configured to obtain imaging information associated with an interaction space and/or an object. The imaging devicemay be, be similar to, include, or be included in, a video camera installed in an interaction space (e.g., a conference room), a camera of a mobile device, and/or any other imaging device capable of obtaining imaging information of the interaction space. The imaging device may be a 3D imaging device, a 2D imaging device, a radio detection and ranging (radar) device, a light detection and ranging (LiDAR) device, and/or an ultrasound device, among other examples. In some implementations, the imaging devicemay represent more than one imaging device and/or one or more type of imaging device.

The imaging information may include interaction space image information indicative of a physical interaction space. Interaction space image information may include any information about a physical environment such as a conference room, an office, or any other type of room, which may be used to generate a virtual representation thereof. For example, interaction space image information may include video data, digital image data, radar data, LiDAR data, and/or any other data indicative of one or more features of a physical environment such as a conference room.

506 506 The imaging information may include object image information indicative of a physical object. The physical object may be located within the interaction space or in another location. For example, the object image information may be obtained via the same imaging devicefrom which the interaction space image information is obtained and/or may be obtained from an additional imaging device. Object image information may include any information about a physical object that may be used to generate a virtual representation thereof (referred to as a “virtual object”).

508 508 508 508 506 508 The 3D modeling componentmay include tools, such as programs, subprograms, functions, routines, subroutines, operations, executable instructions, and/or the like for generating 3D models. The 3D modeling componentmay be configured to generate virtual interaction space image information corresponding to a virtual representation of an interaction space. For example, the 3D modeling componentmay include one or more AI components configured to facilitate generation of the virtual interaction space information. The 3D modeling componentmay obtain, via the imaging device, interaction space image information indicative of a physical interaction space. The 3D modeling componentmay generate, based on the interaction space image information, virtual interaction space information corresponding to a virtual representation of the physical interaction space. The virtual representation may include a virtual 3D structure corresponding to the real physical 3D structure of the physical interaction space.

508 508 506 The 3D modeling componentmay be configured to generate virtual object information for presenting a virtual object. In some implementations, the virtual object information may be a virtual representation of a physical object. For example, the 3D modeling componentmay obtain object image information via the imaging deviceand may generate the virtual object information based on the object image information. In some implementations, the virtual object information may be obtained via a content service such as a stock image distribution service, an internet search engine, and/or any other source of content.

510 510 510 510 510 In some implementations, the virtual object information may be obtained via the AIGC component. The AIGC componentmay employ generative AI to generate the virtual object information. Generative AI refers to a class of algorithms that can create new content by learning patterns from existing data. These algorithms utilize deep learning techniques, such as neural networks, to generate outputs that can mimic the style, structure, and characteristics of the input data. Generative AI can produce a wide array of content types, including text, images, audio, and 3D models. Generative AI may be employed by the AIGC componentto generate virtual object information corresponding to 3D virtual objects that may be used for AR applications within a video conference environment. The AIGC componentmay leverage generative AI models to analyze and interpret input data from various sources, including user interactions and environmental contexts, to create virtual objects that seamlessly integrate with the real-world environment displayed during the video conference. These virtual objects can be dynamically generated and manipulated in real-time, enhancing the interactive experience and providing participants with novel ways to communicate and collaborate. The AIGC componentmay utilize advanced neural network architectures to ensure that the generated objects exhibit high fidelity and are contextually relevant, thereby improving the overall effectiveness and engagement of AR in video conferencing scenarios.

508 506 506 506 506 506 508 In some implementations, the 3D modeling componentmay obtain a set of imaging device parameters associated with the imaging deviceand may use the set of imaging device parameters to facilitate generation of the virtual room information and/or the virtual object information. For example, the set of imaging device parameters may include any one or more parameters associated with the operation of the imaging device, the location of the imaging device, the capabilities of the imaging device, and/or the position of the imaging device, among other examples. The imaging device parameters may include, for example, an imaging device position, an imaging device field of view, an imaging device zoom level, an imaging device location (e.g., relative to one or more other aspects of an interaction space and/or a geographic location), an imaging device exposure value, and/or any number of other parameters. The 3D modeling componentmay use any number of different types of imaging algorithms, which may include AI techniques, to facilitate generating the virtual interaction space information and/or the virtual object information.

508 508 In some implementations, the 3D modeling componentmay use the set of imaging device parameters to facilitate establishing a virtual location of the virtual object within a virtual representation (referred to as a “virtual interaction space”) of a physical interaction space. The 3D modeling componentmay provide for output rendering information configured to cause a computing device to present, via a display device, a virtual representation of an interaction space and a virtual object. The rendering information may be configured to cause the computing device to display the virtual object based on the virtual 3D structure. The rendering information may be configured to cause the computing device to display the virtual object in a manner that makes the virtual object appear to interact with a virtual 3D structure of the virtual interaction space. For example, the virtual object may appear to sit on a table or a chair in the virtual interaction space.

500 502 502 502 512 514 516 518 512 514 516 518 408 410 306 308 310 512 514 516 518 4 FIG. 3 FIG. During a video conference, the offline preparation componentmay provide, to the online processing component, the virtual interaction space information, the virtual object information, and/or the rendering information. In some implementations, the online processing componentmay be configured to generate the rendering information based on the virtual interaction space information and/or the virtual object information. As shown, the online processing componentmay include a user rendering component, a virtual object rendering component, a user interaction component, and a composite rendering component. The user rendering component, the virtual object rendering component, the user interaction component, and/or the composite rendering componentmay be implemented in a single device (e.g., the clientor the clientshown in, client, the client, and/or the clientshown in) or in multiple devices. The user rendering component, the virtual object rendering component, the user interaction component, and/or the composite rendering componentmay include tools, such as programs, subprograms, functions, routines, subroutines, operations, executable instructions, and/or the like.

512 512 512 512 The user rendering componentmay include hardware and/or software configured to generate user rendering information to facilitate rendering a virtual representation of a participant in a video conference. The user rendering componentmay be configured to obtain video data from a video camera at the participant's location and may render the video data within a video conference user interface. The user rendering componentmay be configured to obtain participant tracking information associated with a participant of the video conferencing platform. For example, the user rendering componentmay be configured to obtain video data via an imaging device associated with the participant. The imaging device may be a conference room camera, a laptop camera, a mobile device camera, and/or any number of other types of imaging devices.

512 512 512 512 518 The user rendering componentmay include software for generating a human analysis AI model based at least in part on the video data. For example, the user rendering componentmay perform modeling (e.g., visual modeling, motion modeling, audio modeling, etc.) and/or motion tracking operations associated with perceptible qualities associated with a participant. This feature may require authorization of the participant, an account administrator, and/or another user prior to use. In some implementations, the user rendering componentmay obtain video data associated with a participant and generate, based at least in part on the video data, user rendering information for rendering a 3D virtual participant depiction associated with a participant. The user rendering componentmay provide the generated user rendering information to the composite rendering component.

514 514 514 514 508 514 510 514 518 The virtual object rendering componentis configured to generate data representations of virtual objects. For example, the virtual object rendering componentmay be configured to obtain virtual object information and to generate virtual object rendering information based on the virtual object information. The virtual object rendering componentprocesses input parameters related to various virtual objects and produces corresponding rendering information. The generated virtual object information encompasses details such as position, orientation, scale, and texture attributes of the virtual objects. In some implementations, the virtual object rendering componentmay be configured to obtain virtual object information from the 3D modeling component. In some implementations, the virtual object rendering componentmay be configured to obtain virtual object information from the AIGC component. In some implementations, the virtual object rendering componentmay be configured to obtain virtual object information from a content service such as a stock image distribution service, an internet search engine, and/or any other source of content. Once created, this information is transferred to the composite rendering component, where it is integrated with additional visual data to produce a final rendered scene.

516 516 The user interaction componentmay include hardware and/or software configured to generate user interaction information to facilitate participant interaction with one or more features of a video conference. Overall, the user interaction componentmay enhance the interactive experience within virtual interaction spaces by enabling intuitive and responsive manipulation of virtual objects, thereby fostering a more engaging and immersive AR conferencing experience.

516 516 516 516 For example, the user interaction componentmay be configured to generate user interaction information to facilitate manipulation, by a participant, of virtual objects in a video conference. The user interaction componentmay include hardware and software elements designed to detect, interpret, and respond to user inputs, thereby enabling dynamic manipulation and engagement with virtual objects during a video conference. The user interaction componentmay include gesture recognition capabilities which detect and interpret participant gestures using data obtained from an imaging device. These gestures can be used to select, move, rotate, or trigger effects on virtual objects presented within the virtual representation of the physical conference room. For example, a participant may employ specific hand signals to reposition a virtual object, change its orientation, or initiate an animation sequence associated with the object. The user interaction componentmay employ AI techniques, incorporating machine learning models to enhance the accuracy and responsiveness of gesture recognition. The AI techniques may include real-time processing of video data to track hand movements, identify key gesture patterns, and translate these patterns into corresponding virtual object manipulations. In some implementations, the AI models may be trained using a dataset of common gestures to ensure robust interaction capabilities.

516 In addition to gesture recognition, the user interaction componentmay support other forms of user input such as touch-based interactions, voice commands, and click events. For instance, a touch interface could allow participants to interact with virtual objects via swiping, pinching, or tapping gestures on a touchscreen device. Similarly, voice recognition technology can enable participants to issue verbal commands to manipulate virtual objects, enhancing the accessibility and ease of use of the system.

516 516 The user interaction componentmay be configured to manage and synchronize interactions among multiple participants. For example, the user interaction componentmay include a coordination mechanism to handle input from different participants, ensuring that changes made by one participant are accurately reflected in the virtual representation for all participants. The coordination mechanism may also implement rules to manage concurrent interactions, such as prioritizing inputs based on predefined criteria or user roles.

516 518 516 518 The user interaction componentmay interface with the composite rendering componentto ensure that user interactions are seamlessly integrated into the rendered virtual environment. When a user interaction, such as a detected gesture or voice command, is identified and processed, the user interaction componentmay generate corresponding interaction data. This data may then be provided to the composite rendering component, which may update the visual presentation of the virtual conference environment accordingly.

516 516 516 516 The user interaction componentmay be configured to generate user interaction information to facilitate adjustment, by a participant, of one or more display parameters, communication parameters, and/or any other adjustable feature of a video conference. The user interaction componentmay be configured to obtain user interaction information associated with a participant of the video conferencing platform. In another example, the user interaction componentmay be configured to generate user interaction information to facilitate rendering a virtual representation of a participant in a video conference. The virtual representation of the participant may be a video image of the participant or a 3D virtual participant depiction of the participant. In some implementations, the user interaction componentmay obtain video data associated with a participant and generate, based at least in part on the video data, user rendering information for rendering a 3D virtual participant depiction associated with the participant.

516 516 516 516 518 To generate user interaction information, the user interaction componentmay be configured to obtain video data via an imaging device associated with the participant. The imaging device may be a conference room camera, a laptop camera, a mobile device camera, and/or any number of other types of imaging devices. The user interaction componentmay include software for generating an AI model based at least in part on the video data. The AI model may be user interaction information and/or may be used to generate user interaction information. For example, the user interaction componentmay perform modeling (e.g., visual modeling, motion modeling, audio modeling, etc.) and/or motion tracking operations associated with perceptible qualities associated with a participant. This feature may require authorization of the participant, an account administrator, and/or another user prior to use. The user interaction componentmay provide the generated user interaction information to the composite rendering component.

518 518 512 514 516 518 The composite rendering componentmay include hardware and/or software configured to generate composite rendering information to facilitate rendering any number of different aspects of a video conference. The composite rendering componentmay be configured to obtain user rendering information from the user rendering component, virtual object rendering information from the virtual object rendering component, and user interaction information from the user interaction component. The composite rendering componentmay be configured to generate composite rendering information based on the user rendering information, the virtual object rendering information, and/or the user interaction information. The composite rendering information may be configured to cause a computing device to present, via a display device, a virtual representation of a 3D interaction space and a virtual object. The rendering information may be configured to cause the computing device to display the virtual object based on the virtual 3D structure. The rendering information may be configured to cause the computing device to display the virtual object in a manner that makes the virtual object appear to interact with a virtual 3D structure of the virtual interaction space.

518 518 518 The composite rendering componentmay include hardware and/or software elements configured to manage and process diverse rendering inputs. This processing may enable the accurate integration of participant representations, virtual objects, and real-time interactions within a virtual 3D structure that mirrors the 3D interaction space. For example, the user rendering information may include video streams and 3D virtual participant depiction information generated based on modeling and/or motion tracking operations associated with perceptible qualities associated with a participant, while virtual object rendering information may include details such as the position, orientation, texture, and animation of virtual objects. To seamlessly integrate these disparate data sources, the composite rendering componentmay implement rendering algorithms that ensure that the spatial relationships between virtual objects and the virtual representation of 3D interaction space are accurately maintained. Additionally, the composite rendering componentmay support real-time adjustments, allowing for smooth transitions and interactions as participants manipulate virtual objects through gestures, clicks, or voice commands.

518 516 The composite rendering componentinterfaces with the user interaction componentto apply interaction data, such as gesture indications or voice command inputs, to the rendering process. This interaction data triggers corresponding visual updates, ensuring that changes made by participants are promptly reflected in the virtual conference environment. For instance, when a participant performs a gesture to rotate a virtual object, the composite rendering component updates the object's orientation in real-time, providing an immediate visual response.

518 518 In some implementations, the composite rendering componentmay include synchronization mechanisms that coordinate rendering activities across multiple user devices. The synchronization mechanisms may facilitate ensuring consistency in the presentation of the virtual environment for all conference participants, regardless of the device or platform they are using. By maintaining synchronized states of virtual objects and representations of participants (e.g., 3D virtual participant depictions), the composite rendering componentmay enhance the collaborative experience, enabling participants to interact with each other and the virtual environment as though they were physically co-located.

504 500 502 500 502 504 The conferencing systeminterfaces with the offline preparation componentand the online processing componentto facilitate the production of an immersive and interactive 3D conferencing experience incorporating AR features. The offline preparation componentcaptures and processes data from the physical environment, generating the foundational elements required for the 3D experience. Subsequently, the online processing componentmanages real-time data transmission and rendering, allowing dynamic interaction within the 3D space. The conferencing systemorchestrates the synchronization and integration of these components, ensuring a seamless transition from offline environment capture to online interactive rendering, thereby producing a cohesive and engaging 3D conferencing experience.

6 FIG. 5 FIG. 600 600 500 600 is a schematic block diagram illustrating an example of an offline preparation process for implementing AR in connection with a conferencing system. The process may be performed by an offline preparation component. The offline preparation componentmay be, be similar to, include, or be included in, the offline preparation componentshown in. The offline preparation componentmay be configured to generate a virtual representation of a 3D interaction space that forms the basis of an AR video conferencing experience.

600 602 602 600 600 604 600 606 606 600 608 The offline preparation componentmay receive video datacaptured by a conferencing camera. The conferencing camera may be any type of video camera suitable for capturing video footage of a 3D interaction space (e.g., a conference room or an office), such as fixed-position cameras installed in the conference room or mobile cameras, including cell phone cameras. The captured video dataserves as input for the offline preparation component. The offline preparation componentmay receive video dataassociated with a conference room (or other 3D interaction space), capturing the overall environment and spatial dynamics of the room. The offline preparation componentmay receive video dataassociated with one or more objects. The video datamay include information associated with specific objects within or outside the conference room that are to be integrated into the virtual interaction space. Additionally, the offline preparation componentmay receive AIGC, which may include AI-generated virtual objects. The virtual objects may be generated using generative AI techniques, creating new virtual objects based on learned patterns from existing data.

602 604 606 608 610 610 610 The video data, the video data, the video data, and/or the AIGCmay be processed by a 3D analysis AI model. The 3D analysis AI modelmay perform any number of functions, including the identification and management of objects within the scene. For instance, it may recognize extraneous items, such as a misplaced chair, and exclude them from the virtual interaction space. The 3D analysis AI modelmay analyze spatial relationships and dimensions, forming a detailed understanding of the structure of the interaction space and the locations of objects therein.

600 612 600 602 604 606 608 612 612 600 The offline preparation componentmay include a set of camera calibration toolsthat may be hardware and/or software employed to perform camera calibration, which may be used to facilitate accurate 3D modeling. For example, the offline preparation componentmay obtain intrinsic parameters like focal length and lens distortion, and extrinsic parameters like the camera's position and orientation relative to the conference room. The video data, the video data, the video data, and/or the AIGCmay be processed by the camera calibration toolsand/or may be used for calibration by the camera calibration tools. By calibrating the camera, the offline preparation componentmay ensure that virtual objects will align correctly with the real-world environment.

602 604 606 608 614 614 602 604 606 620 622 The video data, the video data, the video data, and/or the AIGCmay be processed by a 3D reconstruction AI model. The 3D reconstruction AI modelmay use the analyzed video data,, and/orto reconstruct a detailed 3D modelof the interaction space and a detailed 3D modelof the virtual objects.

602 604 606 608 616 600 The video data, the video data, the video data, and/or the AIGCmay be processed using one or more semantic segmentation and object detection processes, which may categorize different features within the scene. For instance, the offline preparation componentmay detect tables, chairs, and screens and segment these objects for accurate representation within the virtual environment and/or removal from the virtual environment.

618 612 618 A set of conferencing camera parametersmay be derived from the camera calibration tools. The conferencing camera parametersparameters may be used to ensure that all virtual objects and interaction space dimensions are aligned correctly with the actual physical setup. The calibration parameters include details about the camera's intrinsic attributes, such as aperture and focal depth, as well as extrinsic attributes like positioning and angle relative to the interaction space.

620 622 620 602 604 622 The room and objects identified through the 3D analysis and semantic segmentation may be reconstructed into a 3D modelof the interaction space and a 3D modelof one of one or more objects. The 3D modelof the interaction space includes spatial and structural details captured from the video dataand/or the video data. The 3D modelmay be used for replicating the specific characteristics and dimensions of individual objects that may be interacted with during the video conference.

600 624 624 624 As shown, the offline preparation componentgenerates virtual rendering information. This information includes data that is used for rendering virtual objects accurately within the virtual interaction space during an online conference. The virtual rendering informationmay be configured to ensure that virtual objects are projected correctly with respect to geometric and visual perspectives, providing a seamless and realistic AR experience. For example, the virtual rendering informationmay include shadow rendering, ensuring that virtual objects cast realistic shadows in the virtual interaction space, and occlusion processing, enabling virtual objects to appear in front of or behind physical objects based on their positions.

7 FIG. 5 FIG. 502 is a schematic block diagram illustrating an example of an online process for implementing AR in connection with a conferencing system. The online process may be implemented using an online processing component such as, for example, the online processing componentshown in.

700 702 702 704 The remote client framecaptures the user data which is then processed by human analysis AI model. The human analysis AI modelmay generate user interaction information, which may include, for example, expression vectors, action vectors, and/or 3D virtual participant depiction information, which may be utilized for creating and animating a 3D virtual participant depiction associated with a participant.

706 708 710 The conference room camera framegathers contextual data from the interaction space. This data may be analyzed by a lightweight 3D analysis AI model, which may be used to facilitate an object detection and/or occlusion computation. The results of this analysis may be used for accurately integrating virtual elements into the real-world context captured by the conference room camera.

712 706 714 In some implementations, a frame matting componentmay receive inputs from the conference room camera frameand may use frame matting techniques to distinguish between foreground and background elements. This processed information may be fed into a real-time rendering process, which may consolidate various inputs to generate the AR environment.

716 718 720 720 722 714 724 An AI-based interaction toolboxmay permit participants to interact with the virtual elements using hand gestures, enhancing interactivity. A conferencing host framemay be used for managing the overall session, while virtual rendering informationprovides data for creating virtual elements. Rendering informationand virtual rendering informationmay be used in conjunction with the real-time rendering processto produce the final visual output. This information may facilitate maintaining high-quality visual fidelity. A photorealistic rendering componentmay employ advanced rendering techniques to enhance the realism of virtual elements, ensuring they seamlessly blend with the real-world background.

8 8 FIGS.A-C 800 800 800 802 802 802 804 806 808 804 806 808 802 804 806 808 800 are diagrams illustrating an example of a virtual interaction space. The virtual interaction spacemay be a representation of a 3D interaction space such as, for example, a conference room. The virtual interaction spaceincludes a virtual tableat the center of the scene. The virtual tablemay be, for example, a rendered video image of a table in the corresponding 3D interaction space. Arranged around the virtual tableare virtual chairs,, and. The virtual chairs,, andmay be rendered video images of chairs located in the 3D interaction space. In some implementations, the virtual table, the virtual chair, the virtual chair, and/or the virtual chairmay be virtual objects displayed in the virtual interaction spacebut not appearing in the corresponding 3D interaction space.

8 FIG.A 8 FIG.A 8 FIG.A 810 808 812 806 810 814 802 800 816 802 In, a video representationof a participant is shown as being seated in the virtual chair. A 3D virtual participant depictionis shown as being seated in the virtual chair. In, the video representationof the participant is depicted interacting with some virtual objectson the virtual table, indicating engagement within the virtual interaction space. This interaction underlines the collaborative and social aspects of the virtual environment.also shows a virtual object(in this case, a virtual cat) sitting on the virtual table.

8 FIG.B 800 816 804 816 816 shows another perspective of the virtual interaction space, in which the virtual objecthas been relocated to the virtual chair. The relocation of the virtual objectmay be performed, for example, in response to receiving a user input (e.g., a gesture) that is interpreted as an instruction to relocate the virtual object.

8 FIG.C 800 818 804 816 802 820 800 820 822 822 822 822 822 822 820 introduces more elements to the virtual interaction space. For example, an additional 3D virtual participant depictionhas been placed in the scene and is shown as being seated in the virtual chair. The virtual objecthas been relocated back to the virtual table. A wall-mounted virtual white boardis shown as being attached to a wall of the virtual interaction space. The virtual white boardmay be interacted with by users (e.g., participants in the video conference). For example, participants may be able to interact with contentshown as being written on the virtual white board. The user interaction with the contentmay include adding to the content, selecting the content, removing the content, editing the content, and/or the like. The inclusion of the virtual white boardalso may facilitate an enhanced interactive capability beyond typical real-time communication, extending to collaborative decision-making and idea sharing.

9 FIG. 1 8 FIGS.-C 5 FIG. 5 FIG. 5 FIG. 4 FIG. 4 FIG. 3 FIG. 3 FIG. 3 FIG. 3 FIG. 2 FIG. 1 FIG. 900 900 900 900 900 500 502 504 400 408 410 300 306 308 310 200 100 To further describe some implementations in greater detail, reference is next made to examples of techniques which may be performed by or using a system for implementing AR in connection with a video conferencing platform.is a flowchart of an example of a techniquefor implementing AR in connection with a video conferencing platform. The techniquecan be executed using computing devices, such as the systems, hardware, and software described with respect to. The techniquecan be performed, for example, by executing a machine-readable program or other computer-executable instructions, such as routines, instructions, programs, or other code. The steps, or operations, of the technique, or another technique, method, process, or algorithm described in connection with the implementations disclosed herein can be implemented directly in hardware, firmware, software executed by hardware, circuitry, or a combination thereof. For example, one or more aspects of the techniquemay be performed by the offline preparation componentshown in, the online processing componentshown in, the conferencing systemshown in, the conferencing systemshown in, the clientand/or the clientshown in, the software platformshown in, the clientshown in, the clientshown in, the clientshown in, the computing deviceshown in, and/or one or more aspects of the systemshown in, among other examples.

900 900 For simplicity of explanation, the techniqueis depicted and described herein as a series of steps or operations. However, the steps or operations of the techniquein accordance with this disclosure can occur in various orders and/or concurrently. Additionally, other steps or operations not presented and described herein may be used. Furthermore, not all illustrated steps or operations may be required to implement a technique in accordance with the disclosed subject matter.

902 The stepincludes obtaining interaction space image information indicative of an interaction space. The interaction space image information may be obtained via an imaging device.

904 The stepincludes generating virtual interaction space information corresponding to a virtual interaction space associated with a video conference, the virtual interaction space comprising a virtual 3D structure representing the interaction space.

906 The stepincludes obtaining virtual object information for displaying a virtual object within the virtual interaction space. In some implementations, obtaining the virtual object information comprises obtaining the virtual object information via at least one of the imaging device or an additional imaging device, the virtual object comprising a virtual representation of a physical object. In some implementations, obtaining the virtual object information comprises obtaining the virtual object information via a content service. In some implementations, obtaining the virtual object information comprises obtaining the virtual object information via an artificial intelligence content generation system.

908 The stepincludes providing, for output and based on the virtual interaction space information and the virtual object information, rendering information configured to cause a computing device to present, via a display device, the virtual object within the virtual interaction space.

900 900 900 In some implementations, the techniquefurther includes obtaining a gesture indication indicative of detection, in user image information associated with the video conference, of a hand gesture made by a participant; and providing, for output and based on the gesture indication, additional rendering information configured to cause the computing device to present, via the display device, a modification of the virtual object. In some implementations, the techniquefurther includes obtaining an indication of a request, from a participant, to display a 3D virtual participant depiction within the virtual interaction space; and providing, for output and based on the indication, additional rendering information (e.g., user rendering information) configured to cause the computing device to present, via the display device, the 3D virtual participant depiction within the virtual interaction space, wherein the 3D virtual participant depiction comprises a 3D representation of the participant. In some implementations, the techniquefurther includes obtaining an indication of a request, from a participant, to display a 3D virtual participant depiction within the virtual interaction space; and providing, for output and based on the indication, additional rendering information (e.g., user rendering information) configured to cause the computing device to present, via the display device, the 3D virtual participant depiction in a virtual location within the virtual representation of the interaction space based on a location of the participant. In some implementations, providing the additional rendering information may include providing the additional rendering information configured to cause the computing device to present, via the display device, the 3D virtual participant depiction in a virtual pose within the virtual interaction space based on a pose of the participant.

900 In some implementations, the techniquefurther includes obtaining an indication of a request, from a participant, to display a 3D virtual participant depiction within the virtual interaction space; and providing, for output and based on the indication, additional rendering information (e.g., user rendering information) configured to cause the computing device to present, via the display device, the 3D virtual participant depiction within the virtual interaction space, wherein the 3D virtual participant depiction comprises a 3D representation of the participant, wherein providing the additional rendering information comprises providing the additional rendering information configured to cause the computing device to present a virtual action performed by the 3D virtual participant depiction within the virtual interaction space based on an action of the participant.

900 900 In some implementations, the techniquefurther includes providing for output additional rendering information configured to cause the computing device to present an additional virtual object within the virtual interaction space, where the additional virtual object is configured to occlude at least a portion of the virtual object. In some implementations, the techniquefurther includes additional rendering information configured to cause the computing device to present, via the display device, the additional virtual object within the virtual interaction space, where the additional virtual object is configured to interact with the virtual object.

Some implementations include a method, comprising: obtaining, via an imaging device, interaction space image information indicative of an interaction space; generating, based on the interaction space image information, virtual interaction space information corresponding to a virtual interaction space associated with a video conference, the virtual interaction space comprising a virtual 3D structure representing the interaction space; obtaining virtual object information for displaying a virtual object within the virtual interaction space; and providing, for output and based on the virtual interaction space information and the virtual object information, rendering information configured to cause a computing device to present, via a display device, the virtual object within the virtual interaction space.

In some implementations, obtaining the virtual object information comprises obtaining the virtual object information via at least one of the imaging device or an additional imaging device, the virtual object comprising a virtual representation of a physical object.

In some implementations, obtaining the virtual object information comprises obtaining the virtual object information via a content service.

In some implementations, obtaining the virtual object information comprises obtaining the virtual object information via an artificial intelligence content generation system.

In some implementations, the method further comprises: obtaining a gesture indication indicative of detection, in user image information associated with the video conference, of a hand gesture made by a participant; and providing, for output and based on the gesture indication, additional rendering information configured to cause the computing device to present, via the display device, a modification of the virtual object.

In some implementations, the method further comprises: obtaining an indication of a request, from a participant, to display a 3D virtual participant depiction within the virtual interaction space; and providing, for output and based on the indication, additional rendering information configured to cause the computing device to present, via the display device, the 3D virtual participant depiction within the virtual interaction space, wherein the 3D virtual participant depiction comprises a 3D representation of the participant, wherein providing the additional rendering information comprises providing the additional rendering information configured to cause the computing device to present the 3D virtual participant depiction in a virtual location within the virtual interaction space based on a location of the participant.

In some implementations, the method further comprises: obtaining an indication of a request, from a participant, to display a 3D virtual participant depiction within the virtual interaction space; and providing, for output and based on the indication, additional rendering information configured to cause the computing device to present, via the display device, the 3D virtual participant depiction within the interaction space, wherein the 3D virtual participant depiction comprises a 3D representation of the participant, wherein providing the additional rendering information comprises providing the additional rendering information configured to cause the computing device to present the 3D virtual participant depiction as having a virtual facial expression based on a facial expression of the participant.

In some implementations, the method further comprises: obtaining an indication of a request, from a participant, to display a 3D virtual participant depiction within the virtual interaction space; and providing, for output and based on the indication, additional rendering information configured to cause the computing device to present, via the display device, the 3D virtual participant depiction within the virtual interaction space, wherein the 3D virtual participant depiction comprises a 3D representation of the participant, wherein providing the additional rendering information comprises providing the additional rendering information configured to cause the computing device to present a virtual action performed by the 3D virtual participant depiction based on an action performed by the participant.

Some implementations include a non-transitory computer-readable medium storing instructions operable to cause one or more processors to perform operations comprising: obtaining, via an imaging device, interaction space image information indicative of an interaction space; generating, based on the interaction space image information, virtual interaction space information corresponding to a virtual interaction space associated with a video conference, the virtual interaction space comprising a virtual 3D structure representing the interaction space; obtaining virtual object information for displaying a virtual object within the virtual interaction space; and providing, for output and based on the virtual interaction space information and the virtual object information, rendering information configured to cause a computing device to present, via a display device, the virtual object within the virtual interaction space.

In some implementations, the virtual object comprises a virtual representation of a white board.

In some implementations, the virtual object comprises a virtual representation of a white board, and wherein the virtual representation of the white board is configured to be manipulated by a user via an input device of the computing device.

In some implementations, the virtual object comprises a virtual representation of a physical object.

In some implementations, the virtual object information comprises content provided by a content service.

In some implementations, the virtual object information comprises artificial intelligence generated content.

In some implementations, the operations further comprise: obtaining a gesture indication indicative of detection, in user image information associated with the video conference, of a hand gesture made by a participant; and providing, for output and based on the gesture indication, additional rendering information configured to cause the computing device to present, via the display device, a modification of a virtual position of the virtual object.

Some implementations include a system, comprising: one or more memories; and one or more processors configured to execute instructions stored in the one or more memories to: obtain, via an imaging device, interaction space image information indicative of an interaction space; generate, based on the interaction space image information, virtual interaction space information corresponding to a virtual interaction space associated with a video conference, the virtual interaction space comprising a virtual 3D structure representing the interaction space; obtain virtual object information for displaying a virtual object within the virtual interaction space; and provide for output rendering information configured to cause a computing device to present, via a display device and based on the virtual 3D structure, the virtual object within the virtual interaction space.

In some implementations, the one or more processors are further configured to execute the instructions stored in the one or more memories to: obtain an indication of a request, from a participant, to display a 3D virtual participant depiction within the virtual interaction space; and provide, for output and based on the indication, additional rendering information configured to cause the computing device to present, via the display device, the 3D virtual participant depiction within the virtual interaction space, wherein the 3D virtual participant depiction comprises a 3D representation of the participant.

In some implementations, the one or more processors are further configured to provide for output additional rendering information configured to cause the computing device to present an additional virtual object within the virtual interaction space, wherein the additional virtual object is configured to interact with the virtual object.

As used herein, unless explicitly stated otherwise, any term specified in the singular may include its plural version. For example, “a computer that stores data and runs software,” may include a single computer that stores data and runs software or two computers—a first computer that stores data and a second computer that runs software. Also “a computer that stores data and runs software,” may include multiple computers that together stored data and run software. At least one of the multiple computers stores data, and at least one of the multiple computers runs software.

As used herein, the term “computer-readable medium” encompasses one or more computer readable media. A computer-readable medium may include any storage unit (or multiple storage units) that store data or instructions that are readable by processing circuitry. A computer-readable medium may include, for example, at least one of a data repository, a data storage unit, a computer memory, a hard drive, a disk, or a random access memory. A computer-readable medium may include a single computer-readable medium or multiple computer-readable media. A computer-readable medium may be a transitory computer-readable medium or a non-transitory computer-readable medium.

As used herein, the term “memory subsystem” includes one or more memories, where each memory may be a computer-readable medium. A memory subsystem may encompass memory hardware units (e.g., a hard drive or a disk) that store data or instructions in software form. Alternatively, or in addition, the memory subsystem may include data or instructions that are hard-wired into processing circuitry.

As used herein, processing circuitry includes one or more processors. The one or more processors may be arranged in one or more processing units, for example, a central processing unit (CPU), a graphics processing unit (GPU), or a combination of at least one of a CPU or a GPU.

As used herein, the term “engine” may include software, hardware, or a combination of software and hardware. An engine may be implemented using software stored in the memory subsystem. Alternatively, an engine may be hard-wired into processing circuitry. In some cases, an engine includes a combination of software stored in the memory subsystem and hardware that is hard-wired into the processing circuitry.

The implementations of this disclosure can be described in terms of functional block components and various processing operations. Such functional block components can be realized by a number of hardware or software components that perform the specified functions. For example, the disclosed implementations can employ various integrated circuit components (e.g., memory elements, processing elements, logic elements, look-up tables, and the like), which can carry out a variety of functions under the control of one or more microprocessors or other control devices. Similarly, where the elements of the disclosed implementations are implemented using software programming or software elements, the systems and techniques can be implemented with a programming or scripting language, such as C, C++, Java, JavaScript, assembler, or the like, with the various algorithms being implemented with a combination of data structures, objects, processes, routines, or other programming elements.

Functional aspects can be implemented in algorithms that execute on one or more processors. Furthermore, the implementations of the systems and techniques disclosed herein could employ a number of conventional techniques for electronics configuration, signal processing or control, data processing, and the like. The words “mechanism” and “component” are used broadly and are not limited to mechanical or physical implementations, but can include software routines in conjunction with processors, etc. Likewise, the terms “system” or “tool” as used herein and in the figures, but in any event based on their context, may be understood as corresponding to a functional unit implemented using software, hardware (e.g., an integrated circuit, such as an ASIC), or a combination of software and hardware. In certain contexts, such systems or mechanisms may be understood to be a processor-implemented software system or processor-implemented software mechanism that is part of or callable by an executable program, which may itself be wholly or partly composed of such linked systems or mechanisms.

Implementations or portions of implementations of the above disclosure can take the form of a computer program product accessible from, for example, a computer-usable or computer-readable medium. A computer-usable or computer-readable medium can be a device that can, for example, tangibly contain, store, communicate, or transport a program or data structure for use by or in connection with a processor. The medium can be, for example, an electronic, magnetic, optical, electromagnetic, or semiconductor device.

Other suitable mediums are also available. Such computer-usable or computer-readable media can be referred to as non-transitory memory or media and can include volatile memory or non-volatile memory that can change over time. The quality of memory or media being non-transitory refers to such memory or media storing data for some period of time or otherwise based on device power or a device power cycle. A memory of an apparatus described herein, unless otherwise specified, does not have to be physically contained by the apparatus, but is one that can be accessed remotely by the apparatus, and does not have to be contiguous with other memory that might be physically contained by the apparatus.

While the disclosure has been described in connection with certain implementations, it is to be understood that the disclosure is not to be limited to the disclosed implementations but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the scope of the appended claims, which scope is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures as is permitted under the law.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06T G06T19/6 G06F G06F3/17 G06T15/0

Patent Metadata

Filing Date

August 26, 2024

Publication Date

February 26, 2026

Inventors

Juntao Feng

Wenchong Lin

Bo Ling

Chong Lv

Xingguo Zhu

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search