Provided are methods, systems, and computer storage media for determining a command (e.g., intent) of an image based on image data features. A task associated with the determined command is generated based on a portion of the image data features. Task entities corresponding to the task are determined. The task and the corresponding task entities are generated and configured for use in a computer productivity application. Accordingly, present embodiments provide an improved technique for generating command-specific tasks and task entities that may be integratable for use in a computer productivity application to enhance functionality of a computer productivity application and reduce computational resources utilized by manually creating these tasks and task entities.
Legal claims defining the scope of protection, as filed with the USPTO.
. At least one computer-storage media having computer-executable instructions embodied thereon that, as a result of being executed by a computing system having a processor and memory, cause the processor to:
. The computer-storage media of, wherein the position profile indicates a set of coordinates within the image associated with alphanumeric characters of the set of alphanumeric characters and the non-alphanumeric-character object.
. The computer-storage media of, wherein determining the command and the set of entities associated with the command further comprises determining a relationship between a first alphanumeric character of the set of alphanumeric characters and non-alphanumeric-character object by at least comparing a first coordinate of the set of coordinates corresponding to the first alphanumeric character and a second coordinate of the set of coordinates corresponding to the non-alphanumeric-character object.
. The computer-storage media of, wherein the relationship indicates a subset of alphanumeric characters of the set of alphanumeric characters, including the alphanumeric character, correspond to a first entity of the set of entities.
. The computer-storage media of, wherein the command is determined based on a set of words included in the set of alphanumeric characters.
. The computer-storage media of, wherein the command comprises at least one of: a recipe, a scheduled event, a list of items, and an action to be completed.
. The computer-storage media of, wherein the second output obtained from the second machine learning model includes image data features.
. The computer-storage media of, the command is determined based on the image data features.
. A system comprising:
. The system of, wherein the second output further comprises image data features extracted from the image, the image data features including visual features associated with the image and spatial features indicative of the relationships between the alphanumeric characters of the set of alphanumeric characters to the non-alphanumeric-character object.
. The system of, wherein the position profile further comprise a coordinate set corresponding to the alphanumeric characters relative to the non-alphanumeric-character object.
. The system of, wherein the processor further perform the operations comprising providing the command and the set of entities to an application.
. The system of, wherein providing the command and the set of entities to the application causes the application, without a user interaction, to generate a calendar event based on the command and the set of entities.
. The system of, wherein providing the command and the set of entities to the application causes the application generate a shopping list, where the set of entities correspond to items within the shopping list.
. A method, comprising:
. The method of, further comprising determining a task based on the command, the task executable by an application.
. The method of, wherein the task causes the application to generate at least one of: a calendar event, a to-do list, a shopping list, a reminder, a meeting invite.
. The method of, wherein determining the command and the set of entities associated with the command further comprises causing a third machine learning model to generate the command and the set of entities by at least providing the first output, the second output, and the position profile as an input.
. The method of, wherein the image comprises a frame of a digital video.
. The method of, wherein the image is captured by a camera of a user device and transmitted to a computer system responsible for extracting the text layout information and the text sequence from the image.
Complete technical specification and implementation details from the patent document.
This application is a continuation of U.S. patent application Ser. No. 17/566,995, filed Dec. 31, 2021, the entire contents of which is incorporated herein its entirety.
Computer-implemented technologies can assist users in employing various productivity tools. Example productivity tools include computer applications or services such as calendar applications, notification systems, reminders, task-managing services, shopping lists, scheduling tools, recipe organizers, and the like. Existing productivity tools are dependent upon receipt of user inputs and/or control in order to perform operations that assist the user. As an example, a user may want to create a calendar entry based on an image; for instance, the user may see a promotion for a music concert or other event and want to remember it so that they can buy tickets or attend it. Creating a calendar entry in the user's electronic calendar application typically requires a number of operations to be performed by the user on their user computing device. For instance, the operations may require a first input by the user to open the calendar application operating on their computing device, a second input indicative of creation of a new calendar entry, a third input indicative of text describing the new calendar entry name (for example, the music concert), a fourth input indicative of a start time, a fifth input indicative of an end time, a sixth input indicative of a description for the calendar entry, and so forth. Consequently due to the number of operations required, many users choose not to use the assistive technology and instead may snap a picture or screenshot of the image (for example, the concert promotion) in hopes to remember it or to create a calendar entry at a later time.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
The technologies described in this disclosure are directed toward computerized systems and methods for providing assistance to a user based on image data. For example, an intent or computer-implemented command (hereinafter “command”) may be inferred from a digital image received by a computer system, and based on the command and the image, a task and corresponding task entity (for example, a person, place, date, or other entity associated with the task) may be determined and utilized to assist the user. In particular, an aspect of the present disclosure may include receiving an image that includes alphanumeric characters and/or non- alphanumeric character objects. Image data may be extracted from the image, based on the alphanumeric characters and/or the non-alphanumeric character objects to determine at least one image data features. Based on at least a portion of the image data features, a command is determined. In some instances, a context associated with the image is further determined. A task that corresponds to the determined command or context then may be generated (or otherwise determined and provided) based on the command, a portion of the image data features, and/or a context associated with the image. In some implementations, the task may include at least one task entities that are determined, based on the task and/or the image data features.
In this manner, the various implementations described herein provide a personalized technique to computing systems employing computer productivity applications by providing computer-generated tasks and task entities based on a predicted command associated with an image. Whereas conventional approaches fail to determine and generate tasks based on a predicted command, and instead may require extensive user interaction and/or control to generate a task in order to assist the user, aspects of the present disclosure can determine the command and generate associated tasks and task entities based on an image, while reducing client-side interactions necessary to arrive at the intended task. Accordingly, present embodiments provide an improved technologies for generating command-specific tasks that are integral for use in computer productivity applications to enhance functionality of a computer productivity application and reduce computational resources utilized by manually creating these tasks and/or manually specifying task entities.
The subject matter of aspects of the present disclosure is described with specificity herein to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms “step” and/or “block” may be used herein to connote different elements of methods employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described. Each method described herein may comprise a computing process that may be performed using any combination of hardware, firmware, and/or software. For instance, various functions may be carried out by a processor executing instructions stored in memory. The methods may also be embodied as computer-usable instructions stored on computer storage media. The methods may be provided by a standalone application, a service or hosted service (standalone or in combination with another hosted service), or a plug-in to another product, to name a few.
Aspects of this disclosure are directed toward computer technologies for providing assistance to a user based on image information. For example and at a high level, an image, such as a photo or drawing created or provided by a user, may be received by a computer system and processed to determine ultimately a task to be performed by the computer system. The task may correspond to a command of the user, such as, from the earlier example of a promotion for a music concert, creating, via an electronic calendar application, a reminder to the user to purchase tickets or attend the concert. Accordingly, a command may be inferred from a digital image or image information received by a computer system, and based on the command and the image information, a task may be generated or determined. Further, in some instances, one or more corresponding task entities are also determined; for example, the date of the concert, the venue, the band(s) playing, or other entities associated with the task. The task then may be utilized by computer assistance technology to automatically assist the user, such as by performing the task or performing operations to facilitating performance of the task. Thus continuing the example of the concert, this may include, without limitation: providing a reminder to book tickets to the concert or automatically reserving tickets or purchasing the tickets; creating a calendar entry for the concert in an electronic calendar of the user; facilitating transportation to the concert (such as scheduling a ride-share app or taxi), or other operations associated with the task.
In another example, suppose the image received by the computer system is a shopping list (for instance, suppose a user snapped a photo via their mobile device of their shopping list that was written on a paper sticky note). By employing aspects of the present disclosure, the computer system may determine that the user is attempting to perform an action (for instance, generate a shopping list of items or ingredients for later retrieval to purchase) based on the image of the shopping list. Thus, the computer system may infer an intent or command indicative of generating a shopping list. Based on this command (or in some implementations based on image data features and/or an image context), a task corresponding to the inferred command may be generated (or otherwise determined). Continuing the example of the image of the shopping list, the computer system may generate a shopping list (which, in this example, corresponds to the task), which may comprise creating a new shopping list or updating an existing shopping list of the user to include the items on the received image of the shopping list.
As further described herein, some implementations of the present disclosure include determining a task entity based on a determined task, command, and/or one of more image data features, and generating a task that includes the task entity. Thus, continuing the example of the image of the shopping list, task entities may comprise, without limitation, items on the shopping list, quantities of items, and the name of store or the type of store. Based on determining the user's shopping list task and image data features, the computer system can determine, for example, specific items on the shopping list (examples of task entities). Thereafter, the computer system may generate a shopping list (in this example, the task) that includes the shopping items (in this example, task entities). The generated shopping list may be configured for consumption by a computer application such as a productivity tool, for instance, for use in a shopping application or to-do application. In one implementation, the generated shopping list task and items (task entities) can be utilized to automatically purchase the items for the user.
Accordingly, in one aspect and as further described herein, an image may be received by a computer system that includes alphanumeric characters and/or non-alphanumeric character objects. For example, the image may comprise a photo taken by a mobile device or other computing device, a screen capture created via a user device, a drawing created by a user, an image received via an electronic communication, such as email, SMS, chat application, or while browsing the Internet, such as an online advertisement. In some implementations, the image may comprise one or more frames from a video source, such as a digital video or computer animation. An alphanumeric character may include, by way of example, a number, letter, symbol, glyph, and/or any suitable character capable of communicating a message. A non-alphanumeric character object may include a shape, graphic element such as a line, photographic or image (or portion) of a person, object, location, and/or any other suitable object other than an alphanumeric character.
Image data may be extracted from the image, based on the alphanumeric characters and/or the non-alphanumeric character objects. The extracted image data may be processed to determine one or more image data features that correspond to a property or characteristic of the image or an aspect of the image. Thus the term “image data feature” may refer to a property or characteristic of the image or an aspect of the image. Image data features may be related or independent of one another, and/or may be recorded using scalar numeric values, binary values, non-binary values, and so forth. Image data features may include visual features associated with the image, as well as spatial features indicative of a position and relationship of the alphanumeric characters and/or the non-alphanumeric character objects with respect to each other. In one embodiment, the coordinates of one or more alphanumeric characters and one or more non- alphanumeric character object may be compared against each other to determine relationships between alphanumeric characters and/or the non-alphanumeric character objects. By way of example and without limitation, the image data features may include machine-encoded text information, size and/or position information, color information, relative location or proximity to other text or objects, and so forth, corresponding to the image and its corresponding alphanumeric characters and/or non-alphanumeric character objects.
Based on at least a portion of the image data features, a command is determined. In some instances, a context associated with the image is further determined. By way of example and without limitation, an image context may comprise information about the date and/or time the image was taken or created, location information in the image or available regarding the time the image was created or received, a computing application used to capture, create, or receive the image, metadata about the image such as filename, information from related images such as images created near the same time, location, similar images or images with similar subject matter or by the same user, or other contextual information associated with an image. Some implementations utilize command logic to determine the command, as described herein. A task that corresponds to the determined command or context then may be generated (or otherwise determined) based on: the command, a portion of the set of image data features, and/or the image context. In some implementations, the task may include one or more task entities that are determined, based on the task and/or the set of image data features. In some implementations, task logic may be utilized to determine the task and/or task entities, as described herein.
Accordingly, embodiments described herein provide improved technologies to computer systems for providing assistance or productivity services by, among other aspects, enabling a computer system to infer a command from an image and determine or generate a task corresponding to the command for use by a computer productivity application or service. In this way, embodiments provide new, enhanced functionality for these computer productivity applications or services and also reduce computational resources that would be required from manual operation or creation of these tasks, and/or manually specifying task entities.
As described previously, conventional computer productivity tools are dependent upon receipt of user inputs and/or control in order to perform operations that assist the user. For instance, the above example of the music concert required the user to perform a number of operations via an electronic calendar application. Consequently due to the number of operations required and sometimes further due to familiarity the user needs to have with operating a particular computer productively application or service, many users choose not to use these assistive technologies and instead resort to simpler operations such as taking a picture or screenshot, or jotting a note or a sketch on paper, in hopes to remember it and their purpose behind creating the image at a later time.
Further, such involved processes by the user also require the computing device to be with the user and may further require a computing device to remain in active operation during the performance by the user of the various operations. Remaining in active operation during this process may drain the computing resource's battery and reduce the overall productivity the computing device is able to provide. Further, these productivity computing applications and services are often configured to only accept certain types of data as input, such as typed text or a static image (for example, a picture of an item for a shopping list), and thus require the user to further input a description in order to capture the command or determine a task associated with the data provided by the user. Consequently, many users again opt out of using these conventional productivity tools in view of their various limitations.
More specifically, many users opt-in to more time-efficient alternatives to managing productivity in lieu of using certain conventional computer productivity tools. For example, a user may post physical sticky notes as reminders or for a shopping list. Similarly, a user may take a screenshot (for example, of a computer-generated image) or a picture (for example, snapped from the camera of the computing device) of the sticky notes, or of an event description to remember attending an event in lieu of entering the event into a calendar application. Additionally or alternatively, the user may take a screenshot or snap a picture of a recipe the user wants to save, an item the user intends to purchase, or any content the user otherwise intends to follow up on. The user may reference the screenshot or picture in the future to remind themselves of the event. However, this alternative picture-taking approach has shortfalls. First, the image remains unstructured data that is unable to be utilized by a computer productivity application to assist the user. Thus, the screenshot or picture may get lost in the user's picture library, or the user may altogether fail to remember to reference the screenshot or picture. Second, unlike the calendar, reminder, task manager, shopping list or other productivity applications, for example, the mere screenshot or picture fails to notify the user regarding the intention the user had when capturing the image, such as reminder the user about a calendar event (for example, via a computer-generated reminder) or to purchase the item in the picture (for example, via a computer shopping list application).
Although conventional productivity tools may be able to receive and store an image taken by the user, these technologies lack functionality to determine the command from the image in order to assist the user by carrying out operations to achieve the command. Rather these conventional technologies first require dedicated input and/or control operations to be performed by the user in order to guide the productivity tool to provide the assistance. For instance, in the above example of the image promoting the music concert, the user intended to remember the event and date so that the user can purchase tickets and attend. But a conventional electronic calendar application requires the user to perform numerous operations to remember the music concert and date, as described previously.
Similarly, although optical character recognition (OCR) could be applied to an image or screenshot or picture to extract text-specific information, conventional applications of OCR fail to account for many image data features and/or the context of the image data-that is, other data represented in the image such as the font size(s), font style(s), color(s), other image elements, such as borders, pictures/icons, and so forth, as well as the relative positions between the fonts and/or other images elements or image contextual information. Instead, conventional OCR technologies: (1) are limited to employing OCR for a particular task, such as extracting text; (2) fail to account for non-alphanumeric characters in an image; (3) and do not consider contextual image information for determining intent command (for example, a predicted action requested to be taken from the image).
With this in mind, some aspects of the disclosed subject matter are generally related to extracting useful information from an image (hereinafter referred to as “image data”) in order to determine a command and then subsequently to determine tasks-or task entities (that is, particular items of data that go with the task, such as the shopping list items and the store for a shopping list; in this example, the shopping items and the store are the task entities, and a shopping list of items to purchase is a task). Particular types of tasks may have different task entities. In some instances, the task entities can be extracted and/or determined from the image data and image context (e.g., time the image was taken, location, application used, and so forth) using task logic, as described herein.
As used herein, “command” or “intent” may be used interchangeably to refer to content being requested or an action to be completed. The command may be context-specific. Using command logic (such as the command logicof), command may be based on information derivable from alphanumeric characters (e.g., text associated with a natural language processing (NLP) transcription) or non-alphanumeric character objects. In some contexts, commands may refer to the intentions that can be determined from alphanumeric characters or non-alphanumeric character objects. For example, a list of words may include “1 cup of flour,” “½ teaspoon of baking soda,” “¼ teaspoon of salt,” and so forth. Additionally, the image may include a non-alphanumeric-character object, such as a gingerbread man. Based on this list of words and/or the object, a computing system may determine the command to be a recipe or a shopping list. “Task entity” may refer to data specific to the command and that correspond to a task. Taking generation of a shopping list as an example of a command, the shopping list items and the store may be the task entities for a task that is a shopping list. Similarly, taking generation of a recipe as an exemplary command, the largest sized text indicative of the title of the recipe, as well as the smaller text indicative of specific ingredients may correspond to the task entities for a task that is a recipe.
Turning now to, a block diagram is provided showing an example operating environmentin which some embodiments of the present disclosure may be employed. It should be understood that this and other arrangements described herein are set forth only as examples. Other arrangements and elements (e.g., machines, interfaces, functions, orders, and groupings of functions) can be used in addition to or instead of those shown, and some elements may be omitted altogether for the sake of clarity. Further, many of the elements described herein are functional entities that may be implemented as discrete or distributed components or in conjunction with other components, and in any suitable combination and location. Various functions described herein as being performed by one or more entities may be carried out by hardware, firmware, and/or software. For instance, some functions may be carried out by a processor or processing circuitry executing instructions stored in memory.
Among other components not shown, example operating environmentincludes a number of user devices, such as user devicesandthrough; a number of data sources, such as data sourcesandthrough; server; displaysandthrough; and network. It should be understood that environmentshown inis an example of one suitable operating environment. Each of the components shown inmay be implemented via any type of computing device, such as computing devicedescribed in connection to, for example. These components may communicate with each other via network, which may include, without limitation, one or more local area networks (LANs) and/or wide area networks (WANs). In exemplary implementations, networkcomprises the Internet and/or a cellular network, amongst any of a variety of possible public and/or private networks employing any suitable communication protocol.
It should be understood that any number of user devices, servers, and data sources may be employed within operating environmentwithin the scope of the present disclosure. Each may comprise a single device or multiple devices cooperating in a distributed environment. For instance, servermay be provided via multiple devices arranged in a distributed environment that collectively provide the functionality described herein. Additionally, other components not shown may also be included within the distributed environment.
User devicesandthroughcan be client devices on the client-side of operating environment, while servercan be on the server-side of operating environment. Servercan comprise server-side software designed to work in conjunction with client-side software on user devicesandthroughto implement any combination of the features and functionalities discussed in the present disclosure. This division of operating environmentis provided to illustrate one example of a suitable environment, and there is no requirement for each implementation that any combination of serverand user devicesandthroughremain as separate entities. The displaysandthroughmay be integrated into the user devicesandthrough. In one embodiment, the displaysandthroughare touchscreen displays.
User devicesandthroughmay comprise any type of computing device capable of use by a user. For example, in one embodiment, user devicesthroughmay be the type of computing devicedescribed in relation to. By way of example and not limitation, a user device may be embodied as a personal computer (PC), a laptop computer, a mobile device, a smartphone, a tablet computer, a smart watch, a wearable computer, a personal digital assistant (PDA), a music player or an MP3 player, a global positioning system (GPS) or device, a video player, a handheld communications device, a gaming device or system, an entertainment system, a vehicle computer system, an embedded system controller, a camera, a remote control, a bar code scanner, a computerized measuring device, an appliance, a consumer electronic device, a workstation, or any combination of these delineated devices, or any other suitable computer device.
Data sourcesandthroughmay comprise data sources and/or data systems, which are configured to make data available to any of the various constituents of operating environment, or systemdescribed in connection to. (For instance, in one embodiment, one or more data sourcesthroughprovide (or make available for accessing) the task and task entities generated by the command classification and task engineofand deployed by the task and entity deploying engineof.) Data sourcesandthroughmay be discrete from user devicesandthroughand server. Alternatively, the data sourcesthroughmay be incorporated and/or integrated into at least one of those components. In one embodiment, one or more of data sourcesthroughmay be integrated into, associated with, and/or accessible to one or more of the user device(s),, oror server. Examples of computations performed by severor user devices, and/or corresponding data made available by data sourcesa throughare described further in connection to systemof.
Operating environmentcan be utilized to implement one or more of the components of system, described in association with. Operating environmentalso can be utilized for implementing aspects of process flowsanddescribed in, respectively. Turning to, depicted is a block diagram illustrating an example systemin which some embodiments of this disclosure are employed. Systemrepresents only one example of a suitable computing system architecture. Other arrangements and elements can be used in addition to or instead of those shown, and some elements may be omitted altogether for the sake of clarity. Further, as with operating environment, many of the elements described herein are functional entities that may be implemented as discrete or distributed components or in conjunction with other components, and in any suitable combination and location.
Example systemincludes network, which is described in connection to, and which communicatively couples components of systemincluding command classification and task engine(which includes image collector, image partitioning engine, command classification engine, feature training module, entity determining engine, task generating engine), model generating engine(which includes model initializer, model trainer, model evaluator, and model deploying engine), and storage(which includes command logicand task logic), and task and entity deploying engine. The command classification and task engineand the model generating enginemay be embodied as a set of compiled computer instructions or functions, program modules, computer software services, or an arrangement of processes carried out on one or more computer systems, such as computing devicedescribed in connection to, for example.
In one embodiment, the functions performed by components of systemare associated with one or more applications, services, or routines. In one embodiment, certain applications, services, or routines may operate on one or more user devices (such as user device), servers (such as server), may be distributed across one or more user devices and servers, or may be implemented in a cloud-based system. Moreover, in some embodiments, these components of systemmay be distributed across a network, including one or more servers (such as server) and client devices (such as user device), in the cloud, or may reside on a user device (such as user device). Moreover, these components and/or functions performed by these components, or services carried out by these components may be implemented at appropriate abstraction layer(s) such as the operating system layer, application layer, hardware layer, and so forth, of the computing system(s). Alternatively, or in addition, the functionality of these components and/or the embodiments of the disclosure described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), and so forth. Additionally, although functionality is described herein with reference to specific components shown in example system, it is contemplated that in some embodiments functionality of these components can be shared or distributed across other components.
Continuing with, the command classification and task engineis generally responsible for determining the command associated with an image and generating a task that includes at least one task entity, as described herein. In this manner, the functionality of productivity tools may be improved and a time a computing device is in operation to generate a task may be reduced by embodiments disclosed herein. The image collectorof the command classification and task enginemay be configured to receive or access an image, such as a photography, a screenshot, a saved document, and/or any content formatted in any suitable manner. Example image formats include, but are not limited to Joint Photographic Experts Group (JPEG/JFIF), Exchangable image file format (Exif), Tagged Image File Format (TIFF), Graphics Interchange Format (GIF), BMP file format (Windows bitmap), Portable Network Graphics (PNG), Portable Pixmap (PPM), WebP, BAT, and the like.
In some embodiments, image may be received in response to a user taking a picture (via a camera device of the computing device, in response to the user uploading the image to a software application associated with the command classification engine, or any other suitable means for communicating an image to the command classification and task engine. For example, a GUI presented to the user may receive a first user input indicative of an option to upload an image and may receive a second user input indicative of a selection of an image to be uploaded. Alternatively or additionally, it should be understood that the image collectormay integrate (e.g., via a suitable application peripheral interface (API)) with an photos application, a camera application, and the like of a computing device, such that an image may be directly communicated from the photos application, the camera application, and the like, to the command classification and task engineby way of the image collector. In some embodiments, the image collectorreceives the image as raw data.
The image partitioning engineis configured to divide the image into computer- recognizable components. In some embodiments, the image partitioning engineis configured to determine alphanumeric characters and non-alphanumeric-character objects. The image partitioning enginemay determine each alphanumeric character (such as each letter, symbol, character, and the like), or may determine a sequence of alphanumeric characters as a phrase or word. The image partitioning enginemay determine a non-alphanumeric-character object, such as a person, item, and its corresponding content. For example, the image partitioning enginemay determine a person in the image, and may determine the face of the person, the body of the person, a color of the clothes worn by the person, and so forth. In one embodiment, the image partitioning engineis able to assign genus-species relationships between the alphanumeric characters and their subcomponents and the non-alphanumeric-character objects and their subcomponents. For example, the image partitioning enginemay determine a sequence of alphanumeric characters (e.g., a genus) and each individual character in the sequence (e.g., the species of the genus), and related the alphanumeric characters to its respective individual characters via a genus-species relationship.
Furthermore, the image partitioning enginemay determine a position of the alphanumeric characters and the non-alphanumeric-character objects. In some embodiments, the image partitioning enginemay determine pnates of the alphanumeric characters and the non-alphanumeric-character objects relative to the entire image, relative to each other, and the like. In one embodiment, the image partitioning enginemay divide the image into any number of partitions. For example, the image partitioning enginemay divide the image into a grid (for example, a 100 by 100 grid having 10,000 grid elements) having uniform or un-uniform grid elements. The image partitioning enginemay assign x-coordinates (for example, horizontal coordinates) and y-coordinates (for example, vertical coordinates). In one embodiment, the x-coordinates and the y-coordinates may be perpendicular to each other. Coordinates of each of the alphanumeric characters and non-alphanumeric-character objects may be determined based on the grid and/or the x and y coordinates.
In one embodiment, the image partitioning engineis configured to generate B-Box Embeddings of the alphanumeric characters and the non-alphanumeric-character objects identified in the image. Generating the B-Box Embedding may cause determining a height, mean elevation, elevation, distance from an edge of the image, and/or other properties for the alphanumeric character and the non-alphanumeric-character object in the image to generate a respective position or position profile for the alphanumeric character and the non-alphanumeric-character object. The position profile may include a set of coordinates associated with the alphanumeric character and the non-alphanumeric-character object. The set of coordinates may be relative to the entire image or relative to the set of coordinates of other alphanumeric characters and/or other non-alphanumeric-character object. In this manner, the alphanumeric characters and the non-alphanumeric-character objects identified in the image may better be related to each other, as well as to the entire image to better determine the image data. Indeed, a computing system may be better able to determine image data based on a relationship between the position profile of the alphanumeric characters and of the non-alphanumeric-character objects.
The image partitioning enginemay determine and generate image data. In some embodiments, the image partitioning enginemay determine and/or generate the image data based on the partitions of the image and/or based on the position profile of the alphanumeric characters and of the non-alphanumeric-character objects. For example, the image partitioning enginemay process the raw image data and generate image data as discussed below with respect to. The image data may include machine-encoded text information, position information, color information, and so forth, corresponding to the image and its corresponding alphanumeric characters and/or non-alphanumeric character objects. The image partitioning enginemay extract image data for the image based on the alphanumeric characters and the non-alphanumeric-character objects. Extracting image data may include determining the partitioned elements in the image (e.g., the alphanumeric characters and the non-alphanumeric-character objects) and a position profile for the alphanumeric characters and the non-alphanumeric-character objects. As discussed below with respect to the model generating engine, herein, the image data may be processed by the model generating engine to train and generate a machine learning model.
Continuing with, the command classification engineis configured with computing logic, such as the command logic, to determine the command of the image. The command classification enginemay determine the command based on the command logic. In some embodiments, the command classification enginedetermines a command of the image based on the image data. For example, the command classification enginemay employ OCR methodologies to determine a context and meaning of text (e.g., alphanumeric characters) identified in the image. In some embodiments, the command classification enginemay employ a machine learning model that is trained and generated by the model generating engine. The command logicmay define logic for using OCR. Example machine learning models include a neural network model, a logistic regression model, a support vector machine model, and the like.
The command classification enginemay determine the command based on a machine learning model that is trained based on a set of image data features. The feature training modulemay be configured with computing logic, such as the command logic, to determine and generate image data features that may be used to train the machine learning model. In one embodiment, the feature training modulemay determine the image data features used to train the machine learning model via any suitable process. For example, the feature training modulemay determine the image data features via any suitable engineering process, which may include at least one of the following steps: brainstorming or testing features, deciding which features to create, creating the features, testing the impact of the created features on a task or training data, and iteratively improving features. Image data features may be engineered by the feature training moduleusing any suitable computations, including, but not limited to, (1) numerical transformation (e.g., taking fractions or scaling), (2) employing a category encoder to categorize data, (3) clustering techniques, (4) group aggregation values, (5) principal component analysis, and the like. In some embodiments, the feature training modulemay assign different levels of significance to the image data, such that certain image data features that have a higher level of significant are weighted when the model trainertrains the machine learning model. In this manner, the model trainermay prioritize and/or rank image data features to improve command determination.
The command classification enginemay employ any suitable classification or prediction algorithm to classify and/or predict the command of an image, for example, based on the image data features. The command classification enginemay classify the command as a user request to generate content based on the image. Example content may include a calendar event, a recipe, a shopping list, a reminder, a message (for example, email, text message, social media post, and the like). Therefore, the command classification enginemay determine that a user wishes to generate a calendar event, a recipe, a shopping list, a reminder, a message (for example, email, text message, social media post, and the like), and so forth based on the image. It should be understood that the embodiments disclosed herein may be broadly applied to predict any suitable intent or computer command other than those described in this paragraph.
The task generating enginemay determine at least one task that corresponds to the command determined by the command classification engine. In some embodiments, the task generating engineemploys task logicto determine the task. The task logicmay define intent-specific instructions for determining the task. The intent-specific instructions may include a subset of the entire task logic, thereby improving the speed by which the task generating engineis able to generate the task. For example, in response to the command classification engineclassifying the command as generating content associated with food, the task generating enginemay employ task logicassociated with food. In this example, the task logicassociated with food may indicate a recipe or a shopping list. Based on the image data features, the task generating engine may determine that the task associated with this command corresponds to a shopping list.
Continuing with, the entity determining enginemay determine task entities associated with the task generated by the task generating engine. In some embodiments, the entity determining enginemay employ the task logicto determine task entities that are specific to a particular task. Continuing the example above, the entity determining enginemay determine the ingredients and title (of the recipe) based on the task. For example, the image data may include certain alphanumeric characters in close proximity to each other (based on the corresponding position profile) indicative of food and corresponding units (and numbers) of measurement. The entity determining enginemay communicate these task entities to the task generating engineso that the task (e.g., the recipe) is generated with the task entities (e.g., the ingredients, their units of measurements, and the title). The task entities may have been included as alphanumeric characters or non-alphanumeric-character objects in the image. Example tasks and task entities are depicted on and described below with respect to, and.
Althoughwas discussed in the context of determining one command, one corresponding task, and a few task entities, it should be understood that any number of commands, corresponding tasks, and task entities may be determined/generated. For example, two task commands may be determined, such that a user (e.g., via the user deviceof) may make a selection indicative of which command of the two commands to generate a corresponding task and task entities. Similarly, a user may select both commands such that both tasks and corresponding entities are generated based on the embodiments disclosed herein.
The task and entity deploying enginemay be configured with computing logic to configure the generated task and entities for use in any suitable abstraction layer, for example of the user device. In some embodiments, the task and entity deploying enginemay receive the task from the task generating engineand the task entity from the entity determining engine. Based on the command of the task, the task, or the task entity, the task and entity deploying enginemay deploy the task and the task entity to an associated software application, such as any suitable computer productivity application. For example, in response to determining the command to correspond to generating a shopping list, the task and entity deploying enginemay format and deploy the task and task entity to a reminder software application or a to-do productivity software application. In some embodiments, the task and entity deploying enginemay communicate with any software application via any suitable API or other communication means. Although this example includes the task and entity deploying engineformatting, configuring, and communicating the task and task entity for use in a software application of an application layer, it should be understood that the task and entity deploying enginemay format, configure, and communicate the task to any suitable abstract layer, such as an operating system layer, another application layer, or a hardware layer.
Continuing with, the model generating enginemay train and generate a machine learning model that may be employed by the command classification and task engine. The model initializermay select and initialize a machine learning model. As discussed above, example machine learning models include a neural network model, a logistic regression model, a support vector machine model, and the like. Initializing the machine learning model may include causing the model initializerto determine model parameters and provide initial conditions for the model parameters. In one embodiment, the initial conditions for the model parameters may include a coefficient for the model parameter.
The model trainermay train the machine learning model determined by the model initializer. As part of training the machine learning model, the model trainermay receive outputs from the model initializerto train the machine learning model. In some embodiments, the model trainer may receive the type of machine learning model, the loss function associated with the machine learning model, the parameters used to train the machine learning model, and the initial conditions for the model parameters. Example loss functions include a standard cross entropy loss function, a focal loss function, a dice loss function, and a self-adjusting loss function, to name a few. The model trainermay iteratively train the machine learning model. In one embodiment, training the machine learning model may include employing an optimizer that causes the machine learning model to continue to be trained using the training data is until certain conditions are met, for example, as determined by the model evaluator. Alternatively, the model trainermay feed one set of training data to the machine learning model to generate a predicted output that is used by the model evaluator.
Example training data includes any labeled data or unlabeled data. For example, training data may include computing device information (such as charging data, date/time, or other information derived from a computing device), user-activity information (for example: app usage; online activity; searches; browsing certain types of webpages; listening to music; taking pictures; voice data such as automatic speech recognition; activity logs; communications data including calls, texts, instant messages, and emails; website posts; other user data associated with communication events; other user interactions with a user device, and so forth) including user activity that occurs over more than one user device, user history, session logs, application data, contacts data, calendar and schedule data, notification data, social network data, news (including popular or trending items on search engines or social networks), online gaming data, ecommerce activity (including data from online accounts such as Microsoft®, Amazon.com®, Google®, eBay®, PayPal®, video-streaming services, gaming services, or Xbox Live®), user-account(s) data (which may include data from user preferences or settings associated with a personalization-related (e.g., “personal assistant” or “virtual assistant”) application or service), home-sensor data, appliance data, global positioning system (GPS) data, vehicle signal data, traffic data, weather data (including forecasts), wearable device data, other user device data (which may include device settings, profiles, network-related information (e.g., network name or ID, domain information, workgroup information, other network connection data, Wi-Fi network data, or configuration data, data regarding the model number, firmware, or equipment, device pairings, such as where a user has a mobile phone paired with a Bluetooth headset, for example, or other network-related information)), gyroscope data, accelerometer data, payment or credit card usage data (which may include information from a user's PayPal account), purchase history data (such as information from a user's Xbox Live, Amazon.com or eBay account), other data that may be sensed or otherwise detected, data derived based on other data (for example, location data that can be derived from Wi-Fi, cellular network, or IP (internet protocol) address data), calendar items specified in user's electronic calendar, and nearly any other data that may be used to train a machine learning model, as described herein.
The model evaluatormay evaluate the accuracy of the machine learning model trained by the model trainer. In some embodiments, the model evaluatoris configured to assess the accuracy of the model based on a loss (e.g., error) determined based on the loss function. The model evaluatormay validate the machine learning model. In some embodiments, the model evaluatormay validate the machine learning model based on training data used for validation purposes instead of training purposes. In some embodiments, the training data used by the model evaluatorto validate the machine learning model may correspond to training data different from the training data used by the model trainerto train the machine learning model. In some embodiments, the training data received via the model generating enginemay be split into training data used by the model trainerand training data used by the model evaluator. In one embodiment, the training data used by the model evaluatormay be unlabeled, while the training data used by the model trainermay be labeled.
The model evaluatormay validate the machine learning model based on a score function. The score function may facilitate determining probabilistic scores for a classification machine learning model or estimated averages for regression problems, to name a couple examples. It should be understood that the score function may include any suitable algorithm applied to training data to uncover probabilistic insights indicative of the accuracy of the machine learning model. In some embodiments, the model evaluatormay employ a score function to determine whether the machine learning model is at or above a validation threshold value indicative of an acceptable model validation metric. The model validation metric may include a percent accuracy or fit associated with applying the machine learning model trained by the model trainerto the training data. If the model evaluatordetermines that the machine learning model fails to meet the model validation metric, then the model trainermay continue to train the machine learning model. On the other hand, if the model evaluatordetermines that the machine learning model passes validation, the model deploying enginemay deploy the machine learning model, for example, to the user device.
In some embodiments, the model deploying enginemay receive a machine learning model determined to be sufficiently trained. The model deploying enginemay deploy a trained machine learning model to the command classification and task engine. As discussed herein, the command classification and task enginemay use the trained machine learning model deployed via the model deploying engineto perform the functionality described herein.
The task and entity deploying enginemay deploy the command classification and task engine, its outputs, and/or the machine learning model generated by the model generating engineto any suitable computing device (e.g., the client device), via any suitable abstraction layer. For example, the task and entity deploying enginemay transmit the command classification and task engine, its outputs, and/or the machine learning model to the operating system layer, application layer, hardware layer, and so forth, associated with a client device or client account. In one embodiment, the command classification and task engine, the model generating engine, or any of its components may integrate with an existing software application, such as a computer productivity application. For example, the command classification and task engine, the model generating engine, or any of its components may be installed as a plug-in (for example, a plug-in extension) to a web-based application or browser or the computer productivity application.
In the context of the task and entity deploying enginetransmitting to a computing device the command classification and task engine, its outputs, and/or the machine learning model to the operating system layer (e.g., of a client device), the task and task entities may easily be accessible to the computing device. A user may select an image or snap a picture of an image to which the embodiments described herein will be applied. For example, the user may select an image or snap a picture of an image for which the command, task, and task entity may be determined and generated, as discussed herein. In this manner, a computing device may include out-of-the-box software that classifies or predicts a command, as well as determines and generates tasks and task entities, as discussed herein. Alternatively, the computing device may access the functionality described herein as any suitable software-as-a-service (SaaS) service or by any other means.
Unknown
September 25, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.