Patentable/Patents/US-20250316105-A1
US-20250316105-A1

Attention Embedded Transformer Network Driven Document Data Extraction

PublishedOctober 9, 2025
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

Attention embedded transformer network driven document data extraction is provided. For example, a system integrates one or more processors with a data repository to identify a document of a first type received from a client device. The system determines a portion of the document based on a boundary established by a digital overlay. The system generates, via a trained machine learning model, a query using the portion of the document determined based on the boundary, wherein the query is designed to facilitate an extraction of data relating to the first type. The system inputs the query into a trained attention embedded transformer network model to extract data from the document, the extracted data including at least the extraction of data relating to the first type. The system displays, via the client device, the extracted data.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

. A system, comprising:

2

. The system of, wherein the one or more processors are further configured to:

3

. The system of, wherein the one or more processors are further configured to:

4

. The system of, wherein the one or more processors are further configured to:

5

. The system of, wherein the one or more processors are further configured to:

6

. The system of, wherein the one or more processors are further configured to:

7

. The system of, wherein the one or more processors are further configured to:

8

. The system of, wherein the one or more processors are further configured to:

9

. The system of, the action performed on the document is at least one of:

10

. The system of, wherein the one or more processors are further configured to:

11

. The system of, wherein the one or more processors are configured to:

12

. The system of, wherein the one or more processors are further configured to:

13

. The system of, wherein the second trained machine learning model is a trained attention embedded transformer network model.

14

. A method, comprising:

15

. The method of, comprising:

16

. The method of, comprising:

17

. The method of, comprising:

18

: The system of, comprising:

19

. A non-transitory computer-readable medium comprising instructions embodied thereon, the instructions to cause a processor to:

20

. The non-transitory computer-readable medium of, comprising the instructions embodied thereon to cause the processor to:

Detailed Description

Complete technical specification and implementation details from the patent document.

This application is generally related to computing technology, and particularly to document data extraction using machine learning or attention embedded transformer networks to improve computing performance.

Heterogeneous computing systems can store, retrieve, and process different types of data across different systems. Computing systems perform high volumes of document data extraction. Due to documents of the same type having different formats, documents of the same entity having different types, and the large number of requests received by computing systems to perform document data extraction on documents from a plurality of domains, document data extraction can be computing resource intensive, error prone due to the non-uniformities, and cause systematic problems including latency, traffic congestion, or delay.

This technology is directed to templated document data extraction via trained machine learning models, including, for example, trained attention embedded transformer network models. For example, aspects of the technical solutions described herein can identify and extract information from a plurality of document types. One or more of the plurality of document types correspond to a domain of a plurality of domains. Some aspects of the technical solutions herein facilitate templating documents of a first type into a uniform format.

Aspects of the technical solutions described herein facilitate generating and using a template to extract data from a document received via a client device. For example, aspects of the technical solutions herein identify a document of a first type received from a client device. In some embodiments, aspects of the technical solutions herein determine a domain of a plurality of domains associated with the first type. Aspects of the technical solutions described herein determine a portion of the document based on a boundary. The boundary can be established by a digital overlay. Aspects of the technical solutions described herein generate training data sets to be used in model training by creating one or more new documents from the document. Aspects of the technical solutions described herein utilize trained machine learning models or trained attention embedded transformer network models to extract and template information from document into a new format associated with the first type. Aspects of the technical solutions described herein enable multiple client devices associated with an entity to produce uniform results when extracting information from documents of the first type.

Acquiring the requisite amount of training data to enable systems to accurately extract data from documents associated with a single domain, let alone a plurality of domains, can be computationally intensive. Producing uniform outputs of document data extraction within an entity can also be computationally intensive. Computationally intensive can refer to or include, for example, utilizing excessive or large amounts of processing or memory that exceed certain predetermined thresholds; for example, that amount of processor or memory used to accurately train the model to extract data from documents relating to the plurality of domains. Computationally intensive can refer to or include computational costs used to extract information from the document, or the computational costs used to provide uniform outputs among client devices associated with an entity. Computational costs can include factors such as network bandwidth, time, memory, electric power, and processing power, etc. The computational cost can also be indicated in terms of computations being performed, for example, the number of floating point operations (FLOPS), or as the number of multiply-and-accumulate operations (MACs or MACCs), etc.

These document data extractions can cause latency, error, traffic congestion, or delay across a system including the client devices and data processing systems due to the size of the data, intricacies of the data, and variety of formats of the documents. Furthermore, document data extraction can be prone to error and is not easily extensible to new formats of documents, new types of documents, or different domains associated with types of documents. Due to the large volume of documents with differing formats and types, different domains associated with the types, and the scale of heterogeneous computing systems, it can be challenging to extract information from different documents and template the information into a new, uniform format without excessive latency, inaccuracy, or generating erroneous computing actions. These technical challenges further prevent one or more client devices from receiving extracted and standardized information due to extensive network traffic and reduced throughput, thereby affecting the efficiency of the system overall. In addition to the systematic problems created by document data extraction, it can be technically challenging to perform document data extraction on documents of different formats, different types, associated with different domains, and to generate outputs that have uniform formats from one or more client devices.

Technical solutions are provided herein to address such technical challenges. The technical solutions identify a document of a first type received from a client device. The technical solutions facilitate use of trained machine learning models (e.g., trained attention embedded transformer network models) to classify, extract, and template information from the document. The technical solutions described herein facilitate classification, extraction, and templating of documents of a variety of formats, of a variety of types, of a variety of domains, as well as documents that may be received from a variety of disparate sources. In some embodiments, the technical solutions described herein identify a format of the document. The technical solutions described herein determine a portion of the document based on a boundary established by a digital overlay. In some aspects, the technical solutions described herein identify one or more labels of the portion. The technical solutions described herein facilitate use of trained machine learning models to extract and template information extracted from the portion or the document. In some aspects, the technical solutions herein determine a schema for the document data extraction and template the document data extraction according to the schema. Thus, by utilizing trained machine learning models to classify documents, extract information from documents, and template the information in a standardized manner, the technical solutions described herein reduce the computational cost (e.g., network bandwidth, time, memory, electric power, and processing power) associated with extracting information from a document by a data processing system compared to previous document data extraction systems. Accordingly, the technical solutions described herein are rooted in computing technology, and provide improvements to computing technology, particularly systems that identify documents, determine portions of documents, extract information from portions of documents, and ensure the information is templated in a uniform manner.

At least one aspect of the technical solutions described herein is directed to a system. The system includes one or more processors, coupled with memory. The one or more processors identify a document of a first type. The document of the first type is received from a client device. The one or more processors establish a boundary of a portion of the document based on a digital overlay. The one or more processors select a portion of the document based on the boundary. The one or more processors generate, using a trained machine learning model, a query using the portion of the document, the query is designed to facilitate an extraction of data. The data to be extracted is based on the document being of the first type. The one or more processors extract the data from the document of the first type by inputting the query to a second trained machine learning model.

In some aspects of the technical solutions described herein, the one or more processors determines a validation score for the extracted data. The one or more processors displays the extracted data via the client device in response to the validation score being above a threshold.

In some aspects of the technical solutions described herein, the one or more processors determines the validation score using the trained machine learning model. The trained machine learning model receives the extracted data as an input.

In some aspects of the technical solutions described herein, the one or more processors determines, using the second trained machine learning model, the validation score, wherein the second trained machine learning model receives the extracted data as an input.

In some aspects of the technical solutions described herein, the one or more processors determines, via the trained machine learning model, a first validation score, wherein the trained machine learning model receives the extracted data as a first input. The one or more processors determines, via the second trained machine learning model, a second validation score, wherein the second trained machine learning model receives the extracted data as a second input. The one or more processors displays the extracted data in response to a determination that the first validation score and the second validation score are both above the threshold.

In some aspects of the technical solutions described herein, the one or more processors determines a validation score for the extracted data. The one or more processors extracts new data from the document of the first type by inputting the query into the second trained machine learning model in response to a determination that the validation score is below a threshold. The one or more processors determines a new validation score for the extracted new data. The one or more processors replaces the extracted data with the extracted new data, in response to a determination that the new validation score is above the threshold.

In some aspects of the technical solutions described herein, the one or more processors determines a domain of a plurality of domains of the document according to the first type. The one or more processors templates the extracted data according to an ontological library corresponding to the domain determined.

In some aspects of the technical solutions described herein, the one or more processors creates at least one new document by an action performed on the document. The one or more processors inputs a first training data set to a machine learning model to train the machine learning model, wherein the first training data set includes the at least one new document and the document.

In some aspects of the technical solutions described herein, the action performed on the document by the one or more processors includes a rotation, an inversion, a rescaling, a blurring, a sharpening, a modification of a quantitative aspect, or a modification of a qualitative aspect.

In some aspects of the technical solutions described herein, the one or more processors creates at least one new document, wherein the new document is a rotation of the document. The one or more processors inputs a first training data set to a machine learning model, wherein the second training data set includes the at least one new document and the document.

In some aspects of the technical solutions described herein, the one or more processors determines a domain of a plurality of domains corresponding to the first type of document, the plurality of domains including: payroll, tax, benefits, human resources, time management, or performance management.

In some aspects of the technical solutions described herein, the one or more processors receives, via the client device, an indication of the first type of document.

In some aspects of the technical solutions described herein, the second trained machine learning model is a trained attention embedded transformer network model.

At least one aspect of the technical solution described herein is directed to a method. The method includes identifying, by one or more processors, a document of a first type received from a client device. The method includes establishing, by the one or more processors, a boundary of a portion of the document based on a digital overlay. The method includes selecting, by the one or more processors, the portion of the document based on the boundary. The method includes generating, by the one or more processors, a query by inputting the portion of the document into a trained machine learning model, wherein the query is designed to facilitate an extraction of data, wherein the data to be extracted is based on the document being of the first type. The method includes extracting, by the one or more processors, the data from the document of the first type by inputting the query into a second trained machine learning model.

In some aspects of the technical solutions described herein, the method includes displaying, by the one or more processors, the extracted data.

In some aspects of the technical solutions described herein, the method includes determining, by the one or more processors, the extracted data in response to determining that the validation score is above a threshold.

In some aspects of the technical solutions described herein, the method includes determining, by the one or more processors, a validation score for the extracted data. The method includes extracting, by the one or more processors, new data from the document of the first type by inputting the query into the second machine learning model in response to determining that the validation score is below a threshold. The method includes determining, by the one or more processors, a new validation score for the extracted new data. The method includes replacing, by the one or more processors, the extracted data with the extracted new data in response to determining that the new validation score is above the threshold.

In some aspects of the technical solutions described herein, the method includes creating, by the one or more processors, at least one new document through an action performed on the document. The method includes inputting, by the one or more processors, a first training data set to a machine learning model to train the machine learning model, wherein the first training data set includes the at least one new document and the document.

In some aspects of the technical solutions described herein, the method includes receiving, by the one or more processors, an indication of the first type of document from the client device.

At least one aspect of the technical solutions described herein is directed to a non-transitory computer-readable medium. The non-transitory computer readable medium includes instructions to cause a processor to identify a document of a first type received from a client device. The instructions cause the processor to generate, using a trained machine learning model, a query using the document, wherein the query is designed to facilitate an extraction of data relating to the first type. The instructions cause the processor to extract the data from the document of the first type by inputting the query into a second trained machine learning model.

In some aspects of the technical solutions described herein, the instructions cause the processor to determine a validation score for the extracted data. The instructions cause the processor to display the extracted data in response to a determination that the validation score is above a threshold.

In some aspects of the technical solutions described herein, the instructions cause the processor to determine a portion of the document based on a boundary established by a digital overlay. The instructions cause the processor to generate, via a trained machine learning model, the query using the portion of the document. The instructions cause the processor to display, the extracted data.

Following below are more detailed descriptions of various concepts related to, and implementations of, methods, apparatuses, and systems to template document data extraction. The various concepts introduced above and discussed in greater detail below can be implemented in any of numerous ways.

The technical solutions described herein describe a system, method, and computer readable medium to automatically template document data extraction for a plurality of document types in a plurality of domains using standardized ontologies. Aspects of the technical solutions described herein are generally directed to digitizing documents using templated document data extraction and machine learning. For example, aspects of the technical solutions described herein receive one or more labeled parts of a document and extract information from the content of the labeled parts. Extracting information from documents is an intensive and error-prone process. A technical problem exists such that current document data extraction systems cannot accurately perform document data extraction for more than one domain. Furthermore, existing document data extraction systems require individualized training to competently perform their tasks. This training requires large amounts of data. In addition to the existing limits regarding the domains for which existing systems can provide templated document data extraction, the existing document data extraction systems lack the technical ability to standardize outputs among multiple client devices. Accordingly, existing document data extraction systems experience technical difficulties related to acquisition of training data and the breadth of document types for which the system can provide document data extraction. Further, existing document data extraction systems are only trained to extract the data and lack the technical capabilities of providing a uniform structured output across a variety of client devices. Therefore, where multiple client devices associated with an entity extract data from documents within the same domain, or of the same type, the lack of standardization techniques creates inconsistencies among the outputs generated by the client devices. These inconsistencies can cause crucial information to be lost or can require additional computational costs to correct.

For example, a client device attempts to extract data from documents ranging across a variety of domains. The client device may use several document data extraction systems to extract data from the different domains. Each system used by the client device in this extraction requires intensive training. In addition to lacking the technical capabilities to extract data from multiple domains, current systems lack the technical ability to structure the output in a standardized form. Thus, when two client devices extract document data from different sources (e.g., different structures, such as for example different corporations), where both sources are of the same domain type, discrepancies due to the different sources can cause errors to propagate into the output. For example, these discrepancies can be differences in a format or structure of the documents, or a use of different ontological terms to describe the content to be extracted from the document. Alternatively or in addition, when a client device extracts document data from different sources, where both sources are of the same domain type, discrepancies due to the different sources can cause errors to propagate into the output. For example, an error can be a lack of uniform ontological terms between the outputs, thus preventing uniform outputs of document data extraction. Due to these technical challenges, document data extraction systems are labor intensive, error prone, and lack the ability to structure outputs in a standard using standardized ontological terms associated with a domain of the first type of the document.

The technical solutions described herein identify a document of a first type received from a client device. The technical solutions described herein determine a domain of the document according to the first type. Using a portion of the document based on a boundary established by a digital overlay, the technical solutions herein generate, via a trained machine learning model, a query. The query is designed to facilitate an extraction of data relating to the first type when input by the technical solutions described herein into a second trained machine learning model (e.g. an attention embedded transformer network model). The technical solutions described herein generate an output including at least the extraction of data relating to the first type. The technical solutions described herein template the extracted data (e.g., the output) using an ontological library corresponding to the domain. The technical solutions described herein display the extracted data. By identifying a type of the document, determining a domain of the document corresponding to the first type, generating a query using a portion of the document based on a boundary established via a digital overlay, generating an output from the query, the technical solutions described herein can extract data from documents of a variety of domains and provide standardized outputs when compared to systems which do not use this templated document data extraction.

depicts an example systemthat facilitates templated document data extraction according to one or more aspects of the technical solutions described herein. The systemincludes a data processing system, a database, at least one client device(sometimes hereinafter referred to as the client device(s) or client service(s)), a server, and a network. The data processing systemcan include an application, a pre-processor, a query generator, a data extractor, a validator module, a model trainer, or a data repository. The data processing systemcan include additional components. The application, the pre-processor, the query generator, the data extractor, the validator module, the model trainer, and the data repositoryeach may communicate with the databaseor the client devicevia the network.

The data processing systemcan interface with, communicate with, or otherwise receive or provide information with one or more of the client devices, the server, or the database. The data processing systemcan include at least one logic device such as a server. The servercan be a computing device having a processor to communicate via a network. The data processing systemcan include or interface with at least one server. The servercan be a computation resource, server, processor or memory. For example, the data processing systemcan include a plurality of computation resources or processors. The servercan facilitate communications between the data processing system, the database, or the client devicevia the network.

The networkcan be a wireless or wired connection for enabling the data processing systemto communicate. The data processing systemcan communicate with internal subcomponents (described herein), or external components (e.g., the server, the database, or the client device, among others) via the network. The data processing systemcan, for example, store data about the systemin the data repository. The data processing systemcan, for example, receive the datatransmitted from the database. The network can include a hardwired connection (e.g., copper wire or fiber optics) or a wireless connection (e.g., wide area network (WAN), controller area network (CAN), local area network (LAN), or personal area network (PAN)). For example, the networkcan include Wi-Fi, Bluetooth, BLE, or other communication protocols for transferring over networks as described herein.

The data repositoryincludes a model artifact store. The model artifact storeis a type of memory, database, or other structure for storing data structures (e.g., machine learning modelsor attention embedded transformer network models). In some aspects of the technical solutions described herein, the model artifact storeis characterized by a duration of time with which the data structures within the model artifact storeare stored. For example, data structures or information within the model artifact storeare maintained, stored, or otherwise present within the model artifact storefor a period of time. In some aspects of the technical solutions described herein, the model artifact storeis characterized by a location of the model artifact store. For example, the model artifact storeis located within the data processing system, or the client device. In this manner, the model artifact storeis referred to or considered local storage. In some embodiments, accessing the model artifact store(by the data processing system, the client device, among others) enables the data processing systemor the client deviceto access the information within the model artifact store(such as the machine learning modelsor the attention embedded transformer network models) with less time, latency, or computational power than accessing the information (such as the machine learning modelsor the attention embedded transformer network models) from a remote database, such as the database. In some aspects of the technical solutions described herein, the model artifact storeis characterized by a type of model stored within the model artifact store. For example, the model artifact storestores models of one or more types, including machine learning models(hereinafter generally referred to as machine learning model(s)or trained machine learning model(s)) or attention embedded transformer network models(hereinafter generally referred to as attention embedded transformer network model(s)or trained attention embedded transformer network model(s)). The model artifact storeprovides models to the query generatoror to the data extractor.

The databaseis or includes a system or computing device including the data. The databaseis or includes a system or computing device including the model artifact store. The databaseis or includes a storage or data repository to store the dataor the model artifact store. The databaseis located remotely from the data processing systemor the client device. For example, the databasecorresponds to or is maintained by an outside entity such as a government, individual, company, or non-profit organization. In some aspects of the technical solutions described herein, the databaseis maintained, owned, or operated by the same entity as an entity maintaining, owning, or operating the data processing system. In some aspects of the technical solutions described herein, the databaseis accessed by approved computing system, such as the data processing systemoperating under the same entity as the database. Although a single databaseis depicted, the databasecan include multiple databases.

The databasemaintains, includes, stores, or otherwise hosts the dataor the model artifact store. The datais any set of aggregated, accumulated, calculated, generated, or otherwise available to an entity. The dataincludes information about an entity. The entity includes an individual, such as an employee of an organization as described herein, or a grouping of people, such as an employee of an organization as described herein, or a grouping of people, such as an organization, corporation, or educational institution. The information includes data like name, address, social security number, salary, personally identifying information, demographic information, familial information, benefits information, or other such information. The dataincludes information about an entity such as location of the entity (e.g., an address, physical or coordinate location, a geofence associated with the entity), employees of the entity, tax information, financial information, proprietary information, among other information. For example, the databasecan be an external computing system maintaining a data repository of payroll, tax, benefits, human resources, time management or performance management for an entity.

The dataincludes a plurality of values. The values can be alpha-numeric. In some cases, the values are displayable on a screen, such as that of the client device, the database, or the data processing system. For example, the datacan include strings such as “First Name”, “Earnings”, “Withholdings”, “Deductions”, “Net Pay Allocations”, “Reimbursements”, “Hours”, “Rate”, “375.00,” or “0.65.” The datacan include auditory values, such as a sound or vocal recording. The datacan include colored or color-coded values. The datacan include time-related values, such as current time, elapsed time, clock-in time, among others. The datacan include images. The images can contain multiple types of information. For example, an image can be a payroll journal that includes information such as “Employee Name”, “Payroll Type”, “Earnings”, “Withholdings”, “Deductions”, “Net Pay Allocations”, or “Social Security Number.” The values of the dataincludes any combination of values (e.g., the data can be multi-modal). For example, a first value of the dataincludes an image and a string, and a second value of the dataincludes an image and an auditory value. The values of the datacan relate to each other. In an example, a value of “Earnings” corresponds to a value of “3,129.” Some values of the datacan be null or zero values.

The databasearranges the values of the datain a specified manner, such as a table, a list, or other defined data structure. The databaseincludes different values within the data. For example, the databasemaintains datacorresponding to demographics of a computer science company with different values and arrangements of those values for the datathan a second data corresponding to tax withholdings for employees of a public education institution.

The dataincludes different attributes, such as a file type, data type, vendor type, or other such attributes. The datais included in, denoted by, or transmitted as an electronic file type. Examples of electronic file types include comma separated values (CSV), excel files (XLS or XLSM), or data interchange format (DIF), JavaScript Object Notation (JSON), among others. The datacan be associated with or stored as a file type. The file type determines or relates to data structures associated with the data. In some aspects of the technical solutions described herein, the datais encrypted by the database, such as by Advanced Encryption Standard (AES), Rivest-Shamir-Adleman (RSA), or another encryption standard. The datacan be unencrypted by the database, or by another system enabled for access to the data, such as the data processing system. In some aspects of the technical solutions described herein, one or more client devicesrequests access to the dataor requests the dataitself from the databasevia the data processing systemor through another computing system or the client devicedirectly.

The client deviceis or includes any computing device such as a laptop, a desktop computer, a smart phone, a tablet, etc. A user may operate, display, or otherwise execute an applicationvia the client device. The client devicecan be coupled with storage or memory. In some aspects of the technical solutions described herein, the client deviceis operated by a user associated with an organization to perform various tasks associated with the organization. The client deviceexecutes the application. The applicationis any platform for performing various tasks associated with the organization, such as a low-code platform, no-code platform, software-as-a-service platform (SaaS), web application, web browser, desktop application, among others. In some aspects of the technical solutions described herein, the applicationincludes one or more user-interfaces, such as a Graphical User Interface (GUI), Command-Line Interface (CLI), Voice User Interface (VUI), Touchscreen Interface, Menu-driven Interface, Natural Language Interface, Multi-modal Interface, or Document Labeling Interface, among others. It should be understood that this listing of user-interfaces is exemplary and is not intended to be construed as exhaustive or limiting.

The data processing systeminterfaces with, communicates with, or otherwise receives or provides information with the database, the client device, among others. The data processing systemincludes or interfaces with at least one logic device such as a server. The server is a computing device having a processor to communicate via the network. For example, the data processing systemincludes a plurality of computation resources or processors. The server facilitates communications between the data processing system, the database, the client device, and the network.

In an illustrative example, the applicationidentifies a document of a first type received from a client device. The applicationestablishes a boundary of a portion of the document based on a digital overlay. The pre-processoraugments the portion of the document. The query generatorgenerates, via a trained machine learning model, a query using the portion of the document. The query generatordesigns the query to facilitate an extraction of data relating to the first type. The data extractorinputs the query into a second trained machine learning model to generate an output, the output comprising at least the extraction of data based on the document being of the first type. The applicationdisplays, via the client device, the extracted data. The second trained machine learning model can be a trained attention embedded transformer network model. The model trainertrains one or more machine learning models, or one or more attention embedded transformer network modelsto perform the functionalities described herein.

The application, the pre-processor, the query generator, the data extractor, the validator module, or the model trainercan each include at least one processing unit or other logic device such as a programmable logic array engine, or module configured to communicate with the data repositoryor database. The application, the pre-processor, the query generator, the data extractor, the validator module, or the model trainercan be separate components, separate microservices, a single component, or part of the data processing system. The systemand its components, such as the data processing system, includes hardware elements, such as one or more processors, logic devices, or circuits.

The data processing systemincludes one or more microservices configured to be executed by the one or more processors of the data processing system. Each microservice communicates with the other microservices to perform a function. In some embodiments, each microservice is located on a separate server or one or more microservices are located on the same server. In some aspects of the technical solutions described herein, each microservice corresponds to a processor of the data processing systemor one or more microservices has their functionalities executed by the same processors. In some aspects of the technical solutions described herein, subcomponents of the data processing system, such as the application, the pre-processor, the query generator, the data extractor, the validator module, or the model trainer, can each be or include a microservice. In some embodiments of the technical solutions described herein, the applicationcan be hosted on the client device. In some embodiments, the microservices can operate or execute on the application. For example, in some aspects of the technical solutions described herein, the operations of the data processing systemcan operate on or be performed by the applicationoperating on the client device.

In some embodiments, the applicationcan perform one or more of the functionalities of the data processing system, the pre-processor, the query generator, the data extractor, the validator module, or the model trainer. For example, the applicationcan perform some or all of the functionalities of the pre-processor, the query generatoror the data extractor, or the application can include the pre-processor, the query generatoror the data extractor. In some aspects of the technical solutions described herein, the applicationcan include one or more of the subcomponents of the data processing system, such as one or more of the pre-processor, the query generator, the data extractor, the validator module, or the model trainer.

The data processing systemincludes an applicationdesigned, constructed, and operational to identify a document received from a client deviceand to determine a portion of the document based on a boundary established by a digital overlay input by a client device. The applicationis any combination of hardware or software for identifying types of documents and for determining portions of documents according to boundaries. In some aspects of the technical solutions described herein, the first type of the document can be selected from a predetermined list by the client device. In some aspects of the technical solutions described herein, the type of document can be input by a user of a client device. In some aspects of the technical solutions described herein, the document is an image. In some aspects of the technical solutions described herein, the type of document identified by the applicationcorresponds to a domain of a plurality of domains. For example, the applicationreceives an image from a client device. The applicationidentifies a first type corresponding to the image via an input from the client device. The first type can correspond to a domain of a plurality of domains and can be selected from a predetermined list or can be input via the client device. For example, the predetermined list can include: W-2, Form 1040, Schedule A, Schedule B, Schedule C, Schedule D, 19, Hiring Forms, Onboarding Documents, Performance Evaluations, Exit Interview Forms, Employment Application, Form 840, Form 941, Form 944, Form 1095, Form 1099, Wage and Tax Statement, SF 52, SF 59, SF 61, SF 71, SF 75, SF 3102, Daily To-Do List, Checklist, Time Log, Activity Log, Eisenhower Matrix, or Shift, among others. It should be understood that this listing of document types is exemplary is not to be construed as exhaustive or limiting.

The applicationidentifies a document of a first type received from a client device. The applicationestablishes a boundary of a portion of the document based on a digital overlay. The applicationselects a portion of the document based on the boundary. The applicationcreates the digital overlay using: image editing software (Adobe Photoshop, Adobe Illustrator, Sketch, Figma, Adobe XD, Affinity Designer, GNU Image Manipulation Program (GIMP), or Canva, among others), or drawing APIs and libraries (JavaFX, Qt, GIMP Toolkit (GTK), wx Widgets, Cairo, Skia, HTML5 Canvas, Simple and Fast Multimedia Library (SFML), or OpenTK, among others), among others. It should be understood that this listing of image editing software and drawing APIs is exemplary and is not to be construed as exhaustive or limiting. The applicationcan create the digital overlay via a shape, a mask, a filter, a color filter, highlighting, non-destructive editing, or framing, among others. The applicationpreserves the content and coordinates of the image when creating the digital overlay by employing techniques such as layers, alpha channels, masking, or accurate coordinate transformations, among others. It should be understood that this listing of techniques is exemplary and is not to be construed as exhaustive or limiting. The applicationcan convert a format of the document. The applicationdisplays, an output of the data extractor(e.g., the extracted data). The extracted data can be data extracted from the portion of the document.

Patent Metadata

Filing Date

Unknown

Publication Date

October 9, 2025

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “ATTENTION EMBEDDED TRANSFORMER NETWORK DRIVEN DOCUMENT DATA EXTRACTION” (US-20250316105-A1). https://patentable.app/patents/US-20250316105-A1

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.