Transaction Auditing Using Token Extraction and Model Matching

PublishedSeptember 23, 2025

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer-implemented method for an automated compliance audit comprising: identifying a first set of one or more selected token types that have been selected by a first reimbursing entity for validation of reimbursement requests and a second set of one or more selected token types that have been selected by a second reimbursing entity for validation of reimbursement requests, wherein the second reimbursing entity is different from the first reimbursing entity and the second set of one or more selected token types is different from the first set of one or more selected token types; training at least one machine learning model for the first reimbursing entity using historical receipt text and historical reimbursement request data values to automatically identify and extract tokens of the first set of one or more selected token types; training at least one machine learning model for the second reimbursing entity using historical receipt text and historical reimbursement request data values to automatically identify and extract tokens of the second set of one or more selected token types; receiving first reimbursement request data values for a first reimbursement request associated with a first reimbursing entity; receiving first receipt text extracted from a first receipt submitted with the first reimbursement request; automatically extracting first token values for the first set of selected token types from the first receipt text using the at least one machine learning model for the first reimbursing entity, wherein automatically extracting the first token values includes: identifying first tokens in the first receipt text; and for each respective identified token in the identified first tokens: determining features of the identified token; determining a token type of the identified token by determining that the token type of the identified token is included in the first set of selected token types, based on the features determined for the identified token and a confidence score that indicates a likelihood that the identified token has the determined token type; and extracting a token value for the identified token from the first receipt text; comparing the first extracted token values to the first reimbursement request data values, wherein the comparing includes: identifying, in the first reimbursement request data values and for each selected token type, a request value for the selected token type; and comparing, for each selected token type, the first extracted token value for the selected token type to the first reimbursement request data value for the selected token type; generating an audit alert in response to determining that an extracted token value for a first selected token type does not match a corresponding first reimbursement request data value for the first selected token type; providing the audit alert to the first reimbursing entity; receiving second reimbursement request data values for a second reimbursement request associated with the second reimbursing entity; receiving second receipt text extracted from a second receipt submitted with the second reimbursement request; automatically extracting second token values for the second set of selected token types from the second receipt text using the at least one machine learning model for the second reimbursing entity; comparing the second extracted token values to the second reimbursement request data values in the second reimbursement request; and accepting the second reimbursement request based on the second extracted token values matching corresponding second reimbursement request data values in the second reimbursement request and having token types in the second set of one or more selected token types.

2. The method of claim 1, wherein the first set of selected token types include date, amount, currency, vendor name, vendor location and expense amount.

3. The method of claim 1, further comprising forwarding the first receipt text and the first reimbursement request data values for secondary processing when the confidence score for the extracted value for an identified token type is less than a first predefined confidence threshold.

4. The method of claim 1, wherein the first receipt text is extracted from an image of the first receipt.

5. The method of claim 1, wherein the features include keywords.

6. The method of claim 1, wherein the features include text format or layout.

7. The method of claim 1, further comprising updating the at least one machine learning model for the first reimbursing entity based on the first reimbursement request.

8. A system for an automated compliance audit comprising: one or more computers; and a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising: receiving, from a first user device of a first user of a first reimbursement entity, while the first user is editing a first reimbursement request, first reimbursement request data values for the first reimbursement request; identifying a first set of one or more selected token types that have been selected by the first reimbursing entity for validation; receiving first receipt text extracted from a first receipt submitted with the first reimbursement request; automatically extracting first token values for the first set of selected token types from the first receipt text using at least one machine learning model for the first reimbursing entity that is trained using historical receipt text and historical reimbursement request data values, wherein automatically extracting the first token values includes: identifying first tokens in the first receipt text; and for each respective identified token in the identified first tokens: determining features of the identified token; determining a token type of the identified token by determining that the token type of the identified token is included in the first set of selected token types, based on the features determined for the identified token and a confidence score that indicates a likelihood that the identified token has the determined token type; and extracting a token value for the identified token from the first receipt text; comparing the first extracted token values to the first reimbursement request data values, wherein the comparing includes: identifying, in the first reimbursement request data values and for each selected token type, a request value for the selected token type; and comparing, for each selected token type, the first extracted token value for the selected token type to the first reimbursement request data value for the selected token type; generating an audit alert in response to determining that an extracted token value for a first selected token type does not match a corresponding first reimbursement request data value for the first selected token type; providing the audit alert to the first reimbursing entity for presentation to the first user on the first user device while the first user is editing the first reimbursement request; receiving second reimbursement request data values for a second reimbursement request-associated with a second reimbursing entity, wherein the second reimbursing entity is different from the first reimbursing entity; identifying a second set of one or more selected token types that have been selected by the second reimbursing entity for validation, wherein the second set of one or more selected token types includes a first token type that has not been selected by the first reimbursing entity; receiving second receipt text extracted from a second receipt submitted with the second reimbursement request; automatically extracting second token values for the second set of selected token types from the second receipt text using the at least one machine learning model for the second reimbursing entity; comparing the second extracted token values to the second reimbursement request data values in the second reimbursement request; and accepting the second reimbursement request based on the second extracted token values matching corresponding second reimbursement request data values in the second reimbursement request and having token types in the second set of one or more selected token types.

9. The system of claim 8, wherein the first set of selected token types include date, amount, currency, vendor name, vendor location and expense amount.

10. The system of claim 8, wherein the operations further comprise forwarding the first receipt text and the first reimbursement request data values for secondary processing when the confidence score for the extracted value for an identified token type is less than a first predefined confidence threshold.

11. The system of claim 8, wherein the first receipt text is extracted from an image of the first receipt.

12. The system of claim 8, wherein the features include keywords.

13. The system of claim 8, wherein the features include text format or layout.

14. The system of claim 8, wherein the operations further comprise updating the at least one machine learning model for the first reimbursing entity based on the first reimbursement request.

15. A computer program product encoded on a non-transitory storage medium, the product comprising non-transitory, computer readable instructions for causing one or more processors to perform operations comprising: identifying a first set of one or more selected token types that have been selected by a first reimbursing entity for validation of reimbursement requests and a second set of one or more selected token types that have been selected by a second reimbursing entity for validation of reimbursement requests, wherein the second reimbursing entity is different from the first reimbursing entity and the second set of one or more selected token types is different from the first set of one or more selected token types; training at least one machine learning model for the first reimbursing entity using historical receipt text and historical reimbursement request data values to automatically identify and extract tokens of the first set of one or more selected token types; training at least one machine learning model for the second reimbursing entity using historical receipt text and historical reimbursement request data values to automatically identify and extract tokens of the second set of one or more selected token types; receiving, from a first user device of a user of the first reimbursement entity, while the first user is editing a first reimbursement request, first reimbursement request data values for the first reimbursement request; receiving first receipt text extracted from a first receipt submitted with the first reimbursement request; automatically extracting first token values for the first set of selected token types from the first receipt text using the at least one machine learning model for the first reimbursing entity, wherein automatically extracting the first token values includes: identifying first tokens in the first receipt text; and for each respective identified token in the identified first tokens: determining features of the identified token; determining a token type of the identified token by determining that the token type of the identified token is included in the first set of selected token types, based on the features determined for the identified token and a confidence score that indicates a likelihood that the identified token has the determined token type; and extracting a token value for the identified token from the first receipt text; comparing the first extracted token values to the first reimbursement request data values, wherein the comparing includes: identifying, in the first reimbursement request data values and for each selected token type, a request value for the selected token type; and comparing, for each selected token type, the first extracted token value for the selected token type to the first reimbursement request data value for the selected token type; generating an audit alert in response to determining that an extracted token value for a first selected token type does not match a corresponding first reimbursement request data value for the first selected token type; providing the audit alert to the first reimbursing entity for presentation to the first user on the first user device while the first user is editing the first reimbursement request; receiving second reimbursement request data values for a second reimbursement request associated with the second reimbursing entity; receiving second receipt text extracted from a second receipt submitted with the second reimbursement request; automatically extracting second token values for the second set of selected token types from the second receipt text using the at least one machine learning model for the second reimbursing entity; comparing the second extracted token values to the second reimbursement request data values in the second reimbursement request; and accepting the second reimbursement request based on the second extracted token values matching corresponding second reimbursement request data values in the second reimbursement request and having token types in the second set of one or more selected token types.

16. The computer program product of claim 15, wherein the first set of selected token types include date, amount, currency, vendor name, vendor location and expense amount.

17. The computer program product of claim 15, wherein the operations further comprise forwarding the first receipt text and the first reimbursement request data values for secondary processing when the confidence score for the extracted value for an identified token type is less than a first predefined confidence threshold.

18. The computer program product of claim 15, wherein the first receipt text is extracted from an image of the first receipt.

19. The computer program product of claim 15, wherein the features include keywords.

20. The computer program product of claim 15, wherein the features include text format or layout.

Patent Metadata

Filing Date

Unknown

Publication Date

September 23, 2025

Inventors

Michael Stark

Jesper Lind

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search