12300011

System and Method for Classification of Unstructured Text Data

PublishedMay 13, 2025
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
17 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A system for classification of unstructured text data relating to a legal query, the system comprising: a session interface to receive session data relating to the legal query; a text interface to receive unstructured text data from a user; a text pre-processor to apply one or more text pre-processing functions to the unstructured text data to output a structured numeric representation of the unstructured text data; at least one machine learning classifier to map the structured numeric representation of the unstructured text data to one or more classes within a defined set of classes; a classifier optimizer to process the session data to generate configuration data for the at least one machine learning classifier, the configuration data indicating a subset of the defined set of classes that are valid given the session data; a database of training data for the at least one machine learning classifier; a token count processor to determine a minimum token count for the unstructured text data based on at least the session data and data stored within the database of training data; and a data input optimizer to validate the unstructured text data received at the text interface prior to application of the at least one machine learning classifier, wherein the at least one machine learning classifier is applied responsive to the length of the unstructured text data exceeding the minimum token count.

2

2. The system of claim 1, wherein the session data comprises selections by the user of a sequence of field values from a respective sequence of defined field value sets, and wherein the data input optimizer is configured to order values within each of the defined field value sets based on the database of training data.

3

3. The system of claim 1, wherein the session data indicates one or more of: a user account type; a legal query role; a legal query status; and a desired legal query outcome.

4

4. A system for classification of unstructured text data relating to a legal query, the system comprising: a session interface to receive session data relating to the legal query; a text interface to receive unstructured text data from a user; a text pre-processor to apply one or more text pre-processing functions to the unstructured text data to output a structured numeric representation of the unstructured text data; at least one machine learning classifier to map the structured numeric representation of the unstructured text data to one or more classes within a defined set of classes, wherein the at least one machine learning classifier comprises a first machine learning classifier and a second machine learning classifier; a classifier optimizer to process the session data to generate configuration data for the at least one machine learning classifier, the configuration data indicating a subset of the defined set of classes that are valid given the session data; and a validation engine to receive validation data associated with a validation of the one or more classes as determined by a first classification applied by the first machine learning classifier, wherein, responsive to receiving validation data indicating an invalid set of classes for the first classification, the validation engine is configured to instruct the second machine learning classifier to perform a second classification, and wherein the validation engine is configured to perform a validation of the classes determined by the second classification.

5

5. The system of claim 4, comprising: a manual classification interface to receive data indicating a manual selection of the classes from the defined set of classes by the user, wherein, responsive to receiving data from the user indicating an invalid set of classes for the second classification, the validation engine is configured to present the user with at least a subset of the defined set of classes for manual selection; and a data storage device to store outputs from one or more of the first machine learning classifier, the second machine learning classifier and the manual classification interface and at least one of the unstructured text data and the structured numeric representation of the unstructured text data as training data for one or more of the first and second machine learning classifiers.

6

6. The system of claim 4, wherein the first machine learning classifier is of a first type and the second machine learning classifier is of a second type, the first and second types being different.

7

7. The system of claim 6, wherein the first machine learning classifier is a logistic regression classifier, and the second machine learning classifier is a support vector machine classifier.

8

8. The system of claim 4, wherein the session data indicates one or more of: a user account type; a legal query role; a legal query status; and a desired legal query outcome.

9

9. A system for classification of unstructured text data relating to a legal query, the system comprising: a session interface to receive session data relating to the legal query; a text interface to receive unstructured text data from a user; a text pre-processor to apply one or more text pre-processing functions to the unstructured text data to output a structured numeric representation of the unstructured text data; at least one machine learning classifier to map the structured numeric representation of the unstructured text data to one or more classes within a defined set of classes; and a classifier optimizer to process the session data to generate configuration data for the at least one machine learning classifier, the configuration data indicating a subset of the defined set of classes that are valid given the session data, wherein the at least one machine learning classifier comprises a domain machine learning classifier and a sub-domain machine learning classifier, the domain and sub-domain machine learning classifiers being of a common type and each receiving the structured numeric representation of the unstructured text data, wherein parameters for the sub-domain machine learning classifier are loaded based on a domain class output by the domain machine learning classifier.

10

10. A system for classification of unstructured text data relating to a legal query, the system comprising: a session interface to receive session data relating to the legal query; a text interface to receive unstructured text data from a user; a text pre-processor to apply one or more text pre-processing functions to the unstructured text data to output a structured numeric representation of the unstructured text data; at least one machine learning classifier to map the structured numeric representation of the unstructured text data to one or more classes within a defined set of classes; a classifier optimizer to process the session data to generate configuration data for the at least one machine learning classifier, the configuration data indicating a subset of the defined set of classes that are valid given the session data, wherein the one or more text pre-processing functions comprise: a tokenizer to parse the unstructured text data as a sequence of character data symbols and to output data indicating one or more groups of character data symbols, and one or more of: a stemming function to map a plurality of tokens from the tokenizer to at least single stem token; a lemmatization function to map a plurality of tokens from the tokenizer to at least single grammar unit token; a stop token removal function to remove one or more tokens from the tokenizer that are defined in a data structure of stop tokens; and a character filter to remove character data symbols that match a predefined set of character data symbols, wherein at least the tokenizer is configured to: partition the unstructured text data into sets of grouped character symbols based on one or more punctuation character symbols; match sets of grouped character symbols against entries in a dictionary data structure; and replace matched sets of grouped character symbols with a numeric value representing an index in the dictionary data structure.

11

11. The system of claim 10, wherein the text pre-processor is configured to output a bag of words vector for the unstructured text data indicating frequencies of matched sets of grouped character symbols and perform a term frequency inverse document frequency (TF-IDF) computation to output a TF-IDF vector; optionally comprising a dimensionality reduction component configured to receive the TF-IDF vector and to reduce a size of the vector.

12

12. A method of classifying unstructured text data relating to a legal query, the method comprising: receiving session data for the legal query from a user; processing the session data to determine configuration data for at least one machine learning classifier, the configuration data indicating a subset of a defined set of classes that are valid given the session data; receiving unstructured text data from the user; pre-processing the unstructured text data to provide a structured numeric representation of the unstructured text data; configuring at least one machine learning classifier using the configuration data; and mapping the structured numeric representation of the unstructured text data to one or more classes using the at least one machine learning classifier, wherein the session data comprises selections by the user of a sequence of field values from a respective sequence of defined field value sets, and the method comprises, prior to receiving the unstructured text data from the user: determining a minimum token count for the unstructured text data based on at least the session data and a database of training data for the at least one machine learning classifier, wherein at least one of the pre-processing and mapping are only performed once the unstructured text data is determined to contain a number of tokens that is above the minimum token count.

13

13. The method of claim 12, wherein said mapping is performed using a first machine learning classifier of a first type and the method further comprises: validating, using validation data, the classes as determined by the first machine learning classifier; responsive to the validating indicating an invalid set of classes: mapping the structured numeric representation of the unstructured text data to one or more classes using a second machine learning classifier of a second type, the second type being different to the first type, and validating, using validation data, the classes as determined by the second machine learning classifier; responsive to the validating indicating an invalid set of classes as determined by the second machine learning classifier: receiving, from the user, data indicating a manual selection of the classes from a pre-defined set of classes; outputting the classes as determined by a validated first or second classification or the manual selection; and storing outputs from one or more of the first machine learning classifier, the second machine learning classifier and the manual selection and at least one of the unstructured text data and the structured numeric representation of the unstructured text data in the database of training data.

14

14. The method of claim 13, comprising: parameterising each of the first and second machine learning classifiers with a first set of parameters to provide a mapping to a set of domain classes; and parameterising each of the first and second machine learning classifiers with a second set of parameters to provide a mapping to a set of sub-domain classes, wherein the second set of parameters are selected based on an output of the mapping to the set of domain classes.

15

15. The method of claim 14, comprising: responsive to a successful validation of a domain class and a sub-domain class, generating unstructured text data based on the domain class and the sub-domain class; and validating, using validation data received from the user, the unstructured text data to confirm the domain class and the sub-domain class.

16

16. The method of claim 14, comprising: obtaining training data comprising text-output data samples, each text-output data sample comprising at least one of unstructured text data and a structured numeric representation of the unstructured text data as input data and domain and sub-domain classifications as output data, wherein the text-output data samples are split into validated text-output data samples and invalidated text-output data samples based on validation data received from one or more users; determining the first set of parameters by training the first and second machine learning classifiers using the domain classifications as output for the training data; and determining the second set of parameters by training the first and second machine learning classifiers using the sub-domain classifications as output for the training data.

17

17. The method of claim 16, wherein the determining of one or more of the first and second sets of parameters is performed responsive to a set of new text-output data samples in the training data exceeding a pre-defined threshold.

Patent Metadata

Filing Date

Unknown

Publication Date

May 13, 2025

Inventors

Fraser J. Matcham
Vasilis Kotsos
Markos Mentzelopoulos

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “System and Method for Classification of Unstructured Text Data” (12300011). https://patentable.app/patents/12300011

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.

System and Method for Classification of Unstructured Text Data — Fraser J. Matcham | Patentable