Generating rules to automatically extract linguistic patterns from documents is provided. A first plurality of linguistic pattern extraction rules corresponding to a user-selected text example from a document is generated according to a first abstraction rule of a plurality of abstraction rules. Each respective linguistic pattern extraction rule of the first plurality of linguistic pattern extraction rules having a first identified level of abstraction. The first plurality of linguistic pattern extraction rules ordered by the first identified level of abstraction is presented in a first list to a user via a user interface. A selection of one particular linguistic pattern extraction rule is received from the first list by the user via the user interface. That one particular linguistic pattern extraction rule selected by the user is applied to the document to automatically extract user-desired linguistic patterns similar to the user-selected text example from the document.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
6. The computer-implemented method of claim 5, wherein the first abstraction rule abstracts a set of tokens corresponding to the user-selected text example based on at least one of a user dictionary or defined parts of speech, and wherein the second abstraction rule abstracts a particular type of token corresponding to the user-selected text example a defined number of times, and wherein the third abstraction rule abstracts tokens corresponding to the user-selected text example based on heuristics.
This invention relates to a computer-implemented method for abstracting text examples using multiple abstraction rules. The method addresses the challenge of efficiently transforming raw text into generalized representations while preserving meaningful structure. The system processes user-selected text examples by applying three distinct abstraction rules. The first rule abstracts a set of tokens from the text based on a user-defined dictionary or predefined parts of speech, allowing for customizable categorization. The second rule abstracts a specific type of token a predefined number of times, enabling controlled generalization of repetitive elements. The third rule applies heuristics to abstract tokens, leveraging pattern recognition to simplify or generalize text segments. These rules work together to generate abstracted representations that balance specificity and generalization, improving text processing for applications like natural language understanding, machine learning, or automated content analysis. The method ensures flexibility by allowing users to define dictionaries or adjust abstraction parameters, making it adaptable to different linguistic contexts and use cases. The system enhances text analysis by systematically reducing complexity while retaining key semantic or syntactic features.
7. The computer-implemented method of claim 5, wherein each respective linguistic pattern extraction rule has a corresponding rule abstraction score that is based on a sum of token abstraction scores of a set of tokens in a given linguistic pattern extraction rule for ordering linguistic pattern extraction rules in a given list.
9. The computer-implemented method of claim 1, wherein the computer annotates or highlights linguistic patterns extracted from documents.
20. The computer program product of claim 19, wherein the first abstraction rule abstracts a set of tokens corresponding to the user-selected text example based on at least one of a user dictionary or defined parts of speech, and wherein the second abstraction rule abstracts a particular type of token corresponding to the user-selected text example a defined number of times, and wherein the third abstraction rule abstracts tokens corresponding to the user-selected text example based on heuristics.
This invention relates to a computer program product for text abstraction, addressing the challenge of automatically generating abstract representations of text examples while preserving meaningful structure and context. The system processes user-selected text examples by applying multiple abstraction rules to transform the input text into a generalized or abstracted form. The first abstraction rule extracts and categorizes tokens from the text based on a user-defined dictionary or predefined parts of speech, ensuring that key linguistic elements are retained. The second abstraction rule focuses on a specific type of token within the text, abstracting it a predefined number of times to generalize its representation. The third abstraction rule applies heuristic-based abstraction, using learned patterns or rules to further refine the token representation. These abstraction rules work together to produce a structured, abstracted output that can be used for tasks such as natural language processing, text analysis, or machine learning model training. The system allows users to customize the abstraction process by defining dictionaries, parts of speech, and heuristic rules, enabling flexible and context-aware text generalization.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
June 21, 2021
November 1, 2022
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.