Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A computer-implemented method for generating questions from a source document, comprising: selecting passages of text from the source document based on a first criterion; transforming the selected text passages based on coreference analysis; selecting fragments of text in the transformed text passages based on matching combined semantic-syntactic patterns from a pattern library; and automatically generating the questions by transforming the selected text fragments.
This invention relates to automated question generation from text documents. The method addresses the challenge of creating meaningful questions from unstructured text by leveraging semantic and syntactic analysis. The process begins by identifying relevant text passages from a source document based on predefined criteria, such as importance or readability. These passages undergo coreference resolution to ensure clarity and coherence, resolving pronouns and references to their antecedents. The transformed text is then analyzed to extract fragments that match predefined semantic-syntactic patterns stored in a pattern library. These patterns represent linguistic structures known to form valid questions. Finally, the selected fragments are transformed into fully formed questions, preserving the original meaning while adapting the structure to interrogative form. The method ensures that generated questions are grammatically correct, contextually relevant, and aligned with the source material. This approach is useful for applications like educational tools, content analysis, and automated assessment systems.
2. The method of claim 1 , wherein the first criterion is that, to be selected, a given passage of text has a similarity greater than or equal to a content relevance threshold, relative to at least one subject matter descriptor of the source document.
This invention relates to text analysis and information retrieval, specifically improving the selection of relevant text passages from a source document based on content relevance. The problem addressed is the need to accurately identify and extract passages that are highly relevant to a given subject matter, ensuring that the selected content aligns closely with the intended focus of the source document. The method involves evaluating text passages from a source document against predefined subject matter descriptors. A passage is selected if it meets a first criterion: its similarity to at least one subject matter descriptor must be greater than or equal to a specified content relevance threshold. This ensures that only passages with sufficient relevance to the subject matter are chosen. The method may also include additional criteria, such as evaluating the passage's coherence, readability, or other linguistic features, to further refine the selection process. The goal is to enhance the precision of information retrieval by filtering out irrelevant or marginally relevant content, thereby improving the quality of extracted information for downstream applications like summarization, analysis, or knowledge extraction. The approach is particularly useful in automated systems where manual review of text relevance is impractical.
3. The method of claim 1 , wherein the transformation of a given selected text passage based on coreference analysis further comprises: identifying, in the given text passage, coreferences comprising one or more of: anaphora, cataphora, endophora, or exaphora; classifying the coreferences according to a plurality of coreference types; and for an instant type which is a most easily resolved one of the coreference types: selecting the coreferences classified as the instant type; analyzing at least the given text passage to determine most probable resolutions of the selected coreferences; and replacing one or more of the selected coreferences with their most probable resolutions.
This invention relates to natural language processing (NLP) and text transformation, specifically improving the resolution of coreferences in text passages. Coreferences—such as pronouns, repeated nouns, or indirect references—often create ambiguity in text, making it difficult for machines or readers to understand relationships between entities. The invention addresses this by enhancing coreference resolution in text transformation processes. The method identifies coreferences in a given text passage, including anaphora (forward references), cataphora (backward references), endophora (internal references), and exaphora (external references). These coreferences are classified into multiple types based on their linguistic characteristics. The system then prioritizes the most easily resolvable coreference type (e.g., simple pronouns) and analyzes the text to determine the most probable resolution for each instance of that type. Finally, the method replaces the identified coreferences with their resolved forms, improving clarity and coherence in the transformed text. This approach ensures that text transformation processes, such as summarization or paraphrasing, maintain accurate entity relationships by systematically resolving ambiguous references. The invention is particularly useful in applications requiring high-precision text understanding, such as automated content generation, machine translation, or document analysis.
4. The method of claim 3 , further comprising, subsequent to the replacing: regenerating the given text passage; selecting new coreferences classified as a second one of the coreference types which is a most easily resolved type after the instant type; analyzing at least the given text passage to determine most probable resolutions of the new selected coreferences; and replacing the new selected coreferences with their most probable resolution.
This invention relates to natural language processing (NLP) and text analysis, specifically improving coreference resolution in written passages. Coreference resolution involves identifying and linking expressions that refer to the same entity, such as pronouns or repeated nouns. The problem addressed is the difficulty in resolving coreferences in a single pass, particularly when certain types of coreferences are harder to resolve than others. The method involves a multi-step iterative process. Initially, a given text passage is analyzed to identify coreferences, which are classified into different types based on their complexity or ease of resolution. The most easily resolvable coreferences are selected first and replaced with their most probable resolution. This step ensures that simpler references are handled before more complex ones, reducing ambiguity in subsequent steps. After replacing the initial set of coreferences, the text passage is regenerated to reflect these changes. The process then repeats, selecting the next most easily resolvable coreference type and resolving them in the updated text. This iterative approach continues until all coreferences are resolved, progressively simplifying the text by addressing easier references first. The method improves accuracy by reducing the complexity of the resolution task at each stage, leveraging prior resolutions to inform subsequent ones. This approach is particularly useful in applications requiring high-precision text analysis, such as machine translation, summarization, or question-answering systems.
5. The method of claim 3 , wherein a sequence of the selecting coreferences, the analyzing, and the replacing is repeated for a plurality of successive instant types of the coreference types.
This invention relates to natural language processing (NLP) and specifically to resolving coreferences in text. The problem addressed is the challenge of accurately identifying and replacing coreferences—such as pronouns or repeated nouns—with their antecedents or other references to improve text clarity and coherence. Existing methods often struggle with handling multiple coreference types across different instances in a document, leading to inconsistencies or errors in resolution. The invention describes a method for iteratively processing coreferences in text. First, coreferences are selected from the text based on predefined types, such as pronouns or noun phrases. These coreferences are then analyzed to determine their antecedents or other references in the text. Once identified, the coreferences are replaced with the appropriate references to resolve ambiguity. This sequence of selecting, analyzing, and replacing is repeated for multiple successive instances of different coreference types, ensuring comprehensive resolution across the text. The iterative approach allows for handling complex coreference chains and dependencies, improving the accuracy and consistency of the resolution process. The method may be applied in applications like document summarization, machine translation, or automated content generation where clear and coherent text is essential.
6. The method of claim 1 , wherein at least one of the combined semantic-syntactic patterns in the pattern library comprises a plurality of nodes representing respective syntactic parts of a text fragment, at least one of the nodes having a corresponding semantic attribute, and at least one pair of the nodes connected by a relationship attribute.
This invention relates to natural language processing, specifically methods for analyzing text using combined semantic-syntactic patterns. The problem addressed is the need for more accurate and context-aware text analysis by integrating both syntactic structure and semantic meaning. The method involves a pattern library containing semantic-syntactic patterns, where each pattern includes multiple nodes representing different syntactic parts of a text fragment. At least one node in the pattern has a corresponding semantic attribute, which provides contextual meaning. Additionally, at least one pair of nodes is connected by a relationship attribute, defining how the syntactic parts interact semantically. This allows the system to capture both the grammatical structure of the text and its underlying meaning, improving tasks like information extraction, sentiment analysis, and machine translation. The patterns in the library can be applied to input text to identify and interpret complex linguistic structures, enabling more nuanced understanding. By combining syntactic parsing with semantic annotations, the method enhances the precision of natural language processing systems, particularly in handling ambiguous or context-dependent language. This approach is useful in applications requiring deep linguistic analysis, such as automated content analysis, chatbots, and document summarization.
7. The method of claim 6 , wherein a given one of the combined semantic-syntactic patterns has a degree in the range 10 - 20 inclusive, wherein the degree comprises a sum of numbers of: nodes, semantic attributes, syntactic attributes, and relationship attributes, over all the nodes of the given combined semantic-syntactic pattern.
This invention relates to natural language processing and semantic analysis, specifically improving the extraction and representation of meaning from text by combining semantic and syntactic patterns. The problem addressed is the difficulty in accurately capturing the nuanced relationships between words and their contextual meanings in unstructured text data. Existing methods often rely on either semantic or syntactic analysis alone, leading to incomplete or ambiguous representations. The invention describes a method for generating combined semantic-syntactic patterns from text, where each pattern is assigned a degree value. The degree quantifies the complexity of the pattern by summing the counts of nodes, semantic attributes, syntactic attributes, and relationship attributes across all nodes in the pattern. A specific embodiment restricts the degree to a range of 10 to 20, ensuring a balance between pattern richness and computational efficiency. This approach allows for more precise and interpretable semantic representations, improving tasks like information retrieval, text classification, and machine translation. The method enhances prior art by integrating both semantic and syntactic features, reducing ambiguity and improving the accuracy of meaning extraction.
8. The method of claim 6 , wherein the matching comprises: comparing the nodes, the corresponding semantic attribute(s), and the relationship attribute(s) of a given one of the combined semantic-syntactic patterns with a given text fragment to determine a matching score; and selecting the given text fragment as matching the given combined semantic-syntactic pattern if the matching score is greater than or equal to a matching threshold.
This invention relates to natural language processing and semantic analysis, specifically improving the accuracy of matching text fragments to predefined semantic-syntactic patterns. The problem addressed is the difficulty in precisely identifying and extracting meaningful information from unstructured text due to variations in syntax and semantics. The method involves analyzing text by comparing nodes, semantic attributes, and relationship attributes of combined semantic-syntactic patterns against text fragments. Each pattern consists of nodes representing linguistic elements (e.g., words, phrases) and attributes defining their semantic and syntactic relationships. The matching process calculates a score by evaluating how well a text fragment aligns with a pattern's nodes and attributes. If the score meets or exceeds a predefined threshold, the text fragment is deemed a match. The method enhances prior approaches by incorporating both semantic and syntactic features, improving precision in identifying relevant text segments. This is particularly useful in applications like information extraction, question answering, and text classification, where accurate pattern matching is critical. The threshold ensures only high-confidence matches are selected, reducing false positives. The technique can be applied in various domains, including legal, medical, and technical documentation, where structured extraction of unstructured data is essential.
9. The method of claim 1 , wherein the transformation of a given selected text fragment is dependent on a first pattern of the pattern library matched by the given selected text fragment, and results in a first one of the questions.
This invention relates to a system for transforming selected text fragments into questions based on predefined patterns. The system addresses the challenge of automatically generating questions from text, which is useful for educational tools, content analysis, or interactive applications. The method involves analyzing a selected text fragment to identify a matching pattern from a predefined pattern library. Each pattern in the library corresponds to a specific transformation rule that converts the text fragment into a question. The transformation is determined by the first pattern in the library that matches the selected text fragment, ensuring consistent and predictable question generation. The resulting question is then output for use in applications such as quizzes, learning platforms, or natural language processing tasks. The system may also include additional steps such as preprocessing the text fragment to improve pattern matching accuracy or applying multiple patterns to generate different question types. The invention enhances automation in question generation, reducing manual effort while maintaining structured and meaningful output.
10. The method of claim 9 , further comprising: selecting a selector based on the first pattern; and applying the selector to the given selected text fragment to determine a correct answer for the first question.
This invention relates to automated question-answering systems that process text fragments to derive answers. The problem addressed is the challenge of accurately selecting and applying the right extraction method to identify correct answers from given text fragments, particularly when multiple patterns or selectors may apply. The method involves analyzing a text fragment to identify a first pattern, which defines a structure or characteristic of the text relevant to answering a question. Based on this pattern, a selector is chosen from a set of available selectors, where each selector is designed to extract information according to specific rules or criteria. The selected selector is then applied to the text fragment to determine the correct answer for the first question. This ensures that the extraction process is tailored to the structure of the text, improving accuracy. The method may also involve preprocessing the text fragment to enhance its suitability for pattern matching, such as normalizing formatting or removing irrelevant content. Additionally, the system may handle multiple questions by iteratively applying the same or different selectors to the same or different text fragments, depending on the patterns detected. The overall approach improves the reliability of automated question-answering systems by dynamically adapting the extraction process to the input text.
11. The method of claim 10 , further comprising generating internal distractors for the first question by performing one or more of: applying negation to the correct answer; applying shifting to the correct answer; or applying a pattern-substitution on the correct answer.
This invention relates to generating internal distractors for multiple-choice questions in educational or assessment systems. The problem addressed is the need for automated generation of plausible but incorrect answer choices (distractors) that effectively test a user's knowledge while avoiding obvious or irrelevant options. The method involves creating distractors by modifying the correct answer through negation, shifting, or pattern-substitution. Negation involves reversing the meaning of the correct answer, such as changing "always" to "never." Shifting adjusts numerical or ordinal values, like increasing or decreasing a number by a fixed amount. Pattern-substitution replaces parts of the correct answer with similar but incorrect patterns, such as swapping synonyms or altering syntax. These techniques ensure distractors are both plausible and challenging, improving the quality of automated question generation. The method may be applied in adaptive learning platforms, standardized testing, or educational software to enhance assessment accuracy and efficiency.
12. The method of claim 10 , further comprising: identifying text, in the source document or in an external corpus, having a syntactic match with the correct answer and a non-synonymous semantic similarity between vectors representing the identified text and the correct answer; and transforming the identified text into a distractor.
This invention relates to automated question-answering systems, specifically improving the generation of distractors (incorrect answer choices) for multiple-choice questions. The problem addressed is the difficulty in creating plausible but incorrect answer options that effectively test a user's knowledge while avoiding trivial or easily identifiable wrong answers. The method involves analyzing a source document or an external corpus to identify text segments that closely match the correct answer in syntactic structure but differ semantically. Syntactic matching ensures the distractor maintains a similar grammatical form to the correct answer, making it appear plausible. Semantic similarity is measured using vector representations of the text, where non-synonymous similarity ensures the distractor is meaningfully different from the correct answer. This approach prevents the generation of synonyms or trivial variations, which would not effectively challenge the user. The identified text is then transformed into a distractor, enhancing the quality of multiple-choice questions by providing more sophisticated and deceptive incorrect options. The method improves the robustness of automated question generation systems by leveraging both syntactic and semantic analysis to create more effective distractors.
13. One or more computer-readable media storing instructions which cause the method of claim 1 to be performed, when the instructions are executed by one or more computer processors.
A system and method for automated data processing involves storing instructions on one or more computer-readable media that, when executed by one or more processors, perform a sequence of operations. The method begins by receiving input data from a user or another system, where the input data may include structured or unstructured information. The system then processes the input data by applying one or more predefined rules or algorithms to extract relevant information, such as keywords, patterns, or relationships. The extracted information is then analyzed to identify specific characteristics or trends, which may involve statistical analysis, machine learning techniques, or other computational methods. The results of the analysis are then formatted into a structured output, such as a report, dataset, or visual representation, which is then transmitted to a user or another system for further use. The system may also include error handling mechanisms to detect and correct inconsistencies or anomalies in the input data or processing steps. The instructions stored on the computer-readable media ensure that the method is executed efficiently and accurately, enabling automated data processing for various applications, including business analytics, scientific research, or decision-making processes.
14. A method of generating questions from a source document, comprising: at one or more computers: receiving the source document; progressively refining text of the source document in phases to generate the questions, the phases comprising: a first selection phase, a first transformation phase at which semantic content of individual sentences is localized, a second selection phase based on matching combined semantic-syntactic patterns from a pattern library, and a second transformation phase at which the questions are generated; and outputting the generated questions; wherein one or more of the combined semantic-syntactic patterns in the pattern library comprises a plurality of nodes representing respective syntactic parts of a text fragment, at least one of the nodes having a corresponding semantic attribute, and at least one pair of the nodes connected by a relationship attribute; and wherein a given one of the one or more combined semantic-syntactic patterns has a degree of at least 10, wherein the degree comprises a sum of numbers of: nodes, semantic attributes, syntactic attributes, and relationship attributes, over all the nodes of the given combined semantic-syntactic pattern.
This invention relates to automated question generation from source documents, addressing the challenge of creating meaningful and contextually relevant questions from unstructured text. The method involves a multi-phase process executed by one or more computers. First, the source document is received and processed through a series of refinement phases. In the initial selection phase, relevant text segments are identified. The first transformation phase then localizes the semantic content of individual sentences, ensuring that the meaning is preserved and contextually accurate. A second selection phase follows, where combined semantic-syntactic patterns from a predefined pattern library are matched to the refined text. These patterns consist of nodes representing syntactic parts of text fragments, with at least one node having a semantic attribute and at least one pair of nodes connected by a relationship attribute. The patterns must have a degree of at least 10, calculated as the sum of nodes, semantic attributes, syntactic attributes, and relationship attributes across all nodes in the pattern. Finally, in the second transformation phase, the matched patterns are used to generate the final set of questions, which are then outputted. This approach ensures that the generated questions are both syntactically and semantically coherent, leveraging structured patterns to enhance accuracy and relevance.
15. The method of claim 14 , wherein the second selection phase comprises reverse parsing by matching an input to the second selection phase against the pattern library of combined semantic-syntactic patterns.
This invention relates to natural language processing (NLP) systems that analyze and interpret text by combining semantic and syntactic patterns. The problem addressed is the difficulty in accurately extracting meaning from text due to the complexity of language, where words and phrases can have multiple interpretations depending on context. Existing NLP systems often struggle with ambiguity, leading to errors in tasks like text classification, machine translation, or question answering. The invention improves upon prior art by using a two-phase selection process. The first phase involves parsing input text to identify candidate semantic-syntactic patterns, which are combinations of word meanings (semantics) and grammatical structures (syntax). The second phase, which this claim focuses on, refines the selection by reverse parsing—comparing the input against a pre-built library of these combined patterns. This reverse matching ensures that the most contextually relevant patterns are selected, reducing ambiguity and improving accuracy. The pattern library is constructed from previously analyzed text, where meaningful semantic-syntactic relationships have been identified and stored. By matching input text against this library, the system can disambiguate words and phrases more effectively than traditional NLP methods that rely solely on isolated word meanings or rigid syntactic rules. This approach enhances the system's ability to understand nuanced language, making it useful for applications like chatbots, document analysis, and automated translation.
16. The method of claim 14 , wherein at least one of the combined semantic-syntactic patterns spans more than one sentence.
This invention relates to natural language processing (NLP) and text analysis, specifically improving the extraction and interpretation of semantic-syntactic patterns from text. The problem addressed is the limitation of existing NLP systems that analyze text on a sentence-by-sentence basis, failing to capture meaningful relationships that span multiple sentences. This can lead to incomplete or inaccurate understanding of complex documents, such as legal contracts, technical manuals, or scientific papers. The solution involves a method for analyzing text by identifying and combining semantic-syntactic patterns that extend beyond single sentences. Semantic patterns refer to the meaning and relationships between words, while syntactic patterns refer to the grammatical structure. By detecting these patterns across sentence boundaries, the system can better understand context, dependencies, and nuanced relationships in the text. For example, a reference in one sentence to a term defined in a previous sentence can be properly linked, improving accuracy in tasks like information retrieval, summarization, or machine translation. The method includes preprocessing the text to identify candidate patterns, then applying rules or machine learning models to determine which patterns span multiple sentences. These combined patterns are then used to enhance downstream NLP tasks. The approach ensures that long-range dependencies and cross-sentence relationships are preserved, leading to more coherent and context-aware text analysis. This is particularly useful in domains where precise interpretation of interconnected ideas is critical.
17. The method of claim 14 , wherein the first selection phase comprises determining respective content values for portions of the source document, and selecting, based on the content values, a first one of the portions as output of the first selection phase.
This invention relates to document processing, specifically a method for selecting portions of a source document based on content values. The problem addressed is efficiently identifying and extracting relevant sections from a document, such as for summarization, analysis, or other processing tasks. The method involves a multi-phase selection process. In the first selection phase, the system analyzes the source document to determine content values for different portions of the document. These content values may represent factors like relevance, importance, or informational density. Based on these values, the system selects one portion of the document as the output of the first phase. This selected portion is then used in subsequent phases to further refine or process the document. The content values may be derived from various features, such as keyword frequency, semantic analysis, or structural cues within the document. The selection process ensures that the most significant or relevant portion is identified early in the workflow, improving efficiency and accuracy in downstream tasks. This approach is particularly useful in applications like automated summarization, document filtering, or content extraction, where identifying key sections is critical.
18. The method of claim 17 , wherein the first selection phase, the first transformation phase, the second selection phase, and the second transformation phase are performed in that order.
This invention relates to a method for processing data, particularly for optimizing data transformation and selection operations in a sequence. The method addresses the problem of inefficient data processing where transformations and selections are not optimally ordered, leading to redundant computations or excessive resource usage. The invention ensures that data transformations and selections are performed in a specific, predefined sequence to improve efficiency and accuracy. The method involves a first selection phase where a subset of data is chosen based on predefined criteria. This is followed by a first transformation phase where the selected data undergoes a transformation, such as normalization, aggregation, or filtering. A second selection phase then further refines the transformed data, applying additional criteria to extract a more specific subset. Finally, a second transformation phase processes this refined subset, applying further transformations to produce the desired output. By enforcing this strict order—first selection, first transformation, second selection, and second transformation—the method ensures that each operation builds upon the results of the previous one, minimizing redundant computations and improving overall processing efficiency. This approach is particularly useful in data analytics, machine learning, and database query optimization, where the order of operations can significantly impact performance.
19. The method of claim 18 , further comprising one or more pairs of an additional selection phase and an additional transformation phase.
A method for processing data involves a sequence of selection and transformation phases to refine or manipulate data sets. The method begins with an initial selection phase, where specific data elements are chosen based on predefined criteria. This is followed by a transformation phase, where the selected data undergoes a defined operation, such as filtering, aggregation, or modification. The process may include multiple iterations of these phases, with each subsequent selection phase refining the data further and each transformation phase applying additional operations. The method ensures that data is progressively processed in a structured manner, allowing for complex data manipulation tasks to be broken down into manageable steps. This approach is particularly useful in data analysis, machine learning, and database management, where structured and iterative processing is required to achieve desired outcomes. The method may be applied in various domains, including but not limited to, data cleaning, feature engineering, and real-time data streaming, to enhance data quality and usability.
20. A computer-implemented method for generating one or more questions, comprising: receiving one or more source documents; analyzing the source document(s); wherein the analyzing comprises determining respective first numerical content value scores for portions of the source document(s); first selecting one or more of the portions based upon the respective first numerical content value scores; resolving coreferences in the first selected portion(s); first transforming the first selected portion(s) into one or more second version documents comprising transformations of the resolved coreferences; reverse parsing the second version document(s) using a pattern library to second select one or more fragments of the second version document(s); second transforming the second selected fragment(s) into the question(s).
This invention relates to automated question generation from text documents. The method addresses the challenge of creating meaningful questions from unstructured text by analyzing document content, resolving ambiguities, and systematically transforming selected text fragments into question formats. The process begins by receiving one or more source documents and analyzing their content. During analysis, portions of the documents are assigned numerical content value scores based on their informational value or relevance. High-scoring portions are then selected for further processing. Coreferences within these selected portions are resolved to eliminate ambiguities, ensuring clarity in the generated questions. The resolved text is transformed into intermediate documents where coreferences are replaced with their resolved forms. Next, the intermediate documents are reverse parsed using a pattern library to identify and select specific text fragments suitable for question generation. These fragments undergo a final transformation process that converts them into one or more questions. The pattern library likely contains predefined linguistic structures or templates that guide the conversion of text fragments into grammatically correct questions. This approach automates the question generation process while maintaining semantic coherence and relevance to the original document content. The method is particularly useful for applications requiring automated content analysis, such as educational tools, knowledge extraction systems, or interactive document interfaces.
Unknown
April 7, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.