Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A data-driven spell checking computer system for a language, the computer system including one or more processors, the computer system comprising: an error-correct patterns generator that learns types and forms of the language's morphological patterns from an annotated corpus, and analyzes types of errors using the morphological patterns to generate error encodings, which are a string that specifies the positions of changes and change types in error patterns; an error-correct patterns database managed by the one or more processors; and a correction candidates generator performed by the one or more processors, wherein the error-correct patterns generator generates the error patterns based on the analysis of the types of errors, wherein the error patterns contain at least one of the language's characters and at least one affix position symbol, the error-correct patterns database stores the error patterns and the error encodings, the correction candidates generator generates correction candidates by, for a particular word error, matching all the error patterns, having a length equal to the word error, against the word error, and generating all the correction candidates according to the matched error patterns' error encodings.
This invention relates to a data-driven spell checking system for a language, addressing the challenge of accurately identifying and correcting morphological errors in text. The system leverages an annotated corpus to learn the language's morphological patterns, including prefixes, suffixes, and inflectional forms, to understand common error types. An error-correct patterns generator analyzes these patterns to produce error encodings, which are strings that specify the positions and types of changes in error patterns. These encodings describe how errors modify words, such as incorrect affixes or character substitutions. The system includes an error-correct patterns database that stores the generated error patterns and their corresponding encodings. A correction candidates generator then uses these patterns to correct errors in text. For a given word error, the generator matches all error patterns of the same length against the word, then produces correction candidates based on the matched patterns' error encodings. This approach ensures that corrections align with the language's morphological rules, improving accuracy over traditional spell-checking methods that rely on simple dictionary lookups or statistical models. The system dynamically adapts to different languages and error types by learning from annotated data, making it versatile for various linguistic contexts.
2. The data-driven computer system of claim 1 , wherein generated correction candidates are checked to determine if they contain a valid root, the correction candidates generator outputting correction candidates that have a valid root.
This invention relates to a data-driven computer system for generating and validating text correction candidates, particularly in natural language processing or spell-checking applications. The system addresses the problem of generating accurate text corrections by ensuring that proposed corrections contain valid linguistic roots, improving the reliability of automated text editing. The system includes a correction candidates generator that produces potential corrections for input text. These candidates are evaluated to determine whether they contain a valid root, which is a fundamental linguistic component of the word. Only correction candidates that pass this validation step are outputted by the generator. This validation process helps filter out nonsensical or grammatically incorrect suggestions, enhancing the quality of the corrections provided to the user. The system may also include a root validator that checks the validity of the root in each correction candidate. This validator ensures that the root conforms to linguistic rules or is present in a predefined dictionary of valid roots. By enforcing this constraint, the system avoids outputting corrections that, while phonetically or visually similar to the original text, lack meaningful linguistic structure. This approach improves the accuracy and usability of automated text correction systems by reducing the likelihood of incorrect or nonsensical suggestions, particularly in applications like spell-checking, autocomplete, or language translation. The system is designed to operate efficiently within a computer environment, leveraging data-driven techniques to enhance text processing tasks.
3. The data-driven computer system of claim 1 , wherein the error-correct patterns generator for a word error, morphologically analyzes the corresponding correction to get one or more word error's morphological patterns; generates one or more error patterns using the pairs of morphological patterns and the corresponding word error; and generates the error encoding and correction codes.
The invention relates to a data-driven computer system designed to improve error correction in text processing by analyzing and encoding morphological patterns of word errors. The system addresses the challenge of accurately identifying and correcting word errors in text data, particularly those involving morphological variations. Traditional error correction methods often fail to account for the nuanced morphological relationships between incorrect and correct words, leading to incomplete or inaccurate corrections. The system includes an error-correct patterns generator that processes word errors by first morphologically analyzing the corresponding correction to extract one or more morphological patterns associated with the error. These patterns are then used to generate error patterns by pairing the morphological patterns with the original word error. The system further generates error encoding and correction codes based on these patterns, enabling more precise and context-aware error detection and correction. This approach enhances the system's ability to handle complex morphological variations, improving the accuracy and reliability of text correction. The generated error patterns and codes can be stored and reused, allowing the system to learn from past corrections and adapt to new errors efficiently. This method is particularly useful in applications such as spell-checking, natural language processing, and automated text editing, where understanding the morphological structure of words is critical for accurate error correction.
4. The data-driven computer system of claim 3 , wherein the generated error patterns, error encodings and correction codes are stored in the error-correct patterns database.
A data-driven computer system is designed to enhance error detection and correction in data storage and transmission systems. The system addresses the challenge of efficiently identifying and correcting errors that occur during data processing, storage, or communication, which can lead to data corruption or loss. The system generates error patterns, error encodings, and correction codes based on input data and stores these in an error-correct patterns database. This database serves as a reference for future error detection and correction operations, allowing the system to quickly retrieve and apply the appropriate error correction techniques. The system may also include a data input module to receive input data, a pattern generation module to generate error patterns, and an encoding module to produce error encodings and correction codes. The stored error patterns, encodings, and codes enable the system to dynamically adapt to different types of errors, improving reliability and accuracy in data handling. The database can be updated with new error patterns and correction methods as needed, ensuring continuous improvement in error management. This approach reduces the computational overhead associated with real-time error correction and enhances system performance.
5. The data-driven computer system of claim 3 , wherein the error encoding includes at least one of no change, transposition of two characters, insertion, deletion, and substitution.
This invention relates to a data-driven computer system designed to detect and correct errors in data transmission or storage. The system addresses the problem of data corruption, which can occur due to various factors such as noise, hardware failures, or transmission errors. The system includes a method for encoding data with error detection and correction capabilities, ensuring data integrity during processing. The error encoding mechanism in this system can include one or more of the following error types: no change (where no error is detected), transposition of two characters (where adjacent characters are swapped), insertion (where an extra character is added), deletion (where a character is removed), and substitution (where one character is replaced by another). These error types are used to identify and correct discrepancies in the data, allowing the system to restore the original information accurately. The system may also include a method for analyzing the encoded data to detect errors based on predefined error patterns. Once errors are identified, the system applies correction techniques to restore the data to its original form. This ensures reliable data transmission and storage, reducing the risk of data loss or corruption. The system is particularly useful in applications where data integrity is critical, such as in communication networks, databases, and storage systems.
6. The data-driven computer system of claim 1 , wherein the error-correct patterns database is created by: storing error patterns generated by the error-correct patterns generator; for each correct stem, examining all possible combinations of prefixes and suffixes, where at least one of the combinations has an error, adding combinations that satisfy the condition that the stem is compatible with the correct affixes to the database; and storing the error patterns with and correction information using dictionaries according to the length of the error pattern.
This invention relates to a data-driven computer system designed to improve error correction in text processing, particularly for handling misspelled words by analyzing and correcting errors based on stems, prefixes, and suffixes. The system addresses the challenge of accurately identifying and correcting errors in text by leveraging a structured database of error patterns. The system includes an error-correct patterns database, which is generated through a multi-step process. First, error patterns are produced by an error-correct patterns generator. For each correct stem, the system examines all possible combinations of prefixes and suffixes, ensuring that at least one combination contains an error. Valid combinations are added to the database if they meet the condition that the stem is compatible with the correct affixes. The error patterns, along with their correction information, are then stored in dictionaries organized by the length of the error pattern. This structured approach allows the system to efficiently retrieve and apply corrections based on the detected errors, improving the accuracy and speed of text correction. The database's organization by pattern length further optimizes the correction process, making it suitable for real-time applications.
7. The data-driven computer system of claim 1 , wherein the language is Arabic.
This invention relates to a data-driven computer system designed to process and analyze text in the Arabic language. The system addresses the challenge of accurately handling Arabic text, which includes unique linguistic features such as right-to-left writing, complex script variations, and diacritics. The system is configured to receive Arabic text input, process it using specialized algorithms, and generate meaningful outputs such as translations, sentiment analysis, or other linguistic insights. The core functionality involves natural language processing (NLP) techniques tailored for Arabic, including tokenization, morphological analysis, and semantic interpretation. The system may also integrate machine learning models trained on Arabic datasets to improve accuracy and adaptability. By focusing on Arabic, the system ensures compatibility with the language's distinct grammatical rules, dialects, and script nuances, making it suitable for applications in translation, content analysis, and automated text processing. The invention enhances computational efficiency and accuracy in Arabic text handling, supporting tasks such as machine translation, chatbots, and document analysis.
8. The data-driven computer system of claim 1 , wherein the language is a Semitic language.
This invention relates to a data-driven computer system designed to process and analyze Semitic languages, which include languages like Arabic, Hebrew, and Aramaic. Semitic languages present unique challenges in natural language processing (NLP) due to their complex morphological structures, root-based word formation, and non-linear sentence construction. The system addresses these challenges by leveraging data-driven techniques to accurately parse, translate, and generate text in Semitic languages. The core system includes a computational model that processes linguistic data to extract meaningful patterns and relationships within Semitic languages. This model is trained on large datasets of Semitic language text to recognize and interpret the intricate morphological and syntactic rules that govern these languages. The system can handle tasks such as root extraction, inflection analysis, and context-aware translation, ensuring high accuracy in language processing. Additionally, the system may incorporate machine learning algorithms to continuously improve its performance by learning from new data inputs. This adaptive capability allows the system to handle dialectal variations and evolving linguistic trends within Semitic languages. The system can be integrated into applications such as translation tools, speech recognition systems, and text analysis platforms, providing robust support for Semitic language processing in various computational environments.
9. The data-driven computer system of claim 1 , wherein the correction candidates generator generates correction candidates by, for a particular word error, matching all the error patterns, having a length equal to the word error, against the word error in order to minimize the number of correction candidates in which the correction candidates contain a correct correction.
The invention relates to a data-driven computer system for generating correction candidates to address word errors in text. The system is designed to improve the efficiency and accuracy of error correction by minimizing the number of correction candidates while ensuring that at least one correct correction is included. The system operates by analyzing word errors and matching them against predefined error patterns of the same length as the error. By focusing on patterns that match the error length, the system reduces the number of unnecessary correction candidates, thereby optimizing computational resources and improving the likelihood of identifying the correct correction. The system leverages data-driven techniques to dynamically generate correction candidates based on the specific characteristics of the error, ensuring that the correction process is both efficient and effective. This approach is particularly useful in applications such as spell-checking, natural language processing, and automated text editing, where minimizing computational overhead while maintaining accuracy is critical. The system's ability to generate relevant correction candidates without excessive redundancy enhances its performance in real-time applications.
10. A non-transitory computer-readable storage medium storing a spell checking program and a database, when executed by a computer, the spell checking program performs an error-correct patterns generation process, including: learning types and forms of the language's morphological patterns from an annotated corpus; analyzing types of errors using the morphological patterns to generate error encodings, which are a string that specifies the positions of changes and change types in an error pattern; generating the error patterns based on the analysis of the types of errors, wherein the error patterns contain at least one of the language's characters and at least one affix position symbol; and storing the error patterns and the error encodings in the database, and a correction candidate generation process, including: inputting a particular word error; and generating correction candidates for the word error by, matching all the stored error patterns, having a length equal to the word error, against the word error, and generating all the correction candidates according to the matched error patterns' error encodings.
The invention relates to a spell-checking system that improves error detection and correction in text by leveraging morphological patterns of a language. The system addresses the challenge of accurately identifying and correcting errors in words by analyzing language structure rather than relying solely on predefined dictionaries or statistical models. The spell-checking program operates in two main processes: error-correct patterns generation and correction candidate generation. In the first process, the system learns morphological patterns from an annotated corpus, which includes the types and forms of language structures such as prefixes, suffixes, and inflectional endings. It then analyzes common error types using these patterns to generate error encodings, which are strings that specify the positions and types of changes in an error pattern. These error patterns include language characters and affix position symbols, representing possible error variations. The generated error patterns and encodings are stored in a database. In the correction candidate generation process, the system inputs a word error and matches it against all stored error patterns of the same length. Based on the matched patterns, it generates correction candidates by applying the corresponding error encodings, effectively reconstructing possible correct forms of the word. This approach enhances spell-checking accuracy by dynamically adapting to language morphology and common error patterns.
11. The computer-readable storage medium of claim 10 , wherein the language is Arabic.
This invention relates to a system for processing and analyzing text data, specifically focusing on the Arabic language. The system includes a computer-readable storage medium containing instructions that, when executed, enable a computing device to perform natural language processing (NLP) tasks on Arabic text. The system is designed to address challenges in accurately processing Arabic, which includes unique linguistic features such as diacritics, root-based morphology, and complex script structures. The instructions enable the computing device to tokenize Arabic text, identify linguistic elements such as words, phrases, and grammatical structures, and perform semantic analysis to extract meaning from the text. The system may also include preprocessing steps to normalize Arabic script variations, handle diacritics, and segment words into their root forms. Additionally, the system can integrate with machine learning models to improve accuracy in tasks such as sentiment analysis, machine translation, and text classification. The invention aims to provide a robust solution for Arabic language processing, enhancing applications in fields like translation, content analysis, and natural language understanding.
12. A spell checking method performed by one or more processors, wherein the one or more processors manage a database that stores error patterns, the method performed by the one or more processors comprising: learning types and forms of the language's morphological patterns from an annotated corpus; analyzing types of errors using the morphological patterns to generate error encodings, which are a string that specifies the positions of changes and change types in an error pattern; generating the error patterns based on the analysis of the types of errors, wherein the error patterns contain at least one of the language's characters and at least one affix position symbol; storing the error patterns and the error encodings in the database; inputting a particular word error; and generating correction candidates for the word error by, matching all the stored error patterns, having a length equal to the word error, against the word error, and generating all the correction candidates according to the matched error patterns' error encodings.
This invention relates to a spell-checking method that improves error detection and correction by leveraging morphological patterns of a language. The method addresses the challenge of accurately identifying and correcting word errors, particularly those involving morphological variations, by analyzing error patterns in a structured way. The system uses one or more processors to manage a database storing error patterns. The method begins by learning the types and forms of a language's morphological patterns from an annotated corpus. These patterns are used to analyze common error types, generating error encodings that describe the positions and types of changes in error patterns. The error encodings are strings that specify where modifications occur in a word and what kind of changes are involved. The method then generates error patterns based on this analysis, incorporating language characters and affix position symbols. These patterns and their corresponding encodings are stored in the database. When a word error is input, the system matches all stored error patterns of the same length against the error. Correction candidates are generated by applying the error encodings of the matched patterns, allowing for precise and context-aware corrections. This approach enhances spell-checking accuracy by systematically encoding and applying morphological error patterns, making it particularly effective for languages with complex inflectional systems.
13. The method of claim 12 , wherein the generating of correction candidates further comprises checking the generated correction candidates to determine if they contain a valid root; and outputting correction candidates that have a valid root.
This invention relates to text correction systems, specifically methods for generating and validating correction candidates for misspelled words. The problem addressed is ensuring that proposed corrections are linguistically valid, particularly by verifying that they contain a valid root or base form. The method involves generating potential corrections for a misspelled word and then filtering those corrections to retain only those that include a valid root. This validation step helps eliminate nonsensical or grammatically incorrect suggestions, improving the accuracy of the correction process. The system may use predefined linguistic rules, dictionaries, or morphological analysis to determine root validity. By enforcing this check, the method ensures that only plausible and meaningful corrections are presented to the user, enhancing the reliability of automated text correction tools. This approach is particularly useful in applications like spell-checkers, predictive text, and natural language processing systems where correctness and relevance of suggestions are critical.
14. The method of claim 12 , wherein the learning includes: for a word error, morphologically analyzing the corresponding correction to get one or more word error's morphological patterns; and the analyzing the types of errors includes: generating one or more error patterns using the pairs of morphological patterns and the corresponding word error; generating the error encoding; and generating corrections corresponding to the error encoding that need to be applied to the word error, denoted by correction codes.
This invention relates to natural language processing and error correction in text, specifically improving the accuracy of automated text correction systems. The problem addressed is the inefficiency of existing systems in identifying and correcting morphological errors in text, such as misspelled words or incorrect word forms, without relying on predefined error dictionaries or extensive training data. The method involves a learning process that analyzes word errors by morphologically breaking down their corrections to extract morphological patterns. These patterns are used to generate error patterns from pairs of morphological features and the corresponding word errors. The system then encodes these error patterns into error encodings and generates correction codes that specify the necessary adjustments to fix the word errors. This approach allows the system to dynamically learn and apply corrections without exhaustive prior knowledge, improving adaptability to new or rare errors. The method also includes analyzing error types by comparing morphological patterns of errors and corrections, enabling the system to generalize corrections across similar errors. By encoding errors and their corrections, the system can efficiently apply learned corrections to new instances of similar errors, enhancing accuracy and reducing reliance on manual rule-setting or large datasets. This technique is particularly useful in applications like spell-checking, grammar correction, and automated text editing.
15. The method of claim 14 , wherein the generated error patterns, error encodings and correction codes are stored in the database.
A method for error detection and correction in data storage systems addresses the challenge of efficiently identifying and correcting data errors during read or write operations. The method involves generating error patterns, error encodings, and correction codes based on the characteristics of the storage medium and the data being processed. These generated error patterns, encodings, and codes are then stored in a database for future reference. The database serves as a repository, allowing the system to quickly retrieve and apply the appropriate error correction techniques when errors are detected. This approach improves the reliability and accuracy of data storage and retrieval by leveraging precomputed error correction information, reducing the computational overhead during real-time operations. The method is particularly useful in systems where data integrity is critical, such as in enterprise storage solutions, solid-state drives, and other high-performance storage applications. By storing the error patterns and correction codes in a centralized database, the system can efficiently manage and correct errors, ensuring data consistency and minimizing the risk of data loss.
16. The method of claim 14 , wherein the error encoding includes at least one of no change, transposition of two characters, insertion, deletion, and substitution.
This invention relates to error encoding in data processing systems, specifically addressing the need for robust error detection and correction in digital communications or storage systems. The method involves encoding data to introduce controlled errors, such as no change, transposition of two characters, insertion, deletion, or substitution, to simulate real-world error conditions. These encoded errors are used to test the reliability of error detection and correction algorithms, ensuring systems can accurately identify and recover from data corruption. The method may be applied in various applications, including data transmission, storage systems, and error-prone environments where data integrity is critical. By systematically introducing different types of errors, the system can evaluate its resilience against common failure modes, improving overall reliability. The approach helps developers and engineers verify the effectiveness of error-handling mechanisms before deployment, reducing the risk of data loss or corruption in real-world scenarios.
17. The method of claim 12 , wherein the database is created by: storing the generated error patterns; for each correct stem, examining all possible combinations of prefixes and suffixes, where at least one of the combinations has an error, adding combinations that satisfy the condition that the stem is compatible with the correct affixes to the database; and storing the error patterns with and correction information using dictionaries according to the length of the error pattern.
This invention relates to a method for creating a database of error patterns and corrections for text processing, particularly for spell-checking or natural language processing applications. The method addresses the challenge of efficiently identifying and correcting errors in text by systematically analyzing error patterns and their corresponding corrections. The method involves generating error patterns, which are variations of words that may contain spelling or typographical errors. For each correct word stem, the method examines all possible combinations of prefixes and suffixes, where at least one combination includes an error. The method then adds combinations that meet the condition that the stem is compatible with the correct affixes to the database. This ensures that the database includes valid error patterns that can be matched against input text. The error patterns and their correction information are stored in dictionaries organized by the length of the error pattern. This organization allows for efficient retrieval and matching during text processing, enabling faster and more accurate error detection and correction. The method ensures that the database is comprehensive and structured to support effective spell-checking or text correction applications.
18. The method of claim 12 , wherein the language is Arabic.
This invention relates to a method for processing text in the Arabic language, addressing challenges in natural language processing (NLP) specific to Arabic script, such as right-to-left writing, diacritics, and complex morphology. The method involves analyzing Arabic text to identify linguistic features, including root words, prefixes, suffixes, and inflectional forms, to improve accuracy in tasks like translation, sentiment analysis, or text classification. The system may use machine learning models trained on Arabic corpora to handle dialectal variations and contextual nuances. Preprocessing steps may include normalization of diacritics, tokenization, and stemming to standardize input text. The method may also incorporate rule-based or statistical approaches to disambiguate words with multiple meanings. The output can be used for applications like machine translation, chatbots, or document summarization, ensuring accurate interpretation of Arabic text in computational systems. The invention aims to enhance NLP performance for Arabic, overcoming limitations in existing systems that struggle with the language's unique structural and semantic characteristics.
19. The method of claim 12 , wherein the language is a Semitic language.
This invention relates to natural language processing (NLP) systems designed to handle Semitic languages, which present unique challenges due to their root-based morphology, non-concatenative derivation, and complex script systems. The method involves analyzing text in Semitic languages such as Arabic, Hebrew, or Aramaic, where word forms are derived from a common root through patterns of consonants and vowels. The system processes these languages by identifying root structures, applying morphological rules, and generating or parsing words based on these roots. The method may include steps for tokenization, lemmatization, and syntactic analysis tailored to Semitic language features, such as handling root-and-pattern morphology, non-linear word formation, and script-specific characteristics like right-to-left writing or diacritics. The system may also incorporate machine learning models trained on Semitic language corpora to improve accuracy in tasks like translation, text generation, or sentiment analysis. The invention addresses the need for specialized NLP tools that can accurately process Semitic languages, which differ significantly from Indo-European languages in structure and derivation.
Unknown
December 24, 2019
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.