Systems and methods are disclosed for an approximate string searching technique to search for match results that have character differences with the search string. A cost is computed to measure the amount of character differences, and a match is recognized if the cost is below a threshold. The match is determined based on an inferred state machine, whose states are iteratively generated in computer memory for successive characters in the input text. States are added to represent modifications to the string needed to account for character differences and track the costs of the modifications. States are removed when their costs become excessive. Advantageously, the search process never generates the full state machine in memory, retaining only a selected set of best states to continue with the approximate match process. The technique thus enables a practicable implementation of approximate searching that can tolerate an arbitrary number of character deviations.
Legal claims defining the scope of protection, as filed with the USPTO.
3. The method of claim 1, wherein the configuration interface is configured to receive a cost function for calculating the costs accumulated in individual ones of the states.
4. The method of claim 1, wherein the cost limit is specified based at least in part on a length of the string.
12. The system of claim 11, wherein the configuration interface is configured to receive a cost function for calculating the costs accumulated in individual ones of the states.
13. The system of claim 11, wherein the cost limit is specified based at least in part on a length of the string.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
September 8, 2022
September 3, 2024
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.