{"schema_version":"1.0","canonical_url":"https://patentable.app/patents/US-9785833","patent":{"patent_number":"US-9785833","title":"System and method for textual near-duplicate grouping of documents","assignee":null,"inventors":[],"filing_date":"2016-04-01T00:00:00.000Z","publication_date":"2017-10-10T00:00:00.000Z","cpc_codes":["G06F","G06F","G06V","G06V","G06V"],"num_claims":20,"abstract":"A method for efficiently grouping electronic documents that are likely textual near-duplicates includes processing first and second electronic documents to determine respective sets of character sequence counts. The processing may include, for each document, identifying a plurality of non-contiguous character sequences expressed within the document text, with each character sequence including at least one character from each of at least two different words in the text, and determining character sequence counts for each unique character sequence within the identified character sequences. The method also includes generating one or more similarity metrics, at least by comparing the sets of character sequence counts determined for the first and second electronic documents. The method may also include using the similarity metric(s) to calculate a similarity score, and assigning, based on the similarity score, the second electronic document to a same document group as the first electronic document."},"analysis":{"summary":null,"layman_explanation":null,"technical_analysis":null,"business_analysis":null,"faqs":null,"topics":[],"tech_cluster":null},"seo":{"title":"System and method for textual near-duplicate grouping of documents","description":"A method for efficiently grouping electronic documents that are likely textual near-duplicates includes processing first and second electronic documents to determine respective sets of character seque","keywords":[]},"attribution":{"source":"Patentable","source_url":"https://patentable.app","canonical_url":"https://patentable.app/patents/US-9785833","license":"CC-BY-4.0-like","license_terms":"AI-generated analysis on this page (summary, layman_explanation, technical_analysis, business_analysis, faqs) may be reused with attribution and a visible link back to the canonical URL above. Patent abstracts, claims, and bibliographic data are USPTO public domain.","required_link":"https://patentable.app/patents/US-9785833","citation_suggestion":"Patentable. \"System and method for textual near-duplicate grouping of documents\" (US-9785833). https://patentable.app/patents/US-9785833","copyright_holder":"Nomic Interactive Technology LLC"},"links":{"html":"https://patentable.app/patents/US-9785833","json":"https://patentable.app/api/llm-context/US-9785833","site":"https://patentable.app","llms_txt":"https://patentable.app/llms.txt"},"generated_at":"2026-06-06T05:59:26.196Z"}