{"schema_version":"1.0","canonical_url":"https://patentable.app/patents/US-9842110","patent":{"patent_number":"US-9842110","title":"Content based similarity detection","assignee":null,"inventors":[],"filing_date":"2014-03-19T00:00:00.000Z","publication_date":"2017-12-12T00:00:00.000Z","cpc_codes":["G06F","G06N"],"num_claims":18,"abstract":"Content Based Similarity Detection. A computer implemented method includes computing a hash of each word in a collection of books to produce a numerical integer token using a reduced representation and computing an Inverse Document Frequency (IDF) vector comprising the number of books the token appears in, for every token in the collection of books. The method also includes creating a token occurrence count vector for each book in the collection and normalizing the token occurrence count vector using the IDF vector to create a Term Frequency-Inverse Document Frequency (TF-IDF) vector. Further, the method includes reducing each TF-IDF vector by using random projections to obtain a final signature representing each book in the collection, reducing each TF-IDF vector by using random projections to obtain a final signature representing each book in the collection and using a trained machine learning algorithm, determining whether each of the list of candidate books is similar to the target book."},"analysis":{"summary":null,"layman_explanation":null,"technical_analysis":null,"business_analysis":null,"faqs":null,"topics":[],"tech_cluster":null},"seo":{"title":"Content based similarity detection","description":"Content Based Similarity Detection. A computer implemented method includes computing a hash of each word in a collection of books to produce a numerical integer token using a reduced representation an","keywords":[]},"attribution":{"source":"Patentable","source_url":"https://patentable.app","canonical_url":"https://patentable.app/patents/US-9842110","license":"CC-BY-4.0-like","license_terms":"AI-generated analysis on this page (summary, layman_explanation, technical_analysis, business_analysis, faqs) may be reused with attribution and a visible link back to the canonical URL above. Patent abstracts, claims, and bibliographic data are USPTO public domain.","required_link":"https://patentable.app/patents/US-9842110","citation_suggestion":"Patentable. \"Content based similarity detection\" (US-9842110). https://patentable.app/patents/US-9842110","copyright_holder":"Nomic Interactive Technology LLC"},"links":{"html":"https://patentable.app/patents/US-9842110","json":"https://patentable.app/api/llm-context/US-9842110","site":"https://patentable.app","llms_txt":"https://patentable.app/llms.txt"},"generated_at":"2026-06-06T06:58:56.102Z"}