{"schema_version":"1.0","canonical_url":"https://patentable.app/patents/US-11475356","patent":{"patent_number":"US-11475356","title":"Data processing method, electronic device and computer readable storage method for deduplication of a training dataset","assignee":null,"inventors":[],"filing_date":"2019-05-03T00:00:00.000Z","publication_date":"2022-10-18T00:00:00.000Z","cpc_codes":["G06N","G06F"],"num_claims":20,"abstract":"A data processing method includes: obtaining a first subset and at least a second subset in a training dataset for learning, the first subset and the at least a second subset having a same size; determining a set of substrings based on data strings in the first subset and the at least a second subset, the substrings being suffix substrings of the data strings and being sorted in a lexicographical order; and determining a grain for deduplication of the training dataset from a set of longest common prefix (CLP) lengths of adjacent substrings in the set of substrings, for use in the deduplication. Thereby, different grains of duplicating procedures for different training datasets can be predicted automatically, and universality and flexibility of GPUaaS can be achieved. In addition, the deduplication rate can be improved, network resource waste can be reduced and system efficiency can be enhanced."},"analysis":{"summary":null,"layman_explanation":null,"technical_analysis":null,"business_analysis":null,"faqs":null,"topics":[],"tech_cluster":null},"seo":{"title":"Data processing method, electronic device and computer readable storage method for deduplication of a training dataset","description":"A data processing method includes: obtaining a first subset and at least a second subset in a training dataset for learning, the first subset and the at least a second subset having a same size; deter","keywords":[]},"attribution":{"source":"Patentable","source_url":"https://patentable.app","canonical_url":"https://patentable.app/patents/US-11475356","license":"CC-BY-4.0-like","license_terms":"AI-generated analysis on this page (summary, layman_explanation, technical_analysis, business_analysis, faqs) may be reused with attribution and a visible link back to the canonical URL above. Patent abstracts, claims, and bibliographic data are USPTO public domain.","required_link":"https://patentable.app/patents/US-11475356","citation_suggestion":"Patentable. \"Data processing method, electronic device and computer readable storage method for deduplication of a training dataset\" (US-11475356). https://patentable.app/patents/US-11475356","copyright_holder":"Nomic Interactive Technology LLC"},"links":{"html":"https://patentable.app/patents/US-11475356","json":"https://patentable.app/api/llm-context/US-11475356","site":"https://patentable.app","llms_txt":"https://patentable.app/llms.txt"},"generated_at":"2026-05-31T06:31:13.053Z"}