{"schema_version":"1.0","canonical_url":"https://patentable.app/patents/US-11481682","patent":{"patent_number":"US-11481682","title":"Dataset management in machine learning","assignee":null,"inventors":[],"filing_date":"2020-05-07T00:00:00.000Z","publication_date":"2022-10-25T00:00:00.000Z","cpc_codes":["G06N","G06F"],"num_claims":20,"abstract":"A method, a computer system, and a computer program product for managing a dataset of training samples, labeled by class, during training of a machine learning model is provided. Embodiments of the present invention may include training the model on a sequence of increasing-sized sets of the training samples and testing performance of the model after training with each set to obtain class-specific performance metrics corresponding to each set size. Embodiments of the present invention may include generating class-specific learning curves from the performance metrics for the plurality of classes. Embodiments of the present invention may include extrapolating the learning curves. Embodiments of the present invention may include optimizing a function of the predicted performance metrics to identify a set of augmentation actions to augment the dataset for further training of the model. Embodiments of the present invention may include providing an output indicative of the set of augmentation actions."},"analysis":{"summary":null,"layman_explanation":null,"technical_analysis":null,"business_analysis":null,"faqs":null,"topics":[],"tech_cluster":null},"seo":{"title":"Dataset management in machine learning","description":"A method, a computer system, and a computer program product for managing a dataset of training samples, labeled by class, during training of a machine learning model is provided. Embodiments of the pr","keywords":[]},"attribution":{"source":"Patentable","source_url":"https://patentable.app","canonical_url":"https://patentable.app/patents/US-11481682","license":"CC-BY-4.0-like","license_terms":"AI-generated analysis on this page (summary, layman_explanation, technical_analysis, business_analysis, faqs) may be reused with attribution and a visible link back to the canonical URL above. Patent abstracts, claims, and bibliographic data are USPTO public domain.","required_link":"https://patentable.app/patents/US-11481682","citation_suggestion":"Patentable. \"Dataset management in machine learning\" (US-11481682). https://patentable.app/patents/US-11481682","copyright_holder":"Nomic Interactive Technology LLC"},"links":{"html":"https://patentable.app/patents/US-11481682","json":"https://patentable.app/api/llm-context/US-11481682","site":"https://patentable.app","llms_txt":"https://patentable.app/llms.txt"},"generated_at":"2026-05-30T21:47:38.152Z"}