{"schema_version":"1.0","canonical_url":"https://patentable.app/patents/US-8503797","patent":{"patent_number":"US-8503797","title":"Automatic document classification using lexical and physical features","assignee":null,"inventors":[],"filing_date":"2008-09-05T00:00:00.000Z","publication_date":"2013-08-06T00:00:00.000Z","cpc_codes":["G06F","G06V","G06V"],"num_claims":48,"abstract":"An automatic document classification system is described that uses lexical and physical features to assign a class ciεC{c1, c2, . . . , ci} to a document d. The primary lexical features are the result of a feature selection method known as Orthogonal Centroid Feature Selection (OCFS). Additional information may be gathered on character type frequencies (digits, letters, and symbols) within d. Physical information is assembled through image analysis to yield physical attributes such as document dimensionality, text alignment, and color distribution. The resulting lexical and physical information is combined into an input vector X and is used to train a supervised neural network to perform the classification."},"analysis":{"summary":null,"layman_explanation":null,"technical_analysis":null,"business_analysis":null,"faqs":null,"topics":[],"tech_cluster":null},"seo":{"title":"Automatic document classification using lexical and physical features","description":"An automatic document classification system is described that uses lexical and physical features to assign a class ciεC{c1, c2, . . . , ci} to a document d. The primary lexical features are the result","keywords":[]},"attribution":{"source":"Patentable","source_url":"https://patentable.app","canonical_url":"https://patentable.app/patents/US-8503797","license":"CC-BY-4.0-like","license_terms":"AI-generated analysis on this page (summary, layman_explanation, technical_analysis, business_analysis, faqs) may be reused with attribution and a visible link back to the canonical URL above. Patent abstracts, claims, and bibliographic data are USPTO public domain.","required_link":"https://patentable.app/patents/US-8503797","citation_suggestion":"Patentable. \"Automatic document classification using lexical and physical features\" (US-8503797). https://patentable.app/patents/US-8503797","copyright_holder":"Nomic Interactive Technology LLC"},"links":{"html":"https://patentable.app/patents/US-8503797","json":"https://patentable.app/api/llm-context/US-8503797","site":"https://patentable.app","llms_txt":"https://patentable.app/llms.txt"},"generated_at":"2026-05-30T13:41:25.108Z"}