Patentable/Patents/US-8543906
US-8543906

Probabilistic learning method for XML annotation of documents

PublishedSeptember 24, 2013
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

A document processor includes a parser that parses a document using a grammar having a set of terminal elements for labeling leaves, a set of non terminal elements for labeling nodes, and a set of transformation rules. The parsing generates a parsed document structure including terminal element labels for fragments of the document and a nodes tree linking the terminal element labels and conforming with the transformation rules. An annotator-annotates the document with structural information based on the parsed document structure.

Patent Claims
4 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A document processor stored in a non-transitory medium comprising: a probabilistic classifier that classifies fragments of an input document respective to a set of terminal elements by assigning probability values for the fragments corresponding to elements of the set of terminal elements; a parser that defines a parsed document structure associating the input document fragments with terminal elements connected by links of non-terminal elements conforming with a probabilistic grammar defining transformation rules operating on elements selected from the set of terminal elements and a set of non-terminal elements, the parsed document structure being used to organize the input document, the parser including a joint probability optimizer that optimizes the parsed document structure respective to a joint probability of (i) the probability values of the associated terminal elements and (ii) probabilities of the connecting links of non-terminal elements derived from the probabilistic grammar; a classifier trainer that trains the probabilistic classifier respective to a set of training documents having pre-classified fragments; and a grammar derivation module that derives the probabilistic grammar from the set of training documents, each training document having a pre-assigned parsed document structure associating fragments of the training document with terminal elements connected by links of non-terminal elements.

2

2. The document processor as set forth in claim 1 , wherein the probabilistic grammar is a probabilistic context-free grammar and the joint probability optimizer employs a modified inside/outside optimization.

3

3. The document processor as set forth in claim 1 , wherein the computer is further programmed to implement: an XML document converter that converts the input document to an XML document having an XML structure generated in accordance with the parsed document structure.

4

4. The document processor as set forth in claim 3 , wherein the XML document includes a DTD based on the probabilistic grammar.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

June 29, 2005

Publication Date

September 24, 2013

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Probabilistic learning method for XML annotation of documents” (US-8543906). https://patentable.app/patents/US-8543906

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.