9613024

System and Methods for Creating Datasets Representing Words and Objects

PublishedApril 4, 2017
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A computer system for producing a dataset for representing a term or an object, the system comprising: one or more processors operable to receive a first group of text contents comprising a plurality of text units; receive, or identify from the text contents, a first term comprising a word or a phrase; identify a text unit comprising a sentence or a phrase containing the first term and one or more second terms each comprising a word or a phrase; identify a relation between the first term and one or more second terms in the text unit using a machine-based algorithm based on occurrence, or location, or attributes associated with the first term or the one or more second terms; determine one or more numerical values to represent the relation or the strength of the relation between the first term and the corresponding one or more second terms; collect one or more of the one or more numerical values into a group of numerical values; associate the group of numerical values to the first term to form a dataset; output the dataset as a representation of the first term or an object represented by the first term based on relations between the first term and terms other than the first term.

2

2. The system of claim 1 , wherein the one or more processors are further operable to collect, based on the relation or based on the numerical values, one or more of the one or more second terms into a group of second terms; associate the group of second terms to the first term to form the dataset.

3

3. The system of claim 2 , wherein the dataset is further used for providing a representation of the first term by other terms, or providing a representation of an object represented by the first term, wherein the object comprises a physical object or a conceptual object, wherein the group of second terms represent properties associated with the object.

4

4. The system of claim 2 , wherein at least one of the one or more second terms in the dataset is associated with one of the numerical values.

5

5. The system of claim 4 , wherein the at least one of the one or more second terms is collected based on the one of the numerical values.

6

6. The system of claim 4 , wherein the function of the one of the numerical values includes representing the strength of association between the at least one of the one or more second terms and the first term, or between a property or attribute represented by the at least one of the one or more second terms and the object represented by the first term.

7

7. The system of claim 1 , wherein the one or more numerical values are determined based on the number of text units that contain the first term or the one or more second terms, or the number of occurrences of the first term or the one or more second terms in the text units.

8

8. The system of claim 7 , wherein the one or more numerical values are determined further by dividing the one or more numerical values by the total number of text units in the first group.

9

9. The system of claim 1 , wherein the one or more numerical values are determined based on the location of the first term or the one or more second terms in the text units.

10

10. The system of claim 1 , wherein the one or more numerical values are determined based on whether the text unit is a phrase, a sentence, a paragraph, or a document containing a plurality of sentences or paragraphs.

11

11. The system of claim 1 , wherein the one or more numerical values are determined based on a grammatical attribute associated with the first term, wherein the grammatical attribute includes at least a subject or a predicate of a sentence, or a head or a modifier of a multi-word phrase, or a sub-component of a multi-word phrase.

12

12. A computer system for producing a dataset for representing a term or information related to an object, the system comprising: one or more processors operable to receive a first group of text contents comprising a plurality of text units; receive, or identify from the text contents, a first term comprising a word or a phrase; identify a text unit comprising a sentence or a phrase containing the first term and one or more second terms each comprising a word or a phrase; identify a relation between the first term and one or more second terms in the text unit using a machine-based algorithm based on occurrence, or location, or attributes associated with the first term or the one or more second terms; determine a strength measure of the relation between the first term and the corresponding one or more second terms; collect, based on the relation and the strength measure, one or more of the one or more second terms into a group of second terms; associate the group of second terms to the first term to form a dataset; and output the dataset as a representation of the first term by other terms associated with the first term, or information associated with an object represented by the first term, wherein the object comprises a physical or conceptual entity, wherein the group of second terms represent properties associated with the object.

13

13. The system of claim 12 , wherein the one or more processors are further operable to produce a first score to represent the strength measure based on the occurrence, location, or attributes associated with the first term or the one or more second terms, wherein the group of second terms are collected based on the first score.

14

14. The system of claim 13 , wherein the first score is produced based on the number of text units that contain the first term or the one or more second terms, or the number of occurrences of the first term or the one or more second terms in the text units.

15

15. The system of claim 14 , wherein the first score is produced further by dividing the first score by the total number of text units in the first group.

16

16. The system of claim 13 , wherein the first score is produced based on the location of the first term or the one or more second terms in the text units.

17

17. The system of claim 13 , wherein the first score is produced based on whether the text unit is a phrase, a sentence, a paragraph, or a document containing a plurality of sentences or paragraphs.

18

18. The system of claim 13 , wherein the first score is produced based on a grammatical attribute associated with the first term, wherein the grammatical attribute includes at least a subject or a predicate of a sentence, or a head or a modifier of a multi-word phrase, or a sub-component of a multi-word phrase.

19

19. The system of claim 13 , wherein the function of the first score includes representing the strength of association between the at least one second term and the first term, or between a property or attribute represented by the at least one second term and the object represented by the first term.

20

20. The system of claim 13 , wherein the first score is produced based on the occurrence or attributes associated with the one or more second terms in text units that do not contain the first term, or based on the number of text units that contain the one or more second terms but do not contain the first term.

Patent Metadata

Filing Date

Unknown

Publication Date

April 4, 2017

Inventors

Guangsheng Zhang

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “SYSTEM AND METHODS FOR CREATING DATASETS REPRESENTING WORDS AND OBJECTS” (9613024). https://patentable.app/patents/9613024

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.