US-11373043

Technique for generating and utilizing virtual fingerprint representing text data

PublishedJune 28, 2022

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

According to an embodiment of a present disclosure, a method for generating and utilizing a text fingerprint performed by a computing device is disclosed. The method comprises the steps of: dividing text data into one or more segments based on a predetermined text segmentation algorithm; determining a mapping value assigned to one or more subsegments that form the divided segment based on a predetermined mapping algorithm; generating a coordinate value for each of the one or more segments based on the determined mapping value; and generating the virtual fingerprint having a phonetic feature for the text data based on the generated coordinate value. That is, whether a plurality of the text data are similar to each other can be easily determined, when the text data also has a unique virtual fingerprint based on a pronunciation, in the same way every person has a unique fingerprint.

Patent Claims

19 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A non-transitory computer readable medium storing a computer program, wherein when the computer program is executed by one or more processors of a computing system, the computer program causes the one or more processors of the computer system to perform a method for generating and utilizing a virtual fingerprint representing text data, wherein the method comprises: dividing the text data into one or more segments based on a predetermined text segmentation algorithm; determining a mapping value assigned to one or more subsegments that form a segment based on a predetermined mapping algorithm; generating a coordinate value for each of the one or more segments based on the determined mapping value; generating the virtual fingerprint having a phonetic feature for the text data based on the generated coordinate value; generating a labeled training data comprising a similarity result data for two or more text data and data for virtual fingerprints having a phonetic feature generated corresponding to the two or more text data; and training a deep neural network by using the labeled training data.

2. The non-transitory computer readable medium according to claim 1 , wherein the generating the virtual fingerprint having a phonetic feature comprises: determining points in N-dimension space based on coordinate values generated for each of the one or more segments; and generating the virtual fingerprint having the phonetic feature for the text data by connecting the determined points.

3. The non-transitory computer readable medium according to claim 2 , wherein the generating the virtual fingerprint having the phonetic feature further comprises: determining a size of a dimension for representing the virtual fingerprint having the phonetic feature based on a size value of a segment divided from the text data by the text segmentation algorithm.

4. The non-transitory computer readable medium according to claim 2 , wherein a first dimension to an (N−1)th dimension among the N-dimension space corresponds to an N−1 dimension coordinate value assigned to subsegments of the segment, and an Nth dimension among the N-dimension space corresponds to a one-dimension coordinate value assigned with the divided segment as a unit.

5. The non-transitory computer readable medium according to claim 2 , wherein the generating the virtual fingerprint having a phonetic feature comprises: generating the virtual fingerprint having the phonetic feature by differently indicating connections between points corresponding to coordinate values of the segments, based on order information of the segments divided from the text data.

6. The non-transitory computer readable medium according to claim 2 , wherein the generating the virtual fingerprint having the phonetic feature comprises at least one of: generating the virtual fingerprint having the phonetic feature by gradually changing at least one of thickness or color intensity of a connection line that connects the determined points, based on at least one of thickness or color intensity determined for the determined points; or generating the virtual fingerprint having the phonetic feature by differently indicating the color intensity of a center of the connection line that connects the points and the color intensity of a periphery of the connection line that connects the points.

7. The non-transitory computer readable medium according to claim 5 , wherein the generating the virtual fingerprint having the phonetic feature comprises: generating the virtual fingerprint having the phonetic feature by indicating connections between the segments by applying a higher weight to a connection line between segments with a preceding order, than a connection line between segments with a succeeding order, based on order information of the segments divided from the text data.

8. The non-transitory computer readable medium according to claim 7 , wherein the indicating the connections between the segments comprises at least one of: indicating thickness of a connection line with a higher weight to have more thickness than thickness of a connection line with a lower weight; or indicating color intensity of a connection line with a higher weight to have more color intensity than color intensity of a connection line with a lower weight, and wherein at least one of a value of the thickness or a value of the color intensity is determined based on length information of the text data.

9. The non-transitory computer readable medium according to claim 5 , wherein the generating the virtual fingerprint having the phonetic feature comprises: generating the virtual fingerprint having the phonetic feature by indicating connections between the segments by applying a weight to a connection line between segments with the most preceding order, based on order information of the segments divided from the text data.

10. The non-transitory computer readable medium according to claim 1 , wherein the predetermined segmentation algorithm determines as a unit of a segmentation the number of subsegments that one segment divided from the text data has, wherein the predetermined segmentation algorithm adds a start subsegment before an initial segment of the text data, and adds an end subsegment after a final subsegment, and wherein the predetermined segmentation algorithm divides the text data into segments by forming M subsegments comprising the start subsegment and the end subsegment into one segment.

11. The non-transitory computer readable medium according to claim 1 , wherein the predetermined mapping algorithm assigns a unique mapping value per a subsegment as a unit or per a combination of subsegments as a unit, based on a pronunciation form of letters constituting a language to which the text data belongs, and wherein the predetermined mapping algorithm further assigns the unique mapping value to a start subsegment added before an initial subsegment of the text data and an end subsegment added after a final subsegment.

12. The non-transitory computer readable medium according to claim 11 , wherein the predetermined mapping algorithm further: sets a difference between mapping values of subsegments corresponding to the letters to have a first difference value, when a similarity level of a pronunciation falls inside a predetermined range; and sets a difference between mapping values of subsegments corresponding to the letters to have a second difference value, when a similarity level of a pronunciation falls outside a predetermined range; and wherein the first difference value is smaller than the second difference value.

13. The non-transitory computer readable medium according to claim 1 , wherein the method further comprises: generating a plurality of text data by dividing a sentence data with a pronunciation as a unit or with a semantic as a unit based on a sentence segmentation algorithm, when the sentence data is received; and transforming the sentence data into a virtual fingerprint having N channels, by stacking virtual fingerprints having a phonetic feature generated corresponding to the plurality of text data, on N-dimension.

14. The non-transitory computer readable medium according to claim 1 , wherein the method further comprises: comparing virtual fingerprints having a phonetic feature generated for each of a plurality of text data, by concatenating a first virtual fingerprint having a phonetic feature for a first text data of the plurality of text data with a second virtual fingerprint having a phonetic feature for a second text data of the plurality of text data and by using the concatenated virtual fingerprint; and determining a pronunciation similarity level of the plurality of text data based on a comparison result.

15. The non-transitory computer readable medium according to claim 14 , wherein the comparing the virtual fingerprints having the phonetic feature comprises: applying a first color of R (Red), G (Green) or B (Blue) to the first virtual fingerprint having the phonetic feature for the first text data of the plurality of text data; applying a second color of R, G or B to the second virtual fingerprint having the phonetic feature for the second text data of the plurality of text data, wherein the first color is different from the second color; and comparing the virtual fingerprints having the phonetic feature, based on at least one of a color intensity or a color weight, by concatenating the first virtual fingerprint and the second virtual fingerprint to which a color is applied.

16. The non-transitory computer readable medium according to claim 14 , wherein the comparing the virtual fingerprints having the phonetic feature comprises: comparing the virtual fingerprints based on a pixel value included in the virtual fingerprint, wherein the comparing the virtual fingerprints having the phonetic feature comprises at least one of: calculating Euclidean distance value between the first virtual fingerprint having the phonetic feature for the first text data of the plurality of text data and the second virtual fingerprint having the phonetic feature for the second text data of the plurality of text data; or calculating Cosine distance value between the first virtual fingerprint and the second virtual fingerprint.

17. The non-transitory computer readable medium according to claim 1 , wherein the method further comprises: after training the deep neural network, receiving an input for two or more text data; generating virtual fingerprints having phonetic features for the two or more text data; determining information related to the result of the comparison for the generated virtual fingerprints having the phonetic features, by a network function of a trained deep neural network; and determining to output the information related to the result of the comparison determined by the network function.

18. A computing apparatus for implementing a method for generating and utilizing a text fingerprint comprising: one or more processors; and a memory storing instructions executable by the one or more processors; wherein the one or more processors are configured to: divide text data into one or more segments based on a predetermined text segmentation algorithm; determine a mapping value assigned to one or more subsegments that form a segment based on a predetermined mapping algorithm; generate a coordinate value for each of the one or more segments based on the determined mapping value; generate a virtual fingerprint having a phonetic feature for the text data based on the generated coordinate value; generate a labeled training data comprising a similarity result data for two or more text data and data for virtual fingerprints having a phonetic feature generated corresponding to the two or more text data; and train a deep neural network by using the labeled training data.

19. A method for generating and utilizing a text fingerprint comprising: dividing text data into one or more segments based on a predetermined text segmentation algorithm; determining a mapping value assigned to one or more subsegments that form a segment based on a predetermined mapping algorithm; generating a coordinate value for each of the one or more segments based on the determined mapping value; generating the virtual fingerprint having a phonetic feature for the text data based on the generated coordinate value; generating a labeled training data comprising a similarity result data for two or more text data and data for virtual fingerprints having a phonetic feature generated corresponding to the two or more text data; and training a deep neural network by using the labeled training data.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06N G06F G06V

Patent Metadata

Filing Date

December 28, 2017

Publication Date

June 28, 2022

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search