Disclosed are a method and an apparatus for constructing a binary feature dictionary. The method may include: extracting binary features from a corpus; calculating a preset statistic of each binary feature; and selecting a preset number of binary features in sequence according to the preset statistic to constitute the binary feature dictionary.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method for constructing a binary feature dictionary, comprising: extracting binary features from a corpus; calculating a preset statistic of each of the binary features; and selecting a preset number of the binary features in sequence according to the preset statistic to constitute the binary feature dictionary; extracting the selected binary features included in the binary feature dictionary from word segments of a semantic similarity model as training data of the semantic similarity model; and performing a neural network training according to the training data to generate the semantic similarity model.
2. The method according to claim 1 , wherein extracting the binary features from the corpus comprises: determining two adjacent terms in the corpus as a binary feature.
3. The method according to claim 1 , wherein the preset statistic is T-statistic.
4. An apparatus for constructing a binary feature dictionary, comprising: one or more processors; a memory storing instructions that, when executed by the one or more processors, cause the one or more processors to: extract binary features from a corpus; calculate a preset statistic of each of the binary features; select a preset number of the binary features in sequence according to the preset statistic to constitute the binary feature dictionary; extract the selected binary features included in the binary feature dictionary from word segments of a semantic similarity model as training data of the semantic similarity model; and perform a neural network training according to the training data to generate the semantic similarity model.
5. The apparatus according to claim 4 , wherein the one or more processors extract binary features from the corpus by performing act of: determining two adjacent terms in the corpus as a binary feature.
6. The apparatus according to claim 4 , wherein the preset statistic calculated by the one or more processors is T-statistic.
7. A non-transitory computer readable storage medium, wherein when instructions in the storage medium are executed by a processor of a terminal, the terminal is caused to perform a method, the method comprises: extracting binary features from a corpus; calculating a preset statistic of each of the binary features; selecting a preset number of the binary features in sequence according to the preset statistic to constitute a binary feature dictionary; extracting the selected binary features included in the binary feature dictionary from word segments of a semantic similarity model as training data of the semantic similarity model; and performing a neural network training according to the training data to generate the semantic similarity model.
8. The method according to claim 2 , wherein the preset statistic is T-statistic.
9. The apparatus according to claim 5 , wherein the preset statistic calculated by the one or more processors is T-statistic.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
December 22, 2016
November 10, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.