Disclosed are techniques for automatically extracting discovered topics and/or from determined discourse clusters for the generation of a language model that is applicable to interpreting commands received from a digital assistant device. An electronic document corpus can be generated having a plurality of documents that are clustered based on entropy, among other things. The clusters can be associated with a corresponding plurality of cluster attractors that are generally representative of a context of the documents included therein. The documents within the cluster for each of the document clusters can be analyzed, so that clusters determined representative of a hierarchical discourse community can be determined and logically merged. The merged clusters can be analyzed, such that topics and/or sub-topics can be determined and extracted therefrom, for indexing and storage, among other things. In this way, a more efficient searching of the electronic document corpus to interpret received inputs, such as commands received via a digital assistant device, can be facilitated.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A non-transitory computer storage medium storing computer-useable instructions that, when used by at least one computing device, cause the at least one computing device to: obtain a set of determined representative phrases for each electronic document cluster in a generated plurality of electronic document clusters, wherein each electronic document cluster in the generated plurality of electronic document clusters includes a portion of electronic documents of a plurality of electronic documents, each electronic document of the plurality of electronic documents being associated with one of a plurality of stored command templates; define a plurality of logical relationships amongst the generated plurality of electronic document clusters; determine a plurality of contextually similar electronic document groups from the generated plurality of electronic document clusters based on the defined plurality of logical relationships, each contextually similar electronic document group including a corresponding portion of the generated plurality of electronic document clusters; determine a set of cluster tags for each contextually similar electronic document group of the determined plurality of contextually similar electronic document groups; for each contextually similar electronic document group of the determined plurality of contextually similar electronic document groups, extract a set of topics and corresponding sub-topics from the determined corresponding set of cluster tags; and store the extracted sets of topics and corresponding sub-topics to a data store, each stored set of topics and corresponding sub-topics being associated with one of the determined plurality of contextually similar electronic document groups.
2. The medium of claim 1 , the instructions further cause the at least one computing device to: generate a searchable index based on the stored sets of topics and corresponding sub-topics; determine that a portion of the generated plurality of document clusters is relevant to a command received from a remote computing device based on the generated searchable index; and provide a determined result to the remote computing device based on the determined relevant portion of the generated plurality of document clusters as a response to the received command.
3. The medium of claim 2 , wherein the generated result corresponds to the determined relevant portion of the generated plurality of document clusters.
4. The medium of claim 3 , wherein the generated result includes at least one of a plurality of action datasets mapped to the determined relevant portion of the generated plurality of document clusters.
5. The medium of claim 1 , wherein each electronic document of the plurality of electronic documents is generated based on other electronic documents retrieved from at least one remote data store.
6. The medium of claim 5 , wherein each other electronic document is retrieved based on a query, the query being generated based on one of the plurality of stored command templates.
7. A computer-implemented method for extracting topics and/or sub-topics from merged document clusters, the method comprising: obtain, by a computing device, a set of determined representative phrases for each electronic document cluster in a generated plurality of electronic document clusters, wherein each electronic document cluster in the generated plurality of electronic document clusters includes a portion of electronic documents of a plurality of electronic documents, each electronic document of the plurality of electronic documents being associated with one of a plurality of stored command templates; define, by the computing device, a plurality of logical relationships amongst the generated plurality of electronic document clusters; determine, by the computing device, a plurality of contextually similar electronic document groups from the generated plurality of electronic document clusters based on the defined plurality of logical relationships, each contextually similar electronic document group including a corresponding portion of the generated plurality of electronic document clusters; determine, by the computing device, a set of cluster tags for each contextually similar electronic document group of the determined plurality of contextually similar electronic document groups; for each contextually similar electronic document group of the determined plurality of contextually similar electronic document groups, extract, by the computing device, a set of topics and corresponding sub-topics from the determined corresponding set of cluster tags; and store, by the computing device, the extracted sets of topics and corresponding sub-topics to a data store, each stored set of topics and corresponding sub-topics being associated with one of the determined plurality of contextually similar electronic document groups.
8. The method of claim 7 , the instructions further cause the at least one computing device to: generate, by the computing device, a searchable index based on the stored sets of topics and corresponding sub-topics; determine, by the computing device, that a portion of the generated plurality of document clusters is relevant to a command received from a remote computing device based on the generated searchable index; and providing, by the computing device, a determined result to the remote computing device based on the determined relevant portion of the generated plurality of document clusters as a response to the received command.
9. The method of claim 8 , wherein the generated result corresponds to the determined relevant portion of the generated plurality of document clusters.
10. The method of claim 9 , wherein the generated result includes at least one of a plurality of action datasets mapped to the determined relevant portion of the generated plurality of document clusters.
11. The method of claim 7 , wherein each electronic document of the plurality of electronic documents is generated based on other electronic documents retrieved from at least one remote data store.
12. The method of claim 11 , wherein each other electronic document is retrieved based on a query, the query being generated based on one of the plurality of stored command templates.
13. A system comprising: at least one processor; and at least one storage device storing computer-useable instructions that, when used by the at least one processor, cause the at least one processor to: obtain a set of determined representative phrases for each electronic document cluster in a generated plurality of electronic document clusters, wherein each electronic document cluster in the generated plurality of electronic document clusters includes a portion of electronic documents of a plurality of electronic documents, each electronic document of the plurality of electronic documents being associated with one of a plurality of stored command templates; define a plurality of logical relationships amongst the generated plurality of electronic document clusters; determine a plurality of contextually similar electronic document groups from the generated plurality of electronic document clusters based on the defined plurality of logical relationships, each contextually similar electronic document group including a corresponding portion of the generated plurality of electronic document clusters; determine a set of cluster tags for each contextually similar electronic document group of the determined plurality of contextually similar electronic document groups; for each contextually similar electronic document group of the determined plurality of contextually similar electronic document groups, extract a set of topics and corresponding sub-topics from the determined corresponding set of cluster tags; and store the extracted sets of topics and corresponding sub-topics to a data store, each stored set of topics and corresponding sub-topics being associated with one of the determined plurality of contextually similar electronic document groups.
14. The system of claim 13 , wherein the instructions further cause the at least one computing device to: generate a searchable index based on the stored sets of topics and corresponding sub-topics; determine that a portion of the generated plurality of document clusters is relevant to a command received from a remote computing device based on the generated searchable index; and provide a determined result to the remote computing device based on the determined relevant portion of the generated plurality of document clusters as a response to the received command.
15. The system of claim 14 , wherein the generated result corresponds to the determined relevant portion of the generated plurality of document clusters.
16. The system of claim 15 , wherein the generated result includes at least one of a plurality of action datasets mapped to the determined relevant portion of the generated plurality of document clusters.
17. The system of claim 13 , wherein each electronic document of the plurality of electronic documents is generated based on other electronic documents retrieved from at least one remote data store.
18. The system of claim 17 , wherein each other electronic document is retrieved based on a query, the query being generated based on one of the plurality of stored command templates.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
December 27, 2018
February 23, 2021
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.