A method for an app search engine leveraging user reviews is provided. The method includes receiving an app search query from a user, determining a plurality of relevant apps based on the received app search query, and extracting app descriptions and user reviews associated with the plurality of relevant apps from an app database. The method also includes preprocessing the extracted app descriptions and user reviews of each of the plurality of relevant apps to generate a text corpus and creating a topic-based language model for each of the plurality of relevant apps based on the generated text corpus. Further, the method includes ranking a list of relevant apps using the topic-based language model and providing the ranked app list for the user.
Legal claims defining the scope of protection, as filed with the USPTO.
2. The method according to claim 1 , wherein providing the ranked app list for the user further includes: formatting the ranked app list to be viewable by a mobile device used by the user.
3. The method according to claim 1 , wherein preprocessing the extracted app descriptions and user reviews of each of the plurality of relevant apps to generate a text corpus further includes: normalizing content of the text corpus to a canonical form.
4. The method according to claim 1 , wherein: the app score indicates strength of association between the query words and the app.
5. The method according to claim 1 , wherein: provided that each document d contains N d words and a whole document collections build a word vocabulary V, the topic-based language model for each app a in each topic z is defined by: p lda ( w | a ) = ∑ z = 1 K p ( w | z , W d , Z ^ d , β ) p ( z | a , Z ^ d , Z ^ r , α d , α r ) ∝ ∑ z = 1 K N ^ w | z + β N ^ z + V β N ^ z | d + K α d + N ^ z | r + K α r N ^ z | d + α d N d + K α d N d + K α d + Σ z N ^ z | r + K α r wherein α and β are symmetric prior vectors; w is a certain word in the app search query; W d is all the words in descriptions of all apps; K is a total number of all shared topics; {circumflex over (N)}with subscription is an estimated number of words satisfying subscription condition; and {circumflex over (Z)} d and {circumflex over (Z)} r are topics for the app descriptions and the user reviews estimated from app latent dirichlet allocation (AppLDA), respectively.
7. The system according to claim 6 , wherein: the ranked app list is formatted to be viewable by a mobile device used by the user.
8. The system according to claim 6 , wherein: the preprocessing module is further configured to normalize content of the text corpus to a canonical form.
9. The system according to claim 6 , wherein: the app score indicates strength of association between the query words and the app.
10. The system according to claim 6 , wherein: provided that each document d contains Nwords and a whole document collections build a word vocabulary V, the topic-based language model for each app a in each topic z is defined by: p lda ( w | a ) = ∑ z = 1 K p ( w | z , W d , Z ^ d , β ) p ( z | a , Z ^ d , Z ^ r , α d , α r ) ∝ ∑ z = 1 K N ^ w | z + β N ^ z + V β N ^ z | d + K α d + N ^ z | r + K α r N ^ z | d + α d N d + K α d N d + K α d + Σ z N ^ z | r + K α r wherein α and β are symmetric prior vectors; w is a certain word in the app search query; W d is all the words in descriptions of all apps; K is a total number of all shared topics; {circumflex over (N)}with subscription is an estimated number of words satisfying subscription condition; and {circumflex over (Z)} d and {circumflex over (Z)} r are topics for the app descriptions and the user reviews estimated from app latent dirichlet allocation (AppLDA), respectively.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
May 7, 2015
October 8, 2019
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.