Technologies for detecting bot signals include extracting at least a primary signal and a secondary signal from header data logged by an automated service during a time interval, where the header data is associated with web-based client-server requests received by an online system, generating distribution data representative of the secondary signal when the primary signal matches a first criterion, converting the distribution data to a quantitative score, after the time interval, causing a web-based client-server request to be blocked or redirected when the quantitative score matches a second criterion.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method comprising: extracting at least a primary signal and a secondary signal from header data logged by an automated service during a time interval; wherein the header data is associated with web-based client-server requests received by an online system; generating distribution data representative of the secondary signal when the primary signal matches a first criterion; the distribution data comprises count data for each of a plurality of different types of secondary signals; converting the distribution data to a quantitative score; wherein converting the distribution data to a quantitative score comprises ordering the distribution data in a descending order of magnitude of the count data, computing count ratios for adjacent pairs of count data in the ordered distribution data; computing, as the quantitative score, a first quantitative score when a first subset of the count ratios is greater than or equal to a first count ratio threshold or a second subset of the count ratios is less than or equal to a second count ratio threshold, and computing, as the quantitative score, a second quantitative score when the first subset of the count ratios is less than the first count ratio threshold or the second subset of the count ratios is greater than the second count ratio threshold, wherein an equation used to compute the first quantitative score is different than an equation used to compute the second quantitative score; after the time interval, causing a web-based client-server request to be blocked or redirected when the quantitative score matches a second criterion; wherein the method is performed by one or more computing devices.
2. The method of claim 1 , wherein the primary signal comprises User-Agent.
3. The method of claim 1 , wherein the secondary signal comprises any one or more of the following: User-Agent, Accept-Language, other HTTP header fields.
4. The method of claim 1 , wherein the primary signal comprises any one or more of the following: User-Agent, Accept-Language, other HTTP header fields.
5. The method of claim 1 , wherein the second criterion comprises a data value that indicates that the web-based client-server request was generated by an abusive bot.
6. The method of claim 1 , wherein: the count data for a particular type of secondary signal indicates a number of web-based client-server requests counted during the time interval that include the particular type of secondary signal converting, to a quantitative score comprises: computing the quantitative score using an equation that is determined based on whether one or more of the count ratios matches a count ratio threshold.
7. The method of claim 1 , comprising using the quantitative score to block or redirect a web scraping bot request or a JavaScript-executing bot request.
8. A method comprising: extracting at least a primary signal and a secondary signal from header data logged by an automated service during a time interval; wherein the header data is associated with web-based client-server requests received by an online system; generating distribution data representative of the secondary signal when the primary signal matches a first criterion; wherein the first criterion comprises a request count threshold that indicates a minimum number of web-based client-server requests that need to be received during the time interval in order to generate the distribution data; converting the distribution data to a quantitative score; determining a request count that indicates a count of web-based client-server requests received during the time interval that have a primary signal matching the first criterion, and after the time interval, causing at least one web-based client-server request to be blocked or redirected when the request count matches the request count threshold; wherein the method is performed by one or more computing devices.
9. A computer program product comprising: one or more non-transitory computer-readable storage media comprising instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising: extracting at least a primary signal and a secondary signal from header data logged by an automated service during a time interval; wherein the header data is associated with web-based client-server requests received by an online system; generating distribution data representative of the secondary signal when the primary signal matches a first criterion; wherein the first criterion comprises a request count threshold that indicates a minimum number of web-based client-server requests that need to be received during the time interval in order to generate the distribution data; converting the distribution data to a quantitative score; determining a request count that indicates a count of web-based client-server requests received during the time interval that have a primary signal matching the first criterion, and after the time interval, causing at least one web-based client-server request to be blocked or redirected when the request count matches the request count threshold or the quantitative score matches a second criterion.
10. The computer program product of claim 9 , wherein the primary signal comprises User-Agent.
11. The computer program product of claim 9 , wherein the secondary signal comprises any one or more of the following: User-Agent, Accept-Language, other HTTP header fields.
12. The computer program product of claim 9 , wherein the primary signal comprises any one or more of the following: User-Agent, Accept-Language, other HTTP header fields.
13. The computer program product of claim 9 , wherein the first criterion comprises a request count threshold that indicates a minimum number of web-based client-server requests that need to be received during the time interval in order to generate the distribution data.
14. The computer program product of claim 9 , wherein the second criterion comprises a data value that indicates that at least one web-based client-server request was generated by an abusive bot.
15. The computer program product of claim 9 , wherein the instructions, when executed by one or more processors, cause the one or more processors to perform operations comprising using the quantitative score to block or redirect a web scraping bot request or a JavaScript-executing bot request.
16. A computer program product comprising: one or more non-transitory computer-readable storage media comprising instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising: extracting at least a primary signal and a secondary signal from header data logged by an automated service during a time interval; wherein the header data is associated with web-based client-server requests received by an online system; generating distribution data representative of the secondary signal when the primary signal matches a first criterion; the distribution data comprises count data for each of a plurality of different types of secondary signals; ordering the distribution data in a descending order of magnitude of the count data, computing count ratios for adjacent pairs of count data in the ordered distribution data, converting the distribution data to a quantitative score; computing, as the quantitative score, a first quantitative score when a first subset of the count ratios is greater than or equal to a first count ratio threshold or a second subset of the count ratios is less than or equal to a second count ratio threshold, and computing, as the quantitative score, a second quantitative score when the first subset of the count ratios is less than the first count ratio threshold or the second subset of the count ratios is greater than the second count ratio threshold, wherein an equation used to compute the first quantitative score is different than an equation used to compute the second quantitative score; after the time interval, causing a web-based client-server request to be blocked or redirected when the quantitative score matches a second criterion.
17. The computer program product of claim 16 , wherein: the count data for a particular type of secondary signal indicates a number of web-based client-server requests counted during the time interval that include the particular type of secondary signal; converting the distribution data to a quantitative score comprises: computing the quantitative score using an equation that is determined based on whether one or more of the count ratios matches a count ratio threshold.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
November 29, 2018
November 3, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.