Methods and apparatus to meter content exposure using closed caption information are disclosed. An example method comprises developing a keyword database of terms based on program guide descriptive of programs for a given time period, generating one or more values representative of likelihoods that one or more respective media content was presented based on a comparison of closed caption text and the keyword database, collecting audience measurement data, and employing the one or more likelihood values to identify a set of reference data for comparison to the audience measurement data to identify presented content.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method comprising: developing a keyword database of terms based on a program guide descriptive of a plurality of programs for a given time period; generating a plurality of likelihood values for respective ones of the plurality of programs based on comparison of closed caption text associated with a presented program to the keyword database, the values representing likelihoods that the respective ones of the plurality of programs is the presented program, the likelihood values being generated without comparing the collected audience measurement data to any reference audience measurement data; collecting an audience measurement parameter for the presented program, the audience measurement parameter useable to identify the presented program; employing the plurality of likelihood values using a processor to select a subset of the plurality of programs to form a list of most probable presented programs, wherein the selected subset includes more than one of and less than all of the plurality of programs; and sending the list of most probable programs and the collected audience measurement data to a collection server, the collection server to compare the collected audience measurement data to reference audience measurement data for respective ones of the most probable programs in an order selected based on the likelihood values for respective ones of the most probable programs in the list.
2. A method as defined in claim 1 , wherein generating the likelihood values comprises counting matches of the closed caption text and the keyword database for respective ones of the plurality of programs.
3. A method as defined in claim 2 , further comprising: computing a sum of the one or more matches for a respective one of the plurality of programs; and dividing each of the matches by the sum.
4. A method as defined in claim 1 , wherein the program guide information comprises an eXtensible Markup Language (XML) data structure.
5. A method as defined in claim 1 , wherein the collected audience measurement parameter comprises at least one of an audio code embedded in the presented program, a video code embedded in the presented program, an audio signature generated from the presented program, or a video signature generated from the presented program.
6. A method as defined in claim 5 , wherein the audio code is embedded in the presented program by a broadcaster to identify the presented program.
7. A method as defined in claim 1 , wherein the list further includes at least one of a most probable channel or a most probable time.
8. An apparatus comprising: an audience measurement engine to collect an audience measurement parameter for a presented program; an indexing engine to create a keyword database based on data descriptive of a plurality of programs; and a closed caption matcher to: generate likelihood values for respective ones of the plurality of programs based on comparison of closed caption text associated with the presented program to the keyword database, the values representing likelihoods that the respective ones of the plurality of programs is the presented programs, the likelihood values being generated without comparing the collected audience measurement data to any reference audience measurement data; select a subset of the plurality of programs based on the likelihood values to form a list of most probable presented programs, the list of most probable presented programs including more than one of and fewer than all of the plurality of programs; order the list of most probable presented programs based on respective ones of the likelihood values; and send the ordered list of most probable programs and the collected audience measurement data to a collection server, the collection server to compare the collected audience measurement data to reference audience measurement data for respective ones of the most probable programs based on the order of the most probable programs in the list to determine an audience presentation statistic, wherein at least one of the audience measurement engine, the indexing engine or the closed caption matcher is implemented in hardware.
9. An apparatus as defined in claim 8 , further comprising a closed caption decoding engine to extract the closed caption text.
10. An apparatus as defined in claim 8 , wherein the audience measurement parameter comprises at least one of an audio code embedded in the presented program, a video code embedded in the presented program, an audio signature generated from the presented program, or a video signature generated from the presented program.
11. An apparatus as defined in claim 8 , wherein the closed caption matcher is to generate the likelihood values by counting matches of the closed caption text and the keyword database for respective ones of the plurality of programs.
12. An apparatus as defined in claim 8 , wherein the indexing engine is to generate the keyword database to remove redundant information.
13. An apparatus as defined in claim 8 , wherein the list further includes at least one of a most probable channel or a most probable time.
14. A tangible article of manufacture excluding propagating signals, the article comprising a computer-readable storage medium storing machine readable instructions that, when executed, cause a machine to: develop a keyword database of terms based on a program guide descriptive of a plurality of programs for a given time period; collect audience measurement data for a presented program, the audience measurement data useable to identify the presented program; generate likelihood values for respective ones of the plurality of programs based on comparison of closed caption information associated with the presented program and the keyword database, the values representing likelihoods that the respective ones of the plurality of programs is the presented program, the likelihood values being generated without comparing the collected audience measurement data to any reference audience measurement data; select a subset of the plurality of programs based on the likelihood values to form a list of most probable presented programs, the list of most probable presented programs including more than one of and fewer than all of the plurality of programs; order the list of most probable presented programs based on respective ones of the generated likelihood values; and send the ordered list of most probable programs and the collected audience measurement data to a collection server, the collection server to compare the collected audience measurement data to reference audience measurement data for respective ones of the most probable programs based on the order of the most probable programs in the list to identify the presented program.
15. A tangible article of manufacture as defined in claim 14 , wherein the machine accessible instructions, when executed, cause the machine to generate the likelihood values by counting matches of the closed caption text and the keyword database for respective ones of the plurality of programs.
16. A tangible article of manufacture as defined in claim 14 , wherein the program guide information comprises an eXtensible Markup Language (XML) data structure.
17. A tangible article of manufacture as defined in claim 14 , wherein the audience measurement data comprises at least one of an audio code, a video code, an audio signature, or a video signature.
18. A tangible article of manufacture as defined in claim 17 , wherein the audio code is inserted by a broadcaster to identify the presented program.
19. A tangible article of manufacture as defined in claim 14 , wherein the further includes at least one of a most probable channel or a most probable time.
20. A method comprising: receiving from a content meter an audience measurement parameter for a presented program; receiving from the content meter a list of most probable presented programs, programs in the list being selected and ordered based on comparisons of closed-caption text associated with the presented programs to a keyword database, the ordered list including more than one of and fewer than all of the plurality of programs; and comparing using a processor reference audience measurement parameters for respective ones of the most probable presented programs until the presented content is identified, the reference audience measurement parameters compared in accordance with the order of the most probable presented programs in the list.
21. A method as defined in claim 20 , wherein the collected audience measurement parameter comprises at least one of an audio code embedded in the presented program, a video code embedded in the presented program, an audio signature generated from the presented program, or a video signature generated from the presented program.
22. A method as defined in claim 20 , wherein the ordered list further includes at least one of a most probable channel, or a most probable time.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
June 11, 2007
April 3, 2012
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.