Legal claims defining the scope of protection, as filed with the USPTO.
1. A non-transitory storage medium storing instructions readable and executable by an electronic data processing device to perform a method operating on an ARPA table for a modeled natural language in which each entry of the ARPA table includes an n-gram Az, an associated backoff value Az.p equal to the conditional probability p(z|A) that symbol z follows context A in the modeled natural language, and an associated backoff weight value Az.b for the context A, the method comprising: computing by said electronic data processing device a max-ARPA table from the ARPA table by operations including: computing and adding for each entry of the ARPA table an associated maximum backoff weight product value Az.m wherein the computing and adding of the associated maximum backoff weight product values is performed on the entries of the ARPA table in descending n-gram order; and after computing and adding the associated maximum backoff weight product values, computing and adding for each entry of the ARPA table an associated max-backoff value Az.w=w(A,z) where w(A,z)=max h p(z|hA) is the maximum backoff value for any head h preceding the context A of the n-gram Az and the computing and adding of the associated max-backoff values is performed on the entries of the ARPA table in descending n-gram order; wherein each entry of the max-ARPA table includes an n-gram Az and its associated backoff value Az.p, backoff weight value Az.b, maximum backoff weight product value Az.m, and max-backoff value Az.w: and computing by said electronic data processing device a max-backoff value w(A,z) for an n-gram Az of the modeled natural language that is not in the ARPA table by applying the recursive equation: w ( A , z ) = { p ( A , z ) if Az ∉ T m A and A ∉ T m A p ( A , z ) × A · m if Az ∉ T m A and A ∈ T m A Az · w if Az ∈ T m A where the values A.m and Az.w are obtained from the .m and .w columns of the max-ARPA table T mA , respectively, and p(A,z) is computed from the .p and .b columns of the max-ARPA table.
2. The non-transitory storage medium as set forth in claim 1 wherein: the computing of the max-ARPA table includes the further operation of sorting the entries of the ARPA table in descending n-gram order prior to computing and adding the maximum backoff weight product values Az.m; the computing and adding of the associated maximum backoff weight product values Az.m is performed from top-to-bottom on the sorted ARPA table whereby the computing and adding of the associated maximum backoff weight product values is performed on the entries of the ARPA table in descending n-gram order; and the computing and adding of the associated max-backoff values Az.w is performed from top-to-bottom on the sorted ARPA table whereby the computing and adding of the associated max-backoff values is performed on the entries of the ARPA table in descending n-gram order.
3. The non-transitory storage medium as set forth in claim 2 wherein the operation of computing and adding for each entry of the sorted ARPA table an associated maximum backoff weight product value A.m comprises performing the algorithm: For A in T sorted A.m ← 1 For x in V s.t.xA in T sorted : A.m ← max(A.m, xA.b × xA.m) where T sorted is the sorted ARPA table, the algorithm is performed from top-to-bottom of the sorted ARPA table T sorted , and V is the vocabulary of the modeled natural language, and the maximum backoff weight product value for entry A is the value A. m computed by performing the algorithm.
4. The non-transitory storage medium as set forth in claim 3 wherein the operation of computing and adding for each entry Az of the sorted ARPA table an associated max-backoff value Az.w comprises performing the algorithm: For Az in T sorted Az.w ← Az.p For x in V s.t.xA in T sorted : If xAz in T sorted : Az.w ← max(Az.w,xAz.w) Else Az.w ← max(Az.w,Az.p × xA.b × xA.m) from top-to-bottom of the sorted ARPA table.
5. The non-transitory storage medium as set forth in claim 1 wherein the method further comprises: computing the backoff value p(z|A) for an n-gram Az of the natural language that is not in the ARPA table by applying the recursive equation: p ( A , z ) = { p ( tail ( A ) , z ) if Az ∉ T m A and A ∉ T m A p ( tail ( A ) , z ) × A · b if Az ∉ T m A and A ∈ T m A Az · p if Az ∈ T m A where tail(A) denotes the string A with its first element removed and the values A.b and Az.p are obtained from the .b and .p columns of the max-ARPA table, respectively.
6. The non-transitory storage medium of claim 1 wherein the method further comprises: sampling a language model of the natural language represented by the ARPA table by generating an upper bound on the language model and then sequentially refining the upper bound during the sampling process using max-backoff values w(A,z) computed using the operation (4).
7. The non-transitory storage medium of claim 6 wherein the method further comprises: performing statistical machine translation using the sampling of the natural language.
8. The non-transitory storage medium of claim 6 wherein the method further comprises: performing part-of-speech tagging using the sampling of the natural language.
9. A method operating on an ARPA table for a modeled natural language in which each entry of the ARPA table includes an n-gram Az, an associated backoff value Az.p equal to the conditional probability p(z|A) that symbol z follows context A in the modeled natural language, and an associated backoff weight value Az.b for the context A, the method comprising: computing a max-ARPA table from the ARPA table by using an electronic data processing device to perform the operations of: sorting the entries of the ARPA table in descending n-gram order to generate a sorted ARPA table; (1) after the sorting, computing and adding for each entry from top-to-bottom of the sorted ARPA table an associated maximum backoff weight product value Az.m; and (2) after performing operation (1), computing and adding for each entry from top-to-bottom of the sorted ARPA table an associated max-backoff value Az.w where Az.w=max h p(z|hA) is the maximum backoff value for any head h preceding the context A of the n-gram Az; wherein each entry of the max-ARPA table includes an n-gram Az and its associated backoff value Az.p, backoff weight value Az.b, maximum backoff weight product value Az.m, and max-backoff value Az.w; computing by said electronic data processing device a max-backoff value w(A,z) for an n-gram Az that is not in the ARPA table by applying the recursive equation: w ( A , z ) = { p ( A , z ) if Az ∉ T m A and A ∉ T m A p ( A , z ) × A · m if Az ∉ T m A and A ∈ T m A Az · w if Az ∈ T m A where the values A.m and Az.w are obtained from the .m and .w columns of the max-ARPA table T mA , respectively, and p(A,z) is computed from the .p and .b columns of the max-ARPA table, and the computing of the max-backoff value w(A,z) is performed by the electronic data processing device; and sampling a language model of the natural language represented by the ARPA table by generating an upper bound on the language model and then sequentially refining the upper bound during the sampling process using the computed max-backoff value w(A,z) wherein the generating and sequential refining are performed by the electronic data processing device.
10. The method of claim 9 wherein performing operation (1) comprises performing the algorithm: For A in T sorted A.m ← 1 For x in V s.t.xA in T sorted : A.m ← max(A.m,xA.b × xA.m) where T sorted is the sorted ARPA table, the algorithm is performed from top-to-bottom of the sorted ARPA table T sorted , and V is the vocabulary of the modeled natural language, and the maximum backoff weight product value for entry A is assigned the value A.m computed by performing the algorithm.
11. The method of claim 10 wherein performing operation (2) comprises performing the algorithm: For Az in T sorted Az.w ← Az.p For x in V s.t.xA in T sorted : If xAz in T sorted: Az.w ← max(Az.w,xAz.w) Else Az.w ← max(Az.w,Az.p × xA.b × xA.m) from top-to-bottom of the sorted ARPA table T sorted .
12. The method claim 9 further comprising: computing the backoff value p(z|A) for an n-gram Az that is not in the ARPA table by applying the recursive equation: p ( A , z ) = { p ( tail ( A ) , z ) if Az ∉ T m A and A ∉ T m A p ( tail ( A ) , z ) × A · b if Az ∉ T m A and A ∈ T m A Az · p if Az ∈ T m A where tail(A) denotes the string A with its first element removed and the values A.b and Az.p are obtained from the .b and .p columns of the max-ARPA table, respectively, and wherein the computing of the backoff value p(z|A) is performed by the electronic data processing device.
13. The method of claim 9 further comprising: performing statistical machine translation or part-of-speech tagging using the sampling of the natural language.
14. An apparatus comprising: a computer programmed to perform a method operating on an ARPA table for a modeled natural language in which each entry of the ARPA table includes an n-gram Az, an associated backoff value Az.p equal to the conditional probability p(z|A) that symbol z follows context A in the modeled natural language, and an associated backoff weight value Az.b for the context A, the method comprising: (1) computing and adding by said computer for each entry of the ARPA table in descending n-gram order an associated maximum backoff weight product value Az.m; (2) after performing operation (1), computing and adding by said computer for each entry of the ARPA table in descending n-gram order an associated max-backoff value Az.w=max h p(z|hA) which is the maximum backoff value for any head h preceding the context A of the n-gram Az; and (3) extending by said computer the ARPA table by adding a column storing the associated maximum backoff weight product values Az.m and a column storing the associated max-backoff values Az.w.
15. The apparatus of claim 14 wherein the method further comprises: (0) sorting the entries of the ARPA table in descending n-gram order to generate a sorted ARPA table; wherein each of operation (1) and operation (2) is performed on the sorted ARPA table from top-to-bottom.
16. The apparatus of claim 15 wherein operation (1) comprises performing the algorithm: For A in T sorted A.m ← 1 For x in V s.t.xA in T sorted : A.m ← max(A.m,xA.b × xA.m) wherein the algorithm is performed from top-to-bottom on the sorted ARPA table T sorted , V is the vocabulary of the modeled natural language, and the maximum backoff weight product value for entry Az is assigned the value Az.m computed by performing the algorithm.
17. The apparatus of claim 16 wherein operation (2) comprises performing the algorithm: For Az in T sorted Az.w ← Az.p For x in V s.t.xA in T sorted : If xAz in T sorted : Az.w ← max(Az.w,xAz.w) Else Az.w ← max(Az.w,Az.p × xA.b × xA.m) from top-to-bottom on the sorted ARPA table T sorted .
18. The apparatus of claim 14 wherein the method performed by the computer further comprises: (4) computing a max-backoff value w(A,z) for an n-gram Az that is not in the ARPA table by applying the recursive equation: w ( A , z ) = { p ( A , z ) × A · m if Az ∉ T Az · w if Az ∈ T where T denotes the extended ARPA table, Az.w is obtained from the Az.w column added to the ARPA table, A.m is obtained from the .m column added to the ARPA table if listed and is assigned a default value otherwise, and p(A,z) is computed from the .p and .b columns of the ARPA table.
19. The apparatus of claim 18 wherein the method performed by the computer further comprises: sampling a language model of the natural language represented by the ARPA table by generating an upper bound on the language model and then sequentially refining the upper bound during the sampling process using max-backoff values w(A,z) computed using the operation (4).
20. The apparatus of claim 19 wherein the method performed by the computer further comprises: performing statistical machine translation or part-of-speech tagging using the sampling of the natural language.
Unknown
July 26, 2016
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.