Patentable/Patents/US-6961699
US-6961699

Automated transcription system and method using two speech converting instances and computer-assisted correction

PublishedNovember 1, 2005
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

A system for automating transcription services for one or more users. This system receives a voice dictation file from a current user, which is automatically converted into a first written text based on a set of conversion variables. The same voice dictation file is automatically converted into a second written text based on a second set of conversion variables. The first and second sets of conversion variables have at least one difference, such as different speech recognition programs, different vocabularies, and the like. The system further includes a program for manually editing a copy of the first and second written text to create a verbatim text of the voice dictation file. This verbatim text can be delivered to the current user as transcribed text. The verbatim text can also be fed back into each speech recognition instance to improve the accuracy of each instance with respect to the human voice in the file.

Patent Claims
22 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A system for substantially automating transcription services for one or more voice users, comprising means for receiving a voice dictation file from a current user, said current user being one of said one or more voice users; first means for automatically converting said voice dictation file into a first written text, said first automatic conversion means having a first set of conversion variables; second means for automatically converting said voice dictation file into a second written text, said second automatic converting means having a second set of conversion variables, said first and second sets of conversion variables having at least one difference; and means for manually editing a copy of said first and second written texts to create a verbatim text of said voice dictation file; wherein said first written text is at least temporarily synchronized to said voice dictation file, and said manual editing means comprises: means for sequentially comparing a copy of said first written text with said second written text resulting in a sequential list of unmatched words culled from said copy of said first written text, said sequential list having a beginning, an end and a current unmatched word, said current unmatched word being successively advanced from said beginning to said end; means for incrementally searching for said current unmatched word contemporaneously within a first buffer associated with said first automatic conversion means containing said first written text and a second buffer associated with said sequential list; and means for correcting said current unmatched word in said second buffer, said correcting means including means for displaying said current unmatched word in a manner substantially visually isolated from other text in said copy of said first written text and means for playing a portion of said synchronized voice dictation recording from said first buffer associated with said current unmatched word.

2

2. The invention according to claim 1 wherein said difference between said first and second sets of conversion variables comprises at least one setting associated with said preexisting speech recognition program.

3

3. The invention according to claim 2 wherein said editing means further includes means for alternatively viewing said current unmatched word in context within said copy of said first written text.

4

4. The invention according to claim 2 wherein said first and second automatic speech converting means each comprises a preexisting speech recognition program intended for human interactive use, each of said first and second automatic speech converting means includes means for automating responses to a series of interactive inquiries from said preexisting speech recognition program.

5

5. The invention according to claim 4 wherein said difference between said first and second sets of conversion variables is said preexisting speech recognition program comprising said first and second automatic speech converting means.

6

6. The invention according to claim 5 wherein said automatic speech converting means is selected from the group consisting essentially of Dragon Systems' Naturally Speaking, IBM's Via Voice and Philips Corporation's Magic Speech.

7

7. The invention according to claim 2 wherein said difference between said first and second sets of conversion variables comprises a language model used in association with said preexisting speech recognition program.

8

8. The invention according to claim 7 wherein a generalized language model is used in said first set of conversion variables and a specialized language model is used in said second set of conversion variables.

9

9. The invention according to claim 1 wherein said difference between said first and second sets of conversion variables comprises means for pre-processing audio prior to its input to said first automatic conversion means.

10

10. The invention according to claim 8 wherein said difference between said first and second sets of conversion variables comprises means for pre-processing audio prior to its input to said second automatic conversion means, wherein said first and second pre-processing variable is different.

11

11. The invention according to claim 10 wherein said pre-processing variables is selected from the group consisting essentially of digital word size, sampling rate, and removing particular harmonic ranges.

12

12. The invention according to claim 1 wherein said difference between said first and second sets of conversion variables comprises a language model used in association with said preexisting speech recognition program.

13

13. The invention according to claim 12 wherein a generalized language model is used in said first set of conversion variables and a specialized language model is used in said second set of conversion variables.

14

14. The invention according to claim 1 wherein said difference between said first and second sets of conversion variables comprises means for pre-processing audio prior to its input to said first automatic conversion means.

15

15. The invention according to claim 14 wherein said difference between said first and second sets of conversion variables comprises means for pre-processing audio prior to its input to said second automatic conversion means, wherein said first and second pre-processing variable is different.

16

16. The invention according to claim 1 further including means for training said automatic speech converting means to achieve higher accuracy with said voice dictation file of current user.

17

17. The invention according to claim 16 wherein said training means comprises a preexisting training portion of a preexisting speech recognition program intended for human interactive use, said training means includes means for automating responses to a series of interactive inquiries from said preexisting training portion of said preexisting speech recognition program.

18

18. A method for automating transcription services for one or more voice users in a system including at least one speech recognition program, comprising receiving a voice dictation file from a current voice user; automatically creating a first written text from the voice dictation file with a speech recognition program using a first set of conversion variables; automatically creating a second written text from the voice dictation file with a speech recognition program using a second set of conversion variables; manually establishing a verbatim file through comparison of the first and second written texts; and returning the verbatim file to the current user, wherein said step of manually establishing a verbatim file includes the sub-steps of: sequentially comparing a copy of the first written text with the second written text resulting in a sequential list of unmatched words culled from the copy of the first written text, the sequential list having a beginning, an end and a current unmatched word, the current unmatched word being successively advanced from the beginning to the end; incrementally searching for the current unmatched word contemporaneously within a first buffer associated with the at least one speech recognition program containing the first written text and a second buffer associated with the sequential list; and displaying the current unmatched word in a manner substantially visually isolated from other text in the copy of the first written text and playing a portion of the synchronized voice dictation recording from the first buffer associated with the current unmatched word; and correcting the current unmatched word to be a verbatim representation of the portion of the synchronized voice dictation recording.

19

19. The invention according to claim 18 further comprising: selecting the first set of conversion variables from available preexisting speech recognition programs; and differently selecting the second set of conversion variables from available preexisting speech recognition programs.

20

20. The invention according to claim 18 further comprising: selecting the first set of conversion variables from available language models; and differently selecting the second set of conversion variables from available language models.

21

21. The invention according to claim 18 further comprising preprocessing the voice dictation file before automatically creating a first written text, the preprocessing forming at least a part of the first set of conversion variables.

22

22. The invention according to claim 21 further comprising preprocessing the voice dictation file differently than the first set of preprocessing conversion variables before automatically creating a second written text, the preprocessing forming at least a part of the second set of conversion variables.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

February 18, 2000

Publication Date

November 1, 2005

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Automated transcription system and method using two speech converting instances and computer-assisted correction” (US-6961699). https://patentable.app/patents/US-6961699

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.