Patentable/Patents/US-20260087235-A1
US-20260087235-A1

Custom Display Post Processing in Speech Recognition

PublishedMarch 26, 2026
Assigneenot available in USPTO data we have
Technical Abstract

Solutions for custom display post processing (DPP) in speech recognition (SR) use a customized multi-stage DPP pipeline that transforms a stream of SR tokens from lexical form to display form. A first transformation stage of the DPP pipeline receives the stream of tokens, in turn, by an upstream filter, a base model stage, and a downstream filter, and transforms a first aspect of the stream of tokens (e.g., disfluency, inverse text normalization (ITN), capitalization, etc.) from lexical form into display form. The upstream filter and/or the downstream filter alter the stream of tokens to change the default behavior of the DPP pipeline into custom behavior. Additional transformation stages of the DPP pipeline perform further transforms, allowing for outputting final text in a display format that is customized for a specific user. This permits each user to efficiently leverage a common baseline DPP pipeline to produce a custom output.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

20 .-. (canceled)

2

a processor; and a computer-readable medium storing instructions that are operative upon execution by the processor to: receive a target format document; transform text of the target format document into a stream of tokens each representing an element of human speech in lexical form; receive, by a multi-stage display post processing (DPP) pipeline, the stream of tokens, wherein the DPP pipeline comprises at least an upstream filter, a first base model, a second base model, and a downstream filter; transform, by the first base model, a first aspect of the stream of tokens from the lexical form to a display form; transform, by the second base model, a second aspect of the stream of tokens from the lexical form to a display form; output, by the multi-stage display post processing (DPP) pipeline, a baseline text representing the stream of tokens with the transformed first aspect and the transformed second aspect; determine a difference between baseline text and text of the target format document; based on the determined difference, generate a set of rules for the upstream filter and the downstream filter; provide a user interface (UI) for a user to accept or edit the set of generated rules; freeze a current version of the DPP pipeline; transform, utilizing the current version of the DDP pipeline, an input stream of human speech from a lexical form to a display form; and provide the transformed input stream to the user. . A system comprising:

3

claim 21 perform an explicit punctuation operation on the baseline text before determining the difference between the baseline text and the text of the target format document. . The system of, wherein the instructions that are further operative upon execution by the processor to:

4

claim 21 perform a grammar capitalization operation on the baseline text before determining the difference between the baseline text and the text of the target format document. . The system of, wherein the instructions are further operative upon execution by the processor to:

5

claim 21 . The system of, wherein the user enables or disables the upstream filter or the downstream filter or both.

6

claim 21 perform a keyword spotted text removal operation on the baseline text before determining the difference between the baseline text and the text of the target format document. . The system of, wherein the instructions are further operative upon execution by the processor to:

7

claim 21 . The system of, wherein the transformed input stream includes a textual transcript of the display form being shown on the UI.

8

claim 21 receive an indication of an error in the transformed input stream; and based on receiving the indication of an error, training the upstream filter or the downstream filter or both, using a trainer. . The system of, wherein the instructions are further operative upon execution by the processor to:

9

claim 21 alter, by the upstream filter or the downstream filter or both, the stream of tokens before determining the difference between the baseline text and the text of the target format document. . The system of, wherein the instructions are further operative upon execution by the processor to:

10

receiving a target format document; transforming text of the target format document into a stream of tokens each representing an element of human speech in lexical form; receiving, by a multi-stage display post processing (DPP) pipeline, the stream of tokens, wherein the DPP pipeline comprises at least an upstream filter, a first base model, a second base model, and a downstream filter; transforming, by the first base model, a first aspect of the stream of tokens from the lexical form to a display form; transforming, by the second base model, a second aspect of the stream of tokens from the lexical form to a display form; outputting, by the multi-stage display post processing (DPP) pipeline, a baseline text representing the stream of tokens with the transformed first aspect and the transformed second aspect; determining a difference between baseline text and text of the target format document; based on the determined difference, generating a set of rules for the upstream filter and the downstream filter; providing a user interface (UI) for a user to accept or edit the set of generated rules; freezing a current version of the DPP pipeline; transforming, utilizing the current version of the DPP pipeline, an input stream of human speech from a lexical form to a display form; and providing the transformed input stream to the user. . A computerized method comprising:

11

claim 29 performing an explicit punctuation operation on the baseline text before determining the difference between the baseline text and the text of the target format document. . The computerized method of, further comprising:

12

claim 29 performing a grammar capitalization operation on the baseline text before determining the difference between the baseline text and the text of the target format document. . The computerized method of, further comprising:

13

claim 29 . The computerized method of, wherein the user enables or disables the upstream filter or the downstream filter or both.

14

claim 29 performing a keyword spotted text removal operation on the baseline text before determining the difference between the baseline text and the text of target format document. . The computerized method of, further comprising:

15

claim 29 . The computerized method of, wherein the transformed input stream includes a textual transcript of the display form being shown on the UI.

16

claim 29 receiving indication of an error in the transformed input stream; and based on receiving the indication of an error, training the upstream filter or the downstream filter or both, using a trainer. . The computerized method of, further comprising:

17

claim 29 altering, by the upstream filter or the downstream filter or both, the stream of tokens before determining a first difference between the baseline text and the text of the target format document. . The computerized method of, further comprising:

18

receiving a target format document; transforming text of the target format document into a stream of tokens each representing an element of human speech in lexical form; receiving, by a multi-stage display post processing (DPP) pipeline, the stream of tokens, wherein the DPP pipeline comprises at least an upstream filter, a first base model, a second base model, and a downstream filter; transforming, by the first base model, a first aspect of the stream of tokens from the lexical form to a display form; transforming, by the second base model, a second aspect of the stream of tokens from the lexical form to a display form; outputting, by the multi-stage display post processing (DPP) pipeline, a baseline text representing the stream of tokens with the transformed first aspect and the transformed second aspect; determining a difference between baseline text and text of the target format document; based on the determined difference, generating a set of rules for the upstream filter and the downstream filter; providing a user interface (UI) for a user to accept or edit the set of generated rules; freezing a current version of the DPP pipeline; transforming, utilizing the current version of the DPP pipeline, an input stream of human speech from a lexical form to a display form; and providing the transformed input stream to the user. . One or more computer storage media having computer-executable instructions stored thereon, which, on execution by a computer, cause the computer to perform operations comprising:

19

claim 37 . The one or more computer storage media of, wherein the transformed input stream includes a textual transcript of the display form being shown on the UI.

20

claim 37 . The one or more computer storage media of, wherein the user enables or disables the upstream filter or the downstream filter or both.

21

claim 37 altering, by the upstream filter or the downstream filter or both, the stream of tokens before determining the difference between the baseline text and the text of the target format document. . The one or more computer storage media of, wherein the operations further comprise:

Detailed Description

Complete technical specification and implementation details from the patent document.

This application is a continuation application of and claims priority to U.S. patent application Ser. No. 18/764,129, entitled “CUSTOM DISPLAY POST PROCESSING IN SPEECH RECOGNITION,” filed on Jul. 3, 2024, which is a continuation application of and claims priority to U.S. patent application Ser. No. 17/815,211 (now U.S. Pat. No. 12,061,861), entitled “CUSTOM DISPLAY POST PROCESSING IN SPEECH RECOGNITION,” filed on Jul. 26, 2022, which claims priority to PCT Application No. PCT/CN22/90154, filed on Apr. 29, 2022, the disclosures of which are incorporated herein by reference in their entireties.

Speech services typically use a two-phase approach: speech recognition and display post processing (DPP). Speech recognition (SR) outputs the recognized speech in lexical form and DPP transforms the lexical form input to display form (e.g., natural language form) to improve readability. For example, the lexical language form “january one nineteen eighty” (as may be output by SR) is more readily-understandable by humans when presented as “Jan. 1, 1980” in a displayed transcript.

st However, different users may prefer different display form versions, such as dates rendered as “Jan. 1, 1980” versus “January 1, 1980” or “1/1/1980” (or even “01/01/1980”). Other categories of lexical to display form transformation, such as disfluency (e.g., removing “uhh” and “um”), capitalization, and punctuation may also be subject to differing user preferences. A one-size-fits-all DPP will therefore not satisfy all user preferences, and generating a different DPP engine for each potential combination of user preferences is resource inefficient.

The disclosed examples are described in detail below with reference to the accompanying drawing figures listed below. The following summary is provided to illustrate some examples disclosed herein. It is not meant, however, to limit all examples to any particular configuration or sequence of operations.

Example solutions for custom display post processing (DPP) in speech recognition (SR) include: receiving, by a customized multi-stage DPP pipeline, a stream of tokens, each token representing an element of human speech in a lexical form; for a first transformation stage of the DPP pipeline, receiving the stream of tokens, in turn, by a first upstream filter, a first base model stage, and a first downstream filter, and: transforming, by the first base model stage, a first aspect of the stream of tokens from lexical form into display form; and altering, by the first upstream filter and/or the first downstream filter, the stream of tokens; receiving, by a second transformation stage of the DPP pipeline, from the first transformation stage, the stream of tokens; transforming, by the second transformation stage, a second aspect of the stream of tokens from lexical form into display form; and based on at least transforming multiple aspects of the stream of tokens, outputting a final text representing the stream of tokens.

Corresponding reference characters indicate corresponding parts throughout the drawings.

The various examples will be described in detail with reference to the accompanying drawings. Wherever preferable, the same reference numbers will be used throughout the drawings to refer to the same or like parts. References made throughout this disclosure relating to specific examples and implementations are provided solely for illustrative purposes but, unless indicated to the contrary, are not meant to limit all examples.

A display post processing (DPP) pipeline typically alters speech recognition (SR) output in stages, such as with stages for the various tasks of: disfluency, inverse text normalization (ITN), capitalization, and punctuation. Text normalization is a process of transforming text into a single canonical form that it might not have had before, such as by replacing symbols with certain words, and possibly re-arranging the order and/or deleting punctuation. ITN, versus to text normalization, is a process of converting raw spoken output of an SR model into its written form to improve text readability. ITN is used to convert from common oral to common written representations, when they differ. For example, the words “five dollars” are replaced with a number (“5”) replacing a word and a currency symbol (“$”) replacing a word. The order is then swapped, because “$5” is the common written representation, rather than “5$”.

A normalization category is a context that has a set of common text normalization rules. In the examples just provided, text normalization for currency has a different set of rules than text normalization for dates, and thus are separate normalization categories. Example normalization categories include address, location, time, date, currency, decimal, fraction, email, internet address, and math.

Example solutions for custom display post processing in SR use a customized multi-stage DPP pipeline that transforms a stream of SR tokens from lexical form to display form. A first transformation stage of the DPP pipeline receives the stream of tokens, in turn, by an upstream filter, a base model stage, and a downstream filter, and transforms a first aspect, portion, element, or the like, of the stream of tokens (e.g., disfluency, inverse text normalization (ITN), capitalization, etc.) from lexical form into display form. The upstream filter and/or the downstream filter alter the stream of tokens to change the default behavior of the DPP pipeline into custom behavior. Additional transformation stages of the DPP pipeline perform further transforms, allowing for outputting final text in a display format that is customized for a specific user. This permits each user to efficiently leverage a common baseline DPP pipeline to produce a custom output. The tokens described herein are SR tokens, which are a defined set of digital symbols that are each mapped to a recognized spoken word. In some examples, the tokens may be textual representations of the spoken words.

Aspects of the disclosure improve the operations of computing devices, efficiently tailoring output of complex SR and DPP processes at least by altering, by an upstream filter (upstream from a base DPP model stage) and/or a downstream filter, a stream of tokens representing an element of human speech in a lexical form. By using customized upstream/downstream filter pairs straddling one or more base DPP model stages, a single baseline DPP pipeline can be efficiently leveraged in a technological sense (e.g., efficient use of computing resources) to provide customized DPP for SR tasks, meeting unique user preferences. The upstream/downstream filter pairs straddling each DPP model stage permit altering DPP pipeline behavior, without changing the DPP model stages, by altering input to a stage and/or output from a stage. For example, certain tokens (e.g., words) may be tagged by an upstream filter prior to entering a particular DPP model stage, to instruct that stage to preserve a token, and the tag may be removed later by the downstream filter (which may further change the tagged token). In some examples, the filters are rules-based.

Users are able to customize filters (e.g., by generating scenario-specific rules) according to their own preferences (e.g., as a form of self-service), and link their customized filters to a baseline DPP pipeline to override default behaviors, thereby producing their own customized DPP pipeline. The users are able to accomplish this rapidly, on their own schedules, with reduced computing resource usage, precluding the need to wait for a developer of the DPP pipeline to use additional computing resources to create a custom version. With each user having their own customizations, which may be withheld from dissemination outside that user's account e.g., a particular organization's resources), user privacy is maintained and network bandwidth use is reduced (thereby improving the functioning of the underlying computing device). In this manner, a company may have its own customizations that are not shared with competitors. As an example, a company may use a certain format for internal project reference identifiers, such as “8B-EV-3”, that the company prefers to keep as proprietary information. A baseline DPP pipeline may transform the spoken words “eight bee ee vee three” to “8 B E V 3” which is less helpful. Thus, the custom DPP pipeline has utility to the company (the user).

Some examples provide for rapid, simplified filter generation (e.g., streamlining use of computing resources) in which the user supplies a document, that has examples of the preferred display format, to use as a target. A disclosed DPP customization tool is able to generate the filter rules in order to match the customized DPP pipeline's output to the text forms in the target document. That is, the filters learn proper display form, given user-provided in-domain unique content that is specific to that user. Users are also able to use default behavior of the baseline DPP pipeline for selected stages, where desired. Examples provide for a per scenario and/or per class of user customization.

Some examples improve computing stability by permitting a user to specify remaining locked to a particular version of a baseline DPP pipeline (e.g., to preclude the risk that a new version of the baseline DPP pipeline operates differently with the user's customized filters, changing overall behavior). Examples run in multiple computing environments (e.g., cloud, on premises (servers), and on device), and may even span and/or share information among different environments. Customized DPP solutions may also be developed by independent software vendors (ISVs) and solution integrators (SIs).

1 FIG. 100 200 102 104 106 108 104 110 111 112 113 illustrates an example arrangementthat advantageously provides custom DPP in SR, using a customized multi-stage DPP pipeline. A microphone(or microphone array) captures an audio inputcomprising human speech from a speaker. An audio segmentersegments audio inputinto a plurality of audio segments, for example comprising audio segment, audio segment, audio segment, and others.

110 120 120 111 113 130 131 132 133 134 135 136 137 138 139 131 139 131 139 131 139 Plurality of audio segmentsare provided to an SR component. SR componentrecognizes elements of human speech in audio segments-and outputs a stream of tokens, for example comprising token, token, token, token, token, token, token, token, and token. Each of tokens-represents an element of human speech in a lexical language form, for example a word. Tokens-are a defined set of digital symbols that are each mapped to a recognized spoken word. In some examples tokens-may be textual representations of the spoken words.

122 130 104 200 130 200 200 122 120 200 170 130 200 122 In some examples, such as in cloud deployments and other multi-user environments, a customer identifierlinks stream of tokens(or audio input) to DPP pipeline. By linking stream of tokensto DPP pipeline, specifically, in multi-use environments, the use of DPP pipelineis limited to only the authorized user (e.g., individual person or a member of an organization) associated with customer identifier. In some examples, such as when SR componentand DPP pipelineare both deployed within a single-user device, such as a user device, linking stream of tokensto DPP pipelinewith customer identifiermay not be needed.

200 130 140 140 130 200 140 150 152 160 170 2 FIG. 1 FIG. nd DPP pipelinetransforms lexical form of stream of tokensinto display form of a final text, and outputs a final textrepresenting stream of tokens. Further detail on the composition and operation of an exemplary DPP pipelineis provided in relation to. Final textis provided to a transcription servicethat outputs a textual transcriptfor display on a display device(e.g., a video screen or a screen of user device. An example sentence “Meet me on 2avenue at 4:30 pm” is shown in, in a natural language display form that may be more readable than the lexical form of “meet me on second avenue at four thirty p m”. The process may be implemented by a voice assistant, transcription service, dictation service, or the like.

100 140 106 102 140 160 In some examples, arrangementoperates in real-time (or near-real-time) such that final textis output and displayed in a streaming fashion. That is, there is a minimal lag time or latency (e.g., under five seconds) after speakerutters a word into microphoneand that word appears within final texton display device.

100 102 160 170 170 108 120 150 102 160 930 9 FIG. In some examples, everything shown in arrangementbetween microphoneand display device, and described thus far, is implemented on user device. Some examples of user deviceare a mobile device, such as a smartphone, a tablet computer, or a notebook computer. In some examples, one or more of audio segmenter, SR component, and transcription serviceis located remotely from microphoneand/or display device, such as in a cloud environment (across a networkof) or in an on-premises server.

400 200 206 200 206 4 FIG. A DPP pipeline customization tool, which is described in further detail in relation to, is used to customize filters of DPP pipelinethat are applied to a baseline DPP pipeline. In some examples, DPP pipelineis thus a combination of customized filters (for a specific user) and a baseline DPP pipelinethat is common to other users.

2 FIG. 200 200 202 130 204 140 200 270 280 210 282 220 284 230 286 240 288 250 260 272 illustrates an exemplary DPP pipeline. DPP pipelineis a customized multi-stage DPP pipeline that transforms a lexical formof stream of tokensinto display formof final text. In the illustrated example, DPP pipelinehas a global pre-rewrite stage, a preserve phrase tagger, a transformation stage, a preserve phrase tagger, a transformation stage, a preserve phrase tagger, a transformation stage, a preserve phrase tagger, a transformation stage, a preserve phrase tagger, a transformation stage, an explicit punctuation stage, and a global post-rewrite stage.

210 220 230 240 250 260 140 In some examples, transformation stageperforms disfluency; transformation stageperforms ITN; transformation stageperforms reformulation; transformation stageperforms capitalization; transformation stagemasks or removes objectionable words, such as profanity, and explicit punctuation stageadds in punctuation, such as by replacing words that state a punctuation with the punctuation mark itself (e.g., replacing “comma” with an actual comma in final text).

3 FIG. 3 FIG. 202 310 Turning briefly to, an example is provided for each of disfluency, inverse text normalization (ITN), reformulation, capitalization, profanity masking, and explicit punctuation. Inlexical form“well well g - - - - - n too day is may twenty seven period” is provided to a disfluency stagethat outputs “well g - - - - - n too day is may twenty seven period”. Disfluencies are interruptions in the regular flow of speech, such as using uh and um, pausing silently, repeating words, or interrupting oneself to correct something said previously.

310 320 320 330 330 340 340 350 The output of disfluency stageis fed into an ITN stagethat outputs “well g - - - - - n too day is may twenty seven period”, according to a user's preferred date representation. The output of ITN stageis fed into a reformulation stagethat outputs “well g - - - - - n today is 5/27 period”, reformulating “too day” into a context-correct “today”. The output of reformulation stageis fed into a capitalization stagethat outputs “Well g - - - - - n today is 5/27 period”, by capitalizing the recognized start of a sentence. The output of capitalization stageis fed into a profanity stagethat outputs “Well *** today is 5/27 period”, masking the profane word “g - - - - - n” with asterisks.

350 260 200 200 The output of profanity stageis fed into explicit punctuation stagethat outputs “Well *** today is 5/27.”, replacing the word “period” with the actual punctuation mark. Explicit punctuation relies on a spoken word for a punctuation mark, and is an optional operation of DPP pipeline. Some examples of DPP pipelineuse implicit punctuation, in which punctuation marks are inferred from context and pauses between spoken words.

310 350 210 250 200 3 FIG. In some examples, stages-ofcorrespond to transformation stages-, respectively. However, in some examples, a different order of number of transformation stages is used in DPP pipeline.

2 FIG. 206 214 224 254 206 234 244 234 244 200 Returning to, a baseline DPP pipelineincludes a base model stage(e.g., disfluency or another), a base model stage(e.g., ITN or another), and a base model stage(e.g., profanity or another). In some examples, baseline DPP pipelinealso includes a base model stage(e.g., reformulation or another) and a base model stage(e.g., capitalization or another), although in some examples, base model stagesandare added with customization into DPP pipeline(thus becoming customized).

In some examples, disfluency reformulation, capitalization, profanity, and punctuation use rule-based filter models, and ITN uses network models for filters. An ITN filter may comprise a neural network (NN), such as a transformer NN. A transformer NN is configured to solve sequence-to-sequence tasks while handling long-range dependencies (e.g., relatively distant prior inputs), and is thus suitable to classifying long strings of SR tokens. For example, phone numbers in the USA are ten numerical digits, and so span at least ten spoken words. Transformer NNs typically rely on self-attention to compute representations of input and output without using sequence-aligned recurrent NNs (RNNs) or convolution.

Rule-based models use an upstream filter and a downstream filter, which may be viewed as a “rule escaper” and a “rule add-on” respectively, and which may be independently actuated (or omitted). An upstream filter changes the input to a base model stage, so that certain rules are disabled. This may be accomplished by tagging the related phrases in the input lexical form. A downstream filter may be similar to its corresponding base model, although it applies custom rules and removes tags inserted by the upstream filter. For example, specific profanity and capitalization rules may be supplied by a user, and disfluency may be turned on or off. In some examples, the explicit punctuation stage uses a merged model that merges base rules, user-provided add-on rules, and user-provided removal rules, although in other examples, the explicit punctuation stage also uses the upstream and downstream filter arrangement.

For ITN, the upstream filter may change the input to the base model stage in support of error correction (e.g., to ensure the lexical text is processed properly), disable certain behavior by tagging select phrases in the input lexical text, and extend functionality by directly applying ITN rules provide by a user. The downstream filter may re-format the output of the base model stage and remove tags applied by the upstream ITN filter. This enables users to leverage the common rules provided in the base model stage and add on their own domain-specific rules. For example, the ITN base model stage provides support to transcript numbers from various spoken forms, so the ITN downstream filter may easily re-format the numbers into a preferred format without separately implementing number transcription related ITN again (e.g., “1/1/1980” to “01/01/1980”).

200 270 200 270 130 271 DPP pipelinehas a global pre-rewrite stagethat may change the input that is then fed into the remainder of DPP pipeline. For example, global pre-rewrite stagemay perform key word spotting (KWS) text removal rule-based recognition error correction, and or insert tags into stream of tokens, such as a tag.

200 280 281 200 200 280 210 200 DPP pipelinehas a preserve phrase taggerthat inserts a preserve phrase tag. In some examples, DPP pipelineuses a global preserved phrase tagger that inserts tags to prevent preserve certain phrases from being processed (changed) by any stage of DPP pipeline. In some examples, preserve phrase taggers are unique to the immediately-following transformation stage (e.g., preserve phrase taggeris unique to the immediately-following transformation stage), inserting tags only relevant to the immediately-following transformation stage. In some examples, the preserve phrase function is provided in each transformation stage by the upstream filter of that transformation stage. At the end of DPP pipeline, tagged phrases are restored to their original wording. For example, certain words may be preserved from any of disfluency, ITN, profanity masking, capitalization, and reformulation.

210 212 214 216 211 216 282 220 222 224 226 284 Transformation stagehas an upstream filter, base model stage, and a downstream filter. As indicated, upstream filter inserts a tag, which will be removed by downstream filter. A preserve phrase taggermay be next, if DPP pipeline does not use a global preserve phrase tagger or perform preserve phrase tagging within upstream filters. Transformation stagehas an upstream filter, base model stage, and a downstream filter. A preserve phrase taggermay be next.

230 232 234 236 286 240 242 244 246 288 250 252 254 256 260 272 Transformation stagehas an upstream filter, base model stage, and a downstream filter. A preserve phrase taggermay be next. Transformation stagehas an upstream filter, base model stage, and a downstream filter. A preserve phrase taggermay be next. Transformation stagehas an upstream filter, base model stage, and a downstream filter. Explicit punctuation stageis next, followed (in the illustrated example) by a global post-rewrite stage.

272 200 140 272 272 140 272 272 271 270 Global post-rewrite stagerewrites the final output of DPP pipelineinto final text. In some examples, global post-rewrite stageis a model comprising a set of rewrite rules. A rewrite rule is a pair of two phrases in the form (old phrase→new phrase). Global post-rewrite stagereplaces any occurrence of “old phrase” with the corresponding “new phrase” final text. In some examples, the matching algorithm is case insensitive and uses a greedy policy, so that if rewrite rules conflict, the one with the longer “old phrase” will prevail. In some examples, global post-rewrite stagealso supports grammar capitalization, such as capitalizing the first letter of a sentence, although this capitalization functionality may be disabled by a user. In some examples, global post-rewrite stagealso remove any remaining tags (e.g., taginserted by global pre-rewrite stage).

Users can independently toggle upstream and downstream filter operation for each transformation stage, as well as the global stages (pre-rewrite, post-rewrite, and preserve phrase). For a given transformation stage, the transfer function may be represented as one of:

204 202 where D represents display form, L represents lexical form, Base( ) represents the behavior of the base model stage, UF( ) represents the behavior of the upstream filter, and DF( ) represents the behavior of the downstream filter.

Eq. (1) is for both the upstream filter and downstream filter disabled. Eq. (2) is for the upstream filter enabled and the downstream filter disabled. Eq. (3) is for the upstream filter disabled and the downstream filter enabled. Eq. (4) is for both the upstream filter and downstream filter enabled. Eq. (5) is for both the upstream filter and downstream filter enabled, and bypassing the base model stage (which may be accomplished in some examples using tags).

4 FIG. 2 FIG. 400 200 100 440 400 200 442 440 410 430 430 206 444 430 illustrates DPP pipeline customization toolthat enables development and deployment of DPP pipelineofinto arrangement. A target format documentis fed into DPP pipeline customization tooland used to enable DPP pipelineto learn proper display form for user-provided in-domain unique content that is specific to that user. Textof target format documentis converted by a document converterinto a stream of tokensin lexical form. Stream of tokensis fed into baseline DPP pipelinewhich outputs baseline textrepresenting stream of tokens.

412 414 416 444 442 440 420 422 212 216 424 222 226 A differencerdetermines a first differenceand a second differencebetween baseline textand textof target format document. A rule generatorgenerates rulesfor upstream filterand downstream filter, rulesfor upstream filterand downstream filter, and other rules for filters of other transformation stages.

428 450 200 450 170 428 122 406 406 200 A deployment managerdeploys the customized filters into deployment environmentto produce DPP pipeline. In some examples, deployment environmentcomprises a cloud resource, on premises servers, or user device. In some examples, such as multi-user environments, deployment manageruses customer identifier(which is associated with user) to ensure that only useris able to access DPP pipeline.

406 426 422 424 452 440 454 454 406 200 206 200 Useruses a user interfaceto accept and/or edit the generated rules (e.g., rulesand), upload authored rules(written without the process described for target format document), and/or enter a version indication. Version indicationenables userto ensure that DPP pipelinewill use only a specified version of baseline DPP pipeline(e.g., DPP pipelineis “locked”), in order to ensure stability.

426 450 406 456 406 152 456 418 In some examples, user interface(or another user interface in deployment environment) enables userto enter indication of an errorif usernotices an error in textual transcript. Indication of an erroracts as a feedback signal, which is used by a trainerto improve rules or training data related to that identified error.

5 FIG. 9 FIG. 500 100 400 500 900 500 502 526 206 200 500 502 400 440 shows a flowchartillustrating exemplary operations that may be performed using arrangementand/or DPP pipeline customization tool. In some examples, operations described for flowchartare performed by computing deviceof. Flowchartcomprises operations-that customizes baseline (multi-stage) DPP pipelineinto (customized multi-stage) DPP pipeline. Flowchartcommences with operation, which includes receiving, by DPP pipeline customization tool, target format document.

504 442 440 430 Operationtransforms textof target format documentinto stream of tokens, each token representing an element of human speech in a lexical form.

506 200 430 214 206 430 506 430 224 508 206 444 430 430 Operationincludes receiving, by baseline DPP pipeline, stream of tokensand transforming, by base model stageof baseline DPP pipeline, a first aspect of stream of tokens(e.g., disfluency) from lexical form into display form. In some examples, operationalso transforms a second aspect of stream of tokens(e.g., ITN) from lexical form into display form with base model stage. In operationbaseline DPP pipelineoutputs baseline textrepresenting stream of tokens, based on at least transforming aspects of stream of tokens.

510 444 442 440 512 422 212 216 444 442 440 512 424 222 226 444 442 440 Operationdetermines at least a first difference between baseline textand textof target format document, and in some examples, also determines a second difference. Operationgenerates rulesfor upstream filterand downstream filter, based on at least the difference between baseline textand textof target format document. In some examples, operationgenerates rulesfor upstream filterand downstream filter, based on at least the difference between baseline textand textof target format document.

514 426 406 422 424 400 422 424 514 516 452 406 518 426 452 212 214 216 222 224 226 270 272 280 500 520 Operationprovides a user interfacefor userto accept the generated rulesand/or. DPP pipeline customization toolreceives acceptance of the generated rulesand/orin operation. Decision operationdetermines whether all of the generated rules are accepted as-is, or instead whether they are edited or authored rulesare submitted by user. If not accepted as-is, operationincludes receiving (via user interface) authored rulesfor at least one of: upstream filter, base model stage, downstream filter, upstream filter, base model stage, downstream filter, global pre-rewrite stage, global post-rewrite stage, or preserved phrase tagger. Otherwise, (e.g., if accepted as-is) flowchartmoves to operation.

520 200 212 214 216 222 224 226 270 272 280 206 450 212 214 216 222 224 226 270 272 280 200 122 Operationdeploys DPP pipelineby deploying at least one of: upstream filter, base model stage, downstream filter, upstream filter, base model stage, downstream filter, global pre-rewrite stage, global post-rewrite stage, or preserved phrase tagger. In some examples, baseline DPP pipelineis already in-place in deployment environment(i.e., already online), and so only the customized components need to be deployed, in order to update the online default (baseline) functionality. In some examples, at least one of upstream filter, base model stage, downstream filter, upstream filter, base model stage, downstream filter, global pre-rewrite stage, global post-rewrite stage, or preserved phrase taggercomprises an NN. Some examples limit dissemination of the deployed components of DPP pipelinebased on at least customer identifier.

522 406 426 454 200 206 206 406 206 200 206 Decision operationdetermines whether userdesires a version freeze with a specified version. This is determined by whether user interfacereceives version indicationto continue using an identified version of DPP pipeline(e.g., freezing to the specified version of baseline DPP pipeline). As described above, baseline DPP pipeline(the standard, default DPP pipeline) is composed of base stage models (e.g., ITN, capitalization, profanity), and useris able to customize the behavior of baseline DPP pipeline. However, the customization of DPP pipelineis defined as a deviation from the specific version of baseline DPP pipelinefor which the customized components (e.g., upstream and downstream filters) are developed.

206 406 200 206 200 206 When baseline DPP pipelinechanges, due to an update, it is possible that the deviation from the older behavior provides different results than the deviation from the updated behavior. Thus, usermay prefer to freeze the behavior of DPP pipelineto a stable, known behavior by freezing the version of baseline DPP pipelineupon which DPP pipelineis based. Thus, the ability to freeze to the current version of baseline DPP pipelineis provided, in some examples.

406 206 406 200 206 406 406 206 200 206 406 206 Further, some examples permit userto select a particular version of baseline DPP pipeline(from a plurality of recent versions) to use. This permits userto freeze DPP pipelinewhen customizing it, and then later, at some point after learning that baseline DPP pipelinehas been updated, userapplies the customizations to the updated version. In some scenarios, usermay prefer to skip several updates of baseline DPP pipeline, and update DPP pipelineon a more relaxed schedule. When several intervening versions of baseline DPP pipelineare available (e.g., between the specified frozen version and the most recent version), useris able to select any of those versions of baseline DPP pipelineto use.

524 206 200 606 600 206 454 200 526 206 406 200 600 6 FIG. To support this versioning control, operationsets a flag to prohibit automatic updates of baseline DPP pipeline. When executing DPP pipeline(during operationof flowchart) deployment environment will use enforce versioning control by read the deployed metadata and load the corresponding base model stages of the specified baseline DPP pipeline(according to version indication), and use the customized components (of DPP pipeline) together to serve SR requests. Otherwise, operationsets a flag to permit automatic updates of baseline DPP pipeline. Usermay now proceed to use DPP pipelineaccording to flowchartof.

6 FIG. 9 FIG. 600 200 600 918 102 104 120 104 110 602 120 104 604 130 shows a flowchartillustrating exemplary operations that may be performed when using a customized, multi-stage DPP pipeline. In some examples, operations described for flowchartare performed by computing deviceof. Microphonecaptures audio inputcomprising human speech, and SR componentreceives audio input(e.g., as plurality of audio segments), in operation. SR componentperforms an SR process on audio inputin operation, and outputs stream of tokens. Each token represents an element of human speech, such as a word or other element.

606 130 200 122 406 122 608 200 130 202 200 450 200 122 200 206 454 In some examples, operationlinks stream of tokensor audio input to DPP pipelinewith at least customer identifier. In such examples, usermay submit customer identifieralong with an SR request. Operationincludes receiving, by DPP pipeline, stream of tokens, each token representing an element of human speech in lexical form. In some examples, DPP pipelinecomprises at least two stages selected from the list consisting of: disfluency, ITN, reformulation, capitalization, profanity, and punctuation. In an example of a multi-user cloud setting, deployment environmentloads the latest customized components of DPP pipeline(e.g., identified using customer identifier) and assembles DPP pipelinemodel in real time using the version of baseline DPP pipelineidentified in version indication(if the version is frozen).

610 130 210 130 270 270 612 200 130 281 Operationincludes, prior to receiving stream of tokensby transformation stage, receiving stream of tokensby global pre-rewrite stage, and performing, by global pre-rewrite stage, KWS text removal or rule-based recognition error correction. Operationincludes, prior to at least one transformation stage of DPP pipeline, tagging stream of tokensto preserve a phrase with preserve phrase tag.

614 616 620 210 200 130 212 217 216 616 130 212 130 130 Operation, which comprises operations-, is performed for transformation stageof DPP pipeline, in which stream of tokensis received, in turn, by upstream filter, base model stage, and downstream filter. Operationalters stream of tokensby upstream filter. In some examples, this includes tagging stream of tokensor changing at least one token of stream of tokens.

618 130 214 620 130 216 212 214 Operationtransforms a first aspect (e.g., disfluency, ITN, or another) of stream of tokensfrom lexical form into display form, by base model stage. Operationalters stream of tokensby downstream filter. In some examples, this includes removing a tag added by upstream filter, or reformatting output of base model stage.

622 200 260 600 614 614 220 210 130 220 130 220 222 224 226 130 616 130 222 618 130 224 620 130 226 230 250 130 210 250 624 260 Decision operationdetermines whether there is another stage of DPP pipeline(except for explicit punctuation stage). If so, flowchartreturns to operation. In the second pass, operationreceives, by transformation stagefrom transformation stage, stream of tokensand transforms, by transformation stage, a second aspect of stream of tokensfrom lexical form into display form. In transformation stage, upstream filter, base model stage, and downstream filterreceive stream of tokensin turn. Operationalters stream of tokensby upstream filter; operationtransforms the second aspect of stream of tokensfrom lexical form into display form by base model stage; and operationalters stream of tokensby downstream filter. Other transformation stages-also transform aspects of stream of tokensaccording to their respective functionality. After transformation stages-are complete, operationperforms explicit punctuation transformation with explicit punctuation stage(although some examples use the three part transformation stage with the upstream and downstream filters and the base model stage).

200 130 272 130 626 626 272 200 260 281 130 After DPP pipelinetransforms the second aspect (and other aspects) of stream of tokens, global post-rewrite stagereceives stream of tokensin operation. Operationalso includes rewriting, by global post-rewrite stage, an output of a final transformation stage of DPP pipeline(e.g., explicit punctuation stageor another transformation stage). In some examples, this includes removing preserve phrase tag(if it had not been removed earlier) from stream of tokens, and replacing phrases.

628 140 130 130 140 140 140 140 152 160 152 160 102 104 120 200 160 170 130 202 204 200 930 102 160 Operationoutputs final text, representing stream of tokens, based on at least transforming multiple aspects of stream of tokens. In some examples, outputting final textcomprises outputting final textas a streaming output. In some examples, outputting final textcomprises outputting final textas textual transcriptand/or on display device. In some examples, textual transcriptis output on display device. In some examples, microphonethat captures audio input, SR component, DPP pipeline, and display deviceare all disposed on a common user device(e.g., a mobile device). In some examples, transforming stream of tokensfrom lexical forminto display formis performed without use of a network connection (e.g., without use of the Internet). In some examples, DPP pipelineis located across computer networkfrom microphoneand/or display device.

200 152 630 456 406 632 140 456 634 140 212 216 418 While DPP pipelineis being employed (e.g., while generating textual transcript), decision operationdetermines whether indication of an erroris received (e.g., from user). If so, operationincludes receiving an indication of an error in final text(e.g., receiving indication of an error). Operationincludes, based on at least the indication of an error in final text, training upstream filterand/or downstream filterwith trainer.

7 FIG. 9 FIG. 700 100 700 900 700 702 is a flowchartillustrating exemplary operations associated with arrangement. In some examples, operations described for flowchartare performed by computing deviceof. Flowchartcommences with operation, which includes receiving, by a customized multi-stage DPP pipeline, a stream of tokens, each token representing an element of human speech in a lexical form.

704 706 708 706 708 Operationis performed using operationsand, and includes, for a first transformation stage of the DPP pipeline, receiving the stream of tokens, in turn, by a first upstream filter, a first base model stage, and a first downstream filter. Operationincludes transforming, by the first base model stage, a first aspect of the stream of tokens from lexical form into display form. Operationincludes altering, by the first upstream filter, the first downstream filter, or both, the stream of tokens.

710 712 714 Operationincludes receiving, by a second transformation stage of the DPP pipeline, from the first transformation stage, the stream of tokens. Operationincludes transforming, by the second transformation stage, a second aspect of the stream of tokens from lexical form into display form. Operationincludes (based on at least transforming multiple aspects of the stream of tokens) outputting a final text representing the stream of tokens.

8 FIG. 9 FIG. 800 100 800 900 800 802 804 is a flowchartillustrating exemplary operations associated with arrangement. In some examples, operations described for flowchartare performed by computing deviceof. Flowchartcommences with operation, which includes receiving, by a DPP pipeline customization tool, a target format document. Operationincludes transforming text of the target format document into a stream of tokens, each token representing an element of human speech in a lexical form.

806 808 810 Operationincludes receiving, by a baseline multi-stage DPP pipeline, the stream of tokens. Operationincludes transforming, by a first base model stage of the baseline multi-stage DPP pipeline, the first aspect of the stream of tokens from lexical form into display form. Operationincludes transforming, by a second base model stage of the baseline multi-stage DPP pipeline, the second aspect of the stream of tokens from lexical form into display form.

812 814 816 Operationincludes, based on at least transforming multiple aspects of the stream of tokens, outputting, by the baseline multi-stage DPP pipeline, a baseline text representing the stream of tokens. Operationincludes determining at least a difference between the baseline text and text of the target format document. Operationincludes, based on at least the difference between the baseline text and text of the target format document, generating rules for the first upstream filter and the first downstream filter.

An example system comprises: a processor; and a computer-readable medium storing instructions that are operative upon execution by the processor to: receive, by a customized multi-stage display post processing (DPP) pipeline, a stream of tokens, each token representing an element of human speech in a lexical form; for a first transformation stage of the DPP pipeline, receive the stream of tokens (in turn), by a first upstream filter, a first base model stage, and a first downstream filter, and: transform, by the first base model stage, a first aspect of the stream of tokens from lexical form into display form; and alter, by the first upstream filter and/or the first downstream filter, the stream of tokens; receive, by a second transformation stage of the DPP pipeline, from the first transformation stage, the stream of tokens; transform, by the second transformation stage, a second aspect of the stream of tokens from lexical form into display form; and based on at least transforming multiple aspects of the stream of tokens, output a final text representing the stream of tokens.

An example computerized method comprises: receiving, by a customized multi-stage display post processing (DPP) pipeline, a stream of tokens, each token representing an element of human speech in a lexical form; for a first transformation stage of the DPP pipeline, receiving the stream of tokens (in turn), by a first upstream filter, a first base model stage, and a first downstream filter, and: transforming, by the first base model stage, a first aspect of the stream of tokens from lexical form into display form; and altering, by the first upstream filter and/or the first downstream filter, the stream of tokens; receiving, by a second transformation stage of the DPP pipeline, from the first transformation stage, the stream of tokens; transforming, by the second transformation stage, a second aspect of the stream of tokens from lexical form into display form; and based on at least transforming multiple aspects of the stream of tokens, outputting a final text representing the stream of tokens.

One or more example computer storage devices have computer-executable instructions stored thereon, which, on execution by a computer, cause the computer to perform operations comprising: receiving, by a customized multi-stage display post processing (DPP) pipeline, a stream of tokens, each token representing an element of human speech in a lexical form; for a first transformation stage of the DPP pipeline, receiving the stream of tokens, by a first upstream filter, a first base model stage, and a first downstream filter, and: transforming, by the first base model stage, a first aspect of the stream of tokens from lexical form into display form; and altering, by the first upstream filter and/or the first downstream filter, the stream of tokens; receiving, by a second transformation stage of the DPP pipeline, from the first transformation stage, the stream of tokens; transforming, by the second transformation stage, a second aspect of the stream of tokens from lexical form into display form; and based on at least transforming multiple aspects of the stream of tokens, outputting a final text representing the stream of tokens.

Another example computerized method comprises: receiving, by a DPP pipeline customization tool, a target format document; transforming text of the target format document into a stream of tokens, each token representing an element of human speech in a lexical form; receiving, by a baseline multi-stage DPP pipeline, the stream of tokens; transforming, by a first base model stage of the baseline multi-stage DPP pipeline, the first aspect of the stream of tokens from lexical form into display form; transforming, by a second base model stage of the baseline multi-stage DPP pipeline, the second aspect of the stream of tokens from lexical form into display form; based on at least transforming multiple aspects of the stream of tokens, outputting, by the baseline multi-stage DPP pipeline, a baseline text representing the stream of tokens; determining at least a difference between the baseline text and text of the target format document; and based on at least the difference between the baseline text and text of the target format document, generating rules for the first upstream filter and the first downstream filter. This additional example method may further be implemented on a system with a processor and a computer-readable medium, and/or on one or more computer storage devices.

transforming the second aspect of the stream of tokens from lexical form into display form comprises, for the second transformation stage, receiving the stream of tokens, by a second upstream filter, a second base model stage, and a second downstream filter; transforming, by the second base model stage, the second aspect of the stream of tokens from lexical form into display form; altering, by the second upstream filter and/or the second downstream filter, the stream of tokens; altering the stream of tokens by the first upstream filter comprises tagging the stream of tokens; altering the stream of tokens by the first upstream filter comprises changing at least one token of the stream of tokens; altering the stream of tokens by the first downstream filter comprises removing a tag added by the first upstream filter; altering the stream of tokens by the first downstream filter comprises reformatting output of the first base model stage; the DPP pipeline comprises at least two stages selected from the list consisting of disfluency, ITN, reformulation, capitalization, profanity, and punctuation; prior to receiving the stream of tokens by the first transformation stage, receiving the stream of tokens by a global pre-rewrite stage; performing, by the global pre-rewrite stage, KWS text removal or rule-based recognition error correction; after transforming the second aspect of the stream of tokens by the second transformation stage, receiving the stream of tokens by a global post-rewrite stage; rewriting, by the global post-rewrite stage, an output of a final transformation stage of the DPP pipeline; prior to at least one transformation stage of the DPP pipeline, tagging the stream of tokens to preserve a phrase with a preserve phrase tag; removing the preserve phrase tag from the stream of tokens; capturing an audio input comprising human speech; receiving, by an SR component, the audio input; performing, by the SR component, an SR process on the audio input; outputting, by the SR component, the stream of tokens; at least one of the first upstream filter, the first base model stage, the first downstream filter, the second upstream filter, the second base model stage, the second downstream filter, the global pre-rewrite stage, the global post-rewrite stage, or the preserved phrase tagger comprises an NN; outputting the final text comprises outputting the final text as a streaming output; outputting the final text as a textual transcript; outputting the final text on a display device; outputting the textual transcript on a display device; the microphone that captures the audio input, the SR component, the DPP pipeline, and the display device are all disposed on a common mobile device; transforming the stream of tokens from lexical form into display form is performed without use of an internet connection; the DPP pipeline is located across a computer network from the microphone that captures the audio input and/or the display device; determining at least a second difference between the baseline text and text of the target format document; based on at least the second difference between the baseline text and text of the target format document, generating rules for the second upstream filter and the second downstream filter. providing a user interface for a user to accept the generated rules or input authored rules; receiving acceptance of the generated rules; receiving authored rules for at least one of: the first upstream filter, the first base model stage, the first downstream filter, the second upstream filter, the second base model stage, the second downstream filter, the global pre-rewrite stage, the global post-rewrite stage, or the preserved phrase tagger; customizing the baseline multi-stage DPP pipeline into the customized multi-stage DPP pipeline; customizing the baseline multi-stage DPP pipeline comprises deploying at least one of: the first upstream filter, the first base model stage, the first downstream filter, the second upstream filter, the second base model stage, the second downstream filter, the global pre-rewrite stage, the global post-rewrite stage, or the preserved phrase tagger; linking the stream of tokens or audio input to the customized multi-stage DPP pipeline with at least a customer identifier; limiting dissemination of the deployed components of the customized multi-stage DPP pipeline based on at least the customer identifier; receiving an indication to continue using an identified version of a DPP pipeline; receiving an indication of an error in the final text; and based on at least the indication of an error in the final text, training the first upstream filter and/or the first downstream filter. Alternatively, or in addition to the other examples described herein, examples include any combination of the following:

While the aspects of the disclosure have been described in terms of various examples with their associated operations, a person skilled in the art would appreciate that a combination of operations from any number of different examples is also within scope of the aspects of the disclosure.

9 FIG. 900 900 900 900 900 is a block diagram of an example computing devicefor implementing aspects disclosed herein, and is designated generally as computing device. In some examples, one or more computing devicesare provided for an on-premises computing solution. In some examples, one or more computing devicesare provided as a cloud computing solution. In some examples, a combination of on-premises and cloud computing solutions are used. Computing deviceis but one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the examples disclosed herein, whether used singly or as part of a larger set.

900 Neither should computing devicebe interpreted as having any dependency or requirement relating to any one or combination of components/modules illustrated. The examples disclosed herein may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program components, being executed by a computer or other machine, such as a personal data assistant or other handheld device. Generally, program components including routines, programs, objects, components, data structures, and the like, refer to code that performs particular tasks, or implement particular abstract data types. The disclosed examples may be practiced in a variety of system configurations, including personal computers, laptops, smart phones, mobile tablets, hand-held devices, consumer electronics, specialty computing devices, etc. The disclosed examples may also be practiced in distributed computing environments when tasks are performed by remote-processing devices that are linked through a communications network.

900 910 912 914 916 918 920 922 924 900 900 912 914 Computing deviceincludes a busthat directly or indirectly couples the following devices: computer storage memory, one or more processors, one or more presentation components, input/output (I/O) ports, I/O components, a power supply, and a network component. While computing deviceis depicted as a seemingly single device, multiple computing devicesmay work together and share the depicted device resources. For example, memorymay be distributed across multiple devices, and processor(s)may be housed with different devices.

910 912 900 912 912 912 912 914 9 FIG. 9 FIG. a b Busrepresents what may be one or more busses (such as an address bus, data bus, or a combination thereof). Although the various blocks ofare shown with lines for the sake of clarity, delineating various components may be accomplished with alternative representations. For example, a presentation component such as a display device is an I/O component in some examples, and some examples of processors have their own memory. Distinction is not made between such categories as “workstation,” “server,” “laptop,” “hand-held device,” etc., as all are contemplated within the scope ofand the references herein to a “computing device.” Memorymay take the form of the computer storage media referenced below and operatively provide storage of computer-readable instructions, data structures, program modules and other data for the computing device. In some examples, memorystores one or more of an operating system, a universal application platform, or other program modules and program data. Memoryis thus able to store and access dataand instructionsthat are executable by processorand configured to carry out the various operations disclosed herein.

912 912 900 912 900 900 912 900 900 912 9 FIG. In some examples, memoryincludes computer storage media. Memorymay include any quantity of memory associated with or accessible by the computing device. Memorymay be internal to the computing device(as shown in), external to the computing device(not shown), or both (not shown). Additionally, or alternatively, the memorymay be distributed across multiple computing devices, for example, in a virtualized environment in which instruction processing is carried out on multiple computing devices. For the purposes of this disclosure, “computer storage media,” “computer-storage memory,” “memory,” and “memory devices” are synonymous terms for the computer-storage memory, and none of these terms include carrier waves or propagating signaling.

914 912 920 914 900 900 914 914 900 900 916 900 918 900 920 920 Processor(s)may include any quantity of processing units that read data from various entities, such as memoryor I/O components. Specifically, processor(s)are programmed to execute computer-executable instructions for implementing aspects of the disclosure. The instructions may be performed by the processor, by multiple processors within the computing device, or by a processor external to the client computing device. In some examples, the processor(s)are programmed to execute instructions such as those illustrated in the flow charts discussed below and depicted in the accompanying drawings. Moreover, in some examples, the processor(s)represent an implementation of analog techniques to perform the operations described herein. For example, the operations may be performed by an analog client computing deviceand/or a digital client computing device. Presentation component(s)present data indications to a user or other device. Exemplary presentation components include a display device, speaker, printing component, vibrating component, etc. One skilled in the art will understand and appreciate that computer data may be presented in a number of ways, such as visually in a graphical user interface (GUI), audibly through speakers, wirelessly between computing devices, across a wired connection, or in other ways. I/O portsallow computing deviceto be logically coupled to other devices including I/O components, some of which may be built in. Example I/O componentsinclude, for example but without limitation, a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, etc.

900 924 924 900 924 924 926 926 928 930 926 926 a a Computing devicemay operate in a networked environment via the network componentusing logical connections to one or more remote computers. In some examples, the network componentincludes a network interface card and/or computer-executable instructions (e.g., a driver) for operating the network interface card. Communication between the computing deviceand other devices may occur using any protocol or mechanism over any wired or wireless connection. In some examples, network componentis operable to communicate data over public, private, or hybrid (public and private) using a transfer protocol, between devices wirelessly using short range communication technologies (e.g., near-field communication (NFC), Bluetooth™ branded communications, or the like), or a combination thereof. Network componentcommunicates over wireless communication linkand/or a wired communication linkto a remote resource(e.g., a cloud resource) across network. Various different examples of communication linksandinclude a wireless connection, a wired connection, and/or a dedicated link, and in some examples, at least a portion is routed through the internet.

900 Although described in connection with an example computing device, examples of the disclosure are capable of implementation with numerous other general-purpose or special-purpose computing system environments, configurations, or devices. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with aspects of the disclosure include, but are not limited to, smart phones, mobile tablets, mobile computing devices, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, gaming consoles, microprocessor-based systems, set top boxes, programmable consumer electronics, mobile telephones, mobile computing and/or communication devices in wearable or accessory form factors (e.g., watches, glasses, headsets, or earphones), network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, virtual reality (VR) devices, augmented reality (AR) devices, mixed reality devices, holographic device, and the like. Such systems or devices may accept input from the user in any way, including from input devices such as a keyboard or pointing device, via gesture input, proximity input (such as by hovering), and/or via voice input.

Examples of the disclosure may be described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices in software, firmware, hardware, or a combination thereof. The computer-executable instructions may be organized into one or more computer-executable components or modules. Generally, program modules include, but are not limited to, routines, programs, objects, components, and data structures that perform particular tasks or implement particular abstract data types. Aspects of the disclosure may be implemented with any number and organization of such components or modules. For example, aspects of the disclosure are not limited to the specific computer-executable instructions or the specific components or modules illustrated in the figures and described herein. Other examples of the disclosure may include different computer-executable instructions or components having more or less functionality than illustrated and described herein. In examples involving a general-purpose computer, aspects of the disclosure transform the general-purpose computer into a special-purpose computing device when configured to execute the instructions described herein.

By way of example and not limitation, computer readable media comprise computer storage media and communication media. Computer storage media include volatile and nonvolatile, removable and non-removable memory implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, or the like. Computer storage media are tangible and mutually exclusive to communication media. Computer storage media are implemented in hardware and exclude carrier waves and propagated signals. Computer storage media for purposes of this disclosure are not signals per se. Exemplary computer storage media include hard disks, flash drives, solid-state memory, phase change random-access memory (PRAM), static random-access memory (SRAM), dynamic random-access memory (DRAM), other types of random-access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that may be used to store information for access by a computing device. In contrast, communication media typically embody computer readable instructions, data structures, program modules, or the like in a modulated data signal such as a carrier wave or other transport mechanism and include any information delivery media.

The order of execution or performance of the operations in examples of the disclosure illustrated and described herein is not essential, and may be performed in different sequential manners in various examples. For example, it is contemplated that executing or performing a particular operation before, contemporaneously with, or after another operation is within the scope of aspects of the disclosure. When introducing elements of aspects of the disclosure or the examples thereof, the articles “a,” “an,” “the,” and “said” are intended to mean that there are one or more of the elements. The terms “comprising,” “including,” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements. The term “exemplary” is intended to mean “an example of.” The phrase “one or more of the following: A, B, and C” means “at least one of A and/or at least one of B and/or at least one of C.”

Having described aspects of the disclosure in detail, it will be apparent that modifications and variations are possible without departing from the scope of aspects of the disclosure as defined in the appended claims. As various changes could be made in the above constructions, products, and methods without departing from the scope of aspects of the disclosure, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

October 10, 2025

Publication Date

March 26, 2026

Inventors

Wei LIU
Padma VARADHARAJAN
Piyush BEHRE
Nicholas KIBRE
Edward C. LIN
Shuangyu CHANG
Che ZHAO
Khuram SHAHID
Heiko Willy RAHMEL

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “CUSTOM DISPLAY POST PROCESSING IN SPEECH RECOGNITION” (US-20260087235-A1). https://patentable.app/patents/US-20260087235-A1

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.