Patentable/Patents/US-20250298500-A1

US-20250298500-A1

Tracking Provenance of Content from a Generative Model

PublishedSeptember 25, 2025

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A computing system is provided that includes processing circuitry and associated memory. The processing circuitry is configured to implement a program using portions of the associated memory, to receive, via an edit operation, digital content and provenance metadata associated with the digital content. The processing circuitry implementing the program is further configured to determine, via a provenance determination module, that a textual portion of the digital content is model-generated and originated from a generative model, based on the provenance metadata, and output the digital content to a graphical user interface with a visual indication that the textual portion of the digital content is model-generated.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

. A computing system for tracking provenance of model-generated content, comprising:

. The computing system of, wherein the program is further configured to:

. The computing system of, wherein the model-generated digital content and non-model-generated other digital content are incorporated in an aggregated document.

. The computing system of, wherein the graphical user interface includes a document region and a review region and the aggregated document with the model generated content and the non-model generated content is displayed in the document region and a model-generated icon is displayed in the review region adjacent the model generated content.

. The computing system of, wherein a non-model generated icon is displayed in the review region adjacent the non-model generated content.

. The computing system of, wherein the graphical user interface includes a selector configured to receive a user input to selectively enable and disable tracking of model-generated content.

. The computing system of, wherein

. The computing system of, wherein the program includes an attribution tracking module configured with a model attribution threshold, the attribution tracking module being configured to remove a model generated attribution of the model-generated content upon determining that the model generated content has been edited to an extent that the model attribution threshold is no longer met.

. The computing system of, wherein determining includes determining, on a per-character, basis whether each of a plurality of textual characters in the digital content is model generated.

. The computing system of, wherein the provenance metadata includes a character encoding that encodes the indication of the model-generated textual characters on a per-character basis.

. The computing system of, wherein the provenance metadata comprises surrogate character encoding or Unicode code plane encoding indicating the model-generated textual characters.

. The computing system of, wherein

. The computing system of, wherein the program is configured to output the digital content with an indication of the textual portion that is model-generated, at least in part by:

. The computing system of, wherein the program is configured to receive the digital content from:

. The computing system of, wherein the provenance metadata includes a creation date, a model version identifier, and/or prompt used to generate the portion of the digital content.

. The computing system of, wherein the provenance metadata is encrypted and/or digitally signed and stored in a manifest associated with the document.

. The computing system of, wherein the provenance determination module is configured to make the provenance determination by detecting a character encoding indicative of text originating from a generative model, on a per-character basis.

. The computing system of, wherein the provenance determination module is configured to make the provenance determination by detecting metadata of the digital content indicating AI generation.

. The computing system of, wherein the provenance determination module reads attestation data in a manifest that is encrypted and/or signed by a digital signature to verify a source of the model-generated digital content.

. The computing system of, wherein the provenance metadata includes license information.

. The computing system of, wherein the provenance metadata comprises model version, model history, and/or prompts or prompt-related context used to generate the portions of the digital content generated by a generative model.

. The computing system of, wherein the program is configured to display a regeneration option using the model version and the model history, along with the indications of model generated content within the digital content.

. A computerized method for tracking provenance of model generated content, comprising:

. The computerized method of, further comprising:

. The computerized method of, wherein the model-generated digital content and non-model-generated other digital content are incorporated in an aggregated document.

. The computerized method of, wherein the graphical user interface includes a document region and a review region and the aggregated document with the model generated content and the non-model generated content is displayed in the document region and a model-generated icon is displayed in the review region adjacent the model generated content.

. The computerized method of, wherein a non-model generated icon is displayed in the review region adjacent the non-model generated content.

. The computerized method of, wherein the graphical user interface includes a selector configured to receive a user input to selectively enable and disable tracking of model-generated content.

. The computerized method of, wherein

. The computerized method of, wherein the program includes an attribution tracking module configured with a model attribution threshold, the attribution tracking module being configured to remove a model generated attribution of the model-generated content upon determining that the model generated content has been edited to an extent that the model attribution threshold is no longer met.

. The computerized method of, wherein determining includes determining, on a per-character, basis whether each of a plurality of textual characters in the digital content is model generated.

. The computerized method of, wherein the provenance metadata includes a character encoding that encodes the indication of the model-generated textual characters on a per-character basis.

. The computerized method of, wherein the provenance metadata comprises surrogate character encoding or Unicode code plane encoding indicating the model-generated textual characters.

. The computerized method of, wherein

. The computerized method of, wherein the program is configured to output the digital content with an indication of the textual portion that is model-generated, at least in part by:

. The computerized method of, wherein the program is configured to receive the digital content from:

. The computerized method of, wherein the provenance metadata includes a creation date, a model version identifier, and/or prompt used to generate the portion of the digital content.

. The computerized method of, wherein the provenance metadata is encrypted and/or digitally signed and stored in a manifest associated with the document.

. The computerized method of, wherein the provenance determination module is configured to make the provenance determination by detecting a character encoding indicative of text originating from a generative model, on a per-character basis.

. The computerized method of, wherein the provenance determination module is configured to make the provenance determination by detecting metadata of the digital content indicating AI generation.

. The computerized method of, wherein the provenance determination module reads attestation data in a manifest that is encrypted and/or signed by a digital signature to verify a source of the model-generated digital content.

. The computerized method of, wherein the provenance metadata includes license information.

. The computerized method of, wherein the provenance metadata comprises model version, model history, and/or prompts or prompt-related context used to generate the portions of the digital content generated by a generative model.

. The computerized method of, wherein the program is configured to display a regeneration option using the model version and the model history, along with the indications of model generated content within the digital content.

. A computing system, comprising:

. A computerized method, comprising:

Detailed Description

Complete technical specification and implementation details from the patent document.

Generative models are machine learning models that are trained to create digital content, such as computer generated text and images. Recently, large language models (LLMs) and multi-modal models based on generative pre-trained transformers have been developed that produce increasingly useful and human-like output. These models have achieved widespread adoption and the digital content generated by these generative models is increasing rapidly. As a result, many users encounter model-generated content in their daily lives from many different sources. When a user directly interacts with such a model, the user can easily understand they are viewing model-generated content. However, when a user encounters model-generated content that appears in other locations, such as in a document authored by other persons, the user may be completely unaware that the content was generated by a generative model. A technical challenge exists in determining the provenance of such digital content and identifying whether it originates from a digital model, and tracking that provenance as the digital content is manipulated by users via edit operations.

To address the above issues, a computing system is provided that includes processing circuitry and associated memory. The processing circuitry is configured to implement a program using portions of the associated memory, to receive, via an edit operation, digital content and provenance metadata associated with the digital content. The processing circuitry implementing the program is further configured to determine, via a provenance determination module, that a textual portion of the digital content is model-generated and originated from a generative model, based on the provenance metadata, and output the digital content to a graphical user interface with a visual indication that the textual portion of the digital content is model-generated.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.

The identification of model-generated content generated via generative models, in particular, generative large language models (LLMs), that produce human-like text presents a technical challenge. Generative models have improved dramatically in recent years, and the digital content provided by such models can be difficult or impossible to distinguish from human- or user-generated content using conventional techniques. For example, one existing technique uses a classifier to analyze text for word patterns that often occur in AI-generated text. However, the use of such a classifier is problematic since generative models are constantly being improved and their output is becoming more human-like. In addition, classifiers developed for the purpose of detecting AI-generated text can be employed as part of the training and refinement of AI text generators via an optimization process that shapes the generators into generating text that is difficult to detect. Further, such classifiers may become more difficult to develop as there is the potential for model collapse scenarios in which training data for such classifiers is polluted by the output of generative models. Model collapse could occur if training data includes too much data that is supposed to be human generated but is actually AI generated. A classifier trained on such a polluted training data set would suffer a loss in prediction accuracy.

Several techniques have been proposed for tracking AI generated content, including AI watermarking techniques that insert special characters or word patterns into AI output, and digitally signed and encrypted manifest techniques that create a secure manifest containing attestations for AI content contained in an associated file. However, such approaches to attaching provenance information or marking with stenographic watermarks do not address the technical challenges of tracking AI content when a user incorporates AI content into documents created using productivity applications, such as word processors, spreadsheet programs, slide deck presentation software, email and messaging applications, etc. in the absence of watermarking and provenance machinery- or when the applications or post-processing enables the stripping of the identifying content. One particular challenge is that during edit operations in which a portion of AI generated content is incorporated into a new document, AI watermark or provenance information can be lost, truncated, or misinterpreted. Another challenge is that information from different AI sources may use different and incompatible AI watermarking or manifest techniques. Another challenge and need is to track and understand AI versus human-generated content or human-edits to note and track attribution of input and creativity. As such, a digital artifact such as a document may have multiple AI-based and human contributions and refinements, which are dispersed throughout where different portions of text have unique histories of edits and refinements.

To address these issues,schematically shows a computing systemfor use in determining provenance of model-generated digital contentfrom a generative model, according to one embodiment of the present disclosure. Computing systemincludes a computing deviceincluding processing circuitryand associated memory. The processing circuitryis configured to implement a programusing portions of memoryto receive, via an edit operation, model-generated digital contentand provenance metadataassociated with the model-generated digital content. The edit operation may be a copy and paste, cut and paste, insert, import, file open, or other operation that causes content from the generative model to be received by the programand incorporated into a document. The documentmay be a word processing document, slide presentation, spreadsheet, web page, email or other message, drawing, social media post, etc., created by the user using the program. Typically, a user who is using the programperforms the edit operation, although in some cases the edit operationmay be performed programmatically by the program. The user may be an authenticated user who has been authenticated and granted access privileges to execute the programby an access control system of an operating system of the computing device.

The documentmay include both model-generated digital content and other digital contentthat is not model generated aggregated in a single document, referred to as an aggregated document. In response to a user command, the program is configured to receive the model-generated digital content and incorporate it into the aggregated documentvia the edit operation. The edit operationis recorded in a first entryA in an edit historyassociated with the aggregated document. Further, in response to another user command to perform another edit operationA, the program is configured to receive other digital contentthat is not model generated (i.e., non-model generated content) and incorporate it in the aggregated document. The edit operationA incorporating the non-model generated contentinto the aggregated documentis also recorded as a second entryB in the edit history.

Programis further configured to determine, via a provenance determination module, that the model-generated digital contentis model-generated and originates from generative modelbased on the provenance metadata. Detailed techniques for making this determination are described below. Upon making this determination, the programis configured to output the model-generated digital contentto a program graphical user interfacewith a visual indicationthat the model-generated digital contentis model-generated. To keep track of the various edit operations made to document, programincludes an attribution tracking moduleconfigured to attribute edit operations in an edit historyto users who performed the edit operations. Each edit operation is listed as an entry in the edit history, and includes the text inserted or deleted, the user making the edit operation, the time of the edit operation, etc. The attribution tracking module can be configured to authorship not only to users, but also to generative models. To accomplish this, an AI tracking settingA can be set by a user to instruct the attribution tracking moduleto selectively track or not track model-generated content included in the documentvia edit operations, as described below.

In addition to model-generated digital content, the programis also configured to receive other digital content, which may be non-model generated content authored by a user, for example. Following receipt of the other digital content, the programis configured to determine, via the provenance determination module, that a textual portion of other digital contentis not model-generated. Upon making this determination, the program is configured to output the other digital contentin a visually distinctive manner to the displayed model-generated content. In, model-generated contentfrom a GPT-4 model and other digital contentfrom Alice are displayed in the program graphical user interface, and the use of differently labeled icons (GPT-4 vs. Alice) is the visually distinctive manner. Various other forms of visual distinction can be employed, such as different colors, fonts, highlighting, bounding boxes, underlining, side bars, etc., as discussed below.

The program graphical user interfaceincludes a document regionand a review region. The model-generated contentand the non-model generated contentare displayed in the document region. A model-generated iconA, labeled with the name (GPT-4) of generative model, is displayed in the review regionadjacent the model-generated contentin the document region. Further, a non-model generated iconB, labeled with the user (Alice) who authored the non-model generated content, is displayed in the review regionadjacent to the non-model generated contentin the document region.

The program graphical user interfaceincludes a selectorconfigured to receive a user input to selectively enable and disable tracking of model-generated content. The selectoris illustrated by way of example as a button with on and off positions, but may take various other forms. When the user clicks or presses selectorto activate tracking of model-generated content, the AI-tracking settingA of the attribution tracking module is set to YES, to enable tracking of model-generated content in the manner described above using the provenance determination module. If the setting is set to NO, then edit operations including model-generate digital contentare included in the edit history without any indication that the content of the edit operation originated from generative model, only that it originated from a user.

As shown in, the attribution tracking modulecan be configured to attribute the model-generated digital contentto a user who performed the edit operationvia which the model-generated digital contentwas received and incorporated into the aggregated document. For example, suppose that Bob is an authenticated user of the computing device, and Bob performed the edit operation(e.g., a copy/paste operation) and copied in the model-generated digital contentfrom a GPT-4 model as generative model. In such a case, a user iconC is displayed adjacent the model iconD, to indicate that Bob was the user who performed the edit operationthat copied in the GPT-4 generated content.

The attribution tracking modulecan be configured with a model attribution threshold. Further, the attribution tracking modulecan be configured to remove a model generated attribution of the model-generated digital contentupon determining that the model generated digital contenthas been edited to an extent that the model attribution threshold is no longer met (N atA). In the depicted example at the bottom of, Bob has extensively edited the model-generated digital contentto be sufficiently dissimilar to the original model generated content as to have the model attribution removed. Thus, only Bob's name appears in attribution iconC, and model iconD is not displayed. Further, a corresponding entry in the edit history(see) would be modified to remove the model attribution, and the text would henceforth treated by programas non-model generated digital content. The model attribution threshold defines a degree to which the model-generated digital contentmust retain its integrity to be classified as model-generated digital content. In one example, the model attribution threshold can be a number of edits. In another example, the model attribution threshold can be a percentage or density of model-generated words or characters retained following user edits as compared to the originally received model-generated content. In yet another example, the model attribution threshold can be a textual similarity score between the original model-generated digital content received by the program and the edited version of it after being edited by a user or users. As an alternative to utilization of a model attribution threshold, the attribution tracking modulemay be configured to retain the full edit historyand indicate attribution not based on an attribution threshold as describe above, but rather directly based on the information in the full edit history. With this approach, a user can see the exact human user or model source for a particular piece content, even after complex editing has occurred. Exact attribution based on the full edit history can be indicated even for short snippets of text, for example on a character by character basis if desired.

Techniques for determining the provenance of the model generated content will now be described in greater detail. Devices and methods are discussed below that enable tracking of the provenance of model-generated content (e.g., generative LLM character strings and multi-modal content including such strings) using, for example, Unicode or similar encoding schemes to identify which characters in a digital content are model-generated. In one example, each character is encoded so as to effectively create a mirror of a commonly used Unicode code point with an AI-specific tag or encoding that indicates a model-generated character. Leveraging this encoding, an application program can identify AI-generated content via the encodings yet continue to render and handle the content in a manner that is compatible with and does not interfere with character rendering by existing software programs. Notably, this encoding method does not add additional characters, such as markup tags to the model-generated text. This has the technical advantage of improved interoperability with existing programs not configured to recognize such tags, and also prevents interference that such tags may cause when used as training data text for other AI models. The identification of model-generated content in the described manner allows for filtering of AI content, identification of AI-generated content in non-AI contexts and intermixed with user-generated/non-AI-generated content, and verification processes for content of unknown origin. In addition or as an alternative, a manifest based approach to encoding attestations of model provenance in a digitally signed and encrypted manifest file may be used to represent and store the provenance metadata.

schematically shows a detailed view of computing systemincluding client computing deviceand a server computing deviceconfigured to communicate with the client computing devicevia a communication network. In some configurations explained in more detail below, the computing systemmay include a server computing deviceconnected to networkand configured to act as an interface between the server computing deviceand client computing device.

The computing systemfeatures an implementation of provenance determination moduleon client computing devicefor receiving model-generated digital contentand provenance metadataassociated with the digital content from the server computing device, according to an exemplary embodiment. In other configurations, the provenance determination modulecan be executed on the server computing device, or other computing device.

also shows that the computing systemcomprises processing circuitryand associated memoryas described above, the processing circuitrybeing configured to implement provenance determination module, to receive digital contentand provenance metadataassociated with the digital content. The content may be received, for example, via a first pathway shown at (), corresponding to model-generated digital contentin a documentreceived by programfrom a third party computing device), or via a second pathway shown at (), corresponding to model-generated digital contentreceived first by a model interface graphical user interface (GUI)from a model interface serveror generative model serverand then by program, or via a third pathway shown at (), corresponding to model-generated digital contentreceived by a programvia a copilot moduleof program. Other pathways for receipt of the digital content by programmay also be provided.

The processing circuitrymay also be configured to determine, on a per-character basis, whether each of a plurality of textual characters in the digital contentis model-generated based on the provenance metadata, and output the digital contentwith indications (e.g., character encoding with an AI indicator) of the textual characters that are model-generated. As shown, the processing circuitryand memorymay be incorporated in a client computing device, and the provenance metadatamay be generated by a provenance metadata generation moduleorexecuted on a server computing deviceor, from which the digital content was received. Also as shown, the server computing device may comprise a generative model server, or the server computing device may comprise a model interface serverconfigured to execute a model interface, the model interfacebeing configured to interface with the generative modelon the generative model server.

For example, in the first pathway (), documentmay be an external document that includes digital contentalong with provenance metadata(for AI-generated characters in the document). The documentmay be saved in one of a variety of formats, including a word processing format, email or message format, a spreadsheet format, a database file format, a drawing format, a presentation format, a notepad format, etc.), and the provenance metadatamay be included in the file format. The documentmay be received by program, which may be, for example, a productivity program such as a word processor program, email or messaging program, spreadsheet program, database program, drawing program, presentation generation program, notepad program, etc. The programmay receive the documentby opening the document, converting the document via an insertion tool, via a copy/paste (or cut/paste) operation carried out via a system clipboard, or some other transfer of the document information from another program. The documentmay be sent by third party computing deviceover computing networkto client computing device, and stored in the memory of client computing deviceprior to being received program.

As illustrated in, the first pathway () for receiving the digital content with provenance metadata, includes the third party computing devicereceiving the model-generated digital contentwith associated provenance metadatavia a generative model serveror a model interface server, either of which may have received provenance metadatafrom a provenance metadata generation moduleor. Also, as shown, a provenance determination modulemay receive the model-generated digital contentwith associated provenance metadatavia the document, determine, for example on a per-character basis, whether each of a plurality of textual characters in the digital contentis model-generated based on the provenance metadata, and output the digital contentwith indications (e.g., character encoding with an AI indicator) of the textual characters that are model-generated.

In a second pathway (), a model interface GUIexecuted on a client computing devicemay be in communication with generative model serverdirectly or via a model interface server, to receive model-generated digital contentwith associated provenance metadata, in response to one or more promptscommunicated from the model interface GUIto the model interface serveror generative model server. In the example illustrated in, one or more promptsinputted by a user into a turn-based chatbot interface are sent directly or indirectly via model interface serverto generative model. The prompts, when inputted into the generative modelexecuted the server computing device, causes the generative modelto output model-generated digital content. The model generated digital content is returned to the client computing deviceand displayed at the model interface GUIin dialogue box. A provenance metadata generation module,is configured to generate provenance metadata, which is also sent from server computing deviceor server computing deviceto the client computing device. As shown, a provenance determination modulemay receive the model-generated digital contentwith associated provenance metadatavia the dialogue box(or via a clipboard or cut/paste or copy/paste function from the model interface GUI), determine, for example on a per-character basis, whether each of the plurality of textual characters in the digital contentcopied from the model interface GUI is model-generated based on the provenance metadata, and output, via a paste function to the destination programthe digital contentwith indications of the textual characters that are model-generated, examples of which are shown atandindescribed below.

In a third pathway (), a copilot AI module(e.g., configured as a chatbot) of programexecuted on the client computing devicemay be in communication with the generative model servereither directly or indirectly via the model interface server, to receive model-generated digital contentwith associated provenance metadata. As shown, the provenance determination modulemay receive the model-generated digital contentwith associated provenance metadatafrom the copilot AI moduleof the program. Similar to the process described for pathways () and () above, the provenance determination moduleis configured upon receipt from the copilot moduleto determine, for example on a per-character basis, whether each of a plurality of textual characters in the digital contentis model-generated based on the provenance metadata, and output the digital contentwith indications (see, e.g.,andin) of the textual characters that are model-generated. For example, as shown, the provenance determination modulemay provide the model-generated digital contentfor display with visual provenance indicationon a document paneof the program, or at another location such as in the pane of the copilot module.

Other pathways may be used, in addition to those discussed above or illustrated with respect to. As one example, the provenance metadata generation moduleand the provenance metadata determination modulemay be executed on a same device. In other examples, the model interfaceand the generative modelmay be executed on a same server computing device. Further, it will be appreciated that the form of the client computing device is not limited, and thus the pathways may traverse smartphone or tablet computing devices as the client computing device, in addition to desktop, laptop or other computing devices. The provenance determination modulemay be implemented as a portion of program, a service of an operating system of the client computing device, a web service, a portion of the browser implementing the model interface GUI, a portion of the copilot module, etc.

Continuing with, the provenance determination modulemay further be configured to output the digital contentwith indications (e.g.,A,B ininin) of the textual characters that are model-generated, at least in part by formatting the digital content for display using the provenance metadata, with the formatted digital content including a visual provenance indication, schematically indicated inas, labeling the model-generated portion of the digital content, and outputting the formatted digital content including the visual provenance indicationto a display. The visual provenance indicationis shown as a star in, but it will be appreciated that this is a schematic illustration and the actual graphical appearance of the indicationis not so limited. As a few examples, the indicationmay include forms of emphasis such as highlighting, underlining, sidebars, comment bubbles, bold, italic, redlining (or other font color), text box, font size, font, etc., of the content determined to be model-generated. Typically, these indications are displayed on a per-character basis, to visually indicate which characters are model generated. For example, “Hello world\0” may be a model-generated string. In one embodiment, each of the characters spelling out the words HELLO WORLD, including the space between the two words, may include encoding that, when interpreted by a destination program capable of receiving provenance-tracked content, permits rendering the characters to appear highlighted or underlined, or highlighted and/or underlined, along with a visual message that pops up indicated the (highlighted and/or underlined) characters are model-generated. In another example, a user (such as user of the provenance-tracking capable program) may add other contentsuch that the string becomes “Hello world, my name is Bob.\0”, which is a combination of model-generated content (i.e., the characters in the string “Hello world\0”) and human/user-generated content (i.e., the characters in the string “, my name is Bob.\0”). In one example, the model-generated content may be highlighted (with a pop up indication that the characters are model-generated) and the user-generated content may be underlined.

The provenance determination modulemay be configured to receive the digital content from a copilot moduleprovided in a productivity application, a browser, a social media application, or a game program executed by the processing circuitry, an instance of a generative modelassociated with a model interface GUIdisplayed by the processing circuitry, a model interfaceassociated with the model interface GUIdisplayed by the processing circuitry, a clipboard program executed by the processing circuitry, or a document.

is a detailed schematic illustration of an example implementation of the generative modelof. As shown, inputmay be transmitted from client computing deviceor third party deviceto the server computing deviceexecuting the generative model. The inputis typically a prompt such as promptdescribed above. The generative modelmay include a generative pre-trained transformer (GPT), which, upon receiving the inputproduces the as outputmodel-generated digital content. It will be appreciated that the generative model can be a LLM having tens of millions to billions of parameters, non-limiting examples of which include GPT-3, BLOOM, and LLaMa-2. The generative model can be a multi-modal generative model configured to receive multi-modal input including natural language text input as a first mode of input and image, video, or audio as a second mode of input, and generate output including natural language text based on the multi-modal input. The output of the multi-modal model may additionally include a second mode of output such as image, video, or audio output. Non-limiting examples of multi-modal generative models include Kosmos-2 and GPT-4 VISUAL. Further, the generative pre-trained transformer, can be, for example, the GPT-3, GPT-3.5, or GPT-4 model. Further, the provenance information may indicate not only the general type of model (e.g., GPT-3.5) but the specific model instance and creation date (e.g., gpt-3.5-turbo-0125 Feb. 1, 2024).

In the depicted example, the inputreads, “Summarize the size, location and economy of Istanbul.” Further, the model-generated outputreads, “Istanbul is the largest city in Türkiye, straddling the Bosporus Strait and spanning both Europe and Asia . . . ” The model-generated outputmay be received and processed by the provenance metadata generation module,, to thereby add provenance metadatato the output. The provenance metadata generation module,may be provenance metadata generation moduleimplemented on the generative model serveror, as shown in dashed lines, may be provenance metadata generation moduleimplemented as part of the model interfaceexecuted on model interface server.

The outputof the provenance metadata generation modulemay include provenance metadatacomprising header metadatathat is detectable by the provenance determination moduleof computing system. Alternatively, metadata of another form may be used. The provenance determination module(see) may be configured to make the provenance determination by detecting header metadataof the digital content indicating AI generation, on a per-character basis. Continuing with, the exemplary header metadata with AI indicatorincludes a character map indicating that characters 0-104 are model-generated (and that none of the output characters are non-AI-generated). Although not shown, the header metadata may include an indication that one or more characters are of non-AI generated and/or of unknown origin, whereby the provenance metadata generation moduledetermined that the provenance of specific characters indicates they are not AI generated or that the origin is unknown.

In one example configuration, the provenance metadata(such as the exemplary header metadata with AI indicator) includes a creation date, a model version identifier, and/or prompt used to generate the portion of the digital content. As shown, the exemplary header metadata with AI indicatorincludes a creation date of “2024-01-15”, a model version identifier specifying the model version used as “GENERICMODEL-1.2.4”, and the prompt used as “Summarize the size, location and economy of Istanbul.” In another embodiment, the provenance metadatamay include model version, model history, and/or prompts or prompt-related context used to generate the portions of the digital content generated by a generative model so that the model-generated content may be regenerated or updated. As shown, the exemplary header metadata with AI indicatorincludes a model history having three different model versions (i.e., “1.2.4”, “1.2.3”, and “1.2.2”), with each model version having an associated URI (i.e., “URI1”, “URI2”, and “URI3”, respectively). In another example, the provenance metadatacomprising the model versions and/or associated URI for each model version, along with the prompt used, enables regeneration of the model-generated content (for example, to update data, statistics, etc. at a later date).

The provenance metadatamay be encrypted and/or digitally signed and may further include license information. For example, the exemplary header metadata with AI indicatorincludes example licensing information (i.e., “LICENSING INFO: SAFELICENSE 1.0” and reference to a digital signature. Additionally, the provenance determination modulemay read attestation data or digital signature data, e.g., included in metadata such as the header metadata, and the outputcomprising the digital content and provenance metadata may be output to a destination program, such as program. The provenance determination modulemay be configured such that outputting, such as shown in, includes providing the provenance metadataand the received digital content, with the identified portions of the digital content that were generated by the generative model, to program, which is capable of receiving provenance-tracked content.

Also as depicted in, the outputof the provenance metadata generation modulemay comprise provenance metadatathat includes a character encodingthat encodes the model-generated indication on a per-character basis. The provenance metadatamay include character encodingwith an AI indicator. The provenance metadatamay include surrogate character encoding or Unicode code plane encoding indicating a model-generated character. For example, the AI indicator may include a Unicode encoding with one or more code plane or code page bit flipped (or set) indicating the particular character is model-generated, or the AI indicator may include one or more bits in a Unicode surrogate pair (as discussed below in greater detail with respect to). The provenance determination modulemay be configured to make the provenance determination by detecting the character encodingindicative of text originating from a generative model, on a per-character basis. Alternatively to a per-character basis, two or more surrounding surrogates in a single string could be utilized to indicate model-generated content.

shows an exemplary browser/application programinteraction with program. In this example, the browser/application programis displaying the model interface GUIand serves as a source program for model-generated content, and programserves as a destination program capable of receiving provenance-tracked content via an edit operation. As shown, the browser/application programmay display as model interface GUIa chat-enabled search platform, which is used to provide the inputin the form of prompt. Further, the browser/application programis configured to receive outputin the form of model generated contentand associated provenance metadatafrom the generative modeland provenance metadata generation module. The model interface GUIdisplays promptand model-generated digital contentin the chat interface, while the associated provenance metadatais typically not displayed. Alternatively, in the example shown in, a selectorlabeled “View AI Provenance” can be provided in the model interface GUI, to enable a user to selectively display the provenance metadata. Returning to, the chat-enabled search platformmay permit copying, via an edit operation using a clipboard or other utility executed on the client computing device, of portions of the model-generated digital contentinto program. Following the edit operation, when displayed in the program GUIof programthe portions of model-generated digital contentinclude formatted digital contentincluding a visual provenance indication,,labeling the model-generated digital content.

As shown in, programis configured to display a plurality of selectable options associated with the displayed visual indication of the model-generated content. For example, in the depicted example, a drop down menuis presented that contains a regeneration optionby which a user can select to regenerate the model-generated content, and the view AI provenance selectormentioned above. The regeneration optiondisplays a model historywith selectable model version entries populated from metadata such as the header metadata. The user can select a particular model version from the historyto cause the original prompt in header metadatato be transmitted to the selected model version, to regenerate the model-generated content using the selected model version.

In the illustrated example, the AI-generated outputcomprises, in part, a portion comprising the text “Istanbul is the largest city in Türkiye, straddling the Bosporus Strait and spanning both Europe and Asia. As of 2023, it has a population of over 15 million people.” The method of encoding each character, e.g., via code page Unicode encoding on a per-character basis, allows for new human modifications, additions, or cut and paste to add to and modify the AI-generated content within the program, without losing the provenance metadataassociated with the copied or modified content, unless the model attribution threshold discussed above is met. As shown in the programin, the copied AI content may be included in, for example, a report, with user/human-generated modifications which are distinguished visually from the AI-generated content. As shown, the AI-generated contentcopied is shown/displayed highlighted and/or underlined, whereas the human modifications are not highlighted or underlined (as being AI-generated. Additional visual indicators of AI-generated content (such as visual provenance indication labeling,,) are included/displayed, along with the regeneration option. The regeneration optionis illustrated as a drop down menu, but may take other forms such as a pop up or static menu. Selecting the regeneration optionwith the most recent model version (e.g., “1.2.4. [URI1]”) in the model historycauses the programto regenerate the inputto the generative model version 1.2.4, to obtain an AI-generated update. In the depicted example, the regenerated content reads, “Istanbul is the largest city in Türkiye, spanning both Europe and Asia across the Bosporus Strait. As of 2024, it has a population of over 16 million people.” As can be seen the original model generated content is rephrased slightly and updated to include latest population figures contained in the selected model. In this way, a regeneration optionprovides the user of the programwith a way to not only track the provenance of the AI-generated content, but also update and revise the AI-generated content via regeneration using the selected model version from model history tracked in the provenance metadata associated with the model-generated digital content.

illustrate character encoding examples for encoding a capital letter ‘A’.illustrates an example UCS (i.e., Unicode) encoding of a capital letter ‘A’ as “U+0x00041”, wherein there is no encoding indicating AI-generated content.

illustrates an example where a bit designating the code plane is set to indicate the character is AI-generated. In particular, the encoding is “U+0x80041”, where the “8” designates the code plane number (i.e., code plane number eight). Each digit in the five digits (comprising “80041”) is a hexadecimal value, which may be represented by four bits. Setting the bit number four from a zero to a one (to create the binary number 1000, or the hex value eight) requires flipping just one bit, shown here as the most significant bit or bit number four. As discussed further below, an encoding scheme (for indicating an AI-generated character) that requires changing just one bit to indicate an AI-generated character, has advantages. For example, a bit mask or bitwise OR operator may be used without disturbing other bits in the encoding for the character. In the example shown in, bit four is set from a zero to a one (via a provenance metadata generation module such as provenance generation module), resulting in a hex value of eight in the code plane digit of the encoding. In one embodiment, when a provenance determination moduledetects the flipped bit in the code plane encoding digit, the provenance determination moduleinterprets the character to be model-generated and sets the flipped bit off (or back to zero) to interpret the character as a normal UCS (Unicode) encoding, for example, as shown in. This may also be described as mirroring by preserving the earlier code plane. That is, in the example shown in, the prior code plane is preserved by setting an AI bit mask (bitwise OR operator) to change/flip bit number four (from zero to one) to indicate an AI-generated character and interpret the character using the prior code plane (i.e., the normal UCS encoding without the changed/flipped bit number four).

illustrates an example in which a code page bit is set to indicate the character (i.e., capital letter ‘A’) is AI-generated. In particular, the encoding (with indication the character is model-generated) is “U+0x08041”. In one embodiment, when a provenance determination moduledetects the flipped bit in the encoding, the provenance determination moduleinterprets the character to be model-generated and sets the flipped bit off (or back to zero) to interpret the character in view of the UCS (Unicode) encoding without the flipped bit. This may also be described as mirroring by preserving the earlier code page. In the example shown in, the prior code page is preserved by setting an AI bit mask (bitwise OR operator) to change/flip a single bit (from zero to one) to indicate an AI-generated character and interpret the character using the prior code page (i.e., without the changed/flipped bit).

A single-byte (eight bit) encoding scheme allows values from 0-255 (or 256 values for an unsigned byte) to be represented. Multi-byte encoding schemes (MBCS) are used to support non-conventional western characters (i.e., ASCII and Extended ASCII) and support global communication. Unicode and the UTF encoding of Unicode may be used for character-by-character encoding. UTF-8 is a common MBCS scheme used on the Internet where only single-byte (ASCII/ANSI) values are conventionally allowed, such as in query strings.

illustrates an allocation of Unicode code points and a relationship between code pages and code planes. A significant portion of Unicode 2-Byte code pages are unused. Unicode extends the encoding of a character to two (or more) byte values to allow encoding of an entire “page” of characters, known as the code page or code point system, allowing for 65535 encodings of characters in the 2-byte modality. Unicode further extends the code page definitions with code blocks, with defined code planes. As a result, Unicode includes several code pages for “private” or “user-defined” usage. Similarly, less than half of the code-planes are in use. For example, as shown in, code planes 4-13 are unused. Accordingly, encoding AI-generated characters as described with respect toallows for use of Unicode code points which are presently unallocated/unused.

For text-based content, the systemenables mapping or altering code pages and/or code planes on active content strings to provide provenance metadata(or provenance attribution) per character to the string. As mentioned above with respect to, the alteration can be a single bit change in unused code page or code plane bits, each requiring a different encoding implementation, to “tag” each AI-generated character in a near-invisible fashion to the user. In other embodiments, a set of bits may be used, similarly setting a set of bits using an AI bit mask (bitwise OR operator) without disturbing other bits. In one embodiment, for example, a combination of code plane and code page mapping or altering may be used.

illustrates an allocation of high and low surrogate code points within the Basic Multilingual Plane (BMP) of the Unicode allocation shown in. The provenance metadatamay include surrogate character encoding or Unicode code plane encoding (such as character encoding with AI indicatoras shown in) indicating a model-generated character. As shown in, Unicode code points have integer values that can range from 0 to U+10FFFF (decimal 1,114,111). Some code points are assigned to letters, symbols, or emoji. Others are assigned to actions that control how text or characters are displayed, such as advance to a new line. Many code points are not yet assigned. The Basic Multilingual Plane (BMP) includes code points in the range U+0000 . . . . U+FFFF and includes high (leading) surrogate code points and low (trailing) surrogate code points, as shown in. When a high surrogate code point (U+D800 . . . . U+DBFF) is immediately followed by a low surrogate code point (U+DC00 . . . . U+DFFF), the pair is interpreted as a supplementary code point by using a calculation, code point=0x10000+ ((high surrogate code point—0xD800)*0x0400)+(low surrogate code point—0xDC00). However, if an encoding includes just a single surrogate code point, the unpaired surrogate code point may be unrecognized and ignored. Accordingly, in one embodiment, the provenance metadata generation module, such as provenance metadata generation module, may include character encoding with an AI indicator that includes a single unpaired surrogate code point, to indicate the character is AI- or model-generated, and to cause a provenance determination module (such as provenance determination module) to identify the character as being AI-generated and to interpret the character based on the Unicode encoding without the unpaired surrogate code point.

shows a flow diagram for a computerized methodaccording to one embodiment of the present disclosure. Methodmay be implemented using the hardware and software components described above, or with other suitable hardware and software components. Methodincludes, at step, receiving at a program executed on processing circuitry, via an edit operation, digital content and provenance metadata associated with the digital content, At step, methodincludes determining, via a provenance determination module, that a textual portion of the digital content is model-generated and originated from a generative model, based on the provenance metadata. At step, the method includes outputting the digital content to a graphical user interface with a visual indication that the textual portion of the digital content is model-generated. As shown at step, the method can include displaying the graphical user interface to include a selector configured to receive a user input to selectively enable and disable tracking of model-generated content.

At step, the method includes receiving other digital content. At step, the method includes determining, via the provenance determination module, that a textual portion of the other digital content is not model-generated. At step, the method includes outputting the other digital content in a visually distinctive manner to the displayed model-generated content. As shown at, the model-generated digital content and non-model-generated other digital content can be incorporated in an aggregated document. Further, as shown at, the graphical user interface can includes a document region and a review region and the aggregated document (with the model generated content and the non-model generated content) can be displayed in the document region, and a model-generated icon can be displayed in the review region adjacent the model generated content. Further, a non-model generated icon can be displayed in the review region adjacent the non-model generated content.

At step, the method includes configuring the program with an attribution tracking module to attribute edit operations in an edit history to users who performed the edit operations, and to attribute the model-generated content to a user who performed the edit operation via which the model-generated content was received and incorporated into the aggregated document. At step, the method includes configuring the program to include an attribution tracking module with a model attribution threshold, the attribution tracking module being configured to remove a model generated attribution of the model-generated content upon determining that the model generated content has been edited to an extent that the model attribution threshold is no longer met. Further, as discussed in detail in methoddescribed in relation tobelow, determining at stepcan includes determining, on a per-character, basis whether each of a plurality of textual characters in the digital content is model generated. Other features and steps of methodsandcan be combined with the steps of method, as desired.

show a flow diagram for an example methodfor receiving digital content and provenance metadata associated with the digital content, and outputting the digital content with indications of the textual characters that are model-generated, according to embodiments. Methodmay be implemented using the hardware and software components described above, or with other suitable hardware and software components. At step, the method includes transmitting digital content and provenance metadata associated with the digital content from a server computing device to a client computing deviceover a communication network. At step, the method further includes communication with an API communication interface, interprocess communication, and file I/O protocols.

At step, the method further includes receiving digital content and provenance metadata associated with the digital content. At step, the method further includes determining, on a per-character basis, whether each of a plurality of textual characters in the digital content is model-generated based on the provenance metadata.

Steps-of the methodmay be performed on the client computing device. At step, the method includes detecting a character encoding indicative of text originating from a generative model, on a per-character basis. At step, the method includes detecting metadata such as header metadata of the digital content indicating AI generation, on a per-character basis. At step, the method includes outputting the digital content with indications of the textual characters that are model-generated.

Steps,,,, andcan be performed at the client computing deviceas part of step. At step, the method includes formatting the digital content for display using the provenance metadata, the formatted digital content including a visual provenance indication labeling the model-generated portion of the digital content, and outputting the formatted digital content including the visual provenance indication to a display. At step, the method includes providing the provenance metadata and the received digital content, with the identified portions of the digital content that were generated by the generative model, to a destination program capable of receiving provenance-tracked content.

At step, the method includes outputting the formatted digital content including the visual provenance indication to a display. At step, the method includes outputting the provenance metadata and the received digital content, with the identified portions of the digital content that were generated by the generative model, to a local program. At step, the method includes outputting the provenance metadata and the received digital content, with the identified portions of the digital content that were generated by the generative model, to an external program via step, which includes communication with an API communication interface, interprocess communication, and file I/O protocols.

Stepfollows completion of stepand includes displaying a regeneration option on the client computing device, with the regeneration option permitting regeneration of the textual characters that are model-generated using the model version and/or the model history, along with the indications of model generated content within the digital content, and the provenance metadata comprises model version, model history, and/or prompts or prompt-related context used to generate the portions of the digital content generated by a generative model.

Patent Metadata

Filing Date

Unknown

Publication Date

September 25, 2025

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search