Patentable/Patents/US-20260087702-A1
US-20260087702-A1

Systems and methods for generating digital images

PublishedMarch 26, 2026
Assigneenot available in USPTO data we have
InventorsDanny Wu
Technical Abstract

Described herein is a computer implemented method. The method includes determining a first set of objects, wherein each object in the first set of objects is associated with an object-image and a position and processing the first set of objects to generate a first image-raster. The first image-raster incorporates each object-image that is associated with an object in the first set of objects, and each object-image is positioned in the first image-raster based on the position of the object that the object-image is associated with. The method further includes generating a first digital image by processing the first image-raster using a trained image generation model.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

determining, by one or more computer processing devices, a first set of objects, wherein each object in the first set of objects is associated with an object-image and a position; processing, by the one or more computer processing devices, the first set of objects to generate a first image-raster, wherein the first image-raster incorporates each object-image that is associated with an object in the first set of objects and each object-image is positioned in the first image-raster based on the position of the object that the object-image is associated with; and generating a first digital image, wherein generating the first digital image includes processing the first image-raster using a first machine learning model, and wherein the first machine learning model is a trained image generation model. . A computer implemented method including:

2

claim 1 the method further includes generating a first image generation prompt based on the first image-raster; and generating the first digital image includes processing the first image-raster and the first image generation prompt using the first machine learning model. . The computer implemented method of, wherein:

3

claim 2 each object in the first set of objects is associated with an object-caption; the method further includes processing the first set of objects to generate a first text-raster, wherein the first text-raster incorporates each object-caption that is associated with an object in the first set of objects and each object-caption is positioned in the first text-raster based on the position of the object that the object-caption is associated with; and the first image generation prompt is generated based on the first image-raster and the first text-raster. . The computer implemented method of, wherein:

4

claim 1 determining a set of text objects, wherein each text object in the set of text objects is associated with a position; processing the set of text objects to generate a corresponding set of text-type design elements, wherein the set of text-type design elements includes a text-type design element corresponding to each text object in the set of text objects, and each text-type design element includes position data that is based on the position of the text object the text-type design element corresponds to; and generating a final digital image based on the first digital image and the set of text-type design elements. . The computer implemented method of, further including:

5

claim 1 determining a second set of objects, wherein each object in the second set of objects is associated with an object-image and a position; processing the second set of objects to generate a second image-raster, wherein the second image-raster incorporates each object-image that is associated with an object in the second set of objects and each object-image is positioned in the second image-raster based on the position of the object that the object-image is associated with; generating a second digital image, wherein generating the second digital image includes processing the second image-raster using the first machine learning model; and generating a final digital image based on the first digital image and the second digital image. . The computer implemented method of, further including:

6

claim 5 the first set of objects is associated with a first predefined layer that is associated with a first layer depth; the second set of objects is associated with a second predefined layer that is associated with a second layer depth; and the final digital image is generated by composing the first digital image and the second digital image together in a depth order that is based on the first and second layer depths. . The computer implemented method of, wherein:

7

claim 1 the first set of objects includes a first object; the first object is a prompt object that is associated with first prompt text and a first position; and the method further includes identifying an existing image based on the first prompt text and using the existing image as the object-image for the first object. claim 1 The computer implemented method of, wherein: the first set of objects includes a first object; the first object is a prompt object that is associated with first prompt text and a first position; and the method further includes generating a new image based on the first prompt text and using the new image as the object-image for the first object. . The computer implemented method of, wherein:

8

claim 1 . The computer implemented method of, further including causing the first digital image to be displayed on a display screen.

9

claim 1 . The computer implemented method of, wherein the first set of objects is determined from a superset of objects, the superset of objects including a plurality of objects that are positioned on a virtual generation surface that is displayed on a display screen.

10

one or more processing devices; and determining a first set of objects, wherein each object in the first set of objects is associated with an object-image and a position; processing the first set of objects to generate a first image-raster, wherein the first image-raster incorporates each object-image that is associated with an object in the first set of objects and each object-image is positioned in the first image-raster based on the position of the object that the object-image is associated with; and generating a first digital image, wherein generating the first digital image includes processing the first image-raster using a first machine learning model, and wherein the first machine learning model is a trained image generation model. one or more non-transitory computer-readable storage media storing instructions, which when executed by the one or more processing devices, cause the one or more processing devices to perform a method including: . A computer processing system including:

11

claim 11 the method further includes generating a first image generation prompt based on the first image-raster; and generating the first digital image includes processing the first image-raster and the first image generation prompt using the first machine learning model. . The computer processing system of, wherein:

12

claim 12 each object in the first set of objects is associated with an object-caption; the method further includes processing the first set of objects to generate a first text-raster, wherein the first text-raster incorporates each object-caption that is associated with an object in the first set of objects and each object-caption is positioned in the first text-raster based on the position of the object that the object-caption is associated with; and the first image generation prompt is generated based on the first image-raster and the first text-raster. . The computer processing system of, wherein:

13

claim 11 determining a set of text objects, wherein each text object in the set of text objects is associated with a position; processing the set of text objects to generate a corresponding set of text-type design elements, wherein the set of text-type design elements includes a text-type design element corresponding to each text object in the set of text objects, and each text-type design element includes position data that is based on the position of the text object the text-type design element corresponds to; and generating a final digital image based on the first digital image and the set of text-type design elements. . The computer processing system of, further including:

14

claim 11 determining a second set of objects, wherein each object in the second set of objects is associated with an object-image and a position; processing the second set of objects to generate a second image-raster, wherein the second image-raster incorporates each object-image that is associated with an object in the second set of objects and each object-image is positioned in the second image-raster based on the position of the object that the object-image is associated with; generating a second digital image, wherein generating the second digital image includes processing the second image-raster using the first machine learning model; and generating a final digital image based on the first digital image and the second digital image. . The computer processing system of, further including:

15

claim 15 the first set of objects is associated with a first predefined layer that is associated with a first layer depth; the second set of objects is associated with a second predefined layer that is associated with a second layer depth; and the final digital image is generated by composing the first digital image and the second digital image together in a depth order that is based on the first and second layer depths. . The computer processing system of, wherein:

16

claim 11 the first set of objects includes a first object; the first object is a prompt object that is associated with first prompt text and a first position; and the method further includes generating a new image based on the first prompt text and using the new image as the object-image for the first object. . The computer processing system of, wherein:

17

claim 11 . The computer processing system of, further including causing the first digital image to be displayed on a display screen.

18

claim 11 . The computer processing system of, wherein the first set of objects is determined from a superset of objects, the superset of objects including a plurality of objects that are positioned on a virtual generation surface that is displayed on a display screen.

19

determining a first set of objects, wherein each object in the first set of objects is associated with an object-image and a position; processing the first set of objects to generate a first image-raster, wherein the first image-raster incorporates each object-image that is associated with an object in the first set of objects and each object-image is positioned in the first image-raster based on the position of the object that the object-image is associated with; and generating a first digital image, wherein generating the first digital image includes processing the first image-raster using a first machine learning model, and wherein the first machine learning model is a trained image generation model. . One or more non-transitory storage media storing instructions executable by one or more processing devices to cause the one or more processing devices to perform a method including:

Detailed Description

Complete technical specification and implementation details from the patent document.

This application is a U.S. Non-Provisional application that claims priority to Australian Patent Application No. 2024219980, filed Sep. 23, 2024, which is hereby incorporated by reference in its entirety.

Certain aspects of the present disclosure are directed to systems and methods for generating digital images.

Various computer applications for creating digital images exist.

As one example, design generation applications exist that allow users to create a design by selecting design elements and adding those design elements to a page. Once a design has been created, such applications will typically also provide mechanisms for the design to be displayed and output—e.g. to be saved, shared, published, or otherwise output.

Described herein is a computer implemented method including: determining a first set of objects, wherein each object in the first set of objects is associated with an object-image and a position; processing the first set of objects to generate a first image-raster, wherein the first image-raster incorporates each object-image that is associated with an object in the first set of objects and each object-image is positioned in the first image-raster based on the position of the object that the object-image is associated with; and generating a first digital image, wherein generating the first digital image includes processing the first image-raster using a first machine learning model, and wherein the first machine learning model is a trained image generation model.

Also described herein is a computer implemented method including: displaying, on a display, a user interface including a virtual generation surface; detecting a first user interaction adding a first object to the virtual generation surface at a first position, wherein the first object is a prompt object and the first user interaction includes user input that defines first prompt text for the first object; resolving the first object to a first resolved image based on the first prompt text; generating a first layer-image based on the first resolved image, wherein the first layer-image includes first image content that corresponds to the first resolved image, and wherein the first image content is positioned in the first layer-image at a position that is based on the first position of the first object on the virtual generation surface

While the description is amenable to various modifications and alternative forms, specific embodiments are shown by way of example in the drawings and are described in detail. It should be understood, however, that the drawings and detailed description are not intended to limit the invention to the particular form disclosed. The intention is to cover all modifications, equivalents, and alternatives falling within the scope of the present invention as defined by the appended claims.

In the following description numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessary obscuring.

The present disclosure is directed to systems and methods for creating digital images. In particular, the present disclosures provides mechanisms for a user to create a digital images based on actual elements, element prompts, or a combination of actual elements and element prompts.

In the context of the present disclosure, reference to a digital image is reference to an image that can be rendered (e.g. displayed) and saved by a computer processing system. The present disclosure refers to two types of digital images in particular: design-format images (referred to as designs for convenience) and raster-format images (referred to as rasters for convenience).

In the context of the present disclosure, a design is an image that is made up of a set of design elements. Generally speaking, each design element has a size and position and can be selected and manipulated separately from each other design element. For example, by use of an appropriate application, each design element of a design can be separately selected and manipulated (e.g. moved, resized, or otherwise edited). As one example, a design may include two raster image elements and an application may allow a user to select one of those image elements and move, resize, or edit it independently of the other image element.

In the context of the present disclosure, reference to a raster is reference to a raster image. Generally speaking, a raster image is made up of a set of pixel values (e.g. RGB or other colour scheme values). Unlike a design, a raster image does not inherently permit selection and manipulation/editing of individual components in the image (though a raster image may be processed—e.g. via segmentation techniques or the like-to identify pixels that belong to individual segments or objects within the image).

In some cases, a design and a raster may look the same once rendered. This will be the case, for example, if a design is processed (e.g. rasterised) to generate a corresponding raster. In this case, though, the underlying data defining the design (e.g. a set of elements with associated element data) will be different to the underlying data defining the raster (e.g. a set of pixel values).

In the context of the present disclosure, reference to an actual element is reference to an existing visual element. This may, for example, be a raster element (e.g. a photo or other raster-format element), a graphic element (e.g. a vector graphic element), a text box (e.g. an element used to display text), or an alternative type of visual element. In this disclosure, an actual (existing) existing visual element may be contrasted with a generated element (which is an element that is generated based on prompt and/or other data).

In the context of the present disclosure, reference to an element prompt is reference to text that is processed in order to retrieve or generate a visual element.

1 FIG. 100 The techniques disclosed herein are necessarily implemented by one or more computer processing systems. While various system architectures and configurations are possible, the disclosure will be described predominantly in the context of a digital design platform that makes use of a client-server architecture. To this end,depicts an example networked environmentin which various features of the present disclosure may be implemented.

100 110 130 110 130 140 Networked environmentincludes a server environmentwhich serves one or more client systems such as client system. Server environmentand client system(s)communicate via one or more communications networks(e.g. the Internet).

110 112 132 Generally speaking, the server environmentincludes computer processing hardware(discussed below) on which one or more server-side applications execute in order to provide server-side functionality to client applications such as client application(described below).

110 114 114 140 114 114 114 114 114 110 In the present example, server environmentincludes a server application. In the present example, the server applicationexecutes to provide a client application endpoint that is accessible over communications network. Generally speaking, the server applicationfunctions to receive data from client applications, perform various processing (and processing coordination) functions, and communicate data back to client applications. Where server applicationserves web browser client applications, the server applicationwill be a web server which receives and responds (for example) to HTTP requests. Where server applicationserves native client applications, server applicationwill be an application server configured to receive, process, and respond to specifically defined API calls received from those client applications. The server environmentmay include one or more web server applications and/or one or more application server applications allowing it to interact with both web and native client applications.

114 110 114 In the present example, server application(and/or other applications of server environment) facilitates various functions related to designs and images. These may include, for example, design/image creation, editing, storage, organisation, searching, storage, retrieval, viewing, sharing, publishing, and/or other functions related to digital designs and images. The server application(and/or other applications) may also facilitate additional, related functions such as user account creation and management, user group creation and management, user and user group permission management, user authentication, and/or other server side functions.

110 116 110 114 132 110 110 In the present example, server environmentalso includes a data storage applicationwhich executes to receive and process requests to persistently store and/or retrieve data relevant to the operations performed/services provided by the server environment. Such requests may be received from the server application, other server environment applications, and/or (in some instances) directly from client applications such as. Data relevant to the operations performed/services provided by the server environmentmay include, for example, user account data, design data (i.e. data describing designs that have been created by users), image data, template design data (e.g. templates that can be used by users to create designs), design element data (e.g. data in respect of existing design elements that users may add to designs), and/or other data relevant to the operation of the server environment.

116 118 118 The data storage applicationmay, for example, be a relational database management application or an alternative application for storing and retrieving data from data storage. Data storagemay be any appropriate data storage device (or set of devices), for example one or more non-transitory computer readable storage devices such as hard disks, solid state drives, tape drives, or alternative computer readable storage devices.

110 114 118 116 114 118 116 110 116 114 In server environment, server applicationpersistently stores data to data storage devicevia the data storage application. In alternative implementations, however, the server applicationmay be configured to directly interact with data storage devices such asto store and retrieve data (in which case a separate data storage application may not be needed). Furthermore, while a single data storage applicationis described, server environmentmay include multiple data storage applications. For example one data storage applicationmay be used for user account data, another for user design data, another for design element data and so forth. In this case, each data storage application may interface with one or more shared data storage devices and/or one or more dedicated data storage devices, and each data storage application may receive/respond to requests from various server-side and/or client-side applications (including, for example server application).

110 120 120 120 120 In the present example, server environmentincludes a first text generation applicationwhich takes a text string as input (potentially with additional inputs defining operational parameters) and generates a text string output. In the described embodiments, the first text generation applicationis (or makes use of) a trained machine learning model (which may be referred to as a first trained machine learning model). For example, the first text generation applicationmay be (or make use of) a large language model (LLM) such as a generative pre-trained transformer (GPT). The first text generation applicationmay be (or make use of) a machine learning model that is specifically trained for the operations described herein (see below), or may be (or make use of) an existing pre-trained machine learning model, for example a model such as ChatGPT, Bard, or an alternative text generation model.

110 122 122 122 4 122 In the present example, server environmentincludes a second text generation applicationwhich takes a text string and one or more images as input (potentially with additional inputs defining operational parameters) and generates a text string output. In the described embodiments, the second text generation applicationis (or makes use of) a trained machine learning model (which may be referred to as a second trained machine learning model). For example, the second text generation applicationmay be (or make use of) a vision-language model (VLM)—for example a GPTmodel or an alternative model. The second text generation applicationmay be (or make use of) a machine learning model that is specifically trained for the operations described herein (see below) or may be (or make use of) an existing pre-trained machine learning model, for example a model such as ChatGPT.

In alternative embodiments, rather than making use of two separate text generation applications as depicted, a single text generating application may be provided that is trained to generate a text string based on either a text input or combined text and image inputs.

110 124 124 124 124 In the present example, server environmentalso includes an image generation applicationwhich takes an image and/or a text string as input (potentially with additional inputs defining operational parameters) and generates an image output (e.g. a raster image). In the described embodiments, image generation applicationis (or makes use of) a trained image generation machine learning model. For example, image generation applicationmay be (or make use of) a generative adversarial network (GAN) model, a variational autoencoder (VAE) model, a latent diffusion model, a mixed-modal auto-regressive transformer model, or an alternative image generation model. Image generation applicationmay be (or make use of) a machine learning model that is specifically trained for the operations described herein (see below), or may be (or make use of) an existing pre-trained machine learning model, for example a model such as Stable Diffusion, DALL-E, CLIP, Chameleon, or an alternative image generation model.

110 126 In the present example, server environmentincludes an image to text applicationwhich takes an image as input (potentially with additional inputs defining operational parameters) and returns a text description (or a caption) that describes the content of the input image. In the described embodiments, image to text application is (or makes use of) a trained machine learning model such as a Bootstrapping Language-Image Pre-training (BLIP), a VLM, ChatGPT, or other trained image captioning model.

110 128 128 In the present example, server environmentincludes a background removal applicationwhich takes an image as input (potentially with additional inputs defining operational parameters) and returns what will be referred to as a background-removed image as output. In this context, a background-removed image is a version of an input that either has background pixels removed or includes a mask or other data (e.g. alpha channel data or a transparency layer) that can be used to identify/render background pixels as transparent. Any appropriate background removal applicationmay be used. In the described embodiments, background removal application is a trained machine learning model, for example an object segmentation model (e.g., Segment Anything Model, remove.bg model), a mixed-modal auto-regressive transformer model, or an alternative model.

120 122 124 126 128 120 122 124 126 128 In certain embodiments, each of applications,,,, andis (or makes use of) a separate trained machine learning model. In other embodiments, however, the functionality of two or more of these applications may be provided by a single machine learning model. One example of such a model is a mixed-modal auto-regressive transformer model (e.g., Chameleon). In this case, and by way of example, the single trained machine learning model may be used (with appropriate prompts and other inputs) to perform the functionality of the first text generation application, the second text generation application, the image generation application, the image to text application, and the background removal application.

120 122 124 126 128 110 Furthermore, while applications,,,, andare described and depicted as being part of the server environment, the functionality provided by one or more of these applications may instead be provided by one or more applications executing at a remote server environment—for example via server(s) and application(s) that offer text generation, image generation, text captioning, and/or background removal as a service.

110 112 112 110 As noted, the server environmentapplications run on (or are executed by) computer processing hardware. Computer processing hardwareincludes one or more computer processing systems. The precise number and nature of those systems will depend on the architecture of the server environment.

For example, in one implementation each server environment application may run on its own dedicated computer processing system. In an alternative implementation, two or more server environment applications may run on a common/shared computer processing system.

110 Communication between the applications and computer processing systems of the server environmentmay be by any appropriate means, for example direct communication or networked communication over one or more local area networks, wide area networks, and/or public networks (with a secure logical overlay, such as a VPN, if required).

130 132 130 132 110 114 110 In the present example, client systemhosts a client applicationwhich, when executed by the client system, configures the client systemto provide client-side functionality/interact with server environment(or, more specifically, the server applicationand/or other applications provided by the server environment).

132 132 132 132 110 Client applicationoperates to generate a (or multiple) user interfaces which are displayed to a user (via a display). Client applicationalso operates to receive user inputs (via the user interfaces and one or more input devices). Such user inputs are detected and processed by the client applicationand, in some instances, cause the client applicationto communicate data to the server environment.

132 114 114 132 114 The client applicationmay be a general web browser application which accesses the server applicationvia an appropriate uniform resource locator (URL) and communicates with the server applicationvia general world-wide-web protocols (e.g. http, https, ftp). Alternatively, the client applicationmay be a native application programmed to communicate with server applicationusing defined application programming interface (API) calls and responses.

130 132 130 A given client system such asmay have more than one client applicationinstalled and executing thereon. For example, a client systemmay have a (or multiple) general web browser application(s) and a native client application.

110 132 114 The present disclosure describes various operations that are performed by applications of the server environmentand client application. Generally speaking, however, operations described as being performed by a particular application (e.g. server application) could be performed by (or in conjunction with) one or more alternative applications, and/or operations described as being performed by multiple separate applications could in some instances be performed by a single application.

While the embodiments of the present disclosure are described in the context of a client-server architecture, the techniques and processing described could be adapted to be executed in a stand-alone context—e.g. by an application (or set of applications) that run on a computer processing system and can perform all required functionality without need of a server environment or application.

The techniques and operations described herein are performed by one or more computer processing systems.

130 132 130 By way of example, client systemmay be any computer processing system which is configured (or configurable) by hardware and/or software—e.g. client application—to offer client-side functionality. A client systemmay be a desktop computer, laptop computer, tablet computing device, mobile/smart phone, or other appropriate computer processing system.

110 112 Similarly, the applications of server environmentare executed by one or more computer processing systems (the computer processing hardware). Server environment computer processing systems will typically be server systems, though again may be any appropriate computer processing systems.

2 FIG. 2 FIG. 200 200 200 provides a block diagram of a computer processing systemconfigurable to implement embodiments and/or features described herein. Systemis a general purpose computer processing system. It will be appreciated thatdoes not illustrate all functional or physical components of a computer processing system. For example, no power supply or power supply interface has been depicted, however systemwill either carry a power supply or be configured for connection to a power supply (or both). It will also be appreciated that the particular type of computer processing system will determine the appropriate hardware and architecture, and alternative computer processing systems suitable for implementing features of the present disclosure may have additional, alternative, or fewer components than those depicted.

200 202 202 200 202 200 Computer processing systemincludes at least one processing unit. The processing unitmay be a single computer processing device (e.g. a central processing unit, graphics processing unit, or other computational device), or may include a plurality of computer processing devices. In some instances, where a computer processing systemis described as performing an operation or function all processing required to perform that operation or function will be performed by processing unit. In other instances, processing required to perform that operation or function may also be performed by remote processing devices accessible to and useable (either in a shared or dedicated manner) by system.

204 202 202 200 200 206 208 210 Through a communications bus, the processing unitis in data communication with a one or more machine readable storage (memory) devices which store computer readable instructions and/or data which are executed by the processing unitto control operation of the processing system. In this example systemincludes a system memory(e.g. a BIOS), volatile memory(e.g. random access memory such as one or more DRAM modules), and non-transitory memory(e.g. one or more hard disk or solid state drives).

200 212 200 200 200 200 Systemalso includes one or more interfaces, indicated generally by, via which systeminterfaces with various devices and/or networks. Generally speaking, other devices may be integral with system, or may be separate. Where a device is separate from system, the connection between the device and systemmay be via wired or wireless hardware and communication protocols, and may be a direct or an indirect (e.g. networked) connection.

200 200 200 Generally speaking, and depending on the particular system in question, devices to which systemconnects include one or more input devices to allow data to be input into/received by systemand one or more output device to allow data to be output by system.

200 218 220 222 224 226 228 By way of example, where systemis a personal computing device such as a desktop or laptop device, it may include a display(which may be a touch screen display and as such operate as both an input and output device), a camera device, a microphone device(which may be integrated with the camera device), a cursor control device(e.g. a mouse, trackpad, or other cursor control device), a keyboard, and a speaker device.

200 218 220 222 228 As another example, where systemis a portable personal computing device such as a smart phone or tablet it may include a touchscreen display, a camera device, a microphone device, and a speaker device.

200 As another example, where systemis a server computing device it may be remotely operable from another computing device via a communication network. Such a server may not itself need/require further peripherals such as a display, keyboard, cursor control device etc. (though may nonetheless be connectable to such devices via appropriate ports).

Alternative types of computer processing systems, with additional/alternative input and output devices, are possible.

200 216 140 100 110 216 200 Systemalso includes one or more communications interfacesfor communication with a network, such as networkof environment(and/or a local network within the server environment). Via the communications interface(s), systemcan communicate data to and receive data from networked systems and/or devices.

200 202 200 210 200 200 216 Systemstores or has access to computer applications (which may also referred to as computer software or computer programs). Generally speaking, such applications include computer readable instructions and data which, when executed by the processing unit, configure systemto receive, process, and output data. Instructions and data can be stored on non-transitory machine readable medium such asaccessible to system. Instructions and data may be transmitted to/received by systemvia a data signal in a transmission channel enabled (for example) by a wired or wireless network connection over an interface such as communications interface.

200 200 202 200 110 114 116 120 122 124 126 128 130 132 1 FIG. Typically, one application accessible to systemwill be an operating system application. In addition, systemwill store or have access to applications which, when executed by the processing unit, configure systemto perform various computer-implemented processing operations described herein. For example, and referring to the networked environment ofabove, server environmentincludes one or more systems which server-side applications,,,,,, and. Similarly, client systemruns a client application.

2 FIG. 200 200 132 132 218 130 Turning to, an example image generation user interface (UI)will be described. In the present embodiments, UIis generated by client application(referred to as applicationfor convenience) and displayed on a display (e.g. a touch screen or other display such as) of the client system.

300 302 302 304 304 132 304 304 UIincludes a generation regionwhich, in this example, a user interacts with in order to generate a digital image. In this example, the generation regionincludes what will be referred to as a virtual generation surface(or surfacefor convenience). As described further below, applicationallows a user to generate a digital image by adding objects (including actual elements and/or element prompts) to the surface. In this particular example the surfaceis displayed with gridlines however this need not be the case. Generally speaking, a virtual generation surface may be any user interface region or area that a user can add objects to (as discussed below).

300 306 308 132 304 308 306 UIalso includes a preview regionwhich is used to display a digital image. In particular, applicationgenerates a digital image based on the objects that have been added to the surfaceand displays that digital imagein the preview region.

302 306 302 306 302 306 302 306 In the present example, a dotted line is shown as dividing the generation and preview regionsand—such a line need not actually be displayed. Furthermore, while in the present example the generation and preview regionsandare displayed side-by-side (and at the same time) in the same UI, this need not be the case. Instead, the generation regionmay be displayed in one UI and the preview regionmay be displayed in a separate UI. In this case, the generation and preview regionsandmay still be displayed at the same time (albeit in separate UIs), or they could be displayed one at a time: e.g. the application may automatically switch between the UIs or allow a user to select which UI is displayed.

300 310 310 132 UIalso includes an element search region. Element search areamay be used, for example, to search for design elements that applicationmakes available to a user to assist in creating a digital image. Different types of elements may be made available, for example template text elements (with different text format attributes), vector graphic elements (such as geometric shapes and/or other vector graphics), raster elements (such as stock photos or other raster images), chart elements, table elements, and/or other types of design elements.

310 312 132 314 In this example, search areaincludes a search controlvia which a user can enter and submit search text (e.g. a string of characters). In response to a user submitting search text, client applicationmay perform a search and display previews(e.g. thumbnails or the like) of any search results.

132 312 132 130 132 210 110 114 Applicationmay be configured to search for design elements at (and retrieve design elements/previews thereof from) various locations. For example, the search functionality invoked by search controlmay cause applicationto search for design elements that are stored in locally accessible memory of the systemon which applicationexecutes (e.g. memory such asor other locally accessible memory), design elements that are stored at a remote server environment such as(and searched/retrieved via a server application such as), and/or design elements stored on other locally or remotely accessible devices.

300 320 132 320 132 132 132 322 324 326 306 328 GUIalso includes an additional controls areawhich, in this example, is used to display additional controls. The additional controls may include one or more: permanent controls (e.g. controls such as save, download, print, share, publish, and/or other controls that are frequently used/widely applicable and that applicationis configured to permanently display); user configurable controls (which a user can select to add to or remove from area); and/or one or more adaptive controls (which applicationmay change depending, for example, on the type of design element that is currently selected/being interacted with by a user). For example, if a text element is selected, applicationmay display adaptive controls such as font style, type, size, position/justification, and/or other font related controls may be displayed. Alternatively, if a vector graphic element is selected, applicationmay display adaptive controls such as fill attributes, line attributes, transparency, and/or other vector graphic related controls may be displayed. By way of example, a save control, share control, and publish controlmay be provided which allow a user to save, share, and/or publish the digital image that is displayed in the preview region. In certain embodiments, a generate image controlmay also be displayed (the operation of which is described below).

132 132 130 210 110 Once a digital image has been generated, applicationmay provide various options for outputting that digital image. For example, applicationmay provide a user with options to output a digital image by one or more of: saving the digital image to local memory of system(e.g. memory); saving the digital image to remotely accessible memory device; saving the digital image at a server environment such as; sending the digital image to a printer (local or networked) for printing; communicating the digital image to another user (e.g. by email, instant message, or other electronic communication channel); publishing the digital image to a social media platform or other service (e.g. by sending the digital image to a third party server system with appropriate API commands to publish the digital image); and/or by other output means.

As noted above, reference to a design in the present specification is reference to set of design elements. Various design data formats are possible. In order to illustrate designs (as opposed to rasters), this section provides a simplified example of a design data format. Alternative design data formats (which make use of the same or alternative design attributes) are, however, possible, and the processing described herein can be adapted for alternative design formats.

In the present example, data in respect of a particular design is stored in a design record which includes a set of key-value pairs (e.g. a map or dictionary). Generally speaking, the design record defines certain design-level attributes and includes design element data (or element data for short). To assist with understanding, a partial example of a design record format is as follows:

Attribute Example Design ID “designId”: “abc123” Dimensions “dimensions”: {“width”: 1080, “height”: 1080} Background “background”: {“mediaID”: “M12345”} Element data “elements”: [{element 1}, . . . {element n}]

In this example, the design-level attributes include: a design identifier (which uniquely identifies the design); dimensions (e.g. a design width and height; background (data indicating any background that has been set, for example an identifier of an image that has been set as the background, data indicating a colour or colour gradient that has been set as a background, or data indicating an alternative background); and element data (discussed below).

In this example, a design's element data is a set (in this example an array) of element records. Each element record defines an element that has been added to the design. In this example, the element data is ordered and an element record's position in the set serves to identify the element and the depth or z-index of the element. For example, an element at array index n is positioned above an element at array index n−1 and below an element at array index n+1. Element depth may be alternatively handled, however, for example, by storing depth as an explicit element attribute.

Furthermore, in this particular example a design's background (if any) is defined in a design-level attribute. In alternative examples, a background may be defined via an element record in the design's element data. Where the order of element records is used to define depth a background (if any) may be defined by the first element of the element data (e.g. index 0). Alternatively, the element record in respect of the background may be provided with a particular flag or other attribute indicating it is the background.

Generally speaking, an element record defines an object that has been added to the design—e.g. by copying and pasting, importing from one or more element libraries (e.g. libraries of images, element types, animations, videos, etc.), drawing/creating using one or more design tools (e.g. a text tool, a line tool, a rectangle tool, an ellipse tool, a curve tool, a freehand tool, and/or other design tools), or by otherwise being added to a design.

Different types of design elements may be provided for depending on the system in question. The present disclosure is particularly concerned with visual design elements which may include, for example, vector graphic elements, raster image elements, video elements, text elements, and/or elements of other types of visual media.

In the present example, example, elements are associated with position and size data. One example of an element record for an element that is used to display a raster image is as follows:

Attribute Note E.g. Type A value defining the type of the element. “type”: “RASTER” Position Data defining the position of the element: e.g. an (x, y) “position”: (100, 100) coordinate pair defining (for example) the top left point of the element. Size Data defining the size of the element: e.g. a (width, “size”: (500, 400) height) pair. Rotation Data defining any rotation of the element. “rotation”: 0 Opacity Data defining any opacity of the element (or element “opacity”: 1 group). Media Data indicating the media (e.g. an image) that the “mediaID”: “M12345” identifier element holds/is used to display

Different attributes may be appropriate for different types of elements. For example, an element record for an element that is used to display text (e.g. a text-box or a text type element) may also include attributes such as:

Attribute Note E.g. Type A value defining the type of the element. “type”: “TEXT”, Position Data defining the position of the element. “position”: (100, 100) Size Data defining the size of the element. “size”: (500, 400) Rotation Data defining any rotation of the element. “rotation”: 0 Opacity Data defining any opacity of the element. “opacity”: 1 Text Data defining the actual text characters “text”: “Trip” Attributes Data defining attributes of the text (e.g. font, font size, “attributes”: {. . .} font style, font colour, character spacing, line spacing, justification, and/or any other relevant attributes)

118 130 210 The storage location for design data (e.g. design records) will depend on implementation. For example, in the networked environment described above design records are (ultimately) stored in/retrieved from the server environment's data storage. Alternatively, or in addition, design data may be locally stored on a client system(e.g. in memorythereof).

4 FIG. 5 FIG. 400 400 500 304 300 Turning to, a computer implemented methodfor generating a digital image will be described. Certain processing blocks of methodwill be described with reference to the example partial user interfaceof(which depicts the surfaceof example UIdescribed above).

400 112 110 130 Methodwill be described with particular processing being performed by particular applications. In alternative embodiments, however, processing that is described as being performed by a particular application may be performed by one or more alternative applications running on the computer processing hardwareof the server environment, the client system, and/or other computer processing systems.

400 132 132 100 218 132 132 218 224 226 In method, client applicationoperates to display a user interface (and various other display objects) and to detect user inputs. Where client applicationdisplays a user interface (and/or other display objects) it does so via one or more displays that are connected to (or integral with) system—e.g. display. Where client applicationoperates to receive or detect user input, such input is received or detected via one or more input devices that are connected to (or integral with) client system—e.g. a touch screen, a touch screen display, a cursor control device, a keyboard, and/or an alternative input device.

402 132 400 300 At, client applicationdisplays an image generation UI. Methodwill be described with reference to the example image generation UIdescribed above, but alternative user interfaces are possible.

404 132 304 300 At, client applicationdetects a user interaction that adds an object to a virtual generation surface (e.g. surfaceof UI). This will be referred to as an add-object interaction (and it may include one or more user inputs).

132 In the present embodiment, client applicationis configured to permit two types of add-object interactions. These will be referred to as an add-actual-element interaction and an add-element-prompt interaction. These are described in turn below.

304 In the present embodiment, an add-actual-element interaction is an interaction in which a user adds an object that corresponds to an actual element to a particular position on the surface. The actual element may, for example, be an image (which may be a raster or a vector graphic image), a text element, or an alternative visual element. An object corresponding to an actual element may be referred to as an element object. In the present examples, and as described below, an element object may more specifically be an image object (where it corresponds to an image) or a text object (where it corresponds to a text element).

312 132 314 114 132 114 314 304 304 406 304 As one example, an add-actual-element interaction may involve a user submitting a search via a search control such as. In response, images that match the search are identified (which may include raster and/or vector graphic images) and client applicationdisplays previews of those images (such as previews). Searching may be performed by the server applicationand involve communications between the client applicationand server application. The user may then select a particular image (via its preview), drag it to a particular position on the surface, and drop it on the surfaceat that position. This causes an object corresponding to the particular image (which may be referred to as an image object) to be generated and (at) displayed on the surface. In the present embodiment, an image object (that corresponds to a particular image) is generated to take the appearance of the particular image (or a preview of the particular image).

312 132 314 310 304 To further illustrate this example, a user may submit a search string such as “dog” via search control(or otherwise browse for existing “dog” elements). In response, “dog” images are identified and client applicationdisplays previewsthereof in element search region. A user can then drag a particular preview onto the generation pane.

132 314 310 304 406 304 406 As another example, an add-actual-element interaction may involve a user searching or browsing for template text elements. In response, different template text elements are identified—for example a level 1 heading text element, a level 2 heading text element, a paragraph text element, or an alternative text element (each different template text element having different default format attributes such as font type, font size, font colour, alignment, and/or other text format attributes). Client applicationthen displays previewsof the template text elements in element search region. A user can then drag one of those text element previews onto the surface. This causes an object corresponding to the selected text element (which may be referred to as a text object) to be generated and (at) displayed on the surface. Where a text element is added, the corresponding text object that is displayed atmay initially include default text associated with the text element (which may, for example, be text such as “Level 1 heading” or the like). A user may then interact with the text object to edit the default text as desired, though need not do so (in which case the default text remains).

132 210 132 As another example, an add-actual-element interaction may involve a user interacting with a particular position on the canvas (e.g. via a specified user input such as a right click, a dwell gesture, or an alternative interaction). This interaction may be referred to as a secondary interaction. In response to the interaction the client applicationmay display a further user interface (or user interface elements) via which a user can search or browse for visual elements and select a particular visual element. In this case, the further user interface may allow a user to search or browse locally accessible visual elements (e.g. stored on local memory such as) and/or remotely accessible visual elements (e.g. visual elements available through a remote storage device or content server). On user input selecting a visual element client applicationmay then add an element object corresponding to the selected visual element at the particular position on the canvas.

304 304 In the present embodiment, an add-element-prompt interaction causes an object corresponding to a user prompt (which will be referred to as a prompt object) to be added to the surfaceat a particular position. In the present embodiments, an add-element-prompt interaction includes an add component (that adds a prompt object to the surface) and a define component (which defines the text of the prompt).

132 406 222 132 As one example, the add component of an add-element-prompt interaction may involve a user interacting with a particular position on the canvas (e.g. via a specified user input such as a left click, a tap gesture, or an alternative interaction). This interaction may be referred to as a primary interaction. In response to the interaction the client applicationdisplays (at) a prompt object at the particular position on the canvas. The prompt object may, for example, be a text entry box permitting text entry. The define component of the add-element-prompt interaction then involves the user defining specific text for the prompt that is being added. This may involve a user typing text into the prompt object that is displayed, speaking words into a microphone (such as) which client applicationthen converts to text and adds to the prompt object that is displayed, or otherwise defining text for the prompt.

406 404 304 At, and as noted above, the object that is added via the add-object interaction detected atis displayed on the surface. In the present embodiment: an image object (corresponding to an actual image) is displayed as the actual image that the object corresponds to; a text object (corresponding to actual text) is displayed as the actual text that the object corresponds to (which may be default text or user defined text); a prompt object (corresponding to a user prompt) is displayed as the actual prompt text defined by the user.

304 132 In the present embodiment, when an object is added to the surfacethe client applicationis configured to add the object as the top-most object (i.e. highest z-index or depth) by default and to display the object accordingly.

304 132 400 400 400 900 Once an object has been added to (and displayed on) the surface, client applicationmay be configured to permit various interactions with the object. In some instances, a user may interact with an object before the remaining processing of methodis performed (in which case the results of such interactions are taken into account in the processing of method). In other instances, a user may interact with an object after the remaining processing of methodhas been performed (and, as such, a digital image is generated and displayed based on the original—pre-interaction—version of the object). In this case, the further interaction with the object is processed according to methoddescribed below.

304 As one example, a user may interact with an object to move it (that is, change its two-dimensional position) on the surface. This may involve, for example clicking on or contacting the object (e.g. a primary interaction), dragging it to a new position, and releasing it.

As another example, a user may interact with an object to resize it. This may involve, for example, a user selecting an edge or a handle of a bounding box displayed for the object and moving the edge or handle as desired.

132 As another example, a user may interact with an object to change its depth. For example, in response to a right click or dwell gesture (e.g. a secondary interaction) on an object client applicationmay display a menu that provides a user with various options in respect of the object. Such menu options may include depth adjustment options which allow a user to change the depth of the object (e.g. a bring forward option, a send backward option, a bring to front option, a send to back option).

As another example, where the object is an element object a user may interact with the object to change one or more attributes of the element object. This functionality may, for example, be provided by options in a menu as discussed above (which may be displayed in response to a secondary interaction with an object). The adjustment options available will depend on the type of the actual element that the element object corresponds to. For example, if the object is an image object that corresponds to a vector graphic, then attributes that can be changed may include line and/or fill colour changes of one or more components of the vector graphic (and/or changes to other vector graphic attributes). Alternatively, if the object is an image object that corresponds to a raster, then attributes that can be changed may include attributes corresponding to parameters such as contrast, brightness, saturation, tint, and/or other raster image parameters. As a further example, if the object is a text object, then attributes that can be changed may include attributes corresponding to one or more text format attributes (e.g. font type, font size, font style, font colour, and/or other text format attributes) and/or a change to the actual text that is to be displayed.

As another example, where the object is a prompt object, a user may interact with the prompt object to edit the prompt text—e.g. by clicking or contacting the object and entering text.

304 500 304 502 504 506 508 510 5 FIG. To illustrate displaying objects on the surface, partial UIofshows a surfacewhere four objects have (progressively) been added by add-object interactions. These include: image object(added first), which corresponds to an image (e.g. a vector graphic or an image) of a dog that has been added; text object(added second), which corresponds to a text element that has been added (and for which a user has replaced any default text with the text “Happy Birthday”); prompt object(added third), which is a prompt that has been added and that a user has defined the prompt text of “cat” for; prompt object(added fourth), that a user has defined the prompt text of “in a backyard” for; and prompt object(added fifth), which is a prompt that has been added and that a user has defined the prompt text of “ball” for.

500 132 504 506 508 510 504 506 508 510 In partial UI, client applicationhas displayed the objects so that text object(which corresponds to a text element) is visually distinguished from prompt objects,, and. In this specific example, text objectis displayed with a dot-dash line bounding box (and no fill) while prompt objectsandandare displayed with a dash-dash-dash line bounding box (and a partially transparent grey fill). Alternative techniques for visually distinguishing text objects from prompt objects may be used, for example by use of different line colours, different line types, different fill colours, different fill patterns, and/or other visual techniques.

408 404 410 412 As indicated at, if the object that has been added atis a prompt object processing proceeds to. If it is an element object processing proceeds to.

410 414 6 FIG. At, a prompt object has been added. In this case, the prompt text of the prompt object is resolved to an image—which is referred to as the resolved image for the prompt object. In the present embodiments the resolved image is a raster image (which may include an alpha channel or transparency layer which causes any background of the resolved image to be rendered transparently). Resolution of a prompt object to a resolved image may be done in various ways, examples of which are described below with reference to. Following resolution of the prompt object to a resolved image processing proceeds to.

412 404 At, an element object has been added. In this case a caption is determined for the element object at.

In the present embodiments, if the element object is a text object, the caption is determined to be the actual text of that text object. This may be default text inherited from the text element the object corresponds to (e.g. “Heading 1” or other default text, depending on the object) or the text entered by a user for the text object.

If the element object is an image object, the caption may be determined in various ways.

In some instances, an element object may correspond to an image that is associated with metadata that includes descriptive text of the image. In this case, that metadata may be used as the caption for the image object. For example, an image of a dog may include a metadata attribute such as “Caption: Dog”. If metadata such as this exists, then that caption may be used for the image object (though in other instances a caption may nonetheless be generated as discussed below if desired—for example to try and provide consistency between captions rather than relying on metadata).

114 126 114 126 126 114 126 126 114 126 If the image that an image object corresponds to is not associated with relevant metadata, a caption is generated. In the present embodiment, server applicationcoordinates generation of a caption using the image to text applicationwhich, as described above, may be a trained image captioning machine learning model. In this case, server applicationprovides the image that the image object corresponds to as input to the image to text application(potentially with a prompt, if required) which returns a caption for the image—e.g. a short textual description of the subject of the image. In certain cases, and depending on the image and the image to text application, server applicationmay need to process the image before providing it as input to the image to text application. For example, if the image to text applicationtakes a raster as input, but the image in question is a vector graphic, server applicationis configured to rasterise the vector graphic and then provide the rasterised vector graphic as input to the image to text application.

In the present embodiment captions are not determined for prompt objects. Rather, if a caption is needed for the prompt object the prompt text itself may be used (or text based thereon).

410 412 304 412 412 410 Followingand, an object that has been added to the surfaceis associated with both an image and a caption. These may be referred to as an object-image and an object-caption. For an image object, the object-image is the image itself and the object-caption is the caption determined at; for a text object, the object-image is an image of the actual text and the object-caption is the actual text (or text based thereon) (determined at); for a prompt object, the object-image is the resolved image generated for the prompt (at) and the object-caption is the prompt text (or text based thereon).

414 404 114 132 At, a layer is determined for the object (e.g. the actual element or the prompt) that has been added at. In the present embodiment, determination of an object's layer is performed by the server application. This could, however, be done by the client application(or an alternative application).

114 Server applicationis configured to determine a particular layer for an object from a set of predefined layers. The set of predefined layers includes two or more predefined layers that have a specific depth (or z-index) order.

By way of example, the set of predefined layers may include three layers: a background layer (at depth 0, or the rearmost layer); a text layer (at depth 1, immediately above the background layer); and a foreground layer (at depth 2, or immediately above the text layer). As an alternative example, the set of predefined layers may include three layers may include: a far background layer (at depth 0); a near background layer (at depth 1); a midground layer (at depth 2); a text layer (at depth 3); and a foreground layer (at depth 4). As a further alternative example, the set of predefined layers may include two layers: a background layer (at depth 0) and a text layer (at depth 1). Other sets of predefined layers are possible.

114 Server applicationmay be configured to determine the particular layer for an object in various ways.

114 In the present embodiments, the set of predefined layers includes a dedicated text layer. In this case, server applicationassigns any text object (corresponding to a text element) to the text layer. Each other object (e.g. image objects and prompt objects) are processed further to determine a layer and assign the object to that layer.

114 412 In one example, server applicationmay make use of a machine learning model (e.g. a classifier) that is programmed to automatically classify non-text objects into one of the predefined layers. Such a model may be trained to classify an object as belonging to a particular layer based on the object-caption, the object-image, or both. In the present embodiment, and as described above: for a prompt object, the object-caption is the text of the prompt as entered by the user (or text based thereon) and the object-image is the object's resolved image; for an image object, the object-caption is the caption determined atand the object-image is the image (or a preview of the image) of the actual element that the object-image corresponds to. Where a machine learning model is used to determine an object's layer any appropriate machine learning architecture may be used (for example a convolutional neural network) and the model may be trained based on an appropriate training dataset that includes numerous images (and/or their associated text) and the predefined layer(s) those images most frequently appear in.

114 114 In an alternative example, server applicationmay make use of natural language processing techniques and a set of heuristic rules to determine the layer for an object based on the caption (and/or other text) associated with each object. By way of simple example, and assuming a set of layers that includes a background layer, text layer, and foreground layer as described above, server applicationmay be configured such that: if an object correspond to an text element, it is associated with the text layer; if an object's associated text indicates the object corresponds to a background element (e.g. based on the specific words and/or the grammar of the associated caption), it is associated with the background layer; otherwise the object is associated with the foreground layer.

114 As yet a further example, and in the example set of layers above which include a background layer and a text layer, server applicationmay assign all objects corresponding to text elements to the text layer and all other objects to the background layer.

304 304 500 114 5 FIG. In certain embodiments, the depths of objects within a layer (referred to as the intra-layer depth) are determined. In this case, intra-layer depth is based on the depth of the objects on the surface. As discussed above, in the present embodiments when an object is added to the surfaceit is added as the top-most object, however a user may manually change the depth of an object. To illustrate layer and intra-layer depths, for the five objects that have been added to example UIof, the server applicationmay assign layers and intra-layer depths as follows:

Depth on Intra-layer Object gen. surface Layer assigned depth 502 (added initially) 0 Foreground (layer 2) 0 504 (added second) 1 Text (layer 1) 0 506 (added third) 2 Foreground (layer 2) 1 508 (added fourth) 3 Background (layer 0) 0 510 (added fifth) 4 Foreground (layer 2) 2

114 If, however, a user had adjusted the depth of the “ball” prompt object after adding it (e.g. in a send to back operation), applicationwould adjust the intra-layer depths for the foreground layer objects as follows:

Depth on Intra-layer Object gen. surface Layer assigned depth 502 (added initially)  1 Foreground (layer 2)  1 504 (added second)  2 Text (layer 1) 0 506 (added third)  3 Foreground (layer 2)  2 508 (added fourth)  4 Background (layer 0) 0 510 (added fifth,  0 Foreground (layer 2)  0 sent-to-back)

In alternative embodiments, intra-layer depths need not be tracked/determined.

416 114 414 418 420 At, server applicationdetermines if generation of a new layer-image is required. In the present embodiments, generation of a new layer-image is required if the layer determined atis anything other than the text layer. If generation of a new layer-image is required, processing proceeds to. If not, processing proceeds to.

418 114 114 404 414 700 420 At, server applicationgenerates a layer-image. In particular, server applicationgenerates a layer-image for the predefined layer that the object added athas been assigned to (at). Generally speaking, generation of a layer-image for a selected layer involves generating a single raster that is based on all objects that have been assigned to the selected layer. An example methodfor generating a layer-image for a selected layer is described below. Following generation of the layer-image processing proceeds to.

420 114 304 At, server applicationgenerates a digital image that corresponds to the objects that have been added to the surface. Generally speaking, the digital image is generated by composing the layer-images that have been generated and (in the present embodiment) the objects that have been assigned to the text layer into a single digital image based on the depth order of the predefined layers. This may be done in various ways.

114 420 114 304 In the present embodiment, sever applicationcreates a design-format image at. To do this, sever applicationgenerates a design element (which will be referred to as a layer-element) corresponding to each non-text layer and generates a set of text-type design elements that includes a text-type design element corresponding to each text object that has been added to the surface.

114 418 304 114 To generate a layer-element that corresponds to a selected non-text layer, and with the example design data format described above, server applicationcreates an element record and associates that element record with the layer-image that has been generated (at) for the selected layer (e.g. via the element's “media” attribute). If no objects that have been added to the surfacehave been assigned to the selected layer then no layer-image will have been generated for that layer and it is ignored. Each layer-image that has been generated should be the same size (e.g. a default design size) and when generating a layer-element server applicationmay set size and position data for the element that causes the layer-image to occupy the entirety of the design. For example, the position data may be (0,0) and the size data may include the width and height of the design itself.

114 304 114 304 304 304 To generate the set of text-type design elements, server applicationprocesses each text object that has been added to the surfaceand generates a corresponding text-type design element. For a selected text object, server applicationgenerates the corresponding design element to have a size and position that are based on the size and position of the object in the surface. In some instances, and depending on the size of the surface, the size and position of a text-type design element will be the same as the size and position of the corresponding object. In other instances, the size and position of a text-type design element will be proportional to the size and position of the corresponding object. Other attributes of the text-type design element (including text formatting attributes, the actual text that is displayed, and any other relevant attributes) are taken from the text object (and, therefore, the original text element that the text object corresponds to). The depth order of the text-type design elements within the set of text-type design elements is based on the depths of the corresponding objects on the surface.

114 Once layer-elements corresponding to each non-text layer have been generated, and the set of text-type design elements has been generated, server applicationgenerates a new design. In the new design, the layer-elements and set of text-type design elements are arranged in depth order.

1 2 3 To illustrate this, consider the example above where there are three predefined layers: background layer (depth 0), text layer (depth 1), and foreground layer (depth 2). For the purposes of this illustration assume that server application has generated: a single layer-element corresponding to the background layer; a set of three text-type elements (T, T, and T) which correspond to three text objects that have intra-layer depths of 0, 1, and 2 respectively; and a single layer-element corresponding to the foreground layer. To generate the design server application may generate an ordered set of elements as follows:

422 114 420 132 114 132 132 306 308 At, server applicationcauses the digital image generated atto be displayed by the client application. To do this, server applicationsends the digital image (or data in respect thereof) to the client application. On receipt, the client applicationcauses the digital image to be displayed. In this example, the digital image is displayed in the preview region(e.g. as digital image).

132 132 114 Once the digital image has been generated and is displayed, a user may perform various actions. For example, client applicationmay provide various user interface controls via which a user can; save the digital image as a design-format image; save the digital image as a raster-format image (in which case client applicationor server applicationrasterises the design-format image); share the digital image (as a design-format or raster-format image); publish the digital image (as a design-format or raster-format image); and/or perform other operations on the digital image.

304 308 306 308 In the example described above, the digital image is generated in such a way that any text elements a user has added to the surfaceare generated and included as editable text elements (i.e. not as rasterised versions thereof). An advantage of this is that a user may wish to interact with those text elements in the design-format digital imagethat has been generated and is displayed in the preview region. For example, a user may select a particular text element in the digital imageand perform various actions such as: move the text element; resize the text element; change the depth of the text element (within the set of text elements or to bring the text element in front of a layer-element or send the text element behind a layer-element); change the text of the text element; change formatting attributes of the text element (e.g. font size, style, type, colour, and/or other format attributes); animate the text element; and/or perform other actions that are relevant to a text element. Notably, such interaction with the digital image that has been generated would not be possible if the image was generated as a raster-format image.

114 420 114 In alternative embodiments, however, server applicationmay generate the digital image atin ways that do not maintain text elements as editable design elements. As one example, instead of generating a set of text-type design elements as described above, server applicationmay instead generate a layer-image for the text layer (i.e. a single raster image including all text elements) and then a single layer-element corresponding to the text layer (the layer-element associated with the text layer's layer-image). In this case, and returning to the above 3-layer example, server application will generate: a single layer-element corresponding to the background layer; a single layer corresponding to the text layer; and a single layer-element corresponding to the foreground layer. To generate the design server application would then generate the ordered set of elements as follows:

422 304 304 400 420 422 304 900 304 In addition to, or instead of, interacting with the digital image as displayed at, a user may “edit” the digital image by further interactions with the surface. For example, a user may add a further object to the surface, in which case processing according to processrepeats and results in a new digital image being generated atand displayed at. Alternatively, a user may interact with an existing object on the surface, in which case processing according to a method such as method(described below) may be performed. While adding a further object to the surfaceand/or interacting with an existing object may be referred to as “editing” the digital image (or as resulting in the digital image being “edited”), such interactions actually cause generation and display of a new digital image.

410 400 600 600 110 132 114 132 132 130 6 FIG. Atof methoda prompt object that has been added in an add-object event is resolved to an image (referred to as the resolved image). Turning to, a methodfor resolving a prompt to a resolved image will be described. In this embodiment, methodis performed at the server environment. To this end, the client applicationcommunicates the prompt text of a prompt object to the server applicationwhich coordinates resolution of the prompt to a resolved image. In other embodiments, however, prompt resolution may be performed by the client applicationitself, or the client applicationin conjunction with one or more other applications (remote or local to the client system).

602 114 At, server applicationgenerates a prompt-expansion prompt: that is, a prompt that will be used to expand the text of the prompt.

114 114 “You are a prompt writer. Please generate a prompt of 50 words or less that will be used to generate an image by creating an expanded description of the text “<user text component>”.” In the present embodiment, server applicationgenerates a prompt-expansion prompt that includes both a user text component (text that is or is based on the prompt text of the prompt object being processed) and a context component (text which provides additional context that is ultimately used to generate the prompt-expansion prompt). As one example, server applicationmay be configured to generate the prompt-expansion prompt by use of a prompt expansion template which includes the context component and to which the user text component is added. By way of specific example, the prompt expansion template may be a template such as:

114 In this example, in order to generate the prompt-expansion prompt the server applicationsubstitutes the “<user text component>” text in the template with the actual user text component.

604 114 602 114 120 120 At, server applicationuses the prompt-expansion prompt generated atto generate an expanded prompt. In the present embodiment server applicationdoes so by processing the prompt-expansion prompt using the first text generation application. As discussed above, the first text generation applicationtakes text as input and generates text (in this particular instance an expanded prompt) as output.

114 120 120 In the present embodiment, server applicationis configured to cause the first text generation applicationto generate the expanded prompt using a fixed seed (and to use the same fixed seed each time an expanded prompt is generated). The fixed seed is an input or parameter that causes the first text generation applicationto generate the same output each time the same input is provided. That is, instead of potentially generating two different expanded prompts in response to the same prompt-expansion prompt, use of the same fixed seed parameter results in the same expanded prompt being generated in response to the same prompt-expansion prompt.

506 120 “You are a prompt writer. Please generate a prompt of 50 words or less that will be used to generate an image by creating an expanded description of the text “cat”.”Processing this via the first text generation applicationmay then result in the following expanded prompt being generated “A fluffy, orange tabby cat with bright green eyes, lounging on a windowsill bathed in warm sunlight. The cat's fur glows softly in the light, and its tail is curled around its body.” To illustrate prompt expansion, consider an example where the text of a prompt object is “dog” (e.g. as for the objectin the example above). In this case (and with the example template above), the server would generate a prompt-expansion prompt of

606 114 124 124 At, the server applicationprocesses the expanded prompt using the image generation application. This causes the image generation applicationto generate an initial image (e.g. a raster image) based on the prompt. This image may be referred to as the resolved image.

114 124 124 124 120 In the present embodiment, server applicationis configured to cause the image generation applicationto generate the resolved image using a fixed seed (and to use the same fixed seed each time a resolved image is generated). The fixed seed input is a parameter that causes the image generation applicationto generate the same output each time the same input is provided. That is, instead of potentially generating two different resolved images in response to the same expanded prompt, use of the same fixed seed parameter results in the same resolved image being generated in response to the same expanded prompt. The fixed seed used for the image generation applicationneed not be the same fixed seed used for the first text generation application.

608 114 114 606 128 At, and if necessary, the server applicationprocesses the resolved image to remove any background. To do this, server applicationprocesses the resolved image generated atusing background removal application. This results in a background-removed version of the resolved image which is then returned/used as the resolved image.

It will be appreciated that resolving the prompt text of a prompt object to an image may be performed in alternative ways.

114 604 114 For example, the server applicationmay resolve the prompt text to an image by performing a search of existing visual elements. Such a search may be based on the prompt text or an expanded prompt (as described at). For example, server applicationmay perform a search based on the prompt text (or expanded prompt text) and select a specific visual element (e.g. the visual element with the highest/most favourable search score) which is returned by the search to be the resolved visual image.

124 608 114 As another example, the image generation applicationmay be capable of generating images that do not have any background. In this case, background removal processing atmay not be necessary (though server applicationmay be configured to generate an expanded prompt that explicitly includes an instruction to generate an image without a background or with a transparent background).

418 400 700 700 110 114 132 132 130 7 FIG. Atof methoda new layer-image is generated for a selected layer. Turning to, a methodfor generating a layer-image will be described. In this embodiment, methodis performed at the server environment, with server applicationorchestrating the process. In other embodiments, however, layer-image generation may be performed by the client applicationitself, or the client applicationin conjunction with one or more other applications (remote or local to the client system).

702 114 114 304 304 304 8 FIG. At, server applicationgenerates what will be referred to as an image-raster for the selected layer. To generate the image-raster, server applicationdetermines all objects that have been assigned to the selected layer and generates a raster that is based on the object-images associated with each of those objects. In the image-raster, the position of each object-image is based on, and corresponds to, the position of the associated object on the surface. In the image-raster, the size of each object-image may be determined in various ways. For example, for an object-image that is an actual image object, the size of the object-image may be the size of the image object itself (noting that a user may resize an image object after adding it to the surface). For an object-image that is a resolved image generated for a prompt, the object-image may be generated at a fixed sized and (optionally) resized. For example, an object-image may be resized based on: the size of a corresponding prompt object's text or bounding box (which a user may resize after adding to the surface); heuristic approaches (e.g. predefined heuristic rules based on one or more factors such as object identity, object importance, relative object size, object location, canvas size, etc.); machine learning based approaches; or a combination of such approaches. Furthermore, and as noted, a user may manually resize object-images (or the image objects or prompt objects they correspond to) to override any automatic resizing. An example of generating an image-raster is described further below with reference to.

704 114 114 304 At, server applicationgenerates what will be referred to as a text-raster for the selected layer. In order to generate the text raster, server applicationdetermines all objects that have been assigned to the selected layer and generates a raster that is based on the text of the object-captions associated with each of those objects. In the text-raster, each object caption is used to generate a text item (the text item being the text of the object-caption or text based thereon). The position of each text item based on the position of the object that the text item corresponds to on the surface.

8 FIG. 702 704 Turning to, an example of generating an image-raster (at) and a text-raster (at) will be described.

8 FIG. 5 FIG. 304 502 506 510 depicts the surfaceof, and the example will be in respect of generating image and text-rasters corresponding to the foreground layer (to which objects,, andhave been assigned).

8 FIG. 800 304 802 804 806 502 506 510 502 802 506 804 510 806 800 802 804 806 also depicts an image-rasterthat corresponds to the foreground layer of surface. Image-raster includes object-images,, andwhich correspond respectively to foreground layer objects,, and. Objectis an image object and therefore the corresponding object-imageis (in this example) that image (the actual dog graphic). Objectis a prompt object (with the text “cat”), and therefore the corresponding object-imageis the resolved image for that prompt (an image of a cat). Objectis a prompt object (with the text “ball”), and therefore the corresponding object-imageis the resolved image for that prompt (an image of a ball). Image-rasteris a single raster and as such although object-images,, andare individually referenced they are simply pixels of the image-raster, not distinct objects/images

8 FIG. 810 304 812 814 816 502 506 510 502 812 502 506 814 502 510 816 510 810 812 814 816 also depicts a text-rasterthat corresponds to the foreground layer of surface. Text-raster includes text items,, andwhich correspond respectively to foreground layer objects,, and. Objectis an image object, and therefore the corresponding text itemis based on the object-caption for object(in this example the word “dog”). Objectis a prompt object (with the text “cat”), and therefore the corresponding text itemis based on the object-caption for object(which, in this example, is the prompt text as entered by the user: the word “cat”). Objectis a prompt object (with the text “ball”), and therefore the corresponding text itemis based on the object-caption for object(which, in this example, is the prompt text as entered by the user: the word “ball”). Text-rasteris a single raster and as such although text items,, andare individually referenced they are simply pixels of the text-raster, not distinct objects/images.

706 114 114 122 At, server applicationgenerates a layer-image generation prompt: that is, a prompt that will (in due course) be used to generate the new layer-image for the selected layer. In the present embodiment server applicationgenerates the layer-image generation prompt by using the second text generation applicationwith inputs that include a prompt input and at least one image input.

122 “You are a prompt writer. Create a prompt that reflects a cohesive image that would have all these objects and that reflects the intentions of the prompts.” In the present example, the prompt input that is used to generate the layer-image generation prompt is a predefined text prompt that describes the task to be performed by the second text generation application. As one example, such text input may be:

702 704 820 122 8 FIG. In the present example, the image input that is used to generate the layer-image generation prompt is based on the image-raster generated atand text-raster generated at. In one implementation, the image input is a single image-text-raster that includes (or combines) both the image-raster and the text-raster. An example of such an image-text-raster is rasterof, which is a single raster with the image and text-rasters positioned side-by-side. In alternative embodiments, the image-raster and text-raster may be provided as separate image inputs to the second text generation application.

122 In still further embodiments, the image input to the second text generation applicationmay include only the image-raster or only the text-raster.

820 122 8 FIG. “A beautiful photo of a golden retriever dog on the left and a Bengal cat that is jumping and playing with a ball on the right.” To illustrate generation of the layer-image generation prompt, providing the example predefined text described above with combined image/text-rasterdepicted inas input to the second text generation applicationmay generate a layer-image generation prompt such as:

708 114 114 124 704 702 124 At, server applicationgenerates a layer-image. In the present embodiment, server applicationgenerates the layer-image by using the image generation applicationwith inputs that include the layer-image generation prompt generated atand the image-raster generated at. The output of the image generation applicationis then an image (referred to as the layer-image) that is based on those inputs.

114 708 702 704 706 124 708 704 708 702 In alternative embodiments, server applicationmay generate the layer-image atbased solely on the image raster generated ator based solely on the text raster generated at. In this case, a layer-image generation prompt need not be generated ator used as input to the image generation applicationwhen generating the layer-image. Furthermore: if the layer-image is generated atbased solely on an image raster a text raster need not be generated at; and if the layer-image is generated atbased solely on a text raster an image text raster need not be generated at.

114 124 124 In the present embodiment, server applicationis configured to cause the image generation applicationto generate the layer-image using a fixed seed (and to use the same fixed seed each time a layer-image is generated). As discussed above, a fixed seed is a parameter that causes the image generation applicationto generate the same output each time the same input is provided.

710 114 114 708 128 At, and if necessary, the server applicationprocesses the layer-image to remove any background. To do this, server applicationprocesses the layer-image generated atusing background removal application. This results in a background-removed version of the layer-image which is returned/used as the layer-image for the selected layer.

302 304 As will be appreciated, by generating a layer-image in this way the two-dimensional positions of the objects that have been assigned to the layer for which the image is generated are taken into account. To illustrate this, if a user places a first object at the bottom left corner of the generation pane(e.g. a prompt object or an element object that is associated with a “dog” image) and a second object at the top right corner of the generation pane (e.g. a prompt object or an element object that is associated with a “cat image), and both those objects are assigned to the same layer, then the layer-image that is generated will depict a dog at the bottom left and a cat at the top right. If a user then changes the generation paneso the first object is at the top right and the second object is at the bottom left, then the layer-image that is generated will depict a dog at the top right and a cat at the bottom left. This provides a user with an intuitive way of not only specifying the types of objects that the digital image is to include, but also specifying the relative positions of those objects in the digital image.

124 708 304 114 706 In alternative embodiments, and depending on the specific image generation application(and/or the nature of the trained image generation model that application is or uses), actual object coordinates may be used as inputs to the image generation model at. For example, two-dimensional coordinates (e.g. (x,y) coordinate pairs) may be determined for each object based on its position on the surface(e.g. coordinate pair indicating a centroid of the object or an alternative defined point such as the top-left corner). The object coordinates may then be used as input to the image generation application. For example, server applicationmay generate (or amend) the layer-generation prompt generated atto incorporate the object coordinates.

304 400 304 As noted above, once at least one object has been added to surfaceand a digital image has been generated (e.g. according to method), a user may interact with an existing object on the surface. This may be referred to as “editing” the existing digital image that has been generated and displayed and, in most cases, will appear to a user as if they are editing that digital image. From a processing perspective however, interacting with an existing object actually causes a new digital image to be generated and displayed.

9 FIG. 900 304 900 304 400 304 900 Turning toa computer implemented methodfor generating a new digital image based on adjustment of an object on the surfacewill be described. Methodis performed after a digital image has been generated and displayed. This may, for example, after an has been added to a surfaceand, in response, a digital image has been generated and displayed (per method), or after an existing object on a surfacehas been adjusted and, in response, a digital image has been generated and displayed (per methoditself).

902 132 304 300 At, client applicationdetects a user interaction with an object that has been displayed on a virtual generation surface (e.g. surfaceof UI). This will be referred to as an edit-object interaction (and it may include one or more user inputs).

132 304 304 In the present embodiment, client applicationis configured permit various edit-object interactions such as: a delete object user interaction (which involve a user selecting an object and deleting it); a move object interaction (which involves a user selecting an object and moving it to a new position on the surface); a change depth interaction (which involves a user selecting an object and altering its depth relative to other objects on the surface—e.g. by sending back, bringing forward, sending to back, bringing to front); a resize object interaction (which involves a user selecting an object and resizing it uniformly or non-uniformly, e.g. by moving a bounding box edge or handle that is displayed for the object); a change prompt interaction (which involves a user selecting a prompt object and changing the prompt text that has been entered); and a change actual element attribute operation (which involves a user selecting an element object—e.g. an image object or a text object—and changing one or more attributes of that relevant to that object). The attribute changes that may be made to an element object will depend on the type of the actual element the object corresponds to. For example, for an image object that corresponds to a vector graphic, then attributes that can be changed may include line and/or fill colour changes of one or more components of the vector graphic (and/or changes to other vector graphic attributes). Alternatively, for an image object that corresponds to a raster, then attributes that can be changed may include attributes corresponding to parameters such as contrast, brightness, saturation, tint, and/or other raster image parameters. As a further example, for a text object, then attributes that can be changed may include attributes corresponding to one or more text format attributes (e.g. font type, font size, font style, font colour, and/or other text format attributes) and/or a change to the actual text that is to be displayed by the text object.

904 132 304 132 304 132 132 132 At, client applicationupdates the display of the surfacein accordance with the edit-object interaction. For example: for a delete object user interaction, client applicationdeletes the selected object from the surface; for a move object, change depth, or resize object interaction, client applicationmoves, resizes, or changes the depth of the selected object in accordance with the user input; for a change prompt interaction, client applicationdisplays the new prompt text; and for a change actual element attribute operation, client applicationupdates the appearance of the object according to the attribute change(s) that has/have been made.

906 908 910 912 As indicated at, different processing may be required depending on the type of edit-object interaction. In the present example: if the edit-object interaction is a change depth interaction, processing proceeds to; if the edit-object interaction is a change prompt interaction, processing proceeds to; otherwise, processing proceeds to.

908 414 304 304 At, the edit-object interaction is a change depth interaction. In this case, a layer is determined for the object that has been edited. This processing may be the same as (or similar to) the processing described above with reference to processing block. In many cases, changing the depth of an object on the surfacewill not result in a new layer being determined for the object-however this may not always be the case. Changing the depth of an object on the surfacemay, however, result in the object having a new intra-layer depth. If so, and intra-layer depth is maintained, the intra-layer depth of the object is updated (which may involve the intra-layer depths of other objects also being updated to accommodate the update).

912 Following determination of the layer (and, if determined, intra-layer depth) for the object processing proceeds to.

910 600 606 410 910 912 At, the edit-object interaction is a change prompt interaction. In this case, the edited prompt is resolved to a new resolved image for the object. Processing to resolve the edited prompt to a new resolved image may be the same as (or similar to) the processing described above—e.g. by using the edited prompt to generate a new image (as described with reference to method) or using the edited prompt to retrieve an existing image. In embodiments that generate an image using a fixed seed (e.g. atabove), the same fixed seed that is used to generate a resolved image atis used to generate a resolved image at. Following resolution of the new prompt to a new resolved image, processing proceeds to.

912 At, and if required, one or more new layer-images are generated. In embodiments where text objects are assigned to a text layer, and the digital design is generated with editable text-type design elements corresponding to each text object,), one or more new layer-images will need to be generated unless the object that has been edited is a text object. In embodiments where text objects are not included in the digital design as editable design elements, one or more new layer-images will need to be generated in most (if not all) cases.

908 700 In most cases a new layer-image will only need to be generated for the layer that the object that has been edited belongs to. For example, if the object that has been edited belongs to the foreground layer, then a new layer-image for the foreground layer is generated. If the edits to the object result in a new layer being determined for that object (at), however, two new layer-images will need to be generated: one for the layer that the object has been newly assigned to and one for the layer that the object was previously assigned to (given the layer-image for the previously assigned layer will have been generated including the object which is no longer assigned to that layer). Processing to generate a (or each) layer-image may be the same as (or similar to) the processing described above with reference to method.

914 304 902 420 At, a new digital image is generated that corresponds to the objects on the surface. This includes the object that is edited at. Processing to generate the new digital image may be the same as (or similar to) the processing described above with reference to processing block.

916 914 306 422 At, the new digital image generated atis displayed in place of the previously generated digital image (e.g. in preview region). Processing to display the new digital image may be the same as (or similar to) the processing described above with reference to processing block.

10 FIG. 10 FIG. 1000 1000 1000 1000 1000 1002 1004 1002 Turning to, an example in which several digital images are generated in accordance with the processing described above will be described.depicts a partial user interfacein several states (stateA toF). In each stateA to F, partial UIincludes a virtual surfaceand digital imagethat has been generated based on the objects that have been added to the virtual surface.

1000 1006 1002 410 1006 414 418 420 1004 422 1008 1006 114 710 1004 1010 1010 1004 In stateA, a user has added a single prompt objectwith the prompt text “dog” to the surface(via an add-element-prompt interaction). As a result of this user interaction: the prompt “dog” has been resolved to a resolved image at; objecthas been assigned to the foreground layer at; a new layer-image has been generated for the foreground layer; a digital image has been generated at; and the digital imagehas been displayed at. As can be seen, the image content of the digital image includes a dogcorresponding to object. In this particular instance, and as no object has been assigned to the background layer, server applicationhas not removed the background of the foreground layer-image at. As a result, the imagethat has been generated includes a “background”(though at this point the “background” may be part of the foreground layer). In other embodiments, however, the server application may remove the background of a layer-image even if no object has been assigned to the background layer. If this was done, then the “background”would not be visible in image.

1000 1012 1002 410 1012 414 418 420 1004 422 1008 1006 1014 1012 In stateB, a user has added a second prompt objectwith the prompt text “ball” to the surface. As a result of this user interaction: the prompt “ball” has been resolved to a resolved image at; objecthas been assigned to the foreground layer at; a new layer-image has been generated for the foreground layer at; a new digital image has been generated at; and the new digital imagehas been displayed at. As can be seen, the image content of the new digital image includes the dogcorresponding to objectand a ballcorresponding to object.

1000 1016 1002 410 1016 414 418 420 1004 422 1008 1006 1014 1012 1018 1016 1018 1010 In stateC, a user has added a third prompt objectwith the prompt text “in a backyard” to the surface. As a result of this user interaction: the prompt “in a backyard” has been resolved to a resolved image at; objecthas been assigned to the foreground layer at; a new layer-image has been generated for the background layer at; a new digital image has been generated at(incorporating both the pre-existing foreground layer-image and the new background layer-image); and the new digital imagehas been displayed at. As can be seen, the image content of the new digital image includes the dogcorresponding to object, the ballcorresponding to object, and a backgroundcorresponding to object. As can also be seen, backgroundhas replaced the “background”.

1000 1020 1002 410 1020 414 418 420 1004 422 1008 1006 1014 1012 1018 1016 1022 1020 In stateD, a user has added a fourth prompt objectwith the prompt text “bengal cat” to the surface. As a result of this user interaction: the prompt “bengal cat” has been resolved to a resolved image at; objecthas been assigned to the foreground layer at; a new layer-image has been generated for the foreground layer at; a new digital image has been generated at(incorporating both the new foreground layer-image and the pre-existing background layer-image); and the new digital imagehas been displayed at. As can be seen, the image content of the new digital image includes the dogcorresponding to object, the ballcorresponding to object, the backgroundcorresponding to object, and a catcorresponding to object.

1000 1012 1002 1020 910 912 1012 1020 914 1004 916 1008 1006 1000 1014 1012 1000 1018 1016 1022 1020 In stateE, a user has: adjusted the position of prompt objecton the surfaceand edited the prompt text of prompt object(from “bengal cat” to “bengal cat jumping”). As a result of these user interactions: the prompt “bengal cat jumping” has been resolved to a resolved image at; a new layer-image has been generated for the foreground layer at(taking into account edited objectsand); a new digital image has been generated at; and the new digital imagehas been displayed at. As can be seen, the image content of the new digital image includes: a dogcorresponding to object(however the angle of the dog's head has now changed compared to the dog in statesA-D); a ballcorresponding to object(noting that the ball has moved compared to its position in statesA-D); a backgroundcorresponding to object; and a catcorresponding to object(noting that the cat is now jumping).

1000 1024 1002 1024 1024 1024 1020 414 420 420 1004 422 1008 1006 1014 1012 1018 1016 1022 1020 1026 1024 In stateF, a user has added a text objectto the surface. The text element that text objectcorresponds to (and, therefore, text objectitself) has format properties that include purple colour text, 14 point size, bold, Comic Sans font. Further, the user has entered the text of “Pets playing” for the text object. As a result of this user interaction: objecthas been assigned to the text layer at; a new text element has been generated for the text object at; a new digital image has been generated at(incorporating, in back-to-front depth order: the pre-existing background layer-image; the new text element; the pre-existing foreground layer-image); and the new digital imagehas been displayed at. As can be seen, the image content of the new digital image includes: the dogcorresponding to object; the ballcorresponding to object; a backgroundcorresponding to object; a catcorresponding to object; and an editable text elementcorresponding to object.

400 900 304 1100 1100 304 11 FIG. Methodsandas described above operate to generate a digital image in real time (or near real time). That is, as a user interacts with the surface(e.g. by adding objects and/or interacting with existing objects) processing is performed to continually generate and display new digital images in accordance with the user interactions. Turning to, an alternative methodfor generating a digital image will be described. In method, rather than automatically generating and displaying new digital images as a user interacts with the surface, the system is configured to generate and display a digital image only in response to a specific user command to do so (referred to as a generate-image interaction).

1102 132 304 132 306 At, client applicationdisplays an image generation UI. The image generation UI includes a virtual generation surface such asdescribed above. Client applicationmay also (concurrently) display an image preview region such asin the image generation UI but need not do so.

1104 304 404 902 132 304 406 304 904 At, one or more user interactions with the image surfaceare detected. These may include one or more add-object interactions as described atabove and/or one or more edit object interactions as described atabove. In response to detecting a user interaction with the virtual generation surface, client applicationupdates the display of the virtual generation surface in accordance with the user interaction. E.g. for an add-object user interaction the surfaceis updated as described atto display the object that is added, and for an edit-object interaction the surfaceis updated as described at.

1106 132 328 At, client applicationdetects a generate-image user interaction. This may, for example, be user input activating a generate image control such as control.

1108 132 304 304 304 1108 1110 1118 As generally indicated at, In response to detecting the generate-image user interaction client applicationgenerates a digital image based on the state of the surfaceat the time generate-image user interaction is detected: that is based on the objects that are on the surface(and their positions on the surface). Generation of the digital image atinvolves processing blocksto.

1110 304 600 At, each prompt object that has been added to the surface(if any) is resolved into a resolved image. Processing to resolve a prompt object to a resolved image may be the same as (or similar to) the processing described above—e.g. by using the prompt to generate a new image (as described with reference to method) or using the prompt to retrieve an existing image.

1112 304 412 At, a caption is determined for each element object that has been added to the surface. Processing to determine the caption for an element object may be the same as (or similar to) the processing described above with reference to processing block.

1114 304 414 At, a layer is determined for each object that has been added to the surface. The processing performed to determine the layer for an object may be the same as (or similar to) the processing described above with reference to processing block.

1116 700 At, a layer-image is generated for each relevant predefined layer. In this context, a relevant predefined layer is a predefined layer that has at least one object assigned to it. In embodiments where text objects are assigned to a text layer, and the digital design is generated with editable text-type design elements corresponding to each text object, the text layer is not a relevant layer (and does not have a layer-image generated for it). Processing to generate a layer-image for a relevant predefined layer may be the same as (or similar to) the processing described above with reference to method.

1118 1116 420 At, a digital image is generated based on the layer-images generated at(and, where relevant, any text objects). Processing to generate the new digital image may be the same as (or similar to) the processing described above with reference to processing block.

1120 1118 422 306 304 132 306 306 306 304 306 304 306 132 304 At, the digital image generated atis displayed. Processing to display the new digital image may be the same as (or similar to) the processing described above with reference to processing block. In embodiments where the design generation UI initially includes a preview region such as(displayed concurrently with the design surface), client applicationdisplays the digital image that is generated in the preview region. In embodiments in which the design generation UI does not initially include a preview region, displaying the design includes displaying a preview region such as. In this case the preview regionmay be displayed together with the surface(i.e. so both are visible at the same time). Alternatively, the preview regionmay be displayed instead of the design surface. In this case the preview regionmay include a control which, if activated, causes client applicationto re-display the design surface.

304 304 Once the digital image has been displayed, a user may interact further with the image (e.g. as described above) and/or the surface(e.g. to add further objects or edit objects, before performing a generate-image user interaction to generate a new image based on an updated surface).

304 The above embodiments facilitate generation of digital images by adding prompt objects (corresponding to user prompts) and/or element objects (corresponding to actual elements) to a surface. In alternative implementations, a system may facilitate generation of digital images by adding prompt objects only to a virtual generation surface, or by adding element objects only.

114 304 414 418 304 304 In the above embodiments, the server applicationis configured to determine a particular layer for each object that is added to the surface(e.g. at) and to generate a separate layer-image for each relevant layer (e.g. at). In alternative embodiments, a system may be configured to operate without determining different layers for objects and generating separate layer-images for those layers. In this case all objects that are added to the surfaceare (effectively) treated as being on the same single layer and generation of a layer-image for that layer is generation of the digital image corresponding to the surface(as there is no need to combine different layer-images and/or text objects).

The following sets of numbered clauses describe additional, specific embodiments of the disclosure.

determining a first set of objects, wherein each object in the first set of objects is associated with an object-image and a position; processing the first set of objects to generate a first image-raster, wherein the first image-raster incorporates each object-image that is associated with an object in the first set of objects and each object-image is positioned in the first image-raster based on the position of the object that the object-image is associated with; and generating a first digital image, wherein generating the first digital image includes processing the first image-raster using a first machine learning model, and wherein the first machine learning model is a trained image generation model. Clause 1. A computer implemented method including:

the method further includes generating a first image generation prompt based on the first image-raster; and generating the first digital image includes processing the first image-raster and the first image generation prompt using the first machine learning model. Clause 2. The computer implemented method of clause 1, wherein:

each object in the first set of objects is associated with an object-caption; and the method further includes processing the first set of objects to generate a first text-raster, wherein the first text-raster incorporates each object-caption that is associated with an object in the first set of objects and each object-caption is positioned in the first text-raster based on the position of the object that the object-caption is associated with; and the first image generation prompt is generated based on the first image-raster and the first text-raster. Clause 3. The computer implemented method of clause 2, wherein:

determining a set of text objects, wherein each text object in the set of text objects is associated with a position; processing the set of text objects to generate a corresponding set of text-type design elements, wherein the set of text-type design elements includes a text-type design element corresponding to each text object in the set of text objects, and each text-type design element includes position data that is based on the position of the text object the text-type design element corresponds to; and generating a final digital image based on the first digital image and the set of text-type design elements. Clause 4. The computer implemented method of any one of clauses 1 to 3, further including:

determining a second set of objects, wherein each object in the second set of objects is associated with an object-image and a position; processing the second set of objects to generate a second image-raster, wherein the second image-raster incorporates each object-image that is associated with an object in the second set of objects and each object-image is positioned in the second image-raster based on the position of the object that the object-image is associated with; generating a second digital image, wherein generating the second digital image includes processing the second image-raster using the first machine learning model; and generating a final digital image based on the first digital image and the second digital image. Clause 5. The computer implemented method of any one of clauses 1 to 3, further including:

the method further includes generating a second image generation prompt based on the second image-raster; and generating the second digital image includes processing the second image-raster and the second image generation prompt using the first machine learning model. Clause 6. The computer implemented method of clause 5, wherein:

the first set of objects is associated with a first predefined layer that is associated with a first layer depth; the second set of objects is associated with a second predefined layer that is associated with a second layer depth; and the final digital image is generated by composing the first digital image and the second digital image together in a depth order that is based on the first and second layer depths. Clause 7. The computer implemented method of clause 5 or clause 6, wherein:

determining a set of text objects, wherein each text object in the set of text objects is associated with a position and an object depth; the set of text-type design elements includes a text-type design element corresponding to each text object in the set of text objects; each text-type design element is associated with position data that is based on the position of the text object that the text-type design element corresponds to; and each text-type design element is associated with an element depth that is based on the object depth of the text object that the text-type design element corresponds to, processing the set of text objects to generate a corresponding set of text-type design elements, wherein: and wherein the final digital image is generated by composing the first digital image, the second digital image, and the set of text-type design elements together in a depth order that is based on the first layer depth, the second layer depth, and the element depth associated with each text-type design element. Clause 8. The computer implemented method of clause 7, further including:

the first set of objects includes a first object; the first object is a prompt object that is associated with first prompt text and a first position; and the method further includes determining a first object-image for the first object based on the first prompt text. Clause 9. The computer implemented method of any one of clauses 1 to 8, wherein:

Clause 10. The computer implemented method of clause 9, wherein determining the first object-image includes using the first prompt text to identify and retrieve an existing image.

Clause 11. The computer implemented method of clause 9, wherein determining the first object-image includes generating a new image based on the first prompt text.

generating a second image generation prompt based on the first prompt text; and processing the second image generation prompt using a second machine learning model, wherein the second machine learning model is a trained image generation model. Clause 12. The computer implemented method of clause 11, wherein generating the new image includes:

generating a prompt-expansion prompt based on the first prompt text; and generating the second image generation prompt by processing the prompt-expansion prompt using a third machine learning model, wherein the third machine learning model is a trained text generation model. Clause 13. The computer implemented method of clause 12, wherein generating the second image generation prompt includes:

generating an initial version of the new image based on the first prompt text; and generating the new image by removing a background of the initial version of the new image. Clause 14. The computer implemented method of any one of clauses 11 to 13, wherein generating the new image includes:

Clause 15. The computer implemented method of any one of clauses 12 to 14, wherein the first machine learning model and the second machine learning model are the same machine learning model.

the first set of objects includes a second object; the second object is an image object that is associated with a second object-image; and the second object-image is an existing image. Clause 16. The computer implemented method of any one of clauses 1 to 15, wherein:

Clause 17. The computer implemented method of clause 16, further including processing the existing image to generate a second object-caption for the second object, wherein the second object-caption includes text describing a subject of the existing image.

Clause 18. The computer implemented method of any one of clauses 1 to 17, further including causing the first digital image to be displayed on a display screen.

Clause 19. The computer implemented method of any one of clauses 4 to 8, further including causing the final digital image to be displayed on a display screen.

Clause 20. The computer implemented method of any one of clauses 1 to 19, wherein the first set of objects is determined from a superset of objects, the superset of objects including a plurality of objects that are positioned on a virtual generation surface that is displayed on a display screen.

displaying, on a display, a user interface including a virtual generation surface; detecting a first user interaction adding a first object to the virtual generation surface at a first position, wherein the first object is a prompt object and the first user interaction includes user input that defines first prompt text for the first object; resolving the first object to a first resolved image based on the first prompt text; generating a first layer-image based on the first resolved image, wherein the first layer-image includes first image content that corresponds to the first resolved image, and wherein the first image content is positioned in the first layer-image at a position that is based on the first position of the first object on the virtual generation surface. Clause 1. A computer implemented method including:

determining a first set of objects that belong to a first predefined layer, wherein: each object in the first set of objects is associated with an object-image and a position on the virtual generation surface; and the first set of object includes the first object which is associated with the first resolved image; processing the first set of objects to generate an image-raster, wherein the image-raster incorporates each object-image that is associated with an object in the first set of objects and each object-image is positioned in the image-raster based on the position of the object that the object-image is associated with; generating a layer-image generation prompt, wherein the layer-image generation prompt is generated based on the image-raster; and generating the first layer-image by processing the image-raster and the layer-image generation prompt using a trained image generation machine learning model. Clause 2. The computer implemented method of clause 1, wherein generating a first layer-image includes:

each object in the first set of objects is associated with an object-caption; the method further includes processing the first set of objects to generate a text-raster, wherein the text-raster incorporates each object-caption that is associated with an object in the first set of objects and each object-caption is positioned in the text-raster based on the position of the object that the object-caption is associated with; and the layer-image generation prompt is generated based on the image-raster and the text-raster. Clause 3. The computer implemented method of clause 2, wherein:

Clause 4. The computer implemented method of any one of clauses 1 to 3, wherein resolving the first object to the first resolved image includes using the first prompt text to identify and retrieve an existing image.

Clause 5. The computer implemented method of any one of clauses 1 to 3, wherein resolving the first object to the first resolved image includes generating a new image based on the first prompt text.

generating a first image generation prompt based on the first prompt text; and processing the first image generation prompt using a first machine learning model, wherein the first machine learning model is a trained image generation model. Clause 6. The computer implemented method of clause 5, wherein generating the new image includes:

generating a prompt-expansion prompt based on the first prompt text; and generating the first image generation prompt by processing the prompt-expansion prompt using a second machine learning model, wherein the second machine learning model is a trained text generation model. Clause 7. The computer implemented method of clause 6, wherein generating the first image generation prompt includes:

generating an initial version of the new image based on the first prompt text; and generating the new image by removing a background of the initial version of the new image. Clause 8. The computer implemented method of any one of clauses 5 to 7, wherein generating the new image includes:

a second object is positioned on the design generation surface at a second position; the second object is associated with a second object-image; and the first layer-image is generated based on the first resolved image and the second object-image, wherein the first layer-image includes second image content that corresponds to the second object-image and the second image content is positioned in the first layer-image at a position that is based on the second position of the second object on the virtual generation surface. Clause 9. The computer implemented method of any one of clauses 1 to 6, wherein:

the second object is a prompt object and is associated with second prompt text; and the method further includes processing the second prompt text to generate the second object-image. Clause 10. The computer implemented method of clause 9, wherein:

a third object is positioned on the design generation surface at a third position; the third object is associated with a third object-image; and determining that the first object belongs to a first predefined layer; determining that the third object belongs to a second predefined layer that is different to the first predefined layer; generating a second layer-image based on the third object-image, wherein the second layer-image includes third image content that corresponds to the third object-image, and wherein the third image content is positioned in the second layer-image at a position that is based on the third position of the third object on the virtual generation surface; and generating a final digital image based on the first layer-image and the second layer-image. the method further includes: Clause 11. The computer implemented method of any one of clauses 1 to 10, wherein:

the first predefined layer is associated with a first layer depth; the second predefined layer is associated with a second layer depth; and the final digital image is generated by composing the first layer-image and the second layer-image together in a depth order that is based on the first and second layer depths. Clause 12. The computer implemented method of clause 11, wherein:

a fourth object is positioned on the design generation surface at a fourth position; the fourth object is a text object that is associated with an object depth; the method further includes processing the fourth objects to generate a corresponding text-type design element, wherein the text-type design element is associated with position data that is based on the fourth position an element depth that is based on the object depth; and the final digital image is generated by composing the first layer-image, the second layer-image, and the text-type design element together in a depth order that is based on the first layer depth, the second layer depth, and the element depth associated with the text-type design element. Clause 13. The computer implemented method of clause 12, wherein:

Clause 14. The computer implemented method of any one of clauses 11 to 13, further including displaying the final digital image.

Clause 15. The computer implemented method of any one of clauses 1 to 14, further including displaying the first layer-image.

determining a first set of images, wherein each image in the first set of images is associated with a position; processing the first set of images to generate a first image-raster, wherein the first image-raster incorporates each image in the first set of images and each image in the first set of images is positioned in the first image-raster based on its associated position; generating a first image generation prompt, wherein the first image generation prompt is generated based on the first image-raster; and generating a first digital image, wherein generating the first digital image includes processing the first image-raster and the first image generation prompt using a trained image generation machine learning model. Clause 1. A computer implemented method including:

one or more processing units; and one or more non-transitory computer-readable storage media storing instructions, which when executed by the processing unit, cause the one or more processing units to perform a method according to: any one of clauses 1 to 20 of clause set 1; any one of clauses 1 to 14 of clause set 2; and/or clause 1 of clause set 3. Clause 1. A computer processing system including:

Clause 2. One or more non-transitory storage media storing instructions executable by one or more processing units to cause the one or more processing units to according to: any one of clauses 1 to 20 of clause set 1; any one of clauses 1 to 14 of clause set 2; and/or clause 1 of clause set 3.

130 132 110 130 110 110 130 130 130 110 112 130 130 132 110 130 In the above embodiments certain operations are described as being performed by the client system(e.g. under control of the client application) and other operations are described as being performed at the server environment. Variations are, however, possible. For example in certain cases an operation described as being performed by client systemmay be performed at the server environmentand, similarly, an operation described as being performed at the server environmentmay be performed by the client system. Generally speaking, however, where user input is required such user input is initially received at client system(by an input device thereof). Data representing that user input may be processed by one or more applications running on client systemor may be communicated to server environmentfor one or more applications running on the server hardwareto process. Similarly, data or information that is to be output by a client system(e.g. via display, speaker, or other output device) will ultimately involve that system. The data/information that is output may, however, be generated (or based on data generated) by client applicationand/or the server environment(and communicated to the client systemto be output).

The flowcharts illustrated in the figures and described above define operations in particular orders to explain various features. In some cases the operations described and illustrated may be able to be performed in a different order to that shown/described, one or more operations may be combined into a single operation, a single operation may be divided into multiple separate operations, and/or the function(s) achieved by one or more of the described/illustrated operations may be achieved by one or more alternative operations. Still further, the functionality/processing of a given flowchart operation could potentially be performed by (or in conjunction with) different applications running on the same or different computer processing systems.

The present disclosure provides various user interface examples. It will be appreciated that alternative user interfaces are possible. Such alternative user interfaces may provide the same or similar user interface features to those described and/or illustrated in different ways, provide additional user interface features to those described and/or illustrated, or omit certain user interface features that have been described and/or illustrated.

In some instances the present disclosure and/or claims may use the terms “first,” “second,” etc. to identify and distinguish between elements or features. When used in this way, these terms are not used in an ordinal sense and are not intended to imply any particular order. For example, when the terms “first” etc are used to differentiate features, a first feature could equally be referred to a second feature without departing from the scope of the described examples. Furthermore, when the terms “first” etc are used to differentiate features a second feature could exist without a first feature or a second feature could occur before a first feature.

It will be understood that the embodiments disclosed and defined in this specification extend to alternative combinations of two or more of the individual features mentioned in or evident from the text or drawings. All of these different combinations constitute alternative embodiments of the present disclosure.

The present specification describes various embodiments with reference to numerous specific details that may vary from implementation to implementation. No limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should be considered as a required or essential feature. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

August 3, 2025

Publication Date

March 26, 2026

Inventors

Danny Wu

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Systems and methods for generating digital images” (US-20260087702-A1). https://patentable.app/patents/US-20260087702-A1

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.