To adjust an aspect ratio of an image to match the aspect ratio of a display area for presenting the image, a computing device receives an image having a first aspect ratio, and obtains a second aspect ratio for a display area of a display in which to present the image, where the second aspect ratio is different from the first aspect ratio. The computing device extends the image to include one or more additional features which were not included in the image. Additionally, the computing device automatically crops the extended image around an identified region of interest by selecting a portion of the extended image that has an aspect ratio which matches the second aspect ratio of the display area, and provides the cropped image for presentation within the display area of the display.
Legal claims defining the scope of protection, as filed with the USPTO.
receiving, at one or more processors, an image having a first aspect ratio; obtaining, by the one or more processors, a selection of a plurality of second aspect ratios for a plurality of display areas of displays in which to present the image, wherein the plurality of second aspect ratios are different from the first aspect ratio; extending, by the one or more processors, the image to include one or more additional features which were not included in the image using a machine learning model to generate a plurality of extended images having the plurality of second aspect ratios; and providing, by the one or more processors, the plurality of extended images for presentation within the plurality of display areas of the displays. . A method for adjusting an aspect ratio of an image, the method comprising:
claim 1 extending, by the one or more processors, the image beyond at least one dimension of the display area, such that the extended image can be cropped into a plurality of aspect ratios; and automatically cropping, by the one or more processors, the extended image by selecting a portion of the extended image that has an aspect ratio which matches the second aspect ratio of the display area. for one of the plurality of display areas: . The method of, further comprising:
claim 1 extending, by the one or more processors, the image using a generative adversarial network (GAN), wherein each of the plurality of extended images includes the image and one or more extended portions. . The method of, wherein extending the image includes:
claim 3 . The method of, wherein the GAN includes a generator for generating extended images and a discriminator for distinguishing between naturally generated and artificially generated images.
claim 4 training, by the one or more processors, the generator using the naturally generated images; and training, by the one or more processors, the discriminator using a first set of visual features of the naturally generated images, and a second set of visual features of the artificially generated images. . The method of, further comprising:
claim 4 applying, by the one or more processors, the image to the generator to generate one of the plurality of extended images; applying, by the one or more processors, the extended image to the discriminator to determine whether the discriminator can distinguish the extended image from the naturally generated images; and in response to determining that the discriminator cannot distinguish the extended image from the naturally generated images, using the extended image to adjust the aspect ratio. . The method of, wherein extending the image using the GAN includes:
claim 6 in response to determining that the discriminator can distinguish the extended image from the naturally generated images, providing the extended image to the generator for further training. . The method of, further comprising:
claim 1 applying, by the one or more processors, a plurality of transformations to the image to generate a plurality of transformed images; applying, by the one or more processors, the plurality of transformed images to the GAN to generate a plurality of transformed, extended images; applying, by the one or more processors, a plurality of respective reverse transformations to the plurality of transformed, extended images to generate a plurality of extended images; and combining, by the one or more processors, the plurality of extended images via a median filter to generate one of the plurality of extended images. . The method of, wherein extending the image includes:
one or more processors; and receive an image having a first aspect ratio; obtain a selection of a plurality of second aspect ratios for a plurality of display areas of displays in which to present the image, wherein the plurality of second aspect ratios are different from the first aspect ratio; extend the image to include one or more additional features which were not included in the image using a machine learning model to generate a plurality of extended images having the plurality of second aspect ratios; and provide the plurality of extended images for presentation within the plurality of display areas of the displays. a non-transitory computer-readable memory coupled to the one or more processors and storing instructions thereon that, when executed by the one or more processors, cause the computing device to: . A computing device for adjusting an aspect ratio of an image, the computing device comprising:
claim 9 extend the image beyond at least one dimension of the display area, such that the extended image can be cropped into a plurality of aspect ratios; and automatically crop the extended image by selecting a portion of the extended image that has an aspect ratio which matches the second aspect ratio of the display area. for one of the plurality of display areas: . The computing device of, wherein the instructions further cause the computing device to:
claim 9 . The computing device of, wherein the image is extended using a generative adversarial network (GAN), and wherein each of the plurality of extended images includes the image and one or more extended portions.
claim 11 . The computing device of, wherein the GAN includes a generator for generating extended images and a discriminator for distinguishing between naturally generated and artificially generated images.
claim 12 train the generator using the naturally generated images; and train the discriminator using a first set of visual features of the naturally generated images, and a second set of visual features of the artificially generated images. . The computing device of, wherein the instructions further cause the computing device to:
claim 12 apply the image to the generator to generate one of the plurality of extended images; apply the extended image to the discriminator to determine whether the discriminator can distinguish the extended image from the naturally generated images; and in response to determining that the discriminator cannot distinguish the extended image from the naturally generated images, use the extended image to adjust the aspect ratio. . The computing device of, wherein to extend the image using the GAN, the instructions cause the computing device to:
claim 14 in response to determining that the discriminator can distinguish the extended image from the naturally generated images, provide the extended image to the generator for further training. . The computing device of, wherein the instructions further cause the computing device to:
claim 9 apply a plurality of transformations to the image to generate a plurality of transformed images; apply the plurality of transformed images to the GAN to generate a plurality of transformed, extended images; apply a plurality of respective reverse transformations to the plurality of transformed, extended images to generate a plurality of extended images; and combine the plurality of extended images via a median filter to generate one of the plurality of extended images. . The computing device of, wherein to extend the image, the instructions cause the computing device to:
receive an image having a first aspect ratio; obtain a selection of a plurality of second aspect ratios for a plurality of display areas of displays in which to present the image, wherein the plurality of second aspect ratios are different from the first aspect ratio; extend the image to include one or more additional features which were not included in the image using a machine learning model to generate a plurality of extended images having the plurality of second aspect ratios; and provide the plurality of extended images for presentation within the plurality of display areas of the displays. . A non-transitory computer-readable medium storing instructions that, when executed by one or more processors in a computing device, cause the one or more processors to:
claim 17 extend the image beyond at least one dimension of the display area, such that the extended image can be cropped into a plurality of aspect ratios; and automatically crop the extended image by selecting a portion of the extended image that has an aspect ratio which matches the second aspect ratio of the display area. for one of the plurality of display areas: . The non-transitory computer-readable medium of, wherein the instructions further cause the one or more processors to:
claim 17 . The non-transitory computer-readable medium of, wherein the image is extended using a generative adversarial network (GAN), and wherein each of the plurality of extended images includes the image and one or more extended portions.
claim 19 . The non-transitory computer-readable medium of, wherein the GAN includes a generator for generating extended images and a discriminator for distinguishing between naturally generated and artificially generated images.
Complete technical specification and implementation details from the patent document.
The present application is a continuation of U.S. patent application Ser. No. 18/028,063 entitled “Flexible Image Aspect Ratio Using Machine Learning” and filed on Mar. 23, 2023, which claims priority to priority to PCT/CN2022/094121 filed May 20, 2022, the disclosure of each of which is hereby incorporated by reference herein in its entirety.
The present disclosure relates to adapting fixed-dimension images provided by a content provider, such as an advertiser, to different dimensions and/or aspect ratios.
The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
A variety of computing devices support browsers or other applications that present images. These images may be presented within particular display areas in a display. For example, advertisements may be presented within predetermined areas within a display, where each display area has a fixed location, size, and/or aspect ratio within the display. When an aspect ratio of an image differs from the aspect ratio of the display area in which the image is presented, the display area may include white space which is undesirable or other visual artifacts associated with the difference in aspect ratios, such as distortion. Additionally, portions of the image may be cut off or removed to fit within the display area, which may result in the loss of significant visual features within the image such as text, an object, or a portion thereof.
To adapt an image having a first aspect ratio to fit within a display area having a second aspect ratio which is different from the first aspect ratio, an image adjustment system uses a generative adversarial network (GAN), which is a type of machine learning model, for generating image extensions to extend an image to include additional features which were not included in the original image. For example, an original image may include a vehicle in the foreground with a house in the background where the roof of the house is cut off in the image. The image adjustment system may use the GAN to generate an extended version of the image which includes the roof of the house which was not included in the original image.
The GAN may include both a generator and a discriminator. The generator uses an encoder-decoder architecture that takes an input image and a binary mask as input, and generates an extended image as output. The discriminator takes either a naturally generated image or an artificially generated image as input, runs it through a convolutional neural network, and attempts to differentiate artificially generated images from real naturally generated images. The image adjustment system uses a combination of loss functions to train the generator and the discriminator, such as an adversarial loss, a reconstruction loss, and/or a perceptual loss. By using the discriminator, the image adjustment system can identify whether an image is real or generated by the generator. For image extensions, the discriminator identifies not only whether an image looks real, but also whether the generated image extensions looks consistent with the known portions of the input image. To ensure this consistency, the image adjustment system trains the discriminator by including known pixels from the input image in the generated image and using descriptors from naturally generated images.
To reduce artifacts in the generated image, the image adjustment system may perform an augmented inference technique. More specifically, the image adjustment system applies multiple transformations to the input image, such as flipping color channels, slight cropping, flipping left/right, flipping up/down, etc. Then the image adjustment system applies the transformed images to the GAN model, applies reverse transformations to the resulting images, and combines them via a median filter to reduce image artifacts.
The GAN model may be trained using naturally generated images from the real world, such as scenes and objects. In other implementations, the GAN model is trained using naturally generated images by users on computing devices, such as display ad images, or any suitable combination of these.
After an image has been extended using the GAN, the image adjustment system automatically crops the extended image using a selected aspect ratio, such as the aspect ratio of the display area. For example, an original image may have an aspect ratio of 16:9. The display area for presenting the image within a display may have an aspect ratio of 3:4. To adjust the aspect ratio of the image from 16:9 to 3:4 without distorting the image or losing significant visual features within the image such as text or objects, the image adjustment system extends the image using the GAN. Then the image adjustment system automatically crops the extended image, such that the cropped image has an aspect ratio of 3:4. The image adjustment system may automatically crop the extended image using machine learning techniques. More specifically, the image adjustment system may identify a region of interest within the extended image using machine learning techniques and may crop the extended image around the region of interest using the selected aspect ratio.
Then the image adjustment system provides the cropped image for presentation within the display area of the display. In this manner, the cropped image may fill the entire display area without leaving any white space, thereby optimizing screen real estate and improving the user interface. Moreover, by using a GAN to generate the extended portions of the image, the extended image appears more realistic and with less visual artifacts compared to alternative implementations such as color padding, blurring, distorting an image, or cutting off certain portions of the image to fit a selected aspect ratio.
One example embodiment of the techniques of this disclosure is a method for adjusting an aspect ratio of an image. The method includes receiving an image having a first aspect ratio, obtaining a second aspect ratio for a display area of a display in which to present the image, where the second aspect ratio is different from the first aspect ratio, and extending the image to include one or more additional features which were not included in the image. The method also includes automatically cropping the extended image around an identified region of interest by selecting a portion of the extended image that has an aspect ratio which matches the second aspect ratio of the display area, and providing the cropped image for presentation within the display area of the display.
Another example embodiment is a computing device for adjusting an aspect ratio of an image. The computing device includes one or more processors and a non-transitory computer-readable memory coupled to the one or more processors and storing thereon instructions. The instructions, when executed by the one or more processors, cause the computing device to receive an image having a first aspect ratio, obtain a second aspect ratio for a display area of a display in which to present the image, where the second aspect ratio is different from the first aspect ratio, and extend the image to include one or more additional features which were not included in the image. The instructions further cause the computing device to automatically crop the extended image around an identified region of interest by selecting a portion of the extended image that has an aspect ratio which matches the second aspect ratio of the display area, and provide the cropped image for presentation within the display area of the display.
Yet another example embodiment is a computer-readable medium storing instructions for adjusting an aspect ratio of an image. The computer-readable medium may be transitory or non-transitory. The instructions, when executed by one or more processors, cause the one or more processors to receive an image having a first aspect ratio, obtain a second aspect ratio for a display area of a display in which to present the image, where the second aspect ratio is different from the first aspect ratio, and extend the image to include one or more additional features which were not included in the image without blurring or color padding the image. The instructions further cause the one or more processors to provide at least a portion of the extended image having an aspect ratio which matches the second aspect ratio of the display area for presentation within the display area of the display.
Generally speaking, the systems and methods of the present disclosure adjust the aspect ratio of an image so that the image fits within the display area of a display without distorting the image or having white space between the image and the respective display area. The display may be a web page or application screen which presents image content from content providers within predetermined display areas. For example, the image content may include advertisements, photographs, etc.
A content provider may provide an input image to a server device, where the input image has a first aspect ratio. The server device may identify the dimensions and/or aspect ratio of the display area for presenting the input image. For example, the display may include several adjacent display areas for presenting image content. The server device may identify the dimensions and/or aspect ratio for one of the adjacent display areas, or the adjacent display areas may have the same dimensions and/or aspect ratios. The aspect ratio for the display area may be a second aspect ratio which is different from the first aspect ratio.
Then the server device may extend the input image to include additional features which were not included in the input image, for example using a first machine learning model such as a GAN. Accordingly, the server device may increase the dimensions of the input image without adjusting the scale of the input image. Next, the server device automatically crops the extended image using the second aspect ratio around an identified region of interest. In some implementations, the server device identifies a region of interest within the extended image using a second machine learning model. Then the server device generates a rectangular box for cropping the extended image using the second aspect ratio. The server device places the rectangular box around the identified region of interest to crop the extended image. Then the server device provides the cropped image as an output image to a client device for presentation within the display area of the display.
1 FIG. 100 10 122 34 122 60 10 122 Referring to, an example computing environment for an image adjustment systemincludes a client deviceconfigured to execute an applicationsuch as a browser application, one or more content providersconfigured to provide image content (also referred to herein as “input images”) for presentation in display areas of a display of the browser application, and a server deviceconfigured to adjust the aspect ratios of the input images to fit within the respective display areas. The client devicemay be operated by a user. The browser applicationmay receive, interpret, and/or display web page information while also receiving inputs from the user.
60 80 The server devicecan be communicatively coupled to a databasethat stores, in an example implementation, a first machine learning model such as a GAN model for generating extended images. The training data used as training input for the first machine learning model may include naturally generated images from the real world, such as scenes and objects as well as naturally generated images by users on computing devices, such as display ad images. As used herein, naturally generated images may refer to real-world images or images generated on a computing device by a person. Artificially generated images may refer to images at least partially generated by a computing device without user input to generate the image or portion thereof. The training data for the first machine learning model may also include a first set of visual features for naturally generated images and a second set of visual features for artificially generated images.
80 Additionally, the databasemay store a second machine learning model for identifying regions of interest (ROIs) within extended images. The training data used as training input for the second machine learning model may include a set of images, a portion of each image in the set indicated as an ROI for the image, and the remaining portion of each image in the set which was not indicated as the ROI for the image.
60 34 60 34 34 60 60 100 30 More generally, the server devicecan communicate with one or several databases that store any type of suitable content information or image adjustment information. The content providercan provide image content to the server devicevia a native application or web browser executing on a client device of the content provider. For example, the content providercan upload the image content to the server devicevia an advertisement application or website. The server devicemay then identify a web page or application screen for presenting the image content, and a specific display area within the web page or application screen in which to present the image content. The devices operating in the image adjustment systemcan be interconnected via a communication network.
10 10 120 116 112 114 132 19 120 114 10 10 10 1 FIG. In various implementations, the client devicemay be a smartphone or a tablet computer. The client devicemay include a memory, one or more processors (CPUs), a graphics processing unit (GPU), an I/O moduleincluding a microphone and speakers, a user interface (UI), and one or several sensorsincluding a Global Positioning Service (GPS) module. The memorycan be a non-transitory memory and can include one or several suitable memory modules, such as random access memory (RAM), read-only memory (ROM), flash memory, other types of persistent memory, etc. The I/O modulemay be a touch screen, for example. In various implementations, the client devicecan include fewer components than illustrated inor conversely, additional components. In other embodiments, the client devicemay be any suitable portable or non-portable computing device. For example, the client devicemay be a laptop computer, a desktop computer, a wearable device such as a smart watch or smart glasses, a virtual reality headset, etc.
120 126 126 10 126 10 The memorystores an operating system (OS), which can be any type of suitable mobile or general-purpose operating system. The OScan include application programming interface (API) functions that allow applications to retrieve sensor readings. For example, a software application configured to execute on the computing devicecan include instructions that invoke an OSAPI for retrieving a current location of the client deviceat that instant. The API can also return a quantitative indication of how certain the API is of the estimate (e.g., as a percentage).
120 122 122 132 60 132 120 132 The memoryalso stores a browser application, which is configured to receive, interpret, and/or display web page information while also receiving inputs from the user as mentioned above. The browser applicationmay present web pages via the UI, and may present image content provided by the server devicein display areas of the UI. The memorymay also store other applications (not shown) which may be configured to present image content in display areas of the UI.
1 FIG. 122 122 10 122 10 122 It is noted that althoughillustrates the browser applicationas a standalone application, the functionality of the browser applicationalso can be provided as a plug-in or extension for another software application executing on the client device, etc. The browser applicationgenerally can be provided in different versions for different respective operating systems. For example, the maker of the client devicecan provide a Software Development Kit (SDK) including the browser applicationfor the Android™ platform, another SDK for the iOS™ platform, etc.
60 62 64 64 64 62 68 In some implementations, the server deviceincludes one or more processorsand a memory. The memorymay be tangible, non-transitory memory and may include any types of suitable memory modules, including random access memory (RAM), read-only memory (ROM), flash memory, other types of persistent memory, etc. The memorystores instructions executable on the processorsthat make up an image aspect ratio adjuster, which can generate a first machine learning model for generating extended images and a second machine learning model for identifying ROIs within extended images.
68 122 122 122 68 122 68 122 10 60 The image aspect ratio adjusterand the browser applicationcan operate as components of an image adjustment system. Alternatively, the image adjustment system can include only server-side components and simply provide the browser applicationwith web information to present, including display areas and image content to present within the display areas. In other words, image adjustment techniques in these embodiments can be implemented transparently to the browser application. As another alternative, the entire functionality of the image aspect ratio adjustercan be implemented in browser application. More generally, the image aspect ratio adjusterand the browser applicationmay execute on the client device, the server device, or any suitable combination of these.
1 FIG. 60 60 10 For simplicity,illustrates the server deviceas only one instance of a server. However, the server deviceaccording to some implementations includes a group of one or more server devices, each equipped with one or more processors and capable of operating independently of the other server devices. Server devices operating in such a group can process requests from the client deviceindividually (e.g., based on availability), in a distributed manner where one operation associated with processing a request is performed on one server device while another operation associated with processing the same request is performed on another server device, or according to any other suitable technique. For the purposes of this discussion, the term “server device” may refer to an individual server device or to a group of two or more server devices.
2 FIG. 200 10 60 200 200 10 10 10 200 200 202 206 202 206 202 206 200 202 206 202 206 202 206 202 206 202 206 illustrates an example web page or application screenwhich may be presented on the client device. The server devicemay generate the web page or application screenand provide the generated web page or application screento the client devicefor display. In other implementations, the server device may provide information to the client devicefor the client deviceto generate the web page or application screen, such as layout information. In any event, the web page or application screenincludes display areas-for placing image content within the display areas-. Each of the display areas-have a particular set of dimensions and/or aspect ratio within the web page or application screen. The displays areas-may each have a set of borders depicting the boundaries of each display area-. Accordingly, images must fit within the boundaries of the display areas-. If the aspect ratio of an image differs from the aspect ratio of a display area-, white space may be shown between the boundaries of the image and the boundaries of the display area-.
3 FIG. 300 34 60 300 300 300 300 illustrates an example input imagewhich may be provided from a content providerto the server device. The input imagehas an aspect ratio of 3:4. While the input imagemay also have a set of dimensions which differ from the dimensions of a display area for presenting the input image, the input imagemay be scaled up or down to match the dimensions of the display area without changing the aspect ratio.
300 202 206 3 4 1 1 60 300 300 300 300 300 300 300 300 300 To adjust the aspect ratio of the input imageto match the aspect ratio of a display area-for presenting the input image (e.g., from:to:), the server deviceextends the input imageto include additional features which were not included in the input image. The additional features may include greater portions of objects in the input image, such as greater portions of people, animals, furniture, buildings, floors, walls, ceilings, trees, plants, water, etc. in the input imagewhich may be cut off in the input image. The input imagemay be extended in all directions so that the extended image includes extended portions to the left, to the right, above, and below the input image. This allows for the input imageto be cropped using any suitable aspect ratio without having to cut off significant features of the input image.
For example, when the input image is an advertisement for a car, the input image may include the car and text describing the car in the center of the input image. The input image may have a landscape aspect ratio (e.g., 16:9) while the display area may have a portrait aspect ratio (e.g., 9:16). If the input image in its current form is cropped to the portrait aspect ratio, portions of the car and the text may be cut off. If the dimensions of the input image are altered from the landscape aspect ratio to the portrait aspect ratio without extending the input image, the car and text may be distorted by stretching the car and text vertically and condensing the car and text horizontally.
68 In any event, the server device and more specifically, the image aspect ratio adjustertrains a first machine learning model such as a GAN model to extend images in such a way to minimize the amount in which the extended images are visually distinguishable from unextended or naturally generated images. The GAN model may include two components: a generator and a discriminator.
300 300 300 300 The generator generates an extended image by combining an input imagewith a binary input mask that extends beyond the dimensions of the input imagein each direction. The input imageand the binary input mask may be combined by concatenating the input imageand the binary input mask channel-wise. The generator may generate the extended image using one or more machine learning techniques, such as neural networks, linear regression, polynomial regression, logistic regression, random forests, boosting such as adaptive boosting, gradient boosting, and extreme gradient boosting, nearest neighbors, Bayesian networks, support vector machines, etc.
The discriminator obtains a first set of unextended or naturally generated images and a second set of extended images or artificially generated images, identifies visual features of each set of images to distinguish between the sets of images, and compares the visual features of a new image to the features of each set of images to determine whether the new image is extended or not.
Visual features may be identified by detecting stable regions within an image that are detectable regardless of blur, motion, distortion, orientation, illumination, scaling, and/or other changes in camera perspective. The stable regions may be extracted from the image using a scale-invariant feature transform (SIFT), speeded up robust features (SURF), fast retina keypoint (FREAK), binary robust invariant scalable keypoints (BRISK), or any other suitable computer vision techniques. In some embodiments, keypoints may be located at high-contrast regions of objects within the image, such as edges within an object. A bounding box may be formed around a keypoint and the portion of the image created by the bounding box may be a feature.
The discriminator compares the visual features using one or more machine learning techniques, such as neural networks, linear regression, polynomial regression, logistic regression, random forests, boosting such as adaptive boosting, gradient boosting, and extreme gradient boosting, nearest neighbors, Bayesian networks, support vector machines, etc. The discriminator then provides feedback to the generator indicating whether the discriminator was able to correctly identify an extended image. If the discriminator correctly identifies an extended image, the generator adjusts the generator machine learning model for generating the extended image.
The generator and discriminator may each include neural networks trying to optimize opposing loss functions. The generator tries to maximize the probability that the discriminator will determine that an artificially generated image is naturally generated, while the discriminator tries to minimize this probability. For example, the generator and discriminator may each be trained using a combination of loss functions, such as an adversarial loss function, a reconstruction loss function to minimize the difference between known pixels from the input image and pixels in the artificially generated image, and a perceptual loss function to minimize the difference between an artificially generated image and a naturally generated image.
4 FIG.A 4 FIG.A 402 406 406 404 410 412 414 414 a n a n schematically illustrates how the GAN model may be trained to generate an extended image from an input image. Some of the blocks inrepresent data structures or memory storing these data structures, registers, or state variables (e.g., blocks,-), other blocks represent hardware and/or software components (e.g., block,,), and other blocks represent output data (e.g., blocks-). Input signals are represented by arrows labeled with corresponding signal names.
404 The generative adversarial networkmay generate an extended image based on a generative machine learning model approach. Broadly defined, a generative machine learning model approach involves training a generative engine to learn the regularities and/or patterns in a set of input data, such that the engine may generate new examples of the input data. As the generative engine is trained on more input data, the engine's generated new examples may increase in similarity to the input data. Thus, a goal of a generative machine learning model approach is to enable the generation of new examples of the input data that are similar to the original input data.
404 406 406 406 406 80 a n a n To generate extended images, the generative adversarial networkreceives training data which may include naturally generated images-. The naturally generated images-may include natural images from the real world, such as scenes and objects or images generated by a user on a computing device, such as display ad images. The naturally generated images may be stored in an image database, such as the database.
404 406 406 410 410 406 406 406 406 410 410 406 406 a a n a n a n. The generative adversarial networkmay pass the naturally generated images-through the generatorto generate extended images. More specifically, the generatormay analyze visual features of the naturally generated images-to identify common features in the naturally generated images-. The generatorthen utilizes the common features when generating extended images or extended portions of input images, for example by training a generator neural network using the common features. For example, the generatormay obtain input images and combine the input images with binary input masks using the common features from the naturally generated images-
The binary input mask for a particular input image may have larger dimensions than the input image and may be filled in with zeroes at pixel locations corresponding to the pixel locations of the input image. In this manner, when the binary input mask is combined with the input image, the combined output will include the input image at the pixel locations filled in with zeroes in the binary input mask. The input image and the binary input mask may be combined by concatenating the input image and the binary input mask channel-wise.
410 410 410 410 410 410 For example, the generatormay obtain the dimensions of an input image and generate the dimensions of the binary input mask by increasing the dimensions of the input image by threshold amounts in the horizontal (x-axis) and vertical (y-axis) directions. In some implementations, the generatormay increase the dimensions of the input image by the same threshold amount (e.g., 50%) in both the x and y directions. In this manner, the binary input image will extend by half of the threshold amount to the right, to the left, above, and below the input image. In other implementations, the generatormay increase the dimensions of the input image by different threshold amounts in the x and y directions. For example, the generatormay determine the threshold amounts in accordance with the aspect ratio of the input image. More specifically, the generatormay increase the width of the input image by the product of a threshold amount (e.g., 50%) and the inverse of the aspect ratio (e.g., 9:16 when the aspect ratio is 16:9). The generatormay increase the length of the input image by the product of the threshold amount (e.g., 50%) and the aspect ratio (e.g., 16:9).
410 406 406 410 414 a n a. At pixel locations that extend beyond the input image, the generatormay populate the binary input mask in accordance with the common features from the naturally generated images-. For example, the generatormay apply a combination of the input image with the binary input mask to the generator neural network to generate an extended image
410 414 414 412 412 406 406 414 414 406 406 412 406 406 414 414 412 412 a n a n a a n a n a n The generatormay then pass the artificially generated images-to the discriminator. The discriminatormay also receive the naturally generated images-along with indications of the images that were artificially generated-and the images that were naturally generated-. The discriminatormay analyze visual features of the naturally generated images-and visual features of the artificially generated images-to generate a machine learning model (e.g., a neural network) for identifying whether an image was generated naturally or artificially. For the discriminatorto ensure that extended images are visually indistinguishable from naturally generated images, the discriminatoridentifies not only whether an entire image is sufficiently similar to naturally generated images but also whether the extended portion of the image is consistent with the input image. To ensure this consistency, known pixels from the input image are included in the extended image to minimize reconstruction loss.
412 414 414 410 412 412 414 414 410 410 414 414 406 406 640 414 414 414 414 412 414 414 410 412 410 412 a n a n a n a n a n a n a n Then in a testing phase, the discriminatoranalyzes the visual features of an artificially generated image-from the generatorwithout knowing whether or not it was naturally generated. When the discriminatordetermines that the image is artificially generated, the discriminatormay return the identified, artificially generated image-to the generatorin a feedback loop, and/or otherwise indicate to the generatorthat the artificially generated image-was not sufficiently similar to the naturally generated images-. The generatormay analyze the artificially generated image-to determine visual features of the artificially generated image-that resulted in the discriminatorflagging the artificially generated image-. Thus, the generatormay alter the generator neural network to avoid a similar flagging result from the discriminator. Both the generatorand the discriminatormay train to optimize a combination of loss functions, such as an adversarial loss function, a reconstruction loss function to minimize the difference between known pixels from the input image and pixels in the artificially generated image, and a perceptual loss function to minimize the difference between an artificially generated image and a naturally generated image.
404 406 406 410 412 410 406 406 414 414 412 414 414 406 406 a n a n a n a n a n In this manner, the generative adversarial networkmay progressively generate extended images that correspond more closely to the naturally generated images-. In some implementations, the generatorand discriminatorare trained progressively, starting with lower resolution images and gradually adding more layers to capture higher-resolution details as training progresses. For example, initially the generatormay receive naturally generated images-having low resolution and, as a result, may generate low-resolution, artificially generated images-. The discriminatormay receive the low-resolution, artificially generated images-and may compare them to the low-resolution, naturally generated images-to train the discriminator machine learning model for identifying whether an image was generated naturally or artificially.
410 406 406 414 414 412 414 414 406 406 a n a n a n a n Then the generatormay receive progressively higher resolution naturally generated images-, and as a result, may generate higher-resolution, artificially generated images-. The discriminatormay receive the progressively higher-resolution, artificially generated images-and may compare them to the progressively higher-resolution, naturally generated images-to further train the discriminator machine learning model for identifying whether an image was generated naturally or artificially
100 404 When the outputs of loss functions are within difference thresholds, or the combined output of the combined loss functions are within a combined difference threshold, the image adjustment systemmay determine that the generative adversarial networkhas been sufficiently trained and can generate extended images outside of the training/testing phase.
420 404 422 404 424 80 424 404 422 410 410 422 422 426 4 FIG.B To illustrate, and as shown in the example request scenarioof, the generative adversarial networkmay receive an input image. The generative adversarial networkmay also receive naturally generated imagesfrom the databaseor may obtain the generator neural network trained using the naturally generated images. The generative adversarial networkmay first pass the input imageto the generator. The generatormay analyze the input imageusing the generator neural network or other machine learning model to generate a binary input mask and combine the binary input mask with the input imageto generate an extended image.
410 426 412 412 426 424 412 426 426 Then the generatormay pass the extended imageto the discriminator. The discriminatormay then analyze the extended imageusing the discriminator neural network trained using the naturally generated imagesand artificially generated images. The discriminatormay attempt to determine whether the extended imagewas naturally or artificially generated, for example by applying the visual features of the extended imageto the neural network.
412 426 426 412 426 412 426 404 426 If the discriminatoranalyzes the extended imageand determines that the extended imagewas artificially generated, the discriminatormay flag the extended image, as described above. However, should the discriminatornot flag the extended image, the generative adversarial networkmay determine that the extended imageis sufficiently similar to naturally generated images.
410 412 426 422 426 412 410 426 512 410 426 410 426 412 It should be appreciated that the generatormay generate and the discriminatormay flag several renderings of the extended imagefor any particular input image. For example, an extended imagemay be flagged by the discriminator. In that instance, the generatormay receive an indication that the extended imagewas flagged by the discriminator, and the generatormay generate a subsequent rendering of the extended image. In embodiments, this may occur multiple times until the generatorgenerates a rendering of the extended imagethat is not flagged by the discriminator.
5 FIG. 5 FIG. 500 300 500 300 500 300 500 300 500 300 illustrates an example extended imagefrom an input imagewhich may be generated by the GAN model. As shown in, the extended imageincludes extended portions/additional features which were not included in the input image. For example, the extended imageincludes a larger portion of the floor in front of the woman's foot than in the input image. The extended imagealso includes a larger portion of the wall above the woman's head than in the input image. Moreover, the extended imageincludes larger portions of the area to the left and right of the woman than in the input image, including additional portions of background objects.
68 In some scenarios, an extended image generated by the GAN model may have visual artifacts. To reduce visual artifacts in an extended image in these scenarios, the image aspect ratio adjusterutilizes an augmented inference process.
6 FIG. 610 660 610 602 68 620 630 640 650 illustrates example steps-of the augmented inference process. The extended imagemay include visual artifactsresulting in a low image quality. To reduce the visual artifacts and improve the image quality, the image aspect ratio adjusterapplies multiple transformations to the input image to generate multiple transformed images including for example, a color channel swap, a horizontal flip, a vertical flip, and cropping.
68 404 68 620 68 630 68 620 650 Then the image aspect ratio adjusterapplies each transformed image to the GANto generate multiple transformed, extended images. Next, the image aspect ratio adjusterapplies reverse transformations to each transformed, extended image to generate multiple extended images. For example, for the extended image that was transformed using a color channel swap, the image aspect ratio adjusterapplies a reverse color channel swap. For the extended image that was transformed using a horizontal flip, the image aspect ratio adjusterflips the image back to its original position, etc. The images-may depict the states of the respective images after they have been reverse transformed.
68 620 650 620 650 Then the image aspect ratio adjustercombines the extended images-using a median filter to generate a combined, extended image. For example, at each pixel location in the extended images-, the median filter may identify the median pixel value as the pixel value for the combined, extended image at that pixel location. By applying multiple transformations to the input images and combining the extended images in this manner, artifacts are significantly reduced.
68 60 60 60 60 In addition to generating an extended image, the image aspect ratio adjustercrops the extended image using the aspect ratio of the display area for presenting the image. The server devicemay obtain the aspect ratio of the display area from the layout of the web page or application screen on which the image will be presented. In other implementations, the server deviceselects the aspect ratio of the display area. For example, the server devicemay identify sections of the web page or application screen which include empty space and may generate the display area in one of these sections. In another example, the server devicemay generate the web page or application screen and may generate the display area.
68 68 In any event, to crop the extended image using the aspect ratio of the display area without cutting off significant visual features from the extended image, such as text, an object, or a portion thereof, the image aspect ratio adjusteridentifies a region of interest within the extended image. The image aspect ratio adjustermay identify the region of interest using one or more machine learning techniques, such as neural networks, linear regression, polynomial regression, logistic regression, random forests, boosting such as adaptive boosting, gradient boosting, and extreme gradient boosting, nearest neighbors, Bayesian networks, support vector machines, etc.
68 68 68 68 The image aspect ratio adjustermay then generate a second machine learning model for identifying ROIs within extended images. The image aspect ratio adjustermay train the second machine learning model using a set of training images including first portions of the images classified as within an ROI and second portions of the images classified as not being within an ROI. The image aspect ratio adjustermay analyze the first and second portions to identify visual features of each portion and generates the machine learning model based on the visual features in each portion. Then the image aspect ratio adjustermay apply visual features of an extended image to the second machine learning model to identify the ROI within the extended image.
7 FIG. 1 FIG. 7 FIG. 68 702 704 712 720 706 schematically illustrates how the image aspect ratio adjusterofidentifies the ROI for an extended image in an example scenario. Some of the blocks inrepresent hardware and/or software components (e.g., block), other blocks represent data structures or memory storing these data structures, registers, or state variables (e.g., blocks,,), and other blocks represent output data (e.g., block). Input signals are represented by arrows labeled with corresponding signal names.
702 68 720 720 702 722 722 724 724 726 726 728 728 7 FIG. The machine learning engineofmay be included within the image aspect ratio adjusterto generate the ROI machine learning model. To generate the ROI machine learning model, the machine learning enginereceives training data including an indication of a first imagehaving a first ROI and a first set of visual features including a first subset of visual features corresponding to the ROI and a second subset of visual features corresponding to the remaining portions of the first image. The training data also includes an indication of a second imagehaving a second ROI and a second set of visual features including a first subset of visual features corresponding to the ROI and a second subset of visual features corresponding to the remaining portions of the second image. Furthermore, the training data includes an indication of a third magehaving a third ROI and a third set of visual features including a first subset of visual features corresponding to the ROI and a second subset of visual features corresponding to the remaining portions of the third image. Still further, the training data includes an indication of an nth imagehaving an nth ROI and an nth set of visual features including a first subset of visual features corresponding to the ROI and a second subset of visual features corresponding to the remaining portions of the nth image.
722 728 While the example training data includes indications of four images-, this is merely an example for ease of illustration only. The training data may include any number of images assigned ROIs by any number of users.
702 720 720 The machine learning enginethen analyzes the training data to generate an ROI machine learning modelfor identifying an ROI in an image. While the ROI machine learning modelis illustrated as a linear regression model, the ROI machine learning model may be another type of regression model such as a logistic regression model, a decision tree, several decision trees, a neural network, a hyperplane, or any other suitable machine learning model.
704 68 704 68 720 706 In any event, in response to receiving an extended image, the image aspect ratio adjusteridentifies visual features of the extended image. The image aspect ratio adjusterthen applies the visual features to the ROI machine learning modelto identify the ROIin the extended image.
706 68 68 68 68 In response to identifying the ROI, the image aspect ratio adjusterselects a portion of the extended image for cropping the extended image around the identified ROI using the aspect ratio of the display area. For example, the image aspect ratio adjustermay generate a first box around the ROT. The image aspect ratio adjustermay also generate a second box having the same aspect ratio as the display area for cropping the extended image. The image aspect ratio adjustermay adjust the position of the second box so that the first box fits within the second box. In some implementations, the second box must fit within the boundaries of the extended image.
68 68 68 68 68 68 68 If the first box which indicates the ROI fits within the second box which has the same aspect ratio as the display area, the image aspect ratio adjustermay automatically crop the extended image using the second box. If the first box does not fit within the second box, the image aspect ratio adjustermay continue moving the position of the second box until the first box fits within the second box. If the image aspect ratio adjusterhas moved the position of the second box to each possible position within the boundaries of the extended image, the image aspect ratio adjustermay adjust the scale of the second box without changing the aspect ratio. For example, the image aspect ratio adjustermay increase the size of the second box while maintaining the same width to length ratio. The image aspect ratio adjustermay continue adjusting the position and/or size of the second box until the first box fits within the second box. Then the image aspect ratio adjustermay automatically crop the extended image using the second box.
8 FIG. 8 FIG. 800 802 804 800 802 804 804 800 68 804 300 300 illustrates an example extended imagecropped using a selected aspect ratio (e.g., 1:1) around an identified region of interest. The first boxindicates the identified region of interest and the second boxindicates the area in which to crop the extended image. As shown in, the first boxfits within the second box, and the second boxfits within the boundaries of the extended image. Accordingly, the image aspect ratio adjustermay automatically crop the extended image using the second box. In this manner, the input imagemay be transformed from having a first aspect ratio of 3:4 to a second aspect ratio of 1:1 which matches the aspect ratio of the display area for presenting the input image.
9 FIG. 900 902 908 900 34 60 902 902 900 904 904 902 902 904 902 904 902 904 illustrates an example input imageat each stage-of the image adjustment process. At first, the input imageis provided from the context providerto the server deviceas an original image. The original imagehas an aspect ratio of 4:3. Then the input imageis extended, for example using a GAN to generate the extended image. The extended imageincludes additional features which were not included in the original image. For example, partial bubbles in the original imagebecome full bubbles in the extended image, a partial stack of books in the original imagebecomes a full stack of books in the extended image, and partial cartoons of kids in the original imagebecome more completed versions of the kids in the extended image.
60 906 904 900 60 904 60 904 908 The server devicethen selects an aspect ratio(e.g., 1:1) for cropping the extended imagearound an identified region of interest based on the aspect ratio for a display area for presenting the image. The server devicemay generate a rectangular box with dimensions matching the selected aspect ratio and may place the rectangular box over the extended image, such that the identified region of interest fits within the rectangular box. Then the server devicecrops the extended imageusing the rectangular box to generate the output image. The output image has an aspect ratio of 1:1.
10 FIG. 1010 1020 1004 1002 1002 1004 1002 1024 1022 1022 1024 illustrates an example comparison of input imagesto adjusted output imageswithin respective display areas. For example, an input imagehas a different aspect ratio than the display areain which it is presented. Accordingly, there is white space in the display areabetween the boundaries of the input imageand the boundaries of the display area. On the other hand, an output image, generated by extending the input image using a GAN and automatically cropping the input image around an identified ROI, has the same aspect ratio as the display areain which it is presented. Accordingly, there is no white space in the display area, and the output imageis not distorted or missing a part of the region of interest.
11 FIG. 1100 60 68 illustrates a flow diagram of an example methodfor adjusting an aspect ratio of an image. The method can be implemented in a set of instructions stored on a computer-readable memory and executable at one or more processors of the server device. For example, the method can be implemented by the image aspect ratio adjuster.
1102 60 200 2 FIG. At block, the server devicereceives an input image have a first aspect ratio, for example from a content provider. The input image may be image content such as an advertisement, photograph, etc. for presenting within a display area of a display, such as a web page or application screen similar to the web page or application screenas shown in.
1104 60 60 60 60 60 At block, the server deviceobtains a second aspect ratio for the display area which is different from the first aspect ratio. For example, the server devicemay obtain the second aspect ratio for the display area from the layout of the web page or application screen on which the input image will be presented. In other implementations, the server deviceselects the second aspect ratio of the display area. For example, the server devicemay identify sections of the web page or application screen which include empty space and may generate the display area in one of these sections. In another example, the server devicemay generate the web page or application screen and may generate the display area.
1106 60 60 60 At block, the server deviceextends the input image to generate an extended image that includes extended portions/additional features which were not included in the input image without blurring or color padding the image. In some implementations, the server devicemay extend the input image by generating an extended image having dimensions that exceed the dimensions of the input image by the same threshold amount (e.g., 50%) in both the x and y directions. In this manner, the extended image will extend by half of the threshold amount to the right, to the left, above, and below the input image. In other implementations, the dimensions of the extended image may exceed the dimensions of the input image by different threshold amounts in the x and y directions. For example, the server devicemay determine the threshold amounts in accordance with the aspect ratio of the input image. More specifically, the width of the extended image may exceed the width of the input image by the product of a threshold amount (e.g., 50%) and the inverse of the aspect ratio (e.g., 9:16 when the aspect ratio is 16:9). The length of the extended image may exceed the length of the input image by the product of the threshold amount (e.g., 50%) and the aspect ratio (e.g., 16:9).
60 60 The server devicemay train a GAN to generate artificially generated images to minimize the perceptible difference between the artificially generated images and naturally generated images, such that they are visually indistinguishable from naturally generated images. The server devicemay then apply the input image and a binary input mask having the increased dimensions to the GAN to generate the extended image.
1108 60 60 60 At block, the server deviceautomatically crops the extended image around an identified ROI using the second aspect ratio, so that the aspect ratio of the cropped image matches the aspect ratio of the display area. More specifically, the server devicemay train an ROI machine learning model for identifying ROIs in images. Then the server devicemay apply the extended image to the ROI machine learning model to identify the ROI in the extended image.
68 60 60 60 60 In response to identifying the ROI, the image aspect ratio adjusterselects a portion of the extended image for cropping the extended image around the identified ROI using the aspect ratio of the display area. For example, the server devicemay generate a first box around the ROI. The server devicemay also generate a second box having the same aspect ratio as the display area for cropping the extended image. The server devicemay adjust the position of the second box so that the first box fits within the second box. In some implementations, the second box must fit within the boundaries of the extended image. If the first box which indicates the ROI fits within the second box which has the same aspect ratio as the display area, the server devicemay automatically crop the extended image using the second box.
60 In other implementations, the server devicedoes not automatically crop the extended image using the second aspect ratio, and instead extends the input image to generate an extended image having the second aspect ratio without needing to crop the extended image.
1110 60 10 10 122 Then at block, the server deviceprovides the cropped image or at least a portion of the extended image having the second aspect ratio to a client devicefor presentation within the display area of the display. The client devicemay present the cropped image via a browser applicationor another suitable application for presenting the display.
The following additional considerations apply to the foregoing discussion. Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter of the present disclosure.
Additionally, certain embodiments are described herein as including logic or a number of components, modules, or mechanisms. Modules may constitute either software modules (e.g., code stored on a machine-readable medium) or hardware modules. A hardware module is tangible unit capable of performing certain operations and may be configured or arranged in a certain manner. In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more hardware modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein.
In various embodiments, a hardware module may be implemented mechanically or electronically. For example, a hardware module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-purpose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
Accordingly, the term hardware should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein. As used herein “hardware-implemented module” refers to a hardware module. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where the hardware modules comprise a general-purpose processor configured using software, the general-purpose processor may be configured as respective different hardware modules at different times. Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
Hardware modules can provide information to, and receive information from, other hardware. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple of such hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
1100 1100 1100 1100 60 10 1100 The methodmay include one or more function blocks, modules, individual functions or routines in the form of tangible computer-executable instructions that are stored in a computer-readable storage medium and executed using a processor of a computing device (e.g., a server device, a personal computer, a smart phone, a tablet computer, a smart watch, a mobile computing device, or other client computing device, as described herein). The computer-readable storage medium may be non-transitory. The methodmay be included as part of any backend server (e.g., a map data server, a navigation server, or any other type of server computing device, as described herein), client computing device modules of the example environment, for example, or as part of a module that is external to such an environment. Though the figures may be described with reference to the other figures for ease of explanation, the methodcan be utilized with other objects and user interfaces. Furthermore, although the explanation above describes steps of the methodbeing performed by specific devices (such as a server deviceor client device), this is done for illustration purposes only. The blocks of the methodmay be performed by one or more devices or other parts of the environment.
The various operations of example methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions. The modules referred to herein may, in some example embodiments, comprise processor-implemented modules.
Similarly, the methods or routines described herein may be at least partially processor-implemented. For example, at least some of the operations of a method may be performed by one or more processors or processor-implemented hardware modules. The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment or as a server farm), while in other embodiments the processors may be distributed across a number of locations.
The one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as an SaaS. For example, as indicated above, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., APIs).
Still further, the figures depict some embodiments of the example environment for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.
Upon reading this disclosure, those of skill in the art will appreciate still additional alternative structural and functional designs for adjusting the aspect ratio of an image through the disclosed principles herein. Thus, while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
September 22, 2025
January 15, 2026
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.