Patentable/Patents/US-20250308118-A1

US-20250308118-A1

Method, Apparatus, Electronic Device and Storage Medium for Generating a Text Video

PublishedOctober 2, 2025

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

The embodiments of the disclosure provide method, apparatus, electronic device and storage medium for generating a text video, by displaying a text editing page including a text input area; in response to a first input instruction for the text editing page, displaying a target text in the text input area, wherein the target text has a first font state in the text input area, the first font state characterizes a font size and/or a row spacing of the target text, and the first font state is determined by a length of the target text; generating a target video for presenting the target text in the text input area.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

. A method of generating a text video, characterized by comprising:

. The method of, characterized in that in response to a first input instruction for the text editing page, displaying a target text in the text input area comprises:

. The method of, characterized in that the area size comprises a lateral size of area and a longitudinal size of area; and determining the first font state according to the total number of characters and an area size of the text input area comprises:

. The method of, characterized in that determining the first font state according to the first longitudinal size and the longitudinal size of area comprises:

. The method of, characterized in that generating a target video for presenting the target text in the text input area comprises:

. The method of, characterized in that before generating a target video for presenting the target text in the text input area, the method further comprises:

. The method of, characterized in that the target text further has a second font state characterizing a font color of the target text;

. The method of, characterized in that the method further comprises:

. An electronic device, characterized by comprising: a processor, and a memory communicatively connected to the processor;

. The electronic device of, characterized in that in response to a first input instruction for the text editing page, displaying a target text in the text input area comprises:

. The electronic device of, characterized in that the area size comprises a lateral size of area and a longitudinal size of area; and determining the first font state according to the total number of characters and an area size of the text input area comprises:

. The electronic device of, characterized in that determining the first font state according to the first longitudinal size and the longitudinal size of area comprises:

. The electronic device of, characterized in that generating a target video for presenting the target text in the text input area comprises:

. The electronic device of, characterized in that before generating a target video for presenting the target text in the text input area, the acts further comprises:

. The electronic device of, characterized in that the target text further has a second font state characterizing a font color of the target text;

. The electronic device of, characterized in that the acts further comprises:

. A non-transitory computer-readable storage medium, characterized in that the non-transitory computer-readable storage medium has computer-executable instructions stored thereon, the computer-executable instructions, when executed by a processor, implementing acts comprising:

. The non-transitory computer-readable storage medium of, characterized in that in response to a first input instruction for the text editing page, displaying a target text in the text input area comprises:

. The non-transitory computer-readable storage medium of, characterized in that the area size comprises a lateral size of area and a longitudinal size of area; and determining the first font state according to the total number of characters and an area size of the text input area comprises:

. The non-transitory computer-readable storage medium of, characterized in that determining the first font state according to the first longitudinal size and the longitudinal size of area comprises:

Detailed Description

Complete technical specification and implementation details from the patent document.

This application is a continuation of International Application No. PCT/SG2023/050805, filed on Dec. 5, 2023, which claims the benefit of CN patent application No. 202211567282.4 filed on Dec. 7, 2022, both of which are incorporated herein by reference in their entireties.

The embodiments of the present disclosure relate to the technology field of Internet, and in particular to method, apparatus, electronic device and storage medium for generating a text video.

Currently, short video platforms are favored by more and more users by virtue of their rich and diverse content. The client of the short video platform is provided with a contribution entrance which is open to ordinary users, creator users can shoot and upload videos for contribution, and then the server of the short video platform pushes the content uploaded by the creator user to viewing users for consumption.

In the prior art, short video platforms can usually only receive video works made by a user, and cannot edit and generate text videos, leading to that users must manually turn a pure text work into a video before it can be uploaded to the short video platform, reducing the video generation efficiency and quality in the video creation process.

The embodiments of the present disclosure provide method, apparatus, electronic device and storage medium for generating a text video to overcome the problem that videos cannot be generated in the form of pure text.

According to a first aspect, the embodiments of the present disclosure provide a method of generating a text video, comprising:

displaying a text editing page comprising a text input area; in response to a first input instruction for the text editing page, displaying a target text in the text input area, wherein the target text has a first font state in the text input area, the first font state characterizes a font size and/or a row spacing of the target text, and the first font state is determined by a length of the target text; generating a target video for presenting the target text in the text input area.

According to a second aspect, the embodiments of the present disclosure provide an apparatus for generating a text video, comprising:

According to a third aspect, the embodiments of the present disclosure provide an electronic device, comprising:

According to a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium having computer-executable instructions stored thereon, the computer-executable instructions, when executed by a processor, implementing the method of generating a text video according to the first aspect and the possible designs of the first aspect.

According to a fifth aspect, embodiments of the present disclosure provide a computer program product, comprising a computer program, the computer program, when executed by a processor, implements the method of generating a text video according to the first aspect and various possible designs of the first aspect.

The method, apparatus, electronic device and storage medium for generating a text video provided in present embodiments achieve the following by displaying a text editing page comprising a text input area; in response to a first input instruction for the text editing page, displaying a target text in the text input area, wherein the target text has a first font state in the text input area, the first font state characterizes a font size and/or a row spacing of the target text, and the first font state is determined by a length of the target text; generating a target video for presenting the target text in the text input area. By setting the text input area and enabling the font size and/or row spacing of the target text edited in the text input area to dynamically change with the text length, and then converting the target text in the text input area, the generated target video can clearly and comprehensively display all the content of the target text, achieving the purpose of generating a text video based on pure text and improving the video generation efficiency and video quality in the video creation process.

In order to make the objectives, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, obviously, the described embodiments are part of the embodiments of the present disclosure, rather than all of the embodiments. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative efforts fall within the protection scope of the present disclosure.

It should be noted that the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) involved in this application are all information and data that are authorized by users or are sufficiently authorized by the parties, and collection, use and processing of the related data need to comply with relevant laws and regulations and standards of related countries and regions, and provide corresponding operation entrances for users to choose to authorize or refuse.

The following explains the application scenario of the embodiments of the present disclosure:

is an application scenario diagram of the method of generating a text video provided in embodiments of the present disclosure, and the method of generating a text video provided in the embodiments of the present disclosure can be applied to application scenarios of video editing and uploading on the client side of a short video platform. Specifically, as shown in, the method provided in the embodiments of the present disclosure may be applied to a terminal device, for example, a smart phone. The client of the short video platform runs in the terminal device, a text editing page is set in the client, the terminal device triggers the corresponding components to enter the text editing page in response to user operations, a text input area is set in the file editing page, and users edit characters in the text input area through an input method. Specifically, as shown in the figure, for example, after the user clicks on the text input area, the text input area enters the editing state and displays the text input cursor, and the input method interface pops up in the text editing page, and the user inputs text in the text input area by operating the input method interface to generate the target text (“X” in the figure represents the character). After the target text input is completed, the input method interface is hidden, and the complete text input area is displayed in the text editing page. Then, the “Complete” button is clicked, the target text in the text input area is rendered, the video material is generated, and the video material is uploaded to the server of the short video platform (shown as the platform server in the figure), thereby completing the process of generating the video work from the pure text work and uploading the video work to the short video platform. On the other hand, clicking the “Back” button may exit the text editing page, which will not be repeated here.

In the prior art, a client side of a short video platform usually can only receive a video work made by a user, and after necessary steps such as transcoding, compression and the like, upload the video work to a video pool of a server, and then the server pushes videos to different users based on the video in the video pool. However, for videos corresponding to the pure text content, the client of the short video platform usually does not provide a function page for editing and creation, since the text length is determined by the text content edited by the users during the editing process of the pure text, therefore, when the text is displayed with a fixed font size and row spacing, the problem that the font and the row spacing are too small or too large will occur, for example, taking the font as an example, when the target text edited by the user contains 5 Chinese characters, a fixed size three font (for example) is used for display, and the font size is appropriates; when the target text edited by the user contains 500 Chinese characters, the fixed size three font character is still used for display, so that the font is too large, and all the content in the target text cannot be displayed on the same screen, and the text video is a video that statically displays the text (that is, the video always displays the same frame of content), resulting in the problem of content loss of the text video generated by rendering based on the target text; conversely, in the above example situation, if a size five font (for example) is used for display, when the target text edited by the user comprises 5 Chinese characters, the problem of too small a font may occur, and then the font in the text video generated by rendering based on the target text is too small, which affects the video display effect. Due to the inability to automatically adapt the size of fonts and row spacings, the generated text video cannot display normal text content, and it is difficult to achieve the video display purpose of text videos. Therefore, in the prior art, various short video platforms can usually only receive text videos made by users, and cannot automatically generate text videos with good display effects.

The embodiments of the present disclosure provide a method of generating a text video to solve the above problems.

Referring to,is a flow chartof the method of generating a text video provided in the embodiments of the present disclosure. The method of the embodiments may be applied in a terminal device, the method of generating a text video includes:

Step S: displaying a text editing page comprising a text input area.

For example, referring to the application scenario diagram shown in, in the client of the short video platform (hereinafter referred to as the client), the text editing page may be started by responding to a trigger operation, and the specific style may be seen in, specifically, the function component for triggering the text editing page may be set on the video playing page of the client (that is, the default page of the short video application); it may also be set on the video shooting page used for uploading video works, that is, the viewfinder page. After that, after receiving the user's trigger operation on this function component, the text editing page is displayed.

Furthermore, a text input area is set in the text editing page. For example, after the terminal device receives the user's click operation on the text input area, the text input area is activated and obtains the input focus. After that, the user may input character information such as words and symbols in this text input area through an input method. The position and size of the text input area are set according to specific needs and are not limited here.

Step S: in response to a first input instruction for the text editing page, displaying a target text in the text input area, wherein the target text has a first font state in the text input area, the first font state characterizes a font size and/or a row spacing of the target text, and the first font state is determined by a length of the target text.

For example, after the text input area obtains the input focus, the terminal device receives the first input instruction input by the user. The first input instruction is used to generate character information of the target text, the first input instruction may include identifiers corresponding to specific characters and symbols, or identifiers of spelling elements used to form characters, such as Chinese pinyin, letters, etc., which will not be enumerated one by one here. After obtaining the first input instruction, the terminal device converts the first input instruction into corresponding characters and symbols through the input method in the system and displays them in the above text input area to form the target text. The target text is the text to be finally released.

The target text may include one or more characters and symbols. It should be noted that the symbols input and displayed in the text input area include visible symbols and invisible symbols. Visible symbols include commas, periods and other symbols used for text editing, and so on. Invisible symbols include spaces, empty rows and row breaks, etc. The target text composed of characters and symbols has a paragraph structure, for example, the target text is divided into several paragraphs, and the target text contains empty rows, etc., so that the target text in the text input area has better readability visually.

Furthermore, the target text displayed in the text input area has a first font state, wherein the first font state characterizes the font size and/or row spacing of the target text, and the first font state is determined by the length of the target text. Specifically, when the length of the target text is larger, the font size of the characters and symbols in the target text is smaller, and/or the row spacing between rows in the target text is smaller; conversely, when the length of the target text is smaller, the font size of the characters and symbols in the target text is larger, and/or the row spacing between rows in the target text is larger. In short, when the length of the target text is relatively large, the terminal device compresses the target text in the text input area, so that the text input area may carry more characters and symbols; and when the length of the target text is relatively small, the font is enlarged, so that the content of the target text is more prominent and the visual display effect is better. Of course, it can be understood that there is a certain allowable range for adjusting the first font state (that is, the font size and/or row spacing of the target text). There is a predetermined nonlinear mapping relationship between the first font state and the length of the target text. The specific implementation method will be introduced in subsequent embodiments and will not be described in detail here.

Furthermore, based on the introduction of the implementation method of the target text above, the length of the target text may be determined by the number of characters in the target text, or it may be determined by the sum of the number of characters and symbols in the target text, it may also be determined according to the overall occupied length of the target text in the text input area. For example, when the target text includes an empty row character, one empty row character occupies one empty row in the text input area. Therefore, when the target text contains empty rows, using the overall occupied length of the target text in the text input area to determine the length of the target text may more accurately measure the actual length of the target text, thereby improving the display effect of the finally generated target video.

is a schematic diagram of displaying the target text in the text input area provided in the embodiments of the present disclosure. As shown in, in this embodiment, the total number of characters in the target text is taken as the length of the target text. Based on the first input instruction, the terminal device continuously displays the corresponding target text in the text input area. When the length N of the target text is equal to 10 (at the first moment), the first font state of the target text is Info_1, characterizing that the font size of each character and symbol in the target text is #4. As the first input instruction is continuously input, the length of the target text continues to increase. When the length N of the target text is equal to 50 (at the second moment), the first font state of the target text is Info_2, characterizing that the font size of each character and symbol in the target text is #5 (one level smaller than #4). Thus, the purpose of dynamically displaying the font size of the target text in the text input area is achieved.

Furthermore, on the basis of the above embodiments, the first font state also includes information characterizing the row spacing. Through the first font state, the row spacing of the target text may be further adjusted. For example, when the length N of the target text is equal to 10, the first font state of the target text is Info_1, characterizing that the font size of each character and symbol in the target text is #4 and the row spacing is 1. When the length N of the target text is equal to 50, the first font state of the target text is Info_2, characterizing that the font size of each character and symbol in the target text is #5 and the row spacing is 0.8.

Or, in another possible implementation, the row spacing of the target text may also be determined separately based on the length of the target text. For example, referring to, when the length N of the target text is equal to 10, the first font state of the target text is Info_1, characterizing that the row spacing of the target text is 1. When the length N of the target text is equal 50, the first font state of the target text is Info_2, characterizing that the row spacing of the target text is 0.8.

Step S: generating a target video for presenting the target text in the text input area.

For example, the target text in the text input area has a corresponding first font state. After that, the text input area is rendered to generate a video with a predetermined duration, that is, the target video. Among them, the target video is equivalent to the restored display of the target text in the text input area. Therefore, the target video may not only display the text content of the target text, but also restore the first font state of the target text, that is, the size of the characters in the target text and/or the row spacing, so that the target text displayed in the target video is visually consistent with the target text displayed in the text input area. Furthermore, when the user edits the text content in the text input area, they may have an accurate expectation of the display effect of the finally generated target video (that is, the text video), thereby improving the video quality of the finally generated target video. Avoid problems such as the text in the generated text video being too small to read; or the text in the text video being too large to be fully displayed.

In a possible implementation, as shown in, the specific implementation of step Sincludes:

Step S: generating, based on the target text, a rendered image comprising the target text having the first font state.

Step S: determining a video duration according to a length of the target text.

Step S: generating the target video according to the video duration and the rendered image.

For example, first, after the input of the target text in the text input area is completed, based on the target text in the text input area, the target text with the first font state in the text input area is converted into an image, that is, a rendered image. There are many specific implementations for converting text into images. For example, a rendered image is generated by taking a screenshot of the text input area; or, taking the target text and the corresponding first font state as input parameters and inputting them into an image converter to generate a corresponding rendered image. The specific implementation steps of converting characters into pictures are prior arts known to those skilled in the art and will not be repeated here.

Furthermore, determine the corresponding video duration according to the length of the target text, for example, the number of characters in the target text. Since the finally generated target video needs to match the user's reading time when displaying the target text, when the length of the target text is longer, the time required for the user to read the target text in the target video is longer. Therefore, setting a video duration that matches the length of the target text may improve the display effect of the target video. After that, taking the rendered image as the material and the video duration as the parameter, perform video conversion to generate a static video, that is, the target video.

In this embodiment, by displaying a text editing page which includes a text input area; in response to a first input instruction for the text editing page, displaying a target text in the text input area, wherein the target text has a first font state within the text input area, the first font state characterizes the font size and/or row spacing of the target text, the first font state is determined by the length of the target text; generating a target video for displaying the target text in the text input area. By setting a text input area and enabling the font size and/or row spacing of the target text edited in the text input area to dynamically change with the text length, and then converting the target text in the text input area, the generated target video may clearly and comprehensively display all the content of the target text, achieving the purpose of generating a text video based on pure text, improving the video generation efficiency in the video creation process, and increasing the diversity of video content in the video platform.

is a flow chartof the method of generating a text video provided in the embodiments of the present disclosure. This embodiment further refines step Son the basis of the embodiments shown inand adds steps to configure background pictures and background music for the target video. The method of generating a text video includes:

Step S: displaying a text editing page, comprising a text input area.

Step S: in response to the first input instruction, generating the target text and obtain the total number of characters of the target text.

Step S: according to the total number of characters and the area size of the text input area, determining the first font state.

For example, after displaying the editing page and responding to the first input instruction for the editing page, according to the information in the first input instruction, corresponding characters and symbols may be generated, and then the target text is generated. The specific implementation steps have been introduced in detail in the embodiments shown inand will not be repeated here. After that, by calling a statistical function to process the string corresponding to the target text, the total number of characters of the target text may be obtained. In this embodiment, the total number of characters may be the number of only characters in the target text, or the sum of the number of characters and the number of symbols.

For example, the text input area of the text editing page has an area size. Taking the text input area as a rectangle as an example, the area size of the text input area may be the length and width of the text input area. Furthermore, the area size of the text input area may represent the area of the text input area. The larger the area of the text input area, the greater the total number of characters that may be displayed. The area size of the text input area is a predetermined fixed value and may be determined according to the screen pixel size of the terminal device, which will not be repeated here.

Further, after determining the total number of characters and the area size of the text input area, first calculate the total area of the text input area based on the area size of the text input area. After that, by using the ratio of the total area to the total number of characters, the unit area occupied by the font size (and symbols) may be obtained. Then, according to the unit area and the predetermined correction coefficient, the font size and/or the corresponding row spacing, that is, the first font state, may be determined. The above implementation does not take into account the influence of blank areas caused by row breaks and empty rows, so it is suitable for rough calculation of the first font state.

In another possible implementation, the area size includes a lateral size of area and a longitudinal size of area. As shown in, the specific implementation of step Sincludes:

Step S: determining a number of characters in a single row according to a font width corresponding to a reference font size and the lateral size of area, the number of characters in a single row characterizing a number of characters that can be displayed in one row of the text input area.

Step S: determining a first longitudinal size according to the number of characters in a single row and the total number of characters.

Step S: determining the first font state according to the first longitudinal size and

Patent Metadata

Filing Date

Unknown

Publication Date

October 2, 2025

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search