Legal claims defining the scope of protection, as filed with the USPTO.
1. A database data compression method, comprising: parsing, by a processor of a data storage device, a first data block to obtain m pieces of first data, wherein the first data is row data or column data, the first data block is obtained based on obtained database data, and m is an integer greater than 1; classifying, by a machine learning classifier executed by the processor of the data storage device, the m pieces of first data to obtain n classification categories, wherein each classification category comprises at least one piece of first data, n is an integer greater than or equal to 1, and any two or more pieces of first data classified into a same classification category are predicted to have an association that indicates similar data content; transforming, by the processor of the data storage device, one or more pieces of at least one piece of first data in a first classification category to obtain target data comprising concatenating a plurality of pieces of the at least one piece of first data in the first classification category to obtain concatenated data and using the concatenated data as the target data, wherein the first classification category is one of the n classification categories; compressing, by the processor of the data storage device, the target data to obtain compressed data; and storing, by the processor of the data storage device, the compressed data in a storage medium of the data storage device.
2. The method according to claim 1, wherein the transforming one or more pieces of at least one piece of first data in a first classification category to obtain target data comprises: selecting, as reference data, at least one piece of first data from a plurality of pieces of the at least one piece of first data in the first classification category; predicting at least one piece of non-reference data in the first classification category based on the reference data, to obtain at least one piece of predicted data, wherein the at least one piece of non-reference data is at least one piece of first data that is in the first classification category and that is not the reference data; and obtaining the target data based on the at least one piece of non-reference data and the at least one piece of predicted data.
3. The method according to claim 2, wherein the predicting at least one piece of non-reference data in the first classification category based on the reference data, to obtain at least one piece of predicted data comprises: predicting an association relationship between the reference data and the at least one piece of non-reference data in the first classification category; and predicting the at least one piece of non-reference data based on the reference data and the association relationship, to obtain the at least one piece of predicted data.
4. The method according to claim 3, wherein the predicting an association relationship between the reference data and the at least one piece of non-reference data in the first classification category comprises: predicting the association relationship between the reference data and the at least one piece of non-reference data in the first classification category by using a linear regression equation.
5. The method according to claim 2, wherein the obtaining the target data based on the at least one piece of non-reference data and the at least one piece of predicted data comprises: determining at least one piece of transformation data based on a difference between the at least one piece of non-reference data and the at least one piece of predicted data; and determining the target data based on the at least one piece of transformation data.
6. The method according to claim 2, wherein the obtaining the target data based on the at least one piece of non-reference data and the at least one piece of predicted data comprises: determining at least one piece of transformation data based on a difference between the at least one piece of non-reference data and the at least one piece of predicted data; performing a redundant data cleanup operation on the at least one piece of transformation data to obtain at least one piece of cleaned transformation data; and determining the target data based on the at least one piece of cleaned transformation data.
7. The method according to claim 1, wherein the classifying the m pieces of first data to obtain n classification categories comprises: selecting a preset quantity of characters from each of the m pieces of first data to construct m eigenvectors, wherein one eigenvector corresponds to one piece of first data; performing a clustering operation on the m eigenvectors according to a clustering algorithm, to obtain n clusters; and obtaining the n classification categories based on the n clusters.
8. The method according to claim 1, wherein the classifying the m pieces of first data to obtain n classification categories comprises: inputting the m pieces of first data into a neural network, and outputting a classification category of each of the m pieces of first data, to obtain the n classification categories, wherein the neural network is trained in advance by using a training set.
9. The method according to claim 1, wherein that the first data block is obtained based on obtained database data comprises: using a subset of data in the database data as the first data block, or using the database data as the first data block.
10. A storage device, comprising: a memory; and at least one processor coupled with the memory, the at least one processor configured to: parse a first data block to obtain m pieces of first data, wherein the first data is row data or column data, the first data block is obtained based on obtained database data, and m is an integer greater than 1; classify, by a machine learning classifier executed by the at least one processor, the m pieces of first data to obtain n classification categories, wherein each classification category comprises at least one piece of first data, n is an integer greater than or equal to 1, and any two or more pieces of first data classified into a same classification category are predicted to have an association that indicates similar data content; transform one or more pieces of at least one piece of first data in a first classification category to obtain target data comprising the at least one processor to concatenate a plurality of pieces of the at least one piece of first data in the first classification category to obtain concatenated data and use the concatenated data as the target data, wherein the first classification category is one of the n classification categories; compress the target data to obtain compressed data; and store the compressed data in the memory.
11. The device according to claim 10, wherein the at least one processor configured to transform one or more pieces of at least one piece of first data in a first classification category to obtain target data, further comprises the at least one processor configured to: select, as reference data, at least one piece of first data from a plurality of pieces of the at least one piece of first data in the first classification category; predict at least one piece of non-reference data in the first classification category based on the reference data, to obtain at least one piece of predicted data, wherein the at least one piece of non-reference data is at least one piece of first data that is in the first classification category and that is not the reference data; and obtain the target data based on the at least one piece of non-reference data and the at least one piece of predicted data.
12. The device according to claim 11, wherein the at least one processor configured to predict at least one piece of non-reference data in the first classification category based on the reference data, to obtain at least one piece of predicted data, further comprises the at least one processor configured to: predict an association relationship between the reference data and the at least one piece of non-reference data in the first classification category; and predict the at least one piece of non-reference data based on the reference data and the association relationship, to obtain the at least one piece of predicted data.
13. The device according to claim 12, wherein the at least one processor configured to predict an association relationship between the reference data and the at least one piece of non-reference data in the first classification category, further comprises the at least one processor configured to: predict the association relationship between the reference data and the at least one piece of non-reference data in the first classification category by using a linear regression equation.
14. The device according to claim 11, wherein the at least one processor configured to obtain the target data based on the at least one piece of non-reference data and the at least one piece of predicted data, further comprises the at least one processor configured to: determine at least one piece of transformation data based on a difference between the at least one piece of non-reference data and the at least one piece of predicted data; and determine the target data based on the at least one piece of transformation data.
15. The device according to claim 11, wherein the at least one processor configured to obtain the target data based on the at least one piece of non-reference data and the at least one piece of predicted data, further comprises the at least one processor configured to: determine at least one piece of transformation data based on a difference between the at least one piece of non-reference data and the at least one piece of predicted data; perform a redundant data cleanup operation on the at least one piece of transformation data to obtain at least one piece of cleaned transformation data; and determine the target data based on the at least one piece of cleaned transformation data.
16. The device according to claim 10, wherein the at least one processor configured to classify the m pieces of first data to obtain n classification categories, further comprises the at least one processor configured to: select a preset quantity of characters from each of the m pieces of first data to construct m eigenvectors, wherein one eigenvector corresponds to one piece of first data; perform a clustering operation on the m eigenvectors according to a clustering algorithm, to obtain n clusters; and obtain the n classification categories based on the n clusters.
17. The device according to claim 10, wherein the at least one processor configured to classify the m pieces of first data to obtain n classification categories, further comprises the at least one processor configured to: input the m pieces of first data into a neural network, and output a classification category of each of the m pieces of first data, to obtain the n classification categories, wherein the neural network is trained in advance by using a training set.
18. The device according to claim 10, wherein that the first data block is obtained based on obtained database data by the at least one processor, further comprises the at least one processor configured to: use a subset of data in the database data as the first data block, or use the database data as the first data block.
Unknown
August 19, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.