11360940

Method and Apparatus for Processing Fastq files compressing lossless compression and decompression

PublishedJune 14, 2022
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
15 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A biological sequence data processing method, wherein the method comprises: obtaining characteristic information of each base in a biological sequence fastq file, wherein the characteristic information comprises at least a sequence location and a base type of each base in the biological sequence fastq file; selecting a target base from the bases in the biological sequence fastq file according to a preset rule and the characteristic information of each base; generating a base patch file by using characteristic information of the target base; performing lossless compression on the biological sequence fastq file to obtain a compressed fastq file; performing lossless compression on the base patch file to obtain a compressed patch file; separately decompressing the compressed patch file and the compressed fastq file; determining whether characteristic information of the target base in the decompressed compressed patch file is consistent with characteristic information of the target base in the decompressed compressed fastq file; and in response to determining that the characteristic information of the target base in the decompressed compressed patch file is inconsistent with the characteristic information of the target base in the decompressed compressed fastq file, modifying the characteristic information of the target base in the decompressed compressed fastq file to be the same as the characteristic information of the target base in the decompressed compressed patch file, wherein the characteristic information of the target base in the decompressed compressed fastq file is replaced with the characteristic information of the target base in the decompressed compressed patch file.

2

2. The method according to claim 1 , wherein the selecting a target base from the bases in the biological sequence fastq file according to a preset rule and the characteristic information of each base, and the generating a base patch file by using characteristic information of the target base comprise: selecting a base of a target base type from the bases in the biological sequence fastq file as the target base; and generating the base patch file by using a base type and a sequence location of the target base.

3

3. The method according to claim 2 , wherein the determining whether characteristic information of the target base in the decompressed compressed patch file is consistent with characteristic information of the target base in the decompressed compressed fastq file comprises: determining whether a base type corresponding to a sequence location in the decompressed compressed patch file is consistent with a base type corresponding to the sequence location in the decompressed compressed fastq file; and wherein the modifying the characteristic information of the target base in the decompressed compressed fastq file to be the same as the characteristic information of the target base in the decompressed compressed patch file comprises: modifying the base type corresponding to the sequence location in the decompressed compressed fastq file to be the same as the base type corresponding to the sequence location in the decompressed compressed patch file.

4

4. The method according to claim 1 , wherein the characteristic information further comprises a quality score, and wherein the selecting a target base from the bases in the biological sequence fastq file according to a preset rule and the characteristic information of each base, and the generating a base patch file by using characteristic information of the target base comprise: selecting, from the bases in the biological sequence fastq file, a base satisfying at least one of the following as the target base: a base that is of a target base type and whose quality score is different from a preset threshold, and a base that is not of the target base type and whose quality score is the preset threshold; and generating the base patch file by using a base type, a quality score, and a sequence location of the target base.

5

5. The method according to claim 4 , wherein the determining whether characteristic information of the target base in the decompressed compressed patch file is consistent with characteristic information of the target base in the decompressed compressed fastq file comprises: determining whether a base type and a quality score that are corresponding to a sequence location in the decompressed compressed patch file are consistent with a base type and a quality score that are corresponding to the sequence location in the decompressed compressed fastq file; and wherein the modifying the characteristic information of the target base in the decompressed compressed fastq file to be the same as the characteristic information of the target base in the decompressed compressed patch file comprises at least one of: modifying the base type corresponding to the sequence location in the decompressed compressed fastq file to be the same as the base type corresponding to the sequence location in the decompressed compressed patch file; and modifying the quality score corresponding to the sequence location in the decompressed compressed fastq file to be the same as the quality score corresponding to the sequence location in the decompressed compressed patch file.

6

6. A biological sequence data processing device, comprising: at least one processor; and a non-transitory computer-readable storage medium coupled to the at least one processor and storing programming instructions for execution by the at least one processor, the programming instructions instruct the at least one processor to perform the following operations: obtaining characteristic information of each base in a biological sequence fastq file, wherein the characteristic information comprises at least a sequence location and a base type of each base in the biological sequence fastq file; selecting a target base from the bases in the biological sequence fastq file according to a preset rule and the characteristic information of each base; generating a base patch file by using characteristic information of the target base; performing lossless compression on the biological sequence fastq file to obtain a compressed fastq file; performing lossless compression on the base patch file to obtain a compressed patch file; separately decompressing the compressed patch file and the compressed fastq file; determining whether characteristic information of the target base in the decompressed compressed patch file is consistent with characteristic information of the target base in the decompressed compressed fastq file; and in response to determining that the characteristic information of the target base in the decompressed compressed patch file is inconsistent with the characteristic information of the target base in the decompressed compressed fastq file, modifying the characteristic information of the target base in the decompressed compressed fastq file to be the same as the characteristic information of the target base in the decompressed compressed patch file, wherein the characteristic information of the target base in the decompressed compressed fastq file is replaced with the characteristic information of the target base in the decompressed compressed patch file.

7

7. The device according to claim 6 , wherein the programming instructions further instruct the at least one processor to perform the following operation steps: selecting a base of a target base type from the bases in the biological sequence fastq file as the target base; and generating the base patch file by using a base type and a sequence location of the target base.

8

8. The device according to claim 7 , wherein the programming instructions further instruct the at least one processor to perform the following operation steps: determining whether a base type corresponding to a sequence location in the decompressed compressed patch file is consistent with a base type corresponding to the sequence location in the decompressed compressed fastq file; and in response to determining that the base type corresponding to the sequence location in the decompressed compressed patch file is inconsistent with the base type corresponding to the sequence location in the decompressed compressed fastq file, modifying the base type corresponding to the sequence location in the decompressed compressed fastq file to be the same as the base type corresponding to the sequence location in the decompressed compressed patch file.

9

9. The device according to claim 6 , wherein the characteristic information further comprises a quality score, and the programming instructions further instruct the at least one processor to perform the following operation steps: selecting, from the bases in the biological sequence fastq file, a base satisfying at least one of the following as the target base: a base that is of a target base type and whose quality score is different from a preset threshold, and a base that is not of the target base type and whose quality score is the preset threshold; and generating the base patch file by using a base type, a quality score, and a sequence location of the target base.

10

10. The device according to claim 9 , wherein the programming instructions further instruct the at least one processor to perform the following operation steps: determining whether a base type and a quality score that are corresponding to a sequence location in the decompressed compressed patch file are consistent with a base type and a quality score that are corresponding to the sequence location in the decompressed compressed fastq file; and in response to determining that the base type and the quality score that are corresponding to the sequence location in the decompressed compressed patch file are inconsistent with the base type and the quality score that are corresponding to the sequence location in the decompressed compressed fastq file, performing at least one of: modifying the base type corresponding to the sequence location in the decompressed compressed fastq file to be the same as the base type corresponding to the sequence location in the decompressed compressed patch file; and modifying the quality score corresponding to the sequence location in the decompressed compressed fastq file to be the same as the quality score corresponding to the sequence location in the decompressed compressed patch file.

11

11. A non-transitory computer readable storage medium, wherein the computer readable storage medium includes instructions, when at least one processor of a computing device executes the instructions, the computing device performs the following operations: obtaining characteristic information of each base in a biological sequence fastq file, wherein the characteristic information comprises at least a sequence location and a base type of each base in the biological sequence fastq file; selecting a target base from the bases in the biological sequence fastq file according to a preset rule and the characteristic information of each base; generating a base patch file by using characteristic information of the target base; performing lossless compression on the biological sequence fastq file to obtain a compressed fastq file; performing lossless compression on the base patch file to obtain a compressed patch file; separately decompressing the compressed patch file and the compressed fastq file; determining whether characteristic information of the target base in the decompressed compressed patch file is consistent with characteristic information of the target base in the decompressed compressed fastq file; and in response to determining that the characteristic information of the target base in the decompressed compressed patch file is inconsistent with the characteristic information of the target base in the decompressed compressed fastq file, modifying the characteristic information of the target base in the decompressed compressed fastq file to be the same as the characteristic information of the target base in the decompressed compressed patch file, wherein the characteristic information of the target base in the decompressed compressed fastq file is replaced with the characteristic information of the target base in the decompressed compressed patch file.

12

12. The non-transitory computer readable storage medium according to claim 11 , wherein the computing device further performs the following operation steps: selecting a base of a target base type from the bases in the biological sequence fastq file as the target base; and generating the base patch file by using a base type and a sequence location of the target base.

13

13. The non-transitory computer readable storage medium according to claim 12 , wherein the computing device further performs the following operation steps: determining whether a base type corresponding to a sequence location in the decompressed compressed patch file is consistent with a base type corresponding to the sequence location in the decompressed compressed fastq file; and in response to determining that the base type corresponding to the sequence location in the decompressed compressed patch file is inconsistent with the base type corresponding to the sequence location in the decompressed compressed fastq file, modifying the base type corresponding to the sequence location in the decompressed compressed fastq file to be the same as the base type corresponding to the sequence location in the decompressed compressed patch file.

14

14. The non-transitory computer readable storage medium according to claim 11 , wherein the characteristic information further comprises a quality score, and the computing device further performs the following operation steps: selecting, from the bases in the biological sequence fastq file, a base satisfying at least one of the following as the target base: a base that is of a target base type and whose quality score is different from a preset threshold, and a base that is not of the target base type and whose quality score is the preset threshold; and generating the base patch file by using a base type, a quality score, and a sequence location of the target base.

15

15. The non-transitory computer readable storage medium according to claim 14 , wherein the computing device further performs the following operation steps: determining whether a base type and a quality score that are corresponding to a sequence location in the decompressed compressed patch file are consistent with a base type and a quality score that are corresponding to the sequence location in the decompressed compressed fastq file; and in response to determining that the base type and the quality score that are corresponding to the sequence location in the decompressed compressed patch file are inconsistent with the base type and the quality score that are corresponding to the sequence location in the decompressed compressed fastq file, performing at least one of: modifying the base type corresponding to the sequence location in the decompressed compressed fastq file to be the same as the base type corresponding to the sequence location in the decompressed compressed patch file; and modifying the quality score corresponding to the sequence location in the decompressed compressed fastq file to be the same as the quality score corresponding to the sequence location in the decompressed compressed patch file.

Patent Metadata

Filing Date

Unknown

Publication Date

June 14, 2022

Inventors

Zhe LIU
Jun ZHANG

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Method and Apparatus for Processing Fastq files compressing lossless compression and decompression” (11360940). https://patentable.app/patents/11360940

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.

Method and Apparatus for Processing Fastq files compressing lossless compression and decompression — Zhe LIU | Patentable