Question

Counting and summation number of bases in a FASTQ.GZ file

0

Entering edit mode

16 months ago

영재 • 0

Happy New Year! This time, I am starting a bioinformatics-related work as a new job.

I want to write a python sciprt that opens the fastq.gz file and calculates the sum of each A, T, G, C base in the sequence. Given read1, read2 fastq.gz, how to get each stat of read1, read2?

In summary, I would like to obtain the following results through a python script.

Below picture is the resulting output I want.

fastq python script • 1.0k views

ADD COMMENT • link updated 16 months ago by GenoMax 142k • written 16 months ago by 영재 • 0

0

Entering edit mode

enter image description here

ADD REPLY • link 16 months ago by 영재 • 0

0

Entering edit mode

What is the question here? You know what you want to implement and which language to implement it in. So start trying it out and then post your code if you run into issues.

This has been implemented in existing packages/tools/commandline hacks already but if you are trying to learn to do this then go for it.

Following list of past threads on this topic is for reference:

Extract basic info from fastq files in an efficient way?
Counting Number Of Bases In A Fastq File
Number of bases with a certain quality in FASTQ file

ADD REPLY • link 16 months ago by GenoMax 142k

score 0 · Answer 1 · 2023-01-05

0

Entering edit mode

16 months ago

Ming Tommy Tang ★ 3.9k

take a look at this python package https://www.biorxiv.org/content/10.1101/2022.12.21.521373v1

ADD COMMENT • link 16 months ago by Ming Tommy Tang ★ 3.9k

0

Entering edit mode

Thank you for your reply. Is it impossible to implement through a basic python script?

ADD REPLY • link 16 months ago by 영재 • 0