Counting and summation number of bases in a FASTQ.GZ file
1
0
Entering edit mode
16 months ago
영재 • 0

Happy New Year! This time, I am starting a bioinformatics-related work as a new job.

I want to write a python sciprt that opens the fastq.gz file and calculates the sum of each A, T, G, C base in the sequence. Given read1, read2 fastq.gz, how to get each stat of read1, read2?

In summary, I would like to obtain the following results through a python script.

Below picture is the resulting output I want.

fastq python script • 1.0k views
ADD COMMENT
0
Entering edit mode

enter image description here

ADD REPLY
0
Entering edit mode

What is the question here? You know what you want to implement and which language to implement it in. So start trying it out and then post your code if you run into issues.

This has been implemented in existing packages/tools/commandline hacks already but if you are trying to learn to do this then go for it.

Following list of past threads on this topic is for reference:

Extract basic info from fastq files in an efficient way?
Counting Number Of Bases In A Fastq File
Number of bases with a certain quality in FASTQ file

ADD REPLY
0
Entering edit mode
16 months ago
Ming Tommy Tang ★ 3.9k

take a look at this python package https://www.biorxiv.org/content/10.1101/2022.12.21.521373v1

ADD COMMENT
0
Entering edit mode

Thank you for your reply. Is it impossible to implement through a basic python script?

ADD REPLY

Login before adding your answer.

Traffic: 1704 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6