Difference between total number of reads in fastq file and no of bases/nt sequences in fastq file?
1
1
Entering edit mode
7 days ago
Fizzah • 0

Hello there; I am a beginner in Data analysis domain and want to clear my concept number of reads and no of bases in fastq file. what is the difference between total number of reads in fastq file and no of bases/nt sequences in fastq file? what command will return out put for total number of reads and total no of bases?

read count • 557 views
0
Entering edit mode
1
Entering edit mode
7 days ago
Mensur Dlakic ★ 14k

Each read represents a short piece of DNA that was sequenced. Reads consist of a certain number of bases. When you add up all the bases from all the reads, you get a total number of bases.

There is a program called stats.sh from the BBtools package that will tell you the number of reads (it will call them scaffolds/contigs) and a total number of bases in each fastq files:

Minimum         Number          Number          Total           Total           Scaffold
Scaffold        of              of              Scaffold        Contig          Contig
Length          Scaffolds       Contigs         Length          Length          Coverage
--------        --------------  --------------  --------------  --------------  --------
All                  1,000           1,000         149,982         149,981   100.00%
100                  1,000           1,000         149,982         149,981   100.00%

2
Entering edit mode

If you want a simple way to get these numbers using Linux commands, these two lines will give you a number of reads and total bases, respectively:

awk '{if (NR % 4 == 0) print $0}' myfile.fastq | wc | awk '{print$1}'
awk '{if (NR % 4 == 0) print $0}' myfile.fastq | wc | awk '{print ($3-$1)}'  This assumes that your file is called myfile.fastq. If you are curious, the first part takes each fourth line from the fastq file, because those lines contain the nucleotide sequence. wc command in Linux counts lines, characters and bytes, and awk selects which of those are printed out. ADD REPLY 0 Entering edit mode Thank you soo much for such detail answers. I got the point now ADD REPLY 0 Entering edit mode Can you please explain how to run that command. I keep trying to find total read count by using bbtool stats.ph program but I keep failing it. ADD REPLY 1 Entering edit mode I don't know what the problem is, so it is difficult to help. If you have Java and have downloaded and installed BBtools, it is as simple as: stats.sh myfile.fastq  ADD REPLY 0 Entering edit mode it says fixu@DESKTOP-KJMSKGU:/mnt/c/bbmap$ stats.sh 19213R-08-01_S16_L002_R1_001.fastq stats.sh: command not found

0
Entering edit mode

I suggest you spend some time learning basics of Linux, specifically how to set up a $PATH variable in order to tell the system where to look for programs. I am assuming from your command that you unpacked the files in /mnt/c/bbmap directory, which would mean adding this command to your startup files for bash shell: export PATH="/mnt/c/bbmap:$PATH"


Or this one for (t)csh shell:

setenv PATH "/mnt/c/bbmap:\${PATH}"


if you want to run it from the directory where it was installed - which is in general not a good idea - you would need to add ./ to the start of your command, to tell the system to look for stats.sh in a current directory:

./stats.sh 19213R-08-01_S16_L002_R1_001.fastq


Please take some time to figure out basic Linux commands and things about system setup, as it is impossible to guide you through all possible problems one command at a time.