Is my estimated genome coverage correct?
1
0
Entering edit mode
5 months ago
apcreyes29 • 0

Hi. I'm trying to calculate my sequenced bacterial genome's genome coverage (for NCBI submission) but I'm not sure if I got the right answer. According to biostars.org/p/208339 , the formula is C = R X L / G where C is the coverage, R is the total number of reads, L is read length, and G is the genome size.

The total reads (R) that I got for my genome is around 31,000,000 while the read length (L) is 100 according to the sequence length I retrieved from FastQC when I submitted one paired-end read of my genome. The genome size (G) I got after assembly is around 4,000,000 bp.

If I use the formula, the answer would be 1,550 = ((2100) 31,000,000) / 4,000,000. I think I may have the wrong answer as I noticed from other whole genome sequencing studies that their genome coverage is only around 100x-300x or it even goes as low as 12x. Did I really have the wrong answer? Thank you in advance.

NCBI Genome Sequencing • 593 views
0
Entering edit mode

while a theoretical formula is quite nice, you should (IMO) plot / look at actual coverage to confirm. map your reads to a reference, and use e.g. mosdepth to calculate coverage or just open bam file in genome browser to see coverage

2
Entering edit mode
5 months ago
Mensur Dlakic ★ 25k

If the total number of reads is 31 million, and if you are sure that most of them map to your genome of interest, the answer would be 775x. Not sure why you are multiplying 2.

Separately, it is not unusual to get 775x or 1500x coverage for single-genome sequencing. If you assembly is fragmented (> 100 contigs), you may want to consider downsampling the reads to 100-200x. It could help to get a more continuous assembly.

0
Entering edit mode

Thank you for the answer. I understand it better now. I multiplied it by 2 since I thought the 100 read length is supposedly for only one paired read. May I ask how to downsample my reads?

1
Entering edit mode

I suggest you search this website for "reads normalization" or something similar. There are several ways of doing it. I have used both khmer digital normalization and bbnorm.sh and both work fine. The latter is faster and has fewer steps, so it may be a good starting point.