Question

Is my estimated genome coverage correct?

0

Entering edit mode

12 months ago

apcreyes29 • 0

Hi. I'm trying to calculate my sequenced bacterial genome's genome coverage (for NCBI submission) but I'm not sure if I got the right answer. According to biostars.org/p/208339 , the formula is C = R X L / G where C is the coverage, R is the total number of reads, L is read length, and G is the genome size.

The total reads (R) that I got for my genome is around 31,000,000 while the read length (L) is 100 according to the sequence length I retrieved from FastQC when I submitted one paired-end read of my genome. The genome size (G) I got after assembly is around 4,000,000 bp.

If I use the formula, the answer would be 1,550 = ((2100) 31,000,000) / 4,000,000. I think I may have the wrong answer as I noticed from other whole genome sequencing studies that their genome coverage is only around 100x-300x or it even goes as low as 12x. Did I really have the wrong answer? Thank you in advance.

NCBI Genome Sequencing • 1.4k views

ADD COMMENT • link updated 12 months ago by Mensur Dlakic ★ 27k • written 12 months ago by apcreyes29 • 0

0

Entering edit mode

while a theoretical formula is quite nice, you should (IMO) plot / look at actual coverage to confirm. map your reads to a reference, and use e.g. mosdepth to calculate coverage or just open bam file in genome browser to see coverage

ADD REPLY • link 12 months ago by cmdcolin ★ 3.8k

score 2 · Answer 1 · 2023-04-30

2

Entering edit mode

12 months ago

Mensur Dlakic ★ 27k

If the total number of reads is 31 million, and if you are sure that most of them map to your genome of interest, the answer would be 775x. Not sure why you are multiplying 2.

Separately, it is not unusual to get 775x or 1500x coverage for single-genome sequencing. If you assembly is fragmented (> 100 contigs), you may want to consider downsampling the reads to 100-200x. It could help to get a more continuous assembly.

ADD COMMENT • link 12 months ago by Mensur Dlakic ★ 27k

0

Entering edit mode

Thank you for the answer. I understand it better now. I multiplied it by 2 since I thought the 100 read length is supposedly for only one paired read. May I ask how to downsample my reads?

ADD REPLY • link 12 months ago by apcreyes29 • 0

1

Entering edit mode

I suggest you search this website for "reads normalization" or something similar. There are several ways of doing it. I have used both khmer digital normalization and bbnorm.sh and both work fine. The latter is faster and has fewer steps, so it may be a good starting point.

ADD REPLY • link 12 months ago by Mensur Dlakic ★ 27k