How to measure NGS depth coverage bias
2
0
Entering edit mode
4.2 years ago
Anand Rao ▴ 430

Is there a software tool that reports a measure of the degree of non-uniformity in depth of Illumina sequencing coverage across a de novo assembled genome (against which the Illumina reads are mapped back)?

I have the PE read library (2150bp HiSeq4000), the *de novo assembled genome, and the BAM file for mapping of former to the latter - and I have 290 such data points. I am curious to know how many of these 290 have more versus less uniform coverage depth across their respective genomes.

I came across a paper - https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-5-r51, but no software tool name per se. Some of my assembly woes may mirror those from an earlier post at Any advice for a de novo genome assembly .

To reiterate: Is there a software (like a supplement to something like BBTool's bbnorm) that can help visualize quickly which of my genome assemblies are built on the basis of more uniform coverage depth?

NGS Illumina sequencing depth coverage bias • 3.1k views
3
Entering edit mode
4.2 years ago
Len Trigg ★ 1.5k

One measure of non-uniformity of coverage is the fold-80 penalty, (see https://genomebiology.biomedcentral.com/articles/10.1186/gb-2011-12-1-r1). Essentially it is the degree of additional coverage (in fold coverage of the genome) required so that 80% of the target bases will be covered at the current mean coverage.

The rtg coverage command from RTG Core computes the fold-80 penalty, in addition to other statistics and graphs that can be used to visualize coverage distribution information.

2
Entering edit mode
4.2 years ago

If you map reads to an assembly, you can use BBMap's pileup.sh like this:

pileup.sh in=mapped.sam stats=covstats.txt hist=hist.txt


From the histogram you can visualize the uniformity of the coverage. stats.txt will contain the average coverage and standard deviation on a per-scaffold basis. The program will also print to the screen the overall average coverage and standard deviation.

1
Entering edit mode

You've got

stats=

flag twice, so could you have meant out=covstats.txt?

1
Entering edit mode

Fixed, thanks :) It actually doesn't matter (the second stats= overrides the first one). For pileup.sh, covstats, stats, and out are synonymous...