Question: Genome coverage against normalized average depth
0
gravatar for konstantinkul
2.8 years ago by
Russian Federation
konstantinkul90 wrote:

Hello!

Does anybody dealt with this type of graph "A plot of genome coverage against normalized average depth" http://prntscr.com/df9wmc. I really need it for optimization of procedures of WGS. I was trying to figure it out how to generate it, but I couldn't. The description in methods section was not obvious for me (http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-13-1, Methods, Genome coverage).

"Mapped reads for each replicate data sets were normalized by average genome coverage to 11× and 21× for clinical and non-clinical samples respectively, and then pooled or analyzed independently depending on analysis metrics used. For the clinical samples, reads originating from host contamination were filtered out (by aligning all reads to P. falciparum 3D7 reference genome) prior to analysis, hence the relatively low average genome coverage depth for these samples. Evaluation of coverage depth and evenness was based on cumulative distributions over a normalized overall average depth. A measurement of low-coverage index lci (d) is defined as the integration of the cumulative distribution C(x) from 0 to d giving an overall assessment of the coverage at the low end of distribution: lci(d)=∫d0C(x)dx The value lci (0.5) that gives a measurement of the coverage below half of the average depth in the distribution was used to compare evenness of coverage for each data set."

Especially I can't understand the method of normalization "... a normalized overall average depth". It just df$norm_depth <- with(df, (depth/median(over_all_avr_depth))) or not???

I'll be grateful for any links, suggestions, and scripts in Python or R.

next-gen R • 1.6k views
ADD COMMENTlink modified 2.8 years ago by Brian Bushnell16k • written 2.8 years ago by konstantinkul90
0
gravatar for Brian Bushnell
2.8 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

It looks like the graph is trying to show that as you increase average depth (in other words, the number of reads sequenced), a higher percentage of the genome is covered by at least 1 read. But some preparations have biased distributions and thus need more depth to achieve a given level of genome coverage. You could certainly generate such a graph by mapping your reads, then subsampling the sam file to various depths, and calculating the genome coverage for each subsample (for example, BBMap's pileup.sh tool will calculate the percent genome coverage, and reformat.sh can subsample a sam file to various numbers of reads).

I find this particular graph confusing, though; there does not seem to me to be any point in using normalized rather than raw depth (other than to obfuscate), and the "theoretical" line that is vertical at 1.0 does not make any sense to me.

ADD COMMENTlink written 2.8 years ago by Brian Bushnell16k

Thank you for your reply. I have similar doubts about "theoretical" line. Let me explain my pipeline 1. I create coverage column for each position of ref genome $ bedtools coverage -abam some.bam -b fasta.tab -d > cov.txt extrat last column cut -f 6 cov.txt > genome_depth 2. In R tab <- read.table("genome_depth", header = FALSE)

Then I created frequency table of different depths freq_tab <- as.data.frame(table(tab$V1))

and calculate percent of each length with specific depth value in this table freq_tab$perc <- with(freq_tab, (100*Freq)/length(tab$V1))

Then I apply simple normalisation by dividing each depth by median depth dc <- as.numeric(levels(freq_tab$Var1)) and freq_tap$scale_depth <- with(freq_tab, (dc/median(dc)))

And plot this data plot(freq_tab$scale_depth, cumsum(freq_tab$perc),type="l")

http://prntscr.com/dfawi2

What do you think is this correct way to obtain such graph? When I add the second sample I get this plot http://prntscr.com/dfb05s

They are similar to what I want, but graphics from different samples in the example picture intersect at a certain point which conveniently display slight differences in the slope.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by konstantinkul90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1944 users visited in the last hour