Question: Mapping And Coverage Stats Of Exome Capture Experiment
gravatar for rob234king
6.7 years ago by
UK/Harpenden/Rothamsted Research
rob234king600 wrote:

I have an exome capture data set for wheat which is hexaploid. I have mapped and called SNPs. The PI wants:

1) A plot of GC content of exons vs read coverage

2) A plot of exon length vs read coverage

3) A figure for the number (%) of reads mapped to exons

4) unique % mapped

5) total % mapped

I can do 4 and 5 but think I'm going to struggle with 1-3. Any ideas (and example commands would be useful) of how to get stats and plot for these qualities, using available tools?

exon mapping statistics • 2.9k views
ADD COMMENTlink modified 6.7 years ago by DG7.1k • written 6.7 years ago by rob234king600

If this is what PI wants, what do _you_ want then? :-)

ADD REPLYlink written 6.7 years ago by Biomonika (Noolean)3.1k

You're mapping to the transcriptome right?

ADD REPLYlink written 6.7 years ago by gammyknee200

Nope mapping to genome.

ADD REPLYlink written 6.7 years ago by rob234king600
gravatar for DG
6.7 years ago by
DG7.1k wrote:

You can use the various GATK QC-oriented tools to get most of those. Given a BED file for your capture regions use:

DepthOfCoverage to get the number of reads mapped to a each of your targeted regions (exons). This will also give summary information like the percent of bases in the exon covered at different depth cut-offs.

GCContentByInterval will give you the GC content of each of the exons (again using the BED file of targeted exons).

You might just want to write a simple script to get the length of each exon based on your BED file. With those bits of summary information you'll probably want to set up some sort of binning strategy for the data and then compile and plot the stats.

ADD COMMENTlink written 6.7 years ago by DG7.1k

GCContentByInterval worked great for getting exon size and GC content all in one file for each exon. I'll have a play with the Depth of Coverage function tomorrow and update. Thanks

ADD REPLYlink written 6.7 years ago by rob234king600

Perfect. Glad that worked.

ADD REPLYlink written 6.7 years ago by DG7.1k

Seems to work ok for a reduced file for the depthofcoverage but takes forever for anything big and numbers don't quite match with what I see in IGV. I've got a program from Tobias Rausch from the DELLY suite which seems to do a better job than GATK for depth of exons because so quick and numbers match so if anyone looking for coverage over exons I would suggest looking at Delly.

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by rob234king600

How many samples do you have? I routinely run DepthOfCoverage on my exome sample data. Also I would recommend NOT running DepthOfCoverage on reduced BAMs. I don't think the results are quite the same, I've certainly gotten different results. It is also worth pointing out that IGV isn't a proxy for the correct answer, as it will include reads with low mapping quality. DepthOfCoverage has default read filters which you should be taking in to account as well as a variety of other settings for adjusting its filtering and counting behaviour.

What do you mean by "forever" in terms of time?

ADD REPLYlink written 6.6 years ago by DG7.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1803 users visited in the last hour