Mapping And Coverage Stats Of Exome Capture Experiment
1
0
Entering edit mode
10.2 years ago
rob234king ▴ 610

I have an exome capture data set for wheat which is hexaploid. I have mapped and called SNPs. The PI wants:

1) A plot of GC content of exons vs read coverage

2) A plot of exon length vs read coverage

3) A figure for the number (%) of reads mapped to exons

4) unique % mapped

5) total % mapped

I can do 4 and 5 but think I'm going to struggle with 1-3. Any ideas (and example commands would be useful) of how to get stats and plot for these qualities, using available tools?

statistics exon mapping • 4.0k views
ADD COMMENT
0
Entering edit mode

If this is what PI wants, what do _you_ want then? :-)

ADD REPLY
0
Entering edit mode

You're mapping to the transcriptome right?

ADD REPLY
0
Entering edit mode

Nope mapping to genome.

ADD REPLY
1
Entering edit mode
10.2 years ago
DG 7.3k

You can use the various GATK QC-oriented tools to get most of those. Given a BED file for your capture regions use:

DepthOfCoverage to get the number of reads mapped to a each of your targeted regions (exons). This will also give summary information like the percent of bases in the exon covered at different depth cut-offs.

GCContentByInterval will give you the GC content of each of the exons (again using the BED file of targeted exons).

You might just want to write a simple script to get the length of each exon based on your BED file. With those bits of summary information you'll probably want to set up some sort of binning strategy for the data and then compile and plot the stats.

ADD COMMENT
0
Entering edit mode

GCContentByInterval worked great for getting exon size and GC content all in one file for each exon. I'll have a play with the Depth of Coverage function tomorrow and update. Thanks

ADD REPLY
0
Entering edit mode

Perfect. Glad that worked.

ADD REPLY
0
Entering edit mode

Seems to work ok for a reduced file for the depthofcoverage but takes forever for anything big and numbers don't quite match with what I see in IGV. I've got a program from Tobias Rausch from the DELLY suite which seems to do a better job than GATK for depth of exons because so quick and numbers match so if anyone looking for coverage over exons I would suggest looking at Delly.

ADD REPLY
0
Entering edit mode

How many samples do you have? I routinely run DepthOfCoverage on my exome sample data. Also I would recommend NOT running DepthOfCoverage on reduced BAMs. I don't think the results are quite the same, I've certainly gotten different results. It is also worth pointing out that IGV isn't a proxy for the correct answer, as it will include reads with low mapping quality. DepthOfCoverage has default read filters which you should be taking in to account as well as a variety of other settings for adjusting its filtering and counting behaviour.

What do you mean by "forever" in terms of time?

ADD REPLY

Login before adding your answer.

Traffic: 3861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6