Question: Calculating Exome Coverage
2
gravatar for Coryza
6.9 years ago by
Coryza30
Coryza30 wrote:

*// Edit to make the post more clear (Mapping done via Bowtie2). My problem is that when counting Exome Coverage via coverageBed gives different results than via genomeCoverageBed. So I'm not sure if I'm doing something wrong, or which of the 2 methods is correct.

1) My first step is to build an .bed file of my Illumina Paired-End reads, returning the positions that only fall in targeted exon regions. I'm doing that via intersectBed -a [data.bed] -b [illuminaexonregions.bed].

2) My next step is to calculate the coverage of my new datafile via coverageBed -a [newdata.bed] -b [illuminaexonregions.bed]. I calculated some statistics:

Number of exons 214126 with a total length of 45326818

Number of matched nucleotides 10993449.0

Nucleotides/Length*100 24.253740909 % Coverage.

3) The next step was to calculate the coverage of my new datafile via genomeCoverageBed -i [newdata.bed] -g [genome.txt] -d awk '$3>0 {print $1"\t"$2"\t"$3}'. I calculated some statistics:

Number of exons 214126 with a total length of 45326818

Number of matched nucleotides 10576907.0

Nucleotides/Length*100 23.3347661863 % Coverage.

Somehow there's a difference in matched nucleotides, which I can't explain. What am I doing wrong?

bedtools exome coverage bowtie2 • 7.4k views
ADD COMMENTlink modified 6.9 years ago • written 6.9 years ago by Coryza30
4
gravatar for Charles Warden
6.9 years ago by
Charles Warden8.0k
Duarte, CA
Charles Warden8.0k wrote:

If you already have a .bed file with your targeted regions (in this case, the exome), I think the easiest solution to to use CalculateHsMetrics in Picard (which will also provide additional statistics beyond just on-target coverage)

http://picard.sourceforge.net/command-line-overview.shtml#CalculateHsMetrics

ADD COMMENTlink modified 6.9 years ago • written 6.9 years ago by Charles Warden8.0k

+1. CalculateHSMetrics is awesome (once you figure out the goofy BED-type format it demands). If you decide to use it and have any trouble, post a comment here and I'll help you out when I see the question.

ADD REPLYlink written 6.9 years ago by Dan D7.2k

Dan, I have trouble figuring out the format of interval files. Links lead to that info seem all dead. Please help. Thanks!

ADD REPLYlink written 5.6 years ago by qy0

Picard has been essentially folded in to the GATK. The new documentation is here: https://broadinstitute.github.io/picard/command-line-overview.html

ADD REPLYlink written 5.6 years ago by DG7.2k
2
gravatar for DG
6.9 years ago by
DG7.2k
DG7.2k wrote:

Assuming you are using the same files for both commands (from your examples it wasn't clear) it looks like you are only dealing with a few bases. IIRC genomeCoverageBed only counts the bases on reads that lie within the targeted regions. IntersectBed returns all reads that intersect with the targeted regions in the bedfile, and then you are counting all of those bases. So it would include the bases that lie within intronic regions, etc as long as part of the read overlaps the exons.

ADD COMMENTlink written 6.9 years ago by DG7.2k

The first step is to intersectBed the derived data against the Illumina Targeted Exome file. In that case all the intronic regions are filtered out right? I thought only the regions that fall in the exon positions are returned, and in that case no intronic regions should be in my new file?

ADD REPLYlink written 6.9 years ago by Coryza30

When you run intersectBed it gives you all reads that intersect that bed file. Many of the reads though can span the intron/exon boundary. Intersectbed doesn't just give you the bases that intersect, at least I don't think so. I would use HSMetrics as suggested above, or one of the GATK tools for calculating coverage as you can get much more detailed metrics.

ADD REPLYlink written 6.9 years ago by DG7.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2127 users visited in the last hour
_