Percentage Of Exons Covered
4
0
Entering edit mode
8.3 years ago
hellbio ▴ 440

Hi all,

I have a bed file with the start and end positions of all the exons. How can we find out, what percentage of these exons are covered in the sequenced sample?

Any help would be highly appreciated.

Thanks

exon coverage • 3.5k views
ADD COMMENT
0
Entering edit mode

have you tried bedtools and bedops?

ADD REPLY
2
Entering edit mode
8.3 years ago
Irsan ★ 7.4k

It depends a lot on how you define "covered". My suggestions:

  • An exon is covered when all bases have at least 1 read mapped to it. Or in other words, uncovered exons are those were there are 1 or more bases that have no reads mapped to them. In this case, use bedtools coverageBed -abam file.bam -b exons.bed
  • An exon is covered when the bases have, on average, N reads (I use 5 or more reads) or more mapped to them, where you can choose the threshold N yourself (opposed to bedtools coverageBed ...). Use picard-tools CalculateHsMetrics for this approach (you have to change your bed file a bit, but I'm sure you will manage to do that.
ADD COMMENT
0
Entering edit mode

Well I have used bedtools:

samtools view -b <BAM> | coverageBed -abam stdin -b exons.bed

function and the output looks like,

chr1    60817388        60817507        0       0       119     0.0000000
chr1    91226064        91226183        0       0       119     0.0000000
chr1    101711776       101711895       0       0       119     0.0000000
chr1    108003221       108003340       0       0       119     0.0000000
....

samtools view -b <BAM> | coverageBed -abam stdin -b exons.bed -hist

Few lines of output:

chr1    60817388        60817507        32      1       119     0.0084034
chr1    60817388        60817507        33      2       119     0.0168067
chr1    60817388        60817507        34      3       119     0.0252101

Which output should be used to get the percentage of exons covered?

ADD REPLY
0
Entering edit mode

take the first output. Each line in the output corresponds to an exon. Consider the 7th column (the % of bases in the exon with non-zero coverage). If column 7 = 0, all bases in the exon have non-zero coverage, meaning that all bases have at least 1 read mapped to it. Count the amount of lines with column 7 = 0 and divide that by the total amount of lines:

awk 'BEGIN{FS="\t";totalExons=0;goodExons=0}{totalExons++;if($7==0){goodExons++}}END{print goodExons,"out of ",totalExons," have all bases covered by at least 1 read."}'
ADD REPLY
0
Entering edit mode

Thanks, It gives the % of exons with atleast one read mapped to it. I would also like to find % of exons that have atleast 5,10,20 reads mapped to it. Could you suggest a way to do it or some script?

ADD REPLY
1
Entering edit mode
8.3 years ago
DG 7.2k

There is also the GATK DepthOfCoverage tool, which is what I use to get coverage data for exons with Exome-Seq projects now. I prefer its output over Picards metrics. You can set what you want reported by the tool, from there you can parse the tab delimited output based on whatever thresholds for coverage you want. I tend to look at the percentage of bases covered at at least 5x and 10x to identify regions of low coverage where I may have missed calling variants for instance. And obviously the zero coverage exons are easy to identify in the output.

A typical command line call for me is:

java -Xmx4g -jar /usr/local/bin/GenomeAnalysisTK.jar -T DepthOfCoverage -omitBaseOutput -omitLocusTable -R genome.fa -I A.bam -L Regions.list -o A.coverage -ct 5 -ct 10 -ct 15 -ct 20 -ct 30 -ct 50

GATK DepthOfCoverage

ADD COMMENT
1
Entering edit mode
8.3 years ago

You can try Bamchop: https://github.com/CBMi-BiG/bamchop

ADD COMMENT
0
Entering edit mode
8.3 years ago

Have a look at the bedtools coverage function, and its documentation: http://bedtools.readthedocs.org/en/latest/content/tools/coverage.html

samtools view -b <BAM> | coverageBed -abam stdin -b exons.bed
ADD COMMENT
0
Entering edit mode

Dear Friend, I have a sample whose read count is 51,578,482. After Duplicate removal, i have about 46,393,168 reads with me!! Mapped read count is 46,676,207(99.36%) & on Target read count is 38,221,722 (81.36). Do u think it is a good output? What should be the ideal on target mappability in term of %. I have used Agilent V6+UTR, 150 PE

ADD REPLY
0
Entering edit mode

Your mapped and on-target percentages look good. What people will consider "good" for these statistics really depends on your experiment so there isn't a hard and fast cut-off value to use. But those numbers look fine.

ADD REPLY
0
Entering edit mode

Thanks, Dan, I also would like to ask the permissible duplication allowed? I am Having ~8% duplication of reads in my samples!! I guess this should be just fine. Any valuable comments??

ADD REPLY
0
Entering edit mode

The duplication rate also depends heavily on your experiment but since you are doing exon capture 8% seems reasonable. The metrics you have posted all seem reasonable to me and in line with what I normally expect.

ADD REPLY

Login before adding your answer.

Traffic: 1718 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6