Question

Targeted Sequencing - Calculating The Minimum Exon Coverage

1

Entering edit mode

11.4 years ago

Luca Beltrame ▴ 240

I recently received a batch of data from a targeted sequencing (50 genes, human) experiment. The sequencing was targeted on all exons of these 50 genes. I was suggested to calculate the minimum coverage for each exon, in case the coverage for the reads is not uniform.

As I'm still inexperienced, I was wondering how to calculate this. My idea would be, given that I have a BED file with the sequenced regions:

Calculate per-base coverage for each BAM file using bedtools coverage and the BED file for the regions;
Group the result by exon coordinate, and choose the minimum value for each coordinate.

Is there a better way, or is my approach totally off? Thanks!

sequencing coverage exon • 5.5k views

ADD COMMENT • link updated 11.4 years ago by Sean Davis 26k • written 11.4 years ago by Luca Beltrame ▴ 240

score 2 · Answer 1 · 2012-11-14

Minimum exon coverage, though, is probably not the most actionable number to work with. What you really want to calculate is the number of "callable bases". You can do this per gene, per exon, or per sample. One simple but approximate way to look at this is the percentage of bases covered >X (ie., 82% of target bases covered at 30x) after removing duplicates and low quality bases/reads. Picard CalculateHsMetrics is useful for this kind of thing, but you can also write your own.

score 1 · Answer 2 · 2012-11-14

1

Entering edit mode

11.4 years ago

Zev.Kronenberg 12k

It is defiantly smart to take a look at coverage / depth of your exons.

I would suggest checking out BEDTOOLS. It has methods to answer these questions. I.E. Given my bam what is my % coverage....

ADD COMMENT • link 11.4 years ago by Zev.Kronenberg 12k

0

Entering edit mode

In fact I'm using bedtools coverage to get the fraction of exons covered. I was wondering if there was anything else I could do.

ADD REPLY • link 11.4 years ago by Luca Beltrame ▴ 240

score 0 · Answer 3 · 2012-11-14

0

Entering edit mode

11.4 years ago

Leszek 4.2k

I came across TEQC - Bioconductor package for dealing with enrichment sequencing.
Easily, you can plot coverage distribution histograms (manual).
Give it a try!

ADD COMMENT • link 11.4 years ago by Leszek 4.2k

0

Entering edit mode

Unfortunately, like many Bioconductor packages, it fails with a completely nondescript error when using it, and it looks it doesn't support my aligned genome (the 1000 genomes reference) which uses chromosomes without "chr" prefix.

ADD REPLY • link 11.4 years ago by Luca Beltrame ▴ 240

0

Entering edit mode

I can never understand why people write code where seqid must begin with chr...

ADD REPLY • link 11.4 years ago by Zev.Kronenberg 12k

0

Entering edit mode

I fixed it in the end, the BAM handling is broken, it works if I convert the BAM to a BED file.

ADD REPLY • link 11.4 years ago by Luca Beltrame ▴ 240

0

Entering edit mode

Hi Leszek,

I have used TEQC, but getting errors. I posted the details here: C: reading a bed file in R

Thanks!

ADD REPLY • link 7.0 years ago by bioinfo8 ▴ 230