Targeted Sequencing - Calculating The Minimum Exon Coverage
3
1
Entering edit mode
8.4 years ago
Luca Beltrame ▴ 220

I recently received a batch of data from a targeted sequencing (50 genes, human) experiment. The sequencing was targeted on all exons of these 50 genes. I was suggested to calculate the minimum coverage for each exon, in case the coverage for the reads is not uniform.

As I'm still inexperienced, I was wondering how to calculate this. My idea would be, given that I have a BED file with the sequenced regions:

  1. Calculate per-base coverage for each BAM file using bedtools coverage and the BED file for the regions;
  2. Group the result by exon coordinate, and choose the minimum value for each coordinate.

Is there a better way, or is my approach totally off? Thanks!

sequencing coverage exon • 4.5k views
ADD COMMENT
2
Entering edit mode
8.4 years ago

Minimum exon coverage, though, is probably not the most actionable number to work with. What you really want to calculate is the number of "callable bases". You can do this per gene, per exon, or per sample. One simple but approximate way to look at this is the percentage of bases covered >X (ie., 82% of target bases covered at 30x) after removing duplicates and low quality bases/reads. Picard CalculateHsMetrics is useful for this kind of thing, but you can also write your own.

ADD COMMENT
1
Entering edit mode
8.4 years ago

It is defiantly smart to take a look at coverage / depth of your exons.

I would suggest checking out BEDTOOLS. It has methods to answer these questions. I.E. Given my bam what is my % coverage....

ADD COMMENT
0
Entering edit mode

In fact I'm using bedtools coverage to get the fraction of exons covered. I was wondering if there was anything else I could do.

ADD REPLY
0
Entering edit mode
8.4 years ago
Leszek 4.1k

I came across TEQC - Bioconductor package for dealing with enrichment sequencing.
Easily, you can plot coverage distribution histograms (manual).
Give it a try!

ADD COMMENT
0
Entering edit mode

Unfortunately, like many Bioconductor packages, it fails with a completely nondescript error when using it, and it looks it doesn't support my aligned genome (the 1000 genomes reference) which uses chromosomes without "chr" prefix.

ADD REPLY
0
Entering edit mode

I can never understand why people write code where seqid must begin with chr...

ADD REPLY
0
Entering edit mode

I fixed it in the end, the BAM handling is broken, it works if I convert the BAM to a BED file.

ADD REPLY
0
Entering edit mode

Hi Leszek,

I have used TEQC, but getting errors. I posted the details here: C: reading a bed file in R

Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2134 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6