Question

Rna-Seq : How To Check For Very High Coverage Region

2

Entering edit mode

12.0 years ago

Nicolas Rosewick 11k

Hi,

How can I check for very high coverage region in a bam file. I've several samples with good alignment rate (~90%) but when I count the reads for known genes (ensembl), I don't have good results compared to previous data. I think it's rRNA or other contamination. So how can I check for high coverage region in my data (more than 10000 reads per position) ?

Thanks a lot,

N.

rna-seq coverage • 3.5k views

ADD COMMENT • link updated 12.0 years ago by Ian 6.1k • written 12.0 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Various relevant methods are discussed here: What is the fastest method to determine the number of positions in a BAM file with >N coverage?

ADD REPLY • link 12.0 years ago by Malachi Griffith 20k

score 1 · Answer 1 · 2013-06-25

Hi,

I was going to suggest 'samtools depth' but that doesn't appear in the manual anymore, might still be functioning though (works on my cluster)... Essentially, depth used to give number of reads aligning to a region. So you would give it a BAM and the region and reference.fa and get your (per base) 'depth', then awk out those higher than 10000.

Bruce.

score 1 · Answer 2 · 2013-06-25

You could use the GATK DepthOfCoverage walker with a provided BED file. Usually these are per exon but any sort of intervals will do. I believe you can get per position coverage information from that tool as well. Otherwise for intervals you can set what sort of percentage/depth cutoffs you want to report.

score 1 · Answer 3 · 2013-06-25

You can generate a mpileup output with samtools. Then use a script to scan through the file to get the top X bases with the highest read mapping. You can use this top X list to narrow down the regions with highest coverage.

To get counts for each base, you can simply count the number of '.' or ',' in the 4th column of the mpileup output. A '.' (period) stands for read mapping to forward strand and ',' (comma) stands for read mapping to reverse strand.

score 1 · Answer 4 · 2013-06-26

1

Entering edit mode

12.0 years ago

Ian 6.1k

I would use bedtools coverage:

Summary: Returns the depth and breadth of coverage of features from A on the intervals in B.

bedtools coverage -abam reads.bam -b interesting_intervals.bed > output

other flags can help you get the format you want.

ADD COMMENT • link 12.0 years ago by Ian 6.1k