Use Bedtools to extract stranded read depth from BAM files
1
3
Entering edit mode
7.6 years ago
gtho123 ▴ 260

I am trying to use Bedtools to read-depth counts to generate plots of RNA-Seq read coverage like one sees in genome browsers. I am running into difficulty doing this in a strand specific manner. I know my data is strand-specific and I am informed that the alignment was done using Tophat2 specifying that the library is stranded.

I found that I could do a lot using the coveregeBed function to get most of the way there like so:

coverageBed -abam ALIGNMENT.bam -b FILE.bed -split -d > coverage.txt

Because BAM files are quite large I subseted the Bedfile to focus on one gene I am familiar with and was able to produce a plausible graph where the read peaks all matched annotated exons. I then sought to split these counts by their strands by adding the -s and -S flags in sequential coverageBed calls as suggested in the docs like so:

coverageBed -abam ALIGNMENT.bam -b FILE.bed -split -d -s > coverage_F.txt
coverageBed -abam ALIGNMENT.bam -b FILE.bed -split -d -S > coverage_R.txt

However when I plotted these they were exactly the same (albeit slightly different in shape from the total coverage) which is implausible as I would have thought that any anti-sense transcription should to be lower than the sense expression and not have exactly the same pattern. These are the graphs:

Read_depth_plots

I then tried a different approach where I first interected my subsetted BED file with the BAM file and the called genomeCoverageBed like so:

genomeCoverageBed -ibam SUBSETED.bam -d -split -strand + > coverage_f_geocov.txt 
genomeCoverageBed -ibam SUBSETED.bam -d -splitenter code here -strand - > coverage_r_geocov.txt

This way I got the same result. Crucially the two strand specific read depth files, while identical, do not add up to the total number of reads when I plot it non strand specifically. If I add the forward and reverse strands together and plot them I get this when:

Overlap plot

What is going on and how can I get strand specific read-depth counts?

Thanks in advance

RNA-Seq Bedtools alignment • 3.7k views
ADD COMMENT
0
Entering edit mode
7.6 years ago
michael.ante ★ 3.8k

Hi,

First I'd check some alignment statistics with RSeQC to find out the strand specificity (infer_experiment.py), the alignment itself (bam_stat.py) and here focus on the proper-paired reads vs. the singletons, and the insert size (inner_distance.py).

After checking, I'd use for the coverageBed either only proper-paired reads or only Read1.

Cheers,

Michael

ADD COMMENT

Login before adding your answer.

Traffic: 2398 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6