Question: Use Bedtools to extract stranded read depth from BAM files
2
gravatar for gtho123
14 months ago by
gtho123170
New Zealand
gtho123170 wrote:

I am trying to use Bedtools to read-depth counts to generate plots of RNA-Seq read coverage like one sees in genome browsers. I am running into difficulty doing this in a strand specific manner. I know my data is strand-specific and I am informed that the alignment was done using Tophat2 specifying that the library is stranded.

I found that I could do a lot using the coveregeBed function to get most of the way there like so:

coverageBed -abam ALIGNMENT.bam -b FILE.bed -split -d > coverage.txt

Because BAM files are quite large I subseted the Bedfile to focus on one gene I am familiar with and was able to produce a plausible graph where the read peaks all matched annotated exons. I then sought to split these counts by their strands by adding the -s and -S flags in sequential coverageBed calls as suggested in the docs like so:

coverageBed -abam ALIGNMENT.bam -b FILE.bed -split -d -s > coverage_F.txt
coverageBed -abam ALIGNMENT.bam -b FILE.bed -split -d -S > coverage_R.txt

However when I plotted these they were exactly the same (albeit slightly different in shape from the total coverage) which is implausible as I would have thought that any anti-sense transcription should to be lower than the sense expression and not have exactly the same pattern. These are the graphs:

Read_depth_plots

I then tried a different approach where I first interected my subsetted BED file with the BAM file and the called genomeCoverageBed like so:

genomeCoverageBed -ibam SUBSETED.bam -d -split -strand + > coverage_f_geocov.txt 
genomeCoverageBed -ibam SUBSETED.bam -d -splitenter code here -strand - > coverage_r_geocov.txt

This way I got the same result. Crucially the two strand specific read depth files, while identical, do not add up to the total number of reads when I plot it non strand specifically. If I add the forward and reverse strands together and plot them I get this when:

Overlap plot

What is going on and how can I get strand specific read-depth counts?

Thanks in advance

rna-seq alignment bedtools • 886 views
ADD COMMENTlink modified 14 months ago by michael.ante1.8k • written 14 months ago by gtho123170
0
gravatar for michael.ante
14 months ago by
michael.ante1.8k
Austria/Vienna
michael.ante1.8k wrote:

Hi,

First I'd check some alignment statistics with RSeQC to find out the strand specificity (infer_experiment.py), the alignment itself (bam_stat.py) and here focus on the proper-paired reads vs. the singletons, and the insert size (inner_distance.py).

After checking, I'd use for the coverageBed either only proper-paired reads or only Read1.

Cheers,

Michael

ADD COMMENTlink written 14 months ago by michael.ante1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1104 users visited in the last hour