For visualization purpose (like in a genome browser), I would like to get the per-base genome-wide coverage of paired-end RNA-seq data. I thought this would be easy, turns out it is not. Here is a picture showing what I want to do :
And here is where I came so far :
- I start with mapped paired-end reads (accepted_hits.bam)
- Then I split my file into two strand specific files according to this tutorial.
- Then I use bedtools bamtobed with the -bedpe option to get a bed file that summarizes my paired reads. From there I can easily retrieve the start and end positions of the fragments and calculate coverage (see code below). Hurray. However I loose information on the introns...
And it is where I'm stuck. I thought of using bedtools bamtobed with the -split option along with -bedpe, but those two options seems not compatible. A possible solution would be to extract the halfs of the reads spanning introns that are not linked to the mate pair, save them as singleton and add them later to the coverage, but I have no idea how to do that. Any suggestion, even for a completely different method ? Thank you !
samtools sort -no <(samtools view -b -f 2 fwd.bam) temp | bamToBed -i stdin -bedpe | cut -f 1,2,6 | sort -k 1,1 > fwd_merged_mates.bed bedtools genomecov -bga -i fwd_merged_mates.bed -g chrom_summary > WT1_fwd.bg