So I am working on rna seq data. I am using gsnap for my analysis(though I have also used tophat). With tophat , a junctions.bed file is produced which actually tells us about the splicing patterns going on.
Using gsnap I am able to produce sam/bam files. Is there a way I can convert these sam/bam files into something similar to junctions.bed files.
My main goal is to see what is the difference between junctions produced from tophat and gsnap. Tophat produces junctions.bed file but gsnap doesnt. So is there a way I can get a junctions file from sam/bam output of gsnap.
Hope to hear from you guys soon
Are you comfortable writing perl scripts?
yes i am comfortable
great, then it is relatively straightforward to obtain the junctions given a SAM/BAM file. All you need is the start position of a read and its CIGAR. Suppose a read starts at 10000 and cigar is 15M20N75M, then the read is present from 10000-10014 and 10035-10109. So, the read's junction coordinates are 10015-10034. It should be a bit more involved to take into account the effect of Insertions and deletions. But this is the basic principle. Let me know if you run into problems.
What if the CIGAR string is something like this 1S14M20N75M