How can I count the number of reads supporting a specific splice junction (e.g. chr1:15000-20000 where 15000 is the splice donor and 20000 is the splice acceptore position) in a RNA-Seq bam file aligned with STAR ?
Doesn't STAR's .junc file provide that count?
Thanks Devon I forget about the SJ.out.tab file. However if someone do not have this file and only have the bam file. Any idea?
Ah, I'd forgotten the exact name, thanks.
If you don't have that then you'd have to follow Irsan's suggestion. Of course, this would likely mean needing to code something since I don't personally know anything that would do it. In theory, that should be possible with a bit of pysam to get the splice junctions. The simplest method would then be to write a script to output only splice junctions and then to pipe that to sort and then uniq -c followed by awk to reformat. The performance might not be optimal, but for an ad hoc solution I figure that'd work.
In the *.Log.final.out the number of introns/splices are recorded but I am not sure if this this is the number of spliced reads. I have checked the manual but couldn't find an answer. A read can be spliced multiple times of course.
Star uses N's in CIGAR strings for reads that were spliced (over introns). You can use that to count total number of spliced reads.
If you want to do this for 1 specific regions I think its easiest to make a sashimi plot with IGV.
What is the possible way, to count the spliced reads (over introns). I want to count it for all the introns. Is there any method to extract that information from IGV? I can see in sashmi plot, it shows the spliced read count for each intron.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy