Why there are symmetric splice junctions on IGV?
0
1
Entering edit mode
4 months ago
Apex92 ▴ 200

Hi, I have the same problem as mentioned in this post Interpreting splice junctions on IGV previously - could anyone please elaborate on this? On IGV guide it says When available, IGV uses the "XS" tag provided by the alignment to determine strandedness. If missing, strandeness is inferred from the read strand.

My reads are single-end and stranded (single-cell data).

Any help is highly appreciated.

Thank you.

sequencing bam IGV alignment rna-seq • 608 views
0
Entering edit mode

what are the stranded counts (see here) for any particular gene?

0
Entering edit mode

I have a merged bam file where I converted it to sam and there are 10145130 lines (reads) in the file. Using the method below (mentioned in the link you shared) - for forwardStrandReads.sam I do get 5071949 reads and for reverseStrandReads.sam I do get 5072949 which almost adds up to the total number of reads in the main sam file. With that, do you think that reads are unstranded? I just checked my featureCounts command that I used for counting, there I had used -s0 (meaning unstranded) and Successfully assigned alignments percentage is 95%).

samtools view Reads.bam | gawk '(and(16, $2))' > forwardStrandReads.sam samtools view Reads.bam | gawk '(! and(16,$2))' > reverseStrandReads.sam

0
Entering edit mode

you need to look at one gene - any gene - to see if it's stranded or unstranded

1
Entering edit mode

Thank you Jeremy for giving input - I looked at the bam files, my reads are unstranded (I thought they were stranded).

0
Entering edit mode

the XS tag is a tag added by the aligner, and indeed, if this is missing, then you probably have about a 50-50 of positive and negative reads spanning a particular splice site, resulting in it being symmetric. The XS tag infers the strand by saying: there is a splice site here, and no matter the strand indicated on the read, ON THE GENOME WHERE THE READ ALIGNS, if there are canonical splice letters (GT/AG) on the + strand, then the aligner says: this read contributes to splicing of a gene on the + strand. Similar logic for negative strand.

0
Entering edit mode

note also that some aligners use the TS tag instead of XS (TS being the newer, official version of indicating strandedness since XS is was not a formalized tag. minimap2 also does a thing where it outputs lower case ts which actually has a different meaning that TS, it is flipped, but that is just extra trivia)

0
Entering edit mode

also note: you claim your reads are "stranded" but this may not be the case, because stranded protocols would probably not appear symmetric like this (the reads would truly indicate the strand of the transcript being sequenced)