Question: STAR using splice junctions from SJ.out to find flanking exons (GRCh37)
gravatar for stefanos.bamopoulos
18 months ago by
stefanos.bamopoulos30 wrote:

Hello guys,

I have a question I am struggling with, for which I am sure there is a sensible explanation. I have run an RNASeq analysis with STAR using the GRCh37 reference genome. I used the SJ.out file to run some downstream analysis and have now detected some splice junction that interest me. Using the start and end positions provided in the SJ.out file I tried to find upstream and downstream exons flanking my splice junctions. For this purpose I parsed the Homo_sapiens.GRCh37.75.gtf file.

To my surprise I have some splice junctions that don't seem to have an upstream or downstream exon flanking them. To find the flanking exons I looked for upstream exons that end one base before the start of the splice junction and for downstream exons that begin one base after the end of the splice junction. For most (~85%) of the splice junctions I could find at least one upstream and downstream exon, but others don't have any exon in their close proximity, or have an upstream exon, but no downstream exon.

Below I provide some examples of splice junctions that don't have flanking exons:

SJ1 (no upstream or downstream exon):

Chromosome: 3, Start: 52027879, End: 52028055

SJ2: (no upstream exon, but has a downstream exon):

Chromosome: 11, Start: 61204812, End: 61205096

SJ3: (no downstream exon, but has an upstream exon):

Chromosome: 2, Start: 44121769, End: 44122506

I cannot find a reasonable explanation for this, as the splice junction information for STAR is provided in the form of the gtf-file that I am parsing. My understanding is that the start of the splice junction should be the start of an intron and the end of the splice junction marks the end of the intron. In each case an exon should follow before and after. Interestingly only SJ1 in the example above doesn't have either an upstream or a downstream exon. The rest seems to be flanked by an exon at least from one side.

IMPORTANT NOTE: All splice junctions are reported as annotated in the SJ.out file, meaning there are no de novo splice junctions.

I would greatly appreciate it, if someone could point out a logical error or provide a biological explanation for this problem. If you need any additional information, I will be happy to provide it.

Thank you!

rna-seq star grch37 sj.out gtf • 1.3k views
ADD COMMENTlink modified 16 months ago by Eric Lim1.4k • written 18 months ago by stefanos.bamopoulos30
gravatar for Eric Lim
16 months ago by
Eric Lim1.4k
Eric Lim1.4k wrote:

What you're observing is coined alternative splicing.

SJ1. Splicing within an annotated exon, especially in the UTRs, is fairly common. Some of these events may introduce premature stop codons which lead to the degradation of the transcripts via nonsense mediated decay. Plenty of genes are regulated this way.

SJ2. The use of an alternative 5' splice site (5'ss). Since that would include an unusually long exon, the more likely event is probably an exon inclusion, for which you should observe another alternative splicing upstream of SJ2. In order to piece together splicing events from short reads, sometimes you need to look at more than one splice junction.

I didn't look at SJ3, but I assume similar logic would apply.

Generally speaking, you would need an arbitrary, but sensible threshold to remove low abundant splice junctions. Unless you have specific reasons to look beyond UTRs, I'd also suggest to remove splice junctions with both ends mapped outside of annotated UTRs in your reference of choice.

Since I don't fully understand the intricacies of STAR, I'll let others answer some of your STAR-related questions.

ADD COMMENTlink modified 16 months ago • written 16 months ago by Eric Lim1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1371 users visited in the last hour