Question: STAR using splice junctions from SJ.out to find flanking exons (GRCh37)
0
gravatar for stefanos.bamopoulos
3.2 years ago by
stefanos.bamopoulos40 wrote:

Hello guys,

I have a question I am struggling with, for which I am sure there is a sensible explanation. I have run an RNASeq analysis with STAR using the GRCh37 reference genome. I used the SJ.out file to run some downstream analysis and have now detected some splice junction that interest me. Using the start and end positions provided in the SJ.out file I tried to find upstream and downstream exons flanking my splice junctions. For this purpose I parsed the Homo_sapiens.GRCh37.75.gtf file.

To my surprise I have some splice junctions that don't seem to have an upstream or downstream exon flanking them. To find the flanking exons I looked for upstream exons that end one base before the start of the splice junction and for downstream exons that begin one base after the end of the splice junction. For most (~85%) of the splice junctions I could find at least one upstream and downstream exon, but others don't have any exon in their close proximity, or have an upstream exon, but no downstream exon.

Below I provide some examples of splice junctions that don't have flanking exons:

SJ1 (no upstream or downstream exon):

Chromosome: 3, Start: 52027879, End: 52028055

SJ2: (no upstream exon, but has a downstream exon):

Chromosome: 11, Start: 61204812, End: 61205096

SJ3: (no downstream exon, but has an upstream exon):

Chromosome: 2, Start: 44121769, End: 44122506

I cannot find a reasonable explanation for this, as the splice junction information for STAR is provided in the form of the gtf-file that I am parsing. My understanding is that the start of the splice junction should be the start of an intron and the end of the splice junction marks the end of the intron. In each case an exon should follow before and after. Interestingly only SJ1 in the example above doesn't have either an upstream or a downstream exon. The rest seems to be flanked by an exon at least from one side.

IMPORTANT NOTE: All splice junctions are reported as annotated in the SJ.out file, meaning there are no de novo splice junctions.

I would greatly appreciate it, if someone could point out a logical error or provide a biological explanation for this problem. If you need any additional information, I will be happy to provide it.

Thank you!

rna-seq star grch37 sj.out gtf • 2.7k views
ADD COMMENTlink modified 8 months ago by MirkoR0 • written 3.2 years ago by stefanos.bamopoulos40
1
gravatar for Eric Lim
3.0 years ago by
Eric Lim1.7k
Stoke Therapeutics, Inc
Eric Lim1.7k wrote:

What you're observing is coined alternative splicing.

SJ1. Splicing within an annotated exon, especially in the UTRs, is fairly common. Some of these events may introduce premature stop codons which lead to the degradation of the transcripts via nonsense mediated decay. Plenty of genes are regulated this way.

SJ2. The use of an alternative 5' splice site (5'ss). Since that would include an unusually long exon, the more likely event is probably an exon inclusion, for which you should observe another alternative splicing upstream of SJ2. In order to piece together splicing events from short reads, sometimes you need to look at more than one splice junction.

I didn't look at SJ3, but I assume similar logic would apply.

Generally speaking, you would need an arbitrary, but sensible threshold to remove low abundant splice junctions. Unless you have specific reasons to look beyond UTRs, I'd also suggest to remove splice junctions with both ends mapped outside of annotated UTRs in your reference of choice.

Since I don't fully understand the intricacies of STAR, I'll let others answer some of your STAR-related questions.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Eric Lim1.7k
0
gravatar for MirkoR
8 months ago by
MirkoR0
Italy
MirkoR0 wrote:

IMPORTANT NOTE: All splice junctions are reported as annotated in the SJ.out file, meaning there are no de novo splice junctions.

Concerning this, did you perform 2-pass mode? If you looked at the SJ.out.file output of the second pass, it could be expected. In fact, the novel junctions found at the first pass will be considered as annotated in the output of the second pass. This is what I'm understanding from STAR manual, though.

ADD COMMENTlink modified 8 months ago • written 8 months ago by MirkoR0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1904 users visited in the last hour
_