RNA-Seq junction reads overhang imbalance
1
2
Entering edit mode
3.4 years ago
pavenhuizen ▴ 90

I'm working with human RNA-Seq data and I'm observing some, in my eyes, weird behavior for certain splice junctions. For context, the data consists of 50nt single-end reads and was mapped using STAR against the human genome with GENCODE V34 annotation.

For some junctions, I'm seeing some "weird" read splitting behavior, where the majority of the split-reads have a short overhang on one side, and on one side only. For example, the mean overhang length on the left side is 5nt and 45nt on the right side. Assuming that reads are more or less randomly distributed across a gene, what could be the reason that for these junctions the split reads all start and split at the same position? Might this be an artifact? Again, assuming a uniform read distribution, I would expect the overhang distribution for a junction to be roughly equal on both sides.

Has anyone observed a similar kind of behaviour before and could tell me more what I'm obversing here?

Thanks in advance!

RNA-Seq STAR splice junctions • 1.7k views
ADD COMMENT
1
Entering edit mode

can you provide an IGV screenshot of this?

ADD REPLY
0
Entering edit mode

The IGV screenshots do not give an as clear picture as I observe when examining the CIGAR signatures for the split reads. Nonetheless, here are three screenshots which hopefully can aid a bit. I listed the mean overhang length, based on the CIGAR signatures, of the left and right overhangs below each screenshot.

chr1:9730526-9730621 - Left v right mean overhang: 42.56 v 5.39 chr1:9730526-9730621 - Left v right mean overhang: 42.56 v 5.39

chr1:22660893-22660946 - Left v right mean overhang: 43.93 v 4.07 chr1:22660893-22660946 - Left v right mean overhang: 43.93 v 4.07

chr1:35192686-35192765 - Left v right mean overhang: 4.40 v 43.35 chr1:35192686-35192765 - Left v right mean overhang: 4.40 v 43.35

ADD REPLY
0
Entering edit mode

thanks for uploading this. I must say I am stumped. It does look like the overhang alignments are stopping at a particular base pair position as though there is an N in the reference that is causing a soft clip.

ADD REPLY
4
Entering edit mode
3.4 years ago

Are the reads stranded or not? Are you running with default mapping parameters?

There are a couple of things you might be observing here.

  1. It could be that you need a minimal amount of the read to map before the junction to anchor the read on that side before STAR looks for alignment on the other side. If the read is split 50:50 across the junction, then 25nt might not be enough of the anchor side. I don't quite see why this would have the left/right balance problem though - I do wonder if sequencing might be partly to blame? Like it maps from right to left (which might the case with second-strand stranded sequencing?), and so you need at least a 45nt anchor before it looks for junction?

  2. The alternative if that it is indeed and artifact - reads are coming from somewhere else, but map up to the junction in a different place, and then STAR goes looking for somewhere to put the other part. 5nt is short enough that it finding somewhere by chance where the other half will go, but more than 5nt it can't find somewhere, so that read would be either unmapped or clipped.

Both these are just guesses, its not behavior I've seen myself, but i've not gone looking for it either.

ADD COMMENT
0
Entering edit mode

The reads are not stranded and mostly default parameters were used. By default STAR requires a minimum overhang of 3 nt for a splice junction (if I'm not mistaken), so no need for a 45nt anchor, I would think. I unfortunately do not have any additional information about the sequencing strategy, so I cannot comment on that.

Do you have any suggestions on how to control/check for the second scenario you describe? Could I clip the small overhangs and re-align the longer parts to see if they map to other/multiple locations?

ADD REPLY
2
Entering edit mode

I would be tempted to post this (with the generating data and some specific pointers to these cases) to the STAR repository. Alex Dobin is very responsive, and at the very least, can likely give you a definitive answer as to why you're seeing this behavior.

ADD REPLY
0
Entering edit mode

I can't quite remember the details of the STAR algo, but yes, you only need 5nt overhang on the far side of the junction, but the other side of the junction must have enough sequence to uniquely map it.

ADD REPLY

Login before adding your answer.

Traffic: 2732 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6