Question: RNA-Seq junction reads overhang imbalance
2
gravatar for pavenhuizen
12 weeks ago by
pavenhuizen90
pavenhuizen90 wrote:

I'm working with human RNA-Seq data and I'm observing some, in my eyes, weird behavior for certain splice junctions. For context, the data consists of 50nt single-end reads and was mapped using STAR against the human genome with GENCODE V34 annotation.

For some junctions, I'm seeing some "weird" read splitting behavior, where the majority of the split-reads have a short overhang on one side, and on one side only. For example, the mean overhang length on the left side is 5nt and 45nt on the right side. Assuming that reads are more or less randomly distributed across a gene, what could be the reason that for these junctions the split reads all start and split at the same position? Might this be an artifact? Again, assuming a uniform read distribution, I would expect the overhang distribution for a junction to be roughly equal on both sides.

Has anyone observed a similar kind of behaviour before and could tell me more what I'm obversing here?

Thanks in advance!

splice junctions rna-seq star • 210 views
ADD COMMENTlink modified 12 weeks ago by i.sudbery11k • written 12 weeks ago by pavenhuizen90
1

can you provide an IGV screenshot of this?

ADD REPLYlink written 12 weeks ago by Jeremy Leipzig19k

The IGV screenshots do not give an as clear picture as I observe when examining the CIGAR signatures for the split reads. Nonetheless, here are three screenshots which hopefully can aid a bit. I listed the mean overhang length, based on the CIGAR signatures, of the left and right overhangs below each screenshot.

chr1:9730526-9730621 - Left v right mean overhang: 42.56 v 5.39 chr1:9730526-9730621 - Left v right mean overhang: 42.56 v 5.39

chr1:22660893-22660946 - Left v right mean overhang: 43.93 v 4.07 chr1:22660893-22660946 - Left v right mean overhang: 43.93 v 4.07

chr1:35192686-35192765 - Left v right mean overhang: 4.40 v 43.35 chr1:35192686-35192765 - Left v right mean overhang: 4.40 v 43.35

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by pavenhuizen90

thanks for uploading this. I must say I am stumped. It does look like the overhang alignments are stopping at a particular base pair position as though there is an N in the reference that is causing a soft clip.

ADD REPLYlink written 12 weeks ago by Jeremy Leipzig19k
4
gravatar for i.sudbery
12 weeks ago by
i.sudbery11k
Sheffield, UK
i.sudbery11k wrote:

Are the reads stranded or not? Are you running with default mapping parameters?

There are a couple of things you might be observing here.

  1. It could be that you need a minimal amount of the read to map before the junction to anchor the read on that side before STAR looks for alignment on the other side. If the read is split 50:50 across the junction, then 25nt might not be enough of the anchor side. I don't quite see why this would have the left/right balance problem though - I do wonder if sequencing might be partly to blame? Like it maps from right to left (which might the case with second-strand stranded sequencing?), and so you need at least a 45nt anchor before it looks for junction?

  2. The alternative if that it is indeed and artifact - reads are coming from somewhere else, but map up to the junction in a different place, and then STAR goes looking for somewhere to put the other part. 5nt is short enough that it finding somewhere by chance where the other half will go, but more than 5nt it can't find somewhere, so that read would be either unmapped or clipped.

Both these are just guesses, its not behavior I've seen myself, but i've not gone looking for it either.

ADD COMMENTlink written 12 weeks ago by i.sudbery11k

The reads are not stranded and mostly default parameters were used. By default STAR requires a minimum overhang of 3 nt for a splice junction (if I'm not mistaken), so no need for a 45nt anchor, I would think. I unfortunately do not have any additional information about the sequencing strategy, so I cannot comment on that.

Do you have any suggestions on how to control/check for the second scenario you describe? Could I clip the small overhangs and re-align the longer parts to see if they map to other/multiple locations?

ADD REPLYlink written 12 weeks ago by pavenhuizen90
2

I would be tempted to post this (with the generating data and some specific pointers to these cases) to the STAR repository. Alex Dobin is very responsive, and at the very least, can likely give you a definitive answer as to why you're seeing this behavior.

ADD REPLYlink written 12 weeks ago by Rob4.6k

I can't quite remember the details of the STAR algo, but yes, you only need 5nt overhang on the far side of the junction, but the other side of the junction must have enough sequence to uniquely map it.

ADD REPLYlink written 12 weeks ago by i.sudbery11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1828 users visited in the last hour
_