I'm working with human RNA-Seq data and I'm observing some, in my eyes, weird behavior for certain splice junctions. For context, the data consists of 50nt single-end reads and was mapped using STAR against the human genome with GENCODE V34 annotation.
For some junctions, I'm seeing some "weird" read splitting behavior, where the majority of the split-reads have a short overhang on one side, and on one side only. For example, the mean overhang length on the left side is 5nt and 45nt on the right side. Assuming that reads are more or less randomly distributed across a gene, what could be the reason that for these junctions the split reads all start and split at the same position? Might this be an artifact? Again, assuming a uniform read distribution, I would expect the overhang distribution for a junction to be roughly equal on both sides.
Has anyone observed a similar kind of behaviour before and could tell me more what I'm obversing here?
Thanks in advance!