Question

STAR SjdbOverHang specification

0

Entering edit mode

5 months ago

CTLong ▴ 110

Hi all,

I have a dataset which contains 30 samples. The read length for these 30 samples vary slightly (For example: some samples are 75bp while others are 76bp). While the most ideal situation is to generate two separate indices, with the respective Sjdboverhang set as 74 and 75. I decided to take on a more convenient approach which was to generate a single STAR matrix, because my assumption is that the one base pair variation between my samples are negligible.

But silly me, I made a mistake by specifying the Sjdboverhang as 77 base pairs, which is neither ideal for any of my samples, although it is not far off. I would like to ask whether the mapping of my dataset with this "suboptimal" STAR index would affect subsequent gene-level quantification significantly, and if it is really worth it to regenerate my STAR index and remap everything again. I see from the STAR documentation that Sjdboverhang 100 works just as well as the ideal value. So my guess is that 77 may be fine?

STAR • 506 views

ADD COMMENT • link 5 months ago by CTLong ▴ 110

0

Entering edit mode

I would think so. As long as all samples are aligned using the same index.

ADD REPLY • link 5 months ago by GenoMax 142k

0

Entering edit mode

Thanks for the reply. But let's say I map the 15 samples with STAR index overhang specified to 74 and the other 15 samples set as 75. Are my samples comparable amongst each other, given that batch effect is controlled for here.

ADD REPLY • link 5 months ago by CTLong ▴ 110

0

Entering edit mode

For those interested, I found a relatable GitHub issue on this https://github.com/alexdobin/STAR/issues/931

I guess strictly speaking, using ideal value for sjdboverhang for each cohort is best. But sticking to one common index itself should not change the results by a lot.

ADD REPLY • link 5 months ago by CTLong ▴ 110