Hi Biostars! I am coming to you with a relatively simple question, but one that i have surprisingly not found and answer to. I am working with a case-cohort of samples that were prepared for RNA-seq as paired reads with a read length of 125bp (each).
I am wondering now how to handle alignment with respect to control data that I have obtained through TCGA that is paired reads (75bp) in length (sequenced on the same platform, same organism, same tissue and using a protocol very similar to that used for the case-cohort). As I am running an alignment through STAR (i know that RSEM is generally preferable, but unfortunately it is not an option in this case). When preparing the index, i have thus generated it with the setting --sjdbOverhang=124bp (read-length-1).
My question is, should i generate a separate index for the control cohort that adheres to the length of the controls (i.e. --sjdbOverhang= 75-1 =74) and align them to the reference genome using this, or should i still align the controls against the same index used for the case-group. The alignments will subsequently be used for differential gene expression (with read counting carried out using featurecounts (subread)).
I apologize if this question is very simplistic/basic or have been answered before, but i have previously only worked with case/control samples from the same source of origin, and have not found relevant information regarding working with a case/control with different read-lengths.