Hi all,
I am using the HISAT2-StringTie pipeline to analyse my RNAseq runs. My objective is to calculate gene abundance values (FPKM & TPM) for each run and then use soft clustering algorithms to determine the probability of specific genes falling into 'high' and 'low' expression clusters.
My question relates to whether I should align to the GRCh38_genome index vs. the GRCh38_genome_tran index. I understand the difference between the two indexes, but I do not yet comprehend how aligning to one index or the other will impact my gene abundance values. There appears to be little difference in TPM values when I align a sample to either index.
My explicit questions are:
1 - What practical difference does it make aligning to the GRCh38_genome index vs. the GRCh38_genome_tran index?
2 - If I opted to align to the to the GRCh38_genome index, would it compromise the validity/quality of my alignment for a reason that I've overlooked?
Thank you for any help, and apologies if my questions seem simple - I'm fairly new to RNAseq analysis.
I would abandon traditional alignment for RNA-seq entirely and switch to leightweight pseudoalignment or selective alignment tools such asl
salmon
which typically outperform traditional tools for RNA-seq quantification.