Hi everyone,
I’m a bit confused about how to identify novel genes in a model organism, especially when working with RNA-Seq data.
From what I understand:
If I have a complete reference genome and annotation, I can align my RNA-Seq reads using STAR or HISAT2 and quantify known genes. But if I want to find novel genes (ones not in the annotation), I’m not sure whether STAR alone is enough, or if I need to run StringTie or another transcript assembler on top of STAR’s alignments. And if there’s no good reference genome, then I believe I need to use Trinity for de novo transcriptome assembly to find novel genes. My main confusion is: Can STAR or HISAT2 help discover novel genes, or are they strictly for mapping to known regions? When exactly should I use Trinity instead of STAR? If the reference genome exists but annotation is incomplete, is it better to align with STAR and assemble with StringTie, or skip the reference and go straight to Trinity?
I appreciate any clarification!
Thanks in advance,
I think STAR can detect novel fusion genes and splice junctions but I'm not sure about novel "genes" as such.