Hi all,
I am rather new to genome/transcriptome analysis, so I apologize if this is a basic question. I have done some reading of different analysis methods in the literature; specifically, I'm confused about the difference between the tasks of "gene prediction" (e.g., AUGUSTUS, BRAKER) versus genome-guided "transcript reconstruction" (e.g., Cufflinks, StringTie), both of which seem capable of taking in pre-aligned RNA-seq reads (.bam) and computing relevant genomic regions. Note that I am NOT referring to de novo transcript assembly algorithms like Trinity.
What I currently believe the difference is: Gene prediction is ultimately trying to annotate the features of a genome—there is pretty much a "right answer" for each organism. Additionally, gene prediction tends to incorporate many data sources and prediction methods, like sequence homology, searching for known nucleotide patterns ab initio, etc. Transcript reconstruction is ultimately trying to characterize transcription at the very time of RNA-sequencing; the output is passed on to other analysis tasks such as differential expression. Thus, there is more focus on estimating expression, as well as searching for different transcript isoforms for the same gene.
However, even if that is the case—I hope I'm not mixing biological concepts here, but since transcripts are produced from a gene, aren't the genomic regions that both tasks are seeking to locate one and the same? In other words, for organisms with reference sequences, why use Cufflinks/Stringtie at all if you can ostensibly assemble reads more accurately using gene prediction software?
Any corrections, insights, or suggestions for further reading are appreciated!