Question: Best annotation GTFs for transcript detection vs novel transcript discovery
4.6 years ago
United States
trakhtenberg150 wrote:

In my analysis of RNAseq data (mouse) I have two goals: one is to identify differentially expressed genes, and the other is to discover novel transcripts. I assume that for the first goal I should use the annotation database that has the least number of redundant or erroneous entries. For the second goal I assume I should use the most comprehensive database possible even if it may have redundant or erroneous entries. If I am correct in my assumptions, which annotation databases should I use?

In terms of accuracy, it seems that GENCODE version M3 may be the best way to go? If I understood correctly it includes non-redundant transcripts from all the main sources: (a) all Refseq RNAs, (b) all that is added in UCSC Genes from Genbank, (c) Ensembl checked by HAVANA and just predicted, (d) and other databases.

Is there a reason to use a different database (e.g., UCSC Gene) to accomplish the first goal? To accomplish the second goal (discovering novel transcripts), should I also use the entire genbank? And if GENCODE filters Ensembl should I also use the original Ensembl?

