Question: Best annotation GTFs for transcript detection vs novel transcript discovery
gravatar for trakhtenberg
4.6 years ago by
United States
trakhtenberg150 wrote:

In my analysis of RNAseq data (mouse) I have two goals: one is to identify differentially expressed genes, and the other is to discover novel transcripts. I assume that for the first goal I should use the annotation database that has the least number of redundant or erroneous entries. For the second goal I assume I should use the most comprehensive database possible even if it may have redundant or erroneous entries. If I am correct in my assumptions, which annotation databases should I use?

In terms of accuracy, it seems that GENCODE version M3 may be the best way to go? If I understood correctly it includes non-redundant transcripts from all the main sources: (a) all Refseq RNAs, (b) all that is added in UCSC Genes from Genbank, (c) Ensembl checked by HAVANA and just predicted, (d) and other databases.

Is there a reason to use a different database (e.g., UCSC Gene) to accomplish the first goal? To accomplish the second goal (discovering novel transcripts), should I also use the entire genbank? And if GENCODE filters Ensembl should I also use the original Ensembl?

rna-seq • 1.1k views
ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by trakhtenberg150
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2276 users visited in the last hour