Question

What dataset do I use to annotate a transcriptome? Another transcriptome or a genome?

1

Entering edit mode

4.8 years ago

pseudacris ▴ 10

When I annotate a transcriptome (or a set of transcripts), does it need to be annotated with, another transcriptome or do I need to use a genome? Should my blastdb be a genome, transcriptome, or protein data? My study species in a non-model organism (amphibian). Is it better to annotate with a closely related species of lower quality or a model organism of high quality? How do I determine quality from a database download? What database should I use, NCBI, Ensembl or UniProt?

alignment RNA-Seq • 977 views

ADD COMMENT • link updated 3.9 years ago by lieven.sterck 16k • written 4.8 years ago by pseudacris ▴ 10

score 2 · Answer 1 · 2021-02-26

that depends on your goal.

if you want to annotate the transcripts functionally, you'll be better of using a protein set of a model organism (given that the 'closest' one is not too distantly related). For those you will have a better chance to get back meaningful annotations.

If you are looking to get the gene structures, you have a bit more options as there the functional assignments are not that critical (though generally the gene models for model organisms should be of higher quality)

NCBI nr prot is the most comprehensive dataset you can get but the quality is not equal for all entries. Uniprot or ENsembl are smaller (more dedicated datasets) but have a more thorough annotations.

I would personally give preference for species that have a genome available as those have likely more full data