What dataset do I use to annotate a transcriptome? Another transcriptome or a genome?
1
1
Entering edit mode
4.3 years ago

When I annotate a transcriptome (or a set of transcripts), does it need to be annotated with, another transcriptome or do I need to use a genome? Should my blastdb be a genome, transcriptome, or protein data? My study species in a non-model organism (amphibian). Is it better to annotate with a closely related species of lower quality or a model organism of high quality? How do I determine quality from a database download? What database should I use, NCBI, Ensembl or UniProt?

alignment RNA-Seq • 851 views
ADD COMMENT
2
Entering edit mode
4.3 years ago

that depends on your goal.

if you want to annotate the transcripts functionally, you'll be better of using a protein set of a model organism (given that the 'closest' one is not too distantly related). For those you will have a better chance to get back meaningful annotations.

If you are looking to get the gene structures, you have a bit more options as there the functional assignments are not that critical (though generally the gene models for model organisms should be of higher quality)

NCBI nr prot is the most comprehensive dataset you can get but the quality is not equal for all entries. Uniprot or ENsembl are smaller (more dedicated datasets) but have a more thorough annotations.

I would personally give preference for species that have a genome available as those have likely more full data

ADD COMMENT
0
Entering edit mode

That makes perfect sense, this is very helpful. Thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 2747 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6