mRNA / Protein Annotation questions - etiquette and consensus
1
1
Entering edit mode
3.6 years ago
Biogeek ▴ 420

I am debating whether the following is good practice and could do with guidance/ consensus.

I have aligned mRNA-seq reads to a genome published approx. 3 years ago. From my understanding, they carried out a BlastP on the predicted proteins to annotate them with a function/ name. A lot where hypothetical/ no hit etc. Not much use.

1. Given that the proteins were predicted and annotated over 3 years ago, is it advisable to re-annotate these sequences to the most up to date UniprotKB release? I would assume this is common practice as people always want to be working with more / updated resources of information?..OR should I leave it up to the original authors of the genome to do this? What is generally accepted?

2. I have mRNA gene sequences as well as the original predicted proteins available in a .fasta file which I can use to re-annotate against UniprotKB (Swiss and Trembl). Does it matter if I choose to use mRNA over the predicted proteins as BlastX will search in all 6 coding frames anyway? Does BlastP on predicted proteins yield better results?

Thanks for the insight / consensus

Genome Out-dated Re-annotation blastx blastp • 777 views
1
Entering edit mode

Has the genome been incorporated into NCBI or Ensembl? If yes, there should / could be a better annotation available, compared to the authors original annotation

0
Entering edit mode

h.mon, yes it's available on NCBI. In what way would it be 'better'?

Thanks for the insight.

0
Entering edit mode

What is the species? Go to https://www.ncbi.nlm.nih.gov/ , then at the search box type the name of the species, select Genome database at the pull-down menu, and hit enter.

2
Entering edit mode
3.6 years ago

excellent questions/remarks!

1) that is indeed what you would expect, but it rarely happens for several reasons. the original authors might not even care anymore (depending on their interest in that particular species). Usually people will use the 'official' version (== the one associated with the paper), partly because of not having to do the effort of doing it themself , partially because it can get confusing. Image you say that gene X has function Y (based on your update) but the official one still says Z , if people then look up gene X they will see function Z and not your function Y . Often these things are only updated when a new release of the genome is being presented, very rarely thus (unless it's a key/model species)

2) I would personally rely more on the proteins (blastP). with the mRNA (blastX) approach you might get spurious or 'second grade' hits due to matches on the non-protein frames. If a blastP does not give enough results you might consider blastX