Question: mRNA / Protein Annotation questions - etiquette and consensus
1
gravatar for Biogeek
18 months ago by
Biogeek380
Biogeek380 wrote:

I am debating whether the following is good practice and could do with guidance/ consensus.

I have aligned mRNA-seq reads to a genome published approx. 3 years ago. From my understanding, they carried out a BlastP on the predicted proteins to annotate them with a function/ name. A lot where hypothetical/ no hit etc. Not much use.

  1. Given that the proteins were predicted and annotated over 3 years ago, is it advisable to re-annotate these sequences to the most up to date UniprotKB release? I would assume this is common practice as people always want to be working with more / updated resources of information?..OR should I leave it up to the original authors of the genome to do this? What is generally accepted?

  2. I have mRNA gene sequences as well as the original predicted proteins available in a .fasta file which I can use to re-annotate against UniprotKB (Swiss and Trembl). Does it matter if I choose to use mRNA over the predicted proteins as BlastX will search in all 6 coding frames anyway? Does BlastP on predicted proteins yield better results?

Thanks for the insight / consensus

ADD COMMENTlink modified 18 months ago by lieven.sterck7.8k • written 18 months ago by Biogeek380
1

Has the genome been incorporated into NCBI or Ensembl? If yes, there should / could be a better annotation available, compared to the authors original annotation

ADD REPLYlink modified 18 months ago • written 18 months ago by h.mon29k

h.mon, yes it's available on NCBI. In what way would it be 'better'?

Also, how would I go about downloading that data?

Thanks for the insight.

ADD REPLYlink written 18 months ago by Biogeek380

What is the species? Go to https://www.ncbi.nlm.nih.gov/ , then at the search box type the name of the species, select Genome database at the pull-down menu, and hit enter.

ADD REPLYlink written 18 months ago by h.mon29k
2
gravatar for lieven.sterck
18 months ago by
lieven.sterck7.8k
VIB, Ghent, Belgium
lieven.sterck7.8k wrote:

excellent questions/remarks!

1) that is indeed what you would expect, but it rarely happens for several reasons. the original authors might not even care anymore (depending on their interest in that particular species). Usually people will use the 'official' version (== the one associated with the paper), partly because of not having to do the effort of doing it themself , partially because it can get confusing. Image you say that gene X has function Y (based on your update) but the official one still says Z , if people then look up gene X they will see function Z and not your function Y . Often these things are only updated when a new release of the genome is being presented, very rarely thus (unless it's a key/model species)

2) I would personally rely more on the proteins (blastP). with the mRNA (blastX) approach you might get spurious or 'second grade' hits due to matches on the non-protein frames. If a blastP does not give enough results you might consider blastX

ADD COMMENTlink written 18 months ago by lieven.sterck7.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1620 users visited in the last hour