Question: Automatically naming protein sequences based on homology
gravatar for wl284
3.3 years ago by
wl28440 wrote:

I have a file of about 600 translated protein sequences, which I'd like to name automatically based on their homology. If I blast them against the nr database, the hits they return often don't have consistently annotated names, for example one protein could hit something named 'NADH-ubiquinone reductase' and another 'complex 1 dehydrogenase', which are the same protein (KEGG: but have been named differently depending on who submitted it to ncbi.

Is there some kind of 'official' name that can be assigned for proteins, then I can quickly screen through them and see if a particular protein is present or not, or how many copies of that particular protein there are, for example. At the moment, it's difficult to know whether my searches are exhaustive as I might be looking for a protein under one name but it's called something completely different.

Someone suggested to me assigning each protein to groups of orthologs, for example using something like OrthoMCL or EggNOG but I'm struggling to understand how to use these, especially for several hundred sequences.

If anyone could suggest a strategy or give some indication of how to use these ortholog database for the purpose I've described, I'd greatly appreciate the help. Cheers!

ADD COMMENTlink written 3.3 years ago by wl28440

The most official name I know for them are for the proteins are in the uniprot database. You could blast against the uniprot database and assign the consenus name it returns. But it is unclear to me why do you want to assign a name from a blast search to your sequences.

ADD REPLYlink written 3.3 years ago by LluĂ­s R.890

I'm trying to identify what proteins I have by comparing them to homologues in other species using blast. They have all been sequenced de novo using RNA-seq and there's no genome sequences available from the species I'm working with. There are some key proteins I expect to find in the dataset and I'm identifying them by searching through the blast results. Currently if I don't find one of these proteins, I don't know if it's really not present in the dataset or whether I just haven't found it because the protein is named something else in the blast results.

Hopefully this is a bit clearer?

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by wl28440

Never mind, found the sort of thing I need, eggNOG mapper does the job:

ADD REPLYlink written 3.3 years ago by wl28440
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1076 users visited in the last hour