Finding Organism Protein Databases
1
0
Entering edit mode
7.3 years ago
kgbenn123 ▴ 20

I need to assemble some model seqs of my protein of interest so I have a control to compare to other sequences. I would like to do a nice swath of Eukaryotes and Bacteria (having at least 12 or 13 species of Eukaryotes and 5 or 6 species of Bacteria), as well as having multiple versions of the proteins from each species (i.e. Human Protein_A 1.1, Human Protein_A 1.2, Human Protein_A 2.3, Zebrafish Protein_A 1.1, etc...). SO...I know this is super easy to do with NCBI or UniProt, but my mentor wants me to get the proteins from organism-specific databases. I didn't think this would be a problem until I got about 4 organisms deep.

Human and Mouse and Zebrafish were pretty easy to find, but I'm having so much trouble finding sites that give me AA seqs for my protein (very common protein). Even the yeast genome website, which I was told would be a goldmine, has been useless to me.

Basically, is there an easy way to find organism databases or at least find sequences that link back to an organism database? UniProt kinda does this, but 20hrs of searching over the weekend made me give up on it.

sequence protein database • 1.8k views
ADD COMMENT
0
Entering edit mode

Have you looked at homologene or protein clusters from NCBI?

ADD REPLY
0
Entering edit mode

I have never used these (and I will give them both a shot tonight), but I assume I will still be going through every sequence to see if it is cited back to an organism-specific database? or do these have some feature that allows you to narrow your search that way?

EDIT: Homologene was incredibly helpful! I still have the problem of it not coming from an organism-specific website, but these seqs seem to be of pretty high quality and I can at least review the submitters. Thanks!

ADD REPLY
0
Entering edit mode

Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized.

Since we don't know what organisms you are interested in it is hard to tell if there are organism specific database available. BTW: That is an odd requirement from your mentor and may not be satisfiable in all instances. If you can find the protein at NCBI (e.g. RefSeq) or at Uniprot (swissprot) then that should be good enough evidence that the sequence is real since both of these are manually curated databases.

ADD REPLY
0
Entering edit mode

I'm not even interested in any "specific" organisms, just a good cross-section of Domains. I totally agree though about RefSeq and UniProt. I think the problem is that they may not be the MOST up-to-date on physiology or...I have no idea. I'm just trying to put together diverse, quality seqs that I can use as a control for statistical analysis of unknown seqs.

Anyways, thanks for the help. I'll probably ask my mentor to clarify again or give me a hint of where to find these elusive organism-specific protein databases.

ADD REPLY
0
Entering edit mode
6.7 years ago

A bit late maybe but I wonder why nobody mentioned Ensembl and Ensembl genomes. The added advantage over other resources is that regardless of organism, one can use the same API, no need to write a separate script for each organism.

ADD COMMENT

Login before adding your answer.

Traffic: 2540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6