Question: Genbank,Ncbi Refseq Or Uniprot Protein Sequences
3
gravatar for Woa
8.6 years ago by
Woa2.8k
United States
Woa2.8k wrote:

I have to construct a protein database of a sequenced organism for a proteomics search. Protein sequences from which repositories out of Genbank, NCBI Refseq and UniprotKB will be better for this purpose?

Thanks

WoA

proteomics protein refseq uniprot • 9.2k views
ADD COMMENTlink written 8.6 years ago by Woa2.8k
7
gravatar for Martijn Van Iersel
8.6 years ago by
Netherlands
Martijn Van Iersel570 wrote:

UniprotKB contains the most rich, accurate, high-quality data. Genbank contains raw data, it could be very redundant, and you might have to do a lot of filtering yourself. Refseq is not so richly annotated, but at least it's only non-redundant sequences.

So my first choice would be to go with UniprotKB, second RefSeq, and third Genbank. But it also depends on whether the organism you're interested in has sufficient data in each resource.

Would you care to share which organism you're interested in?

ADD COMMENTlink written 8.6 years ago by Martijn Van Iersel570

Many Thanks !!! Can somebody tell me what is the difference between Uniprot "Complete Proteome set" and the combined reviewed (UniProtKB/Swiss-Prot) and unreviewed (UniProtKB/TrEMBL) entries

For some organisms the difference is negligible but for others, so far I've seen the difference is by around 100 entries.

ADD REPLYlink written 8.6 years ago by Woa2.8k
4
gravatar for Chris Evelo
8.6 years ago by
Chris Evelo10.0k
Maastricht, The Netherlands
Chris Evelo10.0k wrote:

You can find the answer to your second question: "what is the difference between Uniprot "Complete Proteome set" and the combined reviewed (UniProtKB/Swiss-Prot) and unreviewed (UniProtKB/TrEMBL) entries?" on the [?]UniProt Homepage[?]:

  • Swiss-Prot, which is manually annotated and reviewed.
  • TrEMBL, which is automatically annotated and is not reviewed.

UniProt really is a combination of two resources: SwissProt and trEMBL.

SwissProt is a high quality, because highly curated, real protein database. In fact it is one of the oldest databases we have and it is maintained by real protein experts.

trEMBL on the other hand is not a database of real proteins at all. It is a database of translated nucleotide sequences from EMBL (hence trEMBL). These can very well not-exist in real biology or just be wrongly translated (miss an exon or whatever). The two were combined for practical reasons but it is very good to be aware of the difference.

ADD COMMENTlink modified 8.6 years ago • written 8.6 years ago by Chris Evelo10.0k

Thanks for your answer. I think I should fetch "Complete Proteome set" whenever availble for the organism. However The "complete proteome" contains only the canonical sequences and not all splice-variants. Is there any way to get all the splice variants ?

ADD REPLYlink written 8.6 years ago by Woa2.8k

When you go to download the FASTA (assuming that is what you are using), e.g. http://www.uniprot.org/uniprot/?query=organism%3a9606+keyword%3a181&format=*, you get a choice to download the canonical sequence data, or canonical and isoform sequence data. The latter presumably includes splice variants as separate protein entries.

ADD REPLYlink modified 6 months ago by RamRS26k • written 8.5 years ago by Craig30
1
gravatar for Larry_Parnell
8.6 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

What I would like to see is data that can link to mRNA isoforms. RefSeq allows this. GenBank would be noisy as Martijn says. The mRNA isoforms can be important because they are expressed to different levels according to cell type, temporal patterns (circadian, developmental), and responses to stimuli. These points could be quite critical to the design of the experiment whose data you'll now analyze or critical to the hypotheses addressed.

ADD COMMENTlink written 8.6 years ago by Larry_Parnell16k

Thanks!!I'll look into it

ADD REPLYlink written 8.6 years ago by Woa2.8k
0
gravatar for Craig
8.5 years ago by
Craig30
Craig30 wrote:

For mass spectrometry–based proteomics, the International Protein Index (IPI, http://www.ebi.ac.uk/IPI/IPIhelp.html) has been a popular choice for common organisms. For some reason they don't have yeast but Saccharomyces Genome Database (SGD, http://www.yeastgenome.org/) fills in nicely there. However, IPI is closing soon, and they recommend UniProt complete proteome sets (http://www.uniprot.org/faq/15) as a replacement. Overall, UniProt seems to provide good information for pretty much any organism, even if it doesn't have a complete proteome set yet, and it is definitely the most extensive, so I would recommend just going there for everything.

ADD COMMENTlink written 8.5 years ago by Craig30

Thanks Craig, can you tell me where NCBI NR stands compared to, say Uniprot? Is it less annonated and more redundant(even though they call it NR)?

ADD REPLYlink written 8.5 years ago by Woa2.8k

Unfortunately I have never used NR so I can't answer this question.

ADD REPLYlink written 8.5 years ago by Craig30

NCBI nr db for protein is explained here: http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=ProgSelectionGuide

ADD REPLYlink written 8.1 years ago by Lhl730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1268 users visited in the last hour