Question: Downloading Human And Other Completely Sequenced Proteomes To Search Homologs
2
gravatar for Pappu
5.0 years ago by
Pappu1.9k
Pappu1.9k wrote:

I want to download the human and other completely sequenced proteomes in order to search for homologs. A uniprot search results in ~136500 sequences in case of human:

http://www.uniprot.org/uniprot/?query=taxonomy%3A9606&sort=score

Searching for a protein sequence among these sequences yields too many homologs in human which is impossible. CD-HIT filtering by 90% sequence identity does not not reduce the number of hits much. The reviewed ~20000 entries in case of human do not include all the human proteins. I am wondering if Ensembl would be a better choice.

uniprot • 1.4k views
ADD COMMENTlink modified 5.0 years ago by Biojl1.6k • written 5.0 years ago by Pappu1.9k
2
gravatar for Elisabeth Gasteiger
5.0 years ago by
Geneva
Elisabeth Gasteiger1.6k wrote:

See also this FAQ: What is the human complete proteome? http://www.uniprot.org/faq/48

ADD COMMENTlink written 5.0 years ago by Elisabeth Gasteiger1.6k
0
gravatar for hpmcwill
5.0 years ago by
hpmcwill1.1k
United Kingdom
hpmcwill1.1k wrote:

See the UniProt complete and reference proteome sets for a more appropriate set for this kind of search. While UniProtKB contains 136,536 entries describing human proteins, the corresponding reference proteome set contains 68,756 entries (see http://www.uniprot.org/taxonomy/9606).

ADD COMMENTlink written 5.0 years ago by hpmcwill1.1k

I am aware of that. As far as I know human has <30k protein sequences excluding alternative splicing. Ensembl seem to have ~100k human CDS.

ADD REPLYlink written 5.0 years ago by Pappu1.9k
0
gravatar for Biojl
5.0 years ago by
Biojl1.6k
Barcelona
Biojl1.6k wrote:

You can download that data from Ensembl. Take into account the transcript_biotype or gene_biotype tag. For human if you select only gene_biotype=protein_coding you'll end up with 22.836 transcripts in version 75 (biomart).

ADD COMMENTlink written 5.0 years ago by Biojl1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1496 users visited in the last hour