Question: Downloading Interpro Matches For All Proteins For A Given Organism/Species
3
gravatar for Arjun Krishnan
8.3 years ago by
United States
Arjun Krishnan40 wrote:

Hi, I wish to download all the InterPro matches for one species, say humans. So, I'm looking for a flat-file data with the following information:
<UniProt.ID> <Signature.Accession> <Signature.Database> <InterPro.Entry.ID> <InterPro.Entry.Type>

I have already tried InterPro BioMart (query below), but it turns out to be too slow. Or, if a HTML sample table is returned for viewing in 'Results', when I choose file download and wait, it stops after some time (0.5h to 1hr) with a blank page or an error message.

Dataset
- Protein Matches
Filters
- NCBI Taxonomy ID (Includes 'children') : [ID-list specified]
- Match Status : T
- InterPro Entry Type : Active_site,Binding_site,Conserved_site,Domain,Family,PTM,Repeat
Attributes
- UniProtKB Protein Accession
- Signature Accession
- Source Signature Database
- InterPro Entry ID
- InterPro Entry Type

Does anyone know where/how I can download InterPro data one species at a time?

Is there no other way (apart from BioMart) other than to download and parse the entire set of InterPro matches from their FTP site? By the way, if that is the case, which file is to be used, interpro.xml or match_complete.xml?

database protein biomart • 2.5k views
ADD COMMENTlink written 8.3 years ago by Arjun Krishnan40
6
gravatar for Khader Shameer
8.3 years ago by
Manhattan, NY
Khader Shameer18k wrote:

Have you tried Uniprot for this ?

Use the following steps to download human proteome and necessary annotations

  1. Go http://www.uniprot.org/taxonomy/9606
  2. Click on Complete Proteome Set
  3. Click on "Customize" button next to Results
  4. Add/Remove required fields
  5. Once you are done, click on "Download" on top right
  6. You can download the annotations in different formats (Tab-Delimited, Excel, FASTA, GFF, Flat Text, XML, RDF/XML, List of accesssions)
ADD COMMENTlink modified 8.3 years ago • written 8.3 years ago by Khader Shameer18k
1

Pays to know your taxonomy IDs. They're a useful start point at both Uniprot and NCBI.

ADD REPLYlink written 8.3 years ago by Neilfws48k

Wow, I did not realize that UniProt does such a wonderful job of brining data seamlessly together for query and download! Thanks for your suggestion.

ADD REPLYlink written 8.3 years ago by Arjun Krishnan40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1028 users visited in the last hour