Hi, I wish to download all the InterPro matches for one species, say humans. So, I'm looking for a flat-file data with the following information:
<UniProt.ID> <Signature.Accession> <Signature.Database> <InterPro.Entry.ID> <InterPro.Entry.Type>
I have already tried InterPro BioMart (query below), but it turns out to be too slow. Or, if a HTML sample table is returned for viewing in 'Results', when I choose file download and wait, it stops after some time (0.5h to 1hr) with a blank page or an error message.
Dataset
- Protein Matches
Filters
- NCBI Taxonomy ID (Includes 'children') : [ID-list specified]
- Match Status : T
- InterPro Entry Type : Active_site,Binding_site,Conserved_site,Domain,Family,PTM,Repeat
Attributes
- UniProtKB Protein Accession
- Signature Accession
- Source Signature Database
- InterPro Entry ID
- InterPro Entry Type
Does anyone know where/how I can download InterPro data one species at a time?
Is there no other way (apart from BioMart) other than to download and parse the entire set of InterPro matches from their FTP site? By the way, if that is the case, which file is to be used, interpro.xml or match_complete.xml?
Pays to know your taxonomy IDs. They're a useful start point at both Uniprot and NCBI.
Wow, I did not realize that UniProt does such a wonderful job of brining data seamlessly together for query and download! Thanks for your suggestion.