Protein Domains
2
1
Entering edit mode
10.7 years ago

Let's say I have a set of UCSC names of genes.

Is there a database (or databases) I can query through a RESTful request (or something else I can automate on the command line), that would tell me what protein domains are contained in translated genes, such that I could categorize them?

I don't know much about protein databases, so I apologize in advance for what will likely be a n00b question. Thanks for your help.

protein domain • 2.6k views
ADD COMMENT
0
Entering edit mode

An easiest way would be to use Biomart. When you select for Attributes, you get an option of choosing information related to Protein Domains and Families. You can then download domain related information from there. They have PFAM, Interpro, TIGRFam data.

ADD REPLY
0
Entering edit mode
10.7 years ago
Benjamin • 0

You could use NCBI's RPS/PSI blast against their Conserved Domain Database (CDD) database. I don't believe they have a restful interface, but they do have a web/CGI interface. In addition you are free to download the entire CDD and run your RPS blast locally. This is what I would recommend that you do.

However, they do have a sample Perl script available that demonstrates how to automate the search.

They have a collection of internal (CD) data along with TIGR, Pfam, COGs, and more.

ADD COMMENT
0
Entering edit mode

Thanks, I'll investigate the CGI interface. I can send forms programmatically with curl, wget, etc.

ADD REPLY
0
Entering edit mode
10.7 years ago

You could also try the UniProt REST interface: http://www.uniprot.org/faq/28

You might use the gff format, e.g. for human gene alarp4b: http://www.uniprot.org/uniprot/?query=gene%3alarp4b+reviewed%3ayes+organism%3a9606&format=gff

ADD COMMENT

Login before adding your answer.

Traffic: 1452 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6