Find Proteins Containing A Specific Domain?
5
8
Entering edit mode
13.9 years ago
Rajarshi Guha ▴ 880

Hi, could somebody point me to a database that will allow me to identify proteins that contain a specific domain (say the PX domain)? I've looked at SMARTS and it seems to do the trick, but I can't see an easy way to download the results.

protein database • 6.5k views
ADD COMMENT
0
Entering edit mode

Thanks for the pointers everybody. Uniprot seems quite handy and good to see that I'll be able to do this via Bioconductor and BioMart

ADD REPLY
6
Entering edit mode
13.9 years ago

I would recommend Pfam or Interpro than SMART for protein domain related analysis.

My personal favorite is Pfam : Here is link to the Pfam page for PX domain Pfam reports that 2400 proteins have PX domains. You can download the sequences here

ADD COMMENT
4
Entering edit mode
13.9 years ago
Andrew Su 4.9k

I'd use BioMart. For example, once you determine that PX domain has an interpro ID of IPR001683, it's pretty easy to set up the query. You can see how I set it up using this link: http://bit.ly/aM1Ypw

ADD COMMENT
2
Entering edit mode
13.9 years ago

Hi,

It seems that swissprot can be of help here. I did a quick search using 'cytochrome c' and followed the link in the 'domain' section down there to EMBL-EBI and here is what I get:

Domain results

The page provides me with a list of other proteins containing this domain.

Does this help you? What else could you need?

Cheers

ADD COMMENT
2
Entering edit mode
13.9 years ago
Neilfws 49k

All of the suggestions so far are good. My suggestion is to use UniProt.

Simply type "PX" in the query field at the top of the page. On the results page, you'll see a link that begins: "Restrict term "px" to domain (235)...". Click this and you'll see how to formulate this query in the search form - "domain:px".

The "Download" link, top right of page, offers download in multiple formats: sequences, ID list, XML, delimited, spreadsheet and so on.

ADD COMMENT
0
Entering edit mode

Neil : AFAIK, Pfam is using sequence from Uniprot running hmmpfam on sequence and assign domains to sequence. But here it's only 235 domains reported in Uniprot, but 2400 (from full sequence in Pfam) domains reported in Pfam. Any idea why such a huge difference between them ?

ADD REPLY
0
Entering edit mode

None at all! A more stringent filter at UniProt perhaps? Will have to look into that.

ADD REPLY
1
Entering edit mode
13.9 years ago

At the NCBI searching for PX with eutilities returns 118 records:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=cdd&term=PX

from one of those ID (e.g. 154983 ) you can retrieve all the related proteins with ELink:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=cdd&db=protein&id=154983

here, the first protein is http://www.ncbi.nlm.nih.gov/protein/241260146

ADD COMMENT

Login before adding your answer.

Traffic: 3310 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6