Question: Finding Proteins that have NO known Domains
1
gravatar for ddofer
5.7 years ago by
ddofer30
Israel
ddofer30 wrote:

I want to extract a list of proteins for a given organism that have no (high confidence) predicted domains by PFAM or the like. 

(Alternatively, getting a list of predictions for a list of proteins would also be good). 

 

I know HMMER and Pfam and the like (CCD-Hit) have various tools for searching for domains, but I don't know how to work with the emailed file outputs, and I'm specifically interested in just finding which proteins DON'T have predicted domains. 

Is there an easy/simple way to do this? (Even a tool with output that I can copy-paste into a text editor/excel and then filter the columns in it..)?

 

Thanks! 

 

 

batch protein sequence pfam domain • 1.8k views
ADD COMMENTlink modified 5.7 years ago by Elisabeth Gasteiger1.7k • written 5.7 years ago by ddofer30

what is emailed file output? I think, after you blast against a domain database, all those sequences with no hits are considered as sequences without domains. Am I missing something?

ADD REPLYlink modified 6 weeks ago by RamRS25k • written 5.7 years ago by Prakki Rama2.4k

I was working then with the HMMER and/or PFAM search results, which are returned as a plaintext email. Yuch. 

That said, even with the offline tool, I don't know how to parse the command line output text properly, it just prints it onscreen. . 

ADD REPLYlink written 5.6 years ago by ddofer30
2
gravatar for Elisabeth Gasteiger
5.7 years ago by
Geneva
Elisabeth Gasteiger1.7k wrote:

You could query the UniProt Knowledgebase for proteins with no cross-references to InterPro,

active:yes not database:interpro

http://www.uniprot.org/uniprot/?query=+active%3Ayes+not+database%3Ainterpro&sort=score

ADD COMMENTlink written 5.7 years ago by Elisabeth Gasteiger1.7k

Interpro has many annotations though, not just domains...

(And I'm wokring on offline sequences which aren't necessarily in Uniprot; or even NCBI.

As for your approach on a database, Wouldn't i make more sense to just search for proteins with "NOT domain:*" ?  Your query has proteins with annotated domains right on the first page of results :P)

ADD REPLYlink written 5.6 years ago by ddofer30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1892 users visited in the last hour