Question: Finding Proteins that have NO known Domains
gravatar for ddofer
6.5 years ago by
ddofer30 wrote:

I want to extract a list of proteins for a given organism that have no (high confidence) predicted domains by PFAM or the like. 

(Alternatively, getting a list of predictions for a list of proteins would also be good). 


I know HMMER and Pfam and the like (CCD-Hit) have various tools for searching for domains, but I don't know how to work with the emailed file outputs, and I'm specifically interested in just finding which proteins DON'T have predicted domains. 

Is there an easy/simple way to do this? (Even a tool with output that I can copy-paste into a text editor/excel and then filter the columns in it..)?





batch protein sequence pfam domain • 2.0k views
ADD COMMENTlink modified 6.5 years ago by Elisabeth Gasteiger1.8k • written 6.5 years ago by ddofer30

what is emailed file output? I think, after you blast against a domain database, all those sequences with no hits are considered as sequences without domains. Am I missing something?

ADD REPLYlink modified 10 months ago by RamRS30k • written 6.5 years ago by Prakki Rama2.4k

I was working then with the HMMER and/or PFAM search results, which are returned as a plaintext email. Yuch. 

That said, even with the offline tool, I don't know how to parse the command line output text properly, it just prints it onscreen. . 

ADD REPLYlink written 6.3 years ago by ddofer30
gravatar for Elisabeth Gasteiger
6.5 years ago by
Elisabeth Gasteiger1.8k wrote:

You could query the UniProt Knowledgebase for proteins with no cross-references to InterPro,

active:yes not database:interpro

ADD COMMENTlink written 6.5 years ago by Elisabeth Gasteiger1.8k

Interpro has many annotations though, not just domains...

(And I'm wokring on offline sequences which aren't necessarily in Uniprot; or even NCBI.

As for your approach on a database, Wouldn't i make more sense to just search for proteins with "NOT domain:*" ?  Your query has proteins with annotated domains right on the first page of results :P)

ADD REPLYlink written 6.3 years ago by ddofer30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1390 users visited in the last hour