Question: How do I get all sequences for one specific domain from Hmmer vs Pfam output that contains the results of many domains ?
0
gravatar for emilyc
12 days ago by
emilyc10
emilyc10 wrote:

Hello,

Is there a simple way of extracting all sequences for a specific domains from hmmer vs Pfam results ?

I have used esl-reformt to output the fasta containing the sequence for each domain found in my input, however it doesn't specify the domain of each sequence. Many domains were found in the original multi fasta file, but I just want the sequences corresponding to one to be extracted.

Thanks in advance for any advice,

Emily

pfam hmmer domain • 85 views
ADD COMMENTlink modified 12 days ago by Mensur Dlakic5.8k • written 12 days ago by emilyc10

not sure what or how you did it but if you run your proteins through InterPro you will get per input protein a list of all it matching domains.

From the output file you can then either grep on the a protein ID to get all its domains or grep a domainID to get all proteins that have that specific domain

ADD REPLYlink written 12 days ago by lieven.sterck8.0k
0
gravatar for Mensur Dlakic
12 days ago by
Mensur Dlakic5.8k
USA
Mensur Dlakic5.8k wrote:

There are at least two ways of doing this. The easier one is to go to Pfam directly and download aligned hits for the domain of interest. Let's say that you want hits for 7tm_1 or PF00001:

https://pfam.xfam.org/family/PF00001

Click on Alignments on the left-hand side, and select the database in the Format an alignment section. Let's say you want all UniProt members (~165K sequences) so click on a button that corresponds to it, select the alignment type and click on Generate to download. This works if you are OK with the database choices they have.

If you want to search a custom database, click on Curation & model and download a raw HMM for your domain - in this case PF00001.hmm. Now, do the hmmsearch (local HMMer installation required):

hmmsearch -E 0.001 --cpu 4 -A out_alignment.stk PF00001.hmm /path/to/your_database.fasta

In this case out_alignment.stk will contain a Stockholm alignment of all the hits, which esl-reformat can convert to other formats as needed. You may want to adjust the E-value threshold and the number of CPUs.

ADD COMMENTlink written 12 days ago by Mensur Dlakic5.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1541 users visited in the last hour