Pfam - get only one representative fasta sequence per family
2
1
Entering edit mode
2.7 years ago
Xylanaser ▴ 80

Hey

can u help me with getting only one representative fasta sequence per family? Is there way to simply do that?

cheers

X

pfam fasta protein • 858 views
ADD COMMENT
2
Entering edit mode
2.7 years ago
jgreener ▴ 390

It's not trivial. You could use the sequences from the trRosetta Pfam model set, which are representative of the family (download link).

We have a method for getting representative sequences in our paper if you are comfortable with using hmmsearch:

A representative target sequence was found for each family using hmmsearch to search the UniRef90 database with the Pfam HMM and taking the closest subsequence match by E-value.

ADD COMMENT
0
Entering edit mode

thanks for idea :)

ADD REPLY
0
Entering edit mode
2.6 years ago
Mensur Dlakic ★ 27k

hmmemit from the HHMer package will extract a consensus sequence for each HMM from Pfam:

hmmemit -c -o model.fasta model.hmm

Not only is it fast - can be done for the whole Pfam in under 2 minutes - but it is also objective because it gets the sequence directly from the model based on a simple majority rule. Keep in mind that consensus sequences generated this way may not exist in nature, although there will always be some real sequences that are very similar.

ADD COMMENT

Login before adding your answer.

Traffic: 1679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6