Question

Performing Hmmer search against pfam versus UniRef100

0

Entering edit mode

4.7 years ago

baverso • 0

Hello, I'm using HMMER3 to perform a search of a single immunoglobulin sequence against a database, iteratively updating the HMM profile and seed family, by using the function jackhmmer.

My question is regarding search space: I could download any relevant database files from pfam (e.g. V-set).... But I already have UniRef100 on my machine.

Would there be any particular benefit to performing my search against V-set rather than UniRef100? I believe that all V-set sequences have been generated from UniProt in the first place, and therefore the V-set is a subset of UniRef100?

When it comes to searching sequence alignments, what is the benefit of using pfam or TIGRfam versus an entire database (other than computational speed time to reduced search space?)

I suppose I could run jackhmmer on V-set regardless, concatenate the alignments, and eliminate redundant sequences. Any suggestions? Thank you!

hmmer pfam alignment • 1.6k views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 4.7 years ago by baverso • 0

score 3 · Accepted Answer · 2019-08-22

It depends on what are your exact goals. V-set is a subset of UniRef100, but not necessarily the whole subset of V-set proteins from the present version of UniRef100. Current Pfam version is almost a year old, so its sequences are at most representative of a UniRef100 dataset from a year ago. If you are interested in a fairly comprehensive set of homologs for your protein, searching a subset of Pfam sequences will do the trick. You'd likely go to recent UniRef100 if you are interested in ALL available homologs.

At the risk of stating the obvious, also consider this: if your protein has more than one domain - something other than V-set - searching against the V-set sequences will not yield any matches for those additional domains.