Speed of hmmsearch
6.3 years ago
kentnf ▴ 10

Hi,

I am using hmmbuild to build a HMM with 89 domains, and searching all arabidopsis proteins against the 89 domains using hmmsearch :

hmmsearch -Z 2000000 --domZ 89 --cpu 20 -o output.hmmsearch.txt stockholm89.hmm at_pep

It just cost less than 2 minutes to finish the search.

To speed this search, only 1300 interesting proteins were selected to perform the searching using the same command. But it takes about 20minutes.

I use the latest version of HMMER. Does any known the problem about it? Is it a bug for the hmmsearch program?

Thanks

software error • 2.4k views
Are you saying that this took 2 minutes:

hmmsearch -Z 2000000 --domZ 89 --cpu 20 -o output.hmmsearch.txt stockholm89.hmm at_pep_N_seqs


While this took 20 minutes:

hmmsearch -Z 2000000 --domZ 89 --cpu 20 -o output.hmmsearch.txt stockholm89.hmm at_pep_1300_seqs


While at_pep_1300_seqs is a subset of 1,300 sequences from X sequences of at_pep_N_seqs? If yes, that sounds really weird.

BTW, if you have enough RAM and fast I/O then the below can be a lot faster than what you're doing. Split the input file into 20 parts and then (GNU parallel has to be in $PATH): function hmmer() { n=$(basename "$1") hmmsearch -Z 2000000 --domZ 89 --cpu 1 -o$1.output.hmmsearch.txt stockholm89.hmm \$1
}

export -f hmmer
find /where/the/split/files/are/ -maxdepth 1 -type f -name "*specific2splitFiles" | parallel -j 20 hmmer {}