nhmmer is the DNA pipeline version of hmmsearch within the HMMER software. Recently, I've been comparing two paired-end Illumina libraries with the following:
- bwa aln, forward/reverse trimmed reads to database of known reference genes
- Parsing the SAM file for mapped reads (samtools -F 4) and analyzing those
I ran the same raw read files through nhmmer and analyzed the intersection of the alignment and HMMER pipelines. When I look at reads that aligned but were not called by HMMER, bit 16 (reverse strand) is set on virtually all of them. Additionally, when the read information is pulled from the SAM file (where it has been appropriately reverse complemented) and re-run through HMMER, nhmmer does find ~70% of them.
So it appears that HMMER doesn't consider the reverse complement of the database being queried, though it does consider the reverse orientation when the query is of the same strandedness with respect to the database. This is in contrast to the manual, which claims that it does search both strands of the database, but this doesn't explain the findings here.
Has anyone had similar experiences, or does someone have inside knowledge on how or if nhmmer considers reverse complements?