I want to annotate my bacterial genomes and metagenomic samples from gut microflora using the Resfams database available from the Dantas lab website http://www.dantaslab.org/resfams but I am not sure how I can apply it to my samples. I am wondering if anyone knows of a script with commands to use the database. I have fastq and fasta files from raw reads and assembled sequences from both whole bacterial genomes as well as metagenomic samples.
To annotate your assembled contigs with the Resfams models, you'll need to download HMMER, and produce a 6-frame translation of your contigs, perhaps using a tool like MetaGeneMark. Use hmmsearch on your fasta of translated sequences and it will output motifs that match to the Resfams models.
Since you already have assembled contigs, your commands might look something like:
gmhmmp -m /path/to/MetaGeneMark_v1.mod -A path/to/output/protein.fasta assembled_contig.fasta cat path/to/output/protein.fasta | grep -v -e "^$" > path/to/clean_protein.fasta hmmsearch --tblout path/to/output.tblout.scan /path/to/resfams/models.hmm clean_protein.fasta > /dev/null
Note that the most relevant HMMER output is the tblout file. I usually redirect the full output to /dev/null to reduce clutter. You'll then have to parse the tblout file in whatever way is relevant to your interests. The intermediate step is to remove blank lines from the GeneMark output.