I have around 20 actinobacteria genomes, isolated from species X gut, of decent quallity that I am minning for biosynthetic gene clusters (BGCs) using AntiSMASH and Prism. I was to cross reference BGCs I find with a large but highly fragmented metagenomic dataset of not great quallity, also isolated from species X gut, that we have. Thus the BGCs will not exist in full within the metagenome. What would be the best way to search for the BGCs i find in the actinobacteria genomes within the metagenome?
I am thinking using conserved regions of the BGCs would be best, but how do I determine a conserved region? Or would just searching for key genes of the BGC be a better approach?
Any advice would be awesome :) thank you!
would hmmscan be an adiquate tool for step 3 of searching the HMMs against predicted genes?
It depends on what exactly you wish to do. This is how I understand it:
hmmsearch
scores sequences against a single HMM;hmmscan
scores a database of HMMs against a single sequence;hmmpfam
scores multiple HMMs against multiple sequences. This post may help you decide.How would I go about making alignments a biosynthetic gene cluster and then make a HMM ? Or if you mean of only specific genes inside the BGC how sould I pick genes to use and then again I I am unsure what you mean by make alignments.