Hey all, I have a large amount of metagenomic bacterial assemblies (several thousand) that I need to sift through in order to fine a specific gene. I've tried diamond blastx and i need to make a db for every assembly file that I have, which is unfeasable. I was wondering if there was a way I could make a db in diamond from all the assemblies and then blast them, or a way to iterate over all assemblies and run diamond in python I was wondering if HMMER3 may be a good option to use as well, or any other programs that would be a good fit for what I am trying to do, thanks!
I think you could even do this with normal blast using tblastx, if it is just a single query anyway. The NT database is also simply a huge nucleotide collection, and works fine for a few query sequences.
and i need to make a db for every assembly file that I have, which is unfeasable.
Why infeasible? Simply concatenate all assembly fasta files into a huge one and run makeblastdb, or make a blastdb for each assembly file using makeblastdb and combine them using blastdb aliastool. I don't see any problem with doing that. When doing this in bash by typing
cat *.fasta > big.fna, the only thing that could break is that the commandline becomes too long.