Question

Searching through assemblies for specific gene/genecluster

0

Entering edit mode

4.3 years ago

anabaena ▴ 10

Hey all, I have a large amount of metagenomic bacterial assemblies (several thousand) that I need to sift through in order to fine a specific gene. I've tried diamond blastx and i need to make a db for every assembly file that I have, which is unfeasable. I was wondering if there was a way I could make a db in diamond from all the assemblies and then blast them, or a way to iterate over all assemblies and run diamond in python I was wondering if HMMER3 may be a good option to use as well, or any other programs that would be a good fit for what I am trying to do, thanks!

python metagenomics gene • 574 views

ADD COMMENT • link updated 4.3 years ago by Michael 54k • written 4.3 years ago by anabaena ▴ 10

score 2 · Accepted Answer · 2020-01-19

I think you could even do this with normal blast using tblastx, if it is just a single query anyway. The NT database is also simply a huge nucleotide collection, and works fine for a few query sequences.

and i need to make a db for every assembly file that I have, which is unfeasable.

Why infeasible? Simply concatenate all assembly fasta files into a huge one and run makeblastdb, or make a blastdb for each assembly file using makeblastdb and combine them using blastdb aliastool. I don't see any problem with doing that. When doing this in bash by typing cat *.fasta > big.fna, the only thing that could break is that the commandline becomes too long.