Question: Searching through assemblies for specific gene/genecluster
0
gravatar for anabaena
9 months ago by
anabaena0
anabaena0 wrote:

Hey all, I have a large amount of metagenomic bacterial assemblies (several thousand) that I need to sift through in order to fine a specific gene. I've tried diamond blastx and i need to make a db for every assembly file that I have, which is unfeasable. I was wondering if there was a way I could make a db in diamond from all the assemblies and then blast them, or a way to iterate over all assemblies and run diamond in python I was wondering if HMMER3 may be a good option to use as well, or any other programs that would be a good fit for what I am trying to do, thanks!

metagenomics python gene • 186 views
ADD COMMENTlink modified 9 months ago by Michael Dondrup47k • written 9 months ago by anabaena0
2
gravatar for Michael Dondrup
9 months ago by
Bergen, Norway
Michael Dondrup47k wrote:

I think you could even do this with normal blast using tblastx, if it is just a single query anyway. The NT database is also simply a huge nucleotide collection, and works fine for a few query sequences.

and i need to make a db for every assembly file that I have, which is unfeasable.

Why infeasible? Simply concatenate all assembly fasta files into a huge one and run makeblastdb, or make a blastdb for each assembly file using makeblastdb and combine them using blastdb aliastool. I don't see any problem with doing that. When doing this in bash by typing cat *.fasta > big.fna, the only thing that could break is that the commandline becomes too long.

ADD COMMENTlink written 9 months ago by Michael Dondrup47k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 712 users visited in the last hour