Searching through assemblies for specific gene/genecluster
1
0
Entering edit mode
4.3 years ago
anabaena ▴ 10

Hey all, I have a large amount of metagenomic bacterial assemblies (several thousand) that I need to sift through in order to fine a specific gene. I've tried diamond blastx and i need to make a db for every assembly file that I have, which is unfeasable. I was wondering if there was a way I could make a db in diamond from all the assemblies and then blast them, or a way to iterate over all assemblies and run diamond in python I was wondering if HMMER3 may be a good option to use as well, or any other programs that would be a good fit for what I am trying to do, thanks!

python metagenomics gene • 574 views
ADD COMMENT
2
Entering edit mode
4.3 years ago
Michael 54k

I think you could even do this with normal blast using tblastx, if it is just a single query anyway. The NT database is also simply a huge nucleotide collection, and works fine for a few query sequences.

and i need to make a db for every assembly file that I have, which is unfeasable.

Why infeasible? Simply concatenate all assembly fasta files into a huge one and run makeblastdb, or make a blastdb for each assembly file using makeblastdb and combine them using blastdb aliastool. I don't see any problem with doing that. When doing this in bash by typing cat *.fasta > big.fna, the only thing that could break is that the commandline becomes too long.

ADD COMMENT

Login before adding your answer.

Traffic: 2665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6