Blast and sort contigs by species
1
0
Entering edit mode
8.1 years ago
spirowol • 0

Hello there I am trying to BLAST and sort thousands of contigs generated from my assemblies. The problem is that my target contigs belong to a bacteria and DNA I used for sequecing wasn't pure; instead I have a mixture of contigs from at least two different species and I'd like to separate them by species when are identified in BLAST. Do anybody did this before? Thanks

Assembly next-gen blast • 2.9k views
ADD COMMENT
1
Entering edit mode
8.1 years ago
pld 5.1k

You can set BLAST to output taxonomic IDs in hits and filter based on that.

If the contaminating species have sequenced genomes, you may want to filter reads mapping to those genomes out before running the assembly over again.

ADD COMMENT
0
Entering edit mode

Yes, I filtered already my reads to discard those from unwanted organisms but since the large amount of DNA belongs to a large eukaryotic organism (non sequenced yet) I still see host DNA and other bacterial contaminants (which reads I also filtered before). Output taxonomic returns the fasta sequences or just the BLAST ID results?

ADD REPLY
0
Entering edit mode

Using tabular format (6) or a few others,you can set blast to output the taxonomic IDs along with the standard fields (query id, subject id, etc). See the BLAST documentation for more detail.

If you want the full subject sequences, it would be fairly trivial to extract them from the database searched using blastdbcmd and a list of sequence IDs from your results.

ADD REPLY
0
Entering edit mode

I used -outfmt 6and I can have a list of my contigs that actually BLAST with the desired bacteria with all the IDs. But I want to recover my blasted contigs (query) not the subject sequences. The objective is to create two datasets of contigs one with the contaminant sequences and the other only made of contigs that belong to the target bacteria. The contigs belonging to the target bacteria will be used later for scaffolding and genome finishing

ADD REPLY
0
Entering edit mode

Taxids are not output by default, you'll need to add them to the output. Run blast, then split the BLAST results by taxid, those matching contaminating species and those not matching contaminants. After that use the query ID in those files to filter your contigs accordingly.

Another option, again assuming your blast database stores the IDs would be to use blastdbcmd to extract taxids for your hits, then map taxids against your contigs via this file and filter accordingly. This would avoid having to run BLAST over again if you've already run it and didn't collect taxids in your results.

ADD REPLY

Login before adding your answer.

Traffic: 1907 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6