How To Get Fasta Source File From Blast Csv Format?
2
0
Entering edit mode
12.2 years ago

I am currently writing a library that uses the -outfmt 10 option of Blast, which give you a CSV instead of the pretty human readable format.

Like

tblastn -db dmel_a -query somequery.faa -outfmt 10

The problem is, I want to access the db source file so I can extract some sequences after processing. The only way I know how to do this, is to use the remove -outfmt 10 and run the blast twice. Then I parse the human readable output for the line that says:

Database: Source.fas

This only works if the user does not use the title option in makeblastdb. The stitle option of outfmt 10 gives the fasta header line. I cannot just look for the database name and then a .fna, .fas, .faa because you can name the database differently than the source file.

Is there another way to extract the fasta source file from the blast database name? I do not see one in the list of outfmt options. Or am I blind today?

fasta blast blast-plus • 3.6k views
ADD COMMENT
2
Entering edit mode
12.2 years ago
Michael 56k

Not so sure, what you exactly want to do, but blastdbcmd from blast+ has the entry and entry_batch option to extract sequences. The default output format is fasta and can be controlled with -outfmt. If you want to extract only sequences with hits, then you can extract the id column from your blast output using unix cmd cut and use that as input to -entry_batch (one identifier per line).

ADD COMMENT
1
Entering edit mode
12.2 years ago

Take a look at the blastdbcmd tool (part of the blast+ suite).

ADD COMMENT

Login before adding your answer.

Traffic: 3665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6