full sequences from command line tblastn
1
0
Entering edit mode
7.2 years ago
peachila • 0

Hello,

I am a new user of command line blast. I am using a protein sequence query to search through a DNA database I created with makeblastdb. I am getting appropriate results and all is well but I cannot seem to be able to get a fasta file with the complete sequences of the results.

To make clear, I am wanting one file with information such as the e-value and score in tab format (which I am able to get) and in addition, a fasta file with the complete sequences of the resulted accession numbers. If possible I'd want the translated sequence, in amino acids and not DNA.

my command looks like this: tblastn -query query.fasta -db blastdatabase -outfmt 6 -num_threads 3 -max_target_seqs 2000 -out tblastn_DB.tab

I know it's a simple question but I have not been able to solve it looking in the NCBI BLAST command line cookbook.

Thank you very much!

blast tblastn command-line blast output format • 5.7k views
ADD COMMENT
1
Entering edit mode

I like @cschu1981 answer. Translating will be a little more difficult, unless they are an ORF, since you won't know which frame to translate in. However, you can look into EMBOSS transeq for translating your sequences. Did you get your DNA db from a public domain? Perhaps there is already a protein file you can cross-reference your db ids.

ADD REPLY
2
Entering edit mode
7.2 years ago
cschu181 ★ 2.8k

If you created your blast database with the -parse_seqids option, then it should be quite easy.

for id in $(cut -f 2 tblastn_DB.tab); do
 blastdbcmd -entry $id -db blastdatabase >> results.fa
done

This assumes -outfmt 6 without custom fields (i.e. subject id is in field 2) as you state in your question.

ADD COMMENT

Login before adding your answer.

Traffic: 1835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6