Question

Is there any format specifier in BLAST+ which enables to get full length of fasta sequences (subject) in the output along with other specifiers?

0

Entering edit mode

8.4 years ago

Neha shri ▴ 30

Hi,

I am using blast+ and have a database of thousands of protein sequences. Could you please tell me how to get the entire length of every sequence in the output file along with other specifiers. I tried with sseqid but it gives out only the ids not the entire sequence.

sequence blast alignment • 2.1k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.4 years ago by Neha shri ▴ 30

Ram · Answer 1 · 2015-12-02

1

Entering edit mode

8.4 years ago

dschika ▴ 320

What about sseq (means Aligned part of subject sequence)? Have you checked the "Formatting options" in blast help?

ADD COMMENT • link 8.4 years ago by dschika ▴ 320

0

Entering edit mode

Thank you so much :) exactly what I was looking for.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Neha shri ▴ 30

Ram · Answer 2 · 2015-12-02

I'm not sure I follow exactly what it is you want to achieve...

Do you want to extract the sequences in fasta format or do you just want the lengths of the sequences?

If you want the fasta sequences, my recommendation would be to parse the output file and write the ID's of the sequences you want to extract to a new file. Then, you can use blastdbcmd with the ID list as input to extract all the sequences.

If you only want to have the lengths of the sequences displayed, you could use the flag -outfmt to format your output to include the information you want.

Example:

blastp -db <database> -query <input sequences> -out <output file> -outfmt '7 stitle sacc bitscore pident qlen SLEN'

slen (subject len) will list the length of the sequence your input has hit.

Ram · Answer 3 · 2015-12-02

0

Entering edit mode

8.4 years ago

5heikki 11k

blastp -help

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by 5heikki 11k