Is there any format specifier in BLAST+ which enables to get full length of fasta sequences (subject) in the output along with other specifiers?
3
0
Entering edit mode
8.4 years ago
Neha shri ▴ 30

Hi,

I am using blast+ and have a database of thousands of protein sequences. Could you please tell me how to get the entire length of every sequence in the output file along with other specifiers. I tried with sseqid but it gives out only the ids not the entire sequence.

sequence blast alignment • 2.1k views
ADD COMMENT
1
Entering edit mode
8.4 years ago
dschika ▴ 320

What about sseq (means Aligned part of subject sequence)? Have you checked the "Formatting options" in blast help?

ADD COMMENT
0
Entering edit mode

Thank you so much :) exactly what I was looking for.

ADD REPLY
1
Entering edit mode
8.4 years ago
Jenez ▴ 540

I'm not sure I follow exactly what it is you want to achieve...

Do you want to extract the sequences in fasta format or do you just want the lengths of the sequences?

If you want the fasta sequences, my recommendation would be to parse the output file and write the ID's of the sequences you want to extract to a new file. Then, you can use blastdbcmd with the ID list as input to extract all the sequences.

If you only want to have the lengths of the sequences displayed, you could use the flag -outfmt to format your output to include the information you want.

Example:

blastp -db <database> -query <input sequences> -out <output file> -outfmt '7 stitle sacc bitscore pident qlen SLEN'

slen (subject len) will list the length of the sequence your input has hit.

ADD COMMENT
0
Entering edit mode
8.4 years ago
5heikki 11k
blastp -help
ADD COMMENT

Login before adding your answer.

Traffic: 2628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6