Question : How to read this kind of Stand Alone BLAST Formatted File?
0
0
Entering edit mode
3.3 years ago
Firingam ▴ 30

I constructed this type of file by making an alignment between two isolates of the same organism by aligning the protein coding sequences through the following command line with STAND ALONE BLAST:

blastn -query fasta1.fasta -subject fasta2.fasta -dust no -parse_deflines -evalue 1e-10 -max_target_seqs 1 -out BTOP

I got returned a text file like this :

Query= Sequence_1

Length=6624
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

Sequence_5                                             1528    0.0  


>Sequence_5
Length=6645

 Score = 1528 bits (827),  Expect = 0.0
 Identities = 943/1000 (94%), Gaps = 3/1000 (0%)
 Strand=Plus/Plus

Query  5326  ACCATCCCTTTTGGTATTGCTTTCGCTTTAGGATCTATTGCTTTTTTATTTTTGAAGAAA  5385
             |||||||| ||||| || ||| | || |||   || || |  ||||||||||||||||||
Sbjct  5227  ACCATCCCCTTTGGAATAGCTATTGCGTTAACTTCGATAGTGTTTTTATTTTTGAAGAAA  5286

Query  5386  AAAACCAAATCTACTATTGATCTTTTGCGTGTTATTAATATCCCCAAAAGTGATTATGAT  5445
             |||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||
Sbjct  5287  AAAACCAAATCTACTATTGATCTTTTGCGTGTCATTAATATCCCCAAAAGTGATTATGAT  5346

Query  5446  ATACCGACAAAACTTTCACCCAATAGATATATACCTTATACTAGTGGTAAATACAGAGGC  5505
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  5347  ATACCGACAAAACTTTCACCCAATAGATATATACCTTATACTAGTGGTAAATACAGAGGC  5406

Query  5506  AAACGGTACATTTACCTTGAAGGAGATAGTGGAACTGATAGTGGTTACACCGATCATTAT  5565
             ||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||
Sbjct  5407  AAACGGTACATTTACCTTGAAGGAGATAGTGGAACAGATAGTGGTTACACCGATCATTAT  5466

Unfortunately the format is lost a bit by copying and pasting from the original file but I think it is clear that "|" and "-" have a clear interpretation. So how can I analyze this type of file? In the NCBI page from which I got the command line they talk about Trace-back operations (BTOP). I have little experience with this type of file and I wanted to understand what kind of format it was and how to read it. On the NCBI page (https://www.ncbi.nlm.nih.gov/books/NBK279682/) they talk about SAM files, so a simple reading through a function suitable for parsing these files would be fine? Thanks in advance.

PS: if needed I can upload a partial and masked ID file since the data is sensitive data.

BLAST • 990 views
ADD COMMENT
1
Entering edit mode

Please clearly explain what you want to achieve or what your exact issue is.

what you see here is the default blast output (== alignment output). It shows the start and end coordinates of all alignable sequence parts from the sequences you provided as input. In between those coordinates the alignment itself is depicked (where | stands for a match for instance)

What you are referring to is the sam-like output that blast can also produce if you specifically ask for it. That unfortunately you did not, the command on the NCBI helppage is different from what you executed. To get the sam output you need to add -outfmt "6 qseqid sseqid btop" . what you added the -out btop will only redirect the output to a file called btop

ADD REPLY
0
Entering edit mode

Yes I know for output. So it is a SAM output that is returned. I'll try with MATLAB samread and see what comes out!

ADD REPLY
0
Entering edit mode

No, it is not.

What it returned, as in the result you posted, is the default blast output (== Pairwise output , -outfmt 0 ) , NOT sam format. to get sam output you need to add the -outfmt parameter and options as I indicated above.

ADD REPLY
0
Entering edit mode

From what I read from https://open.oregonstate.education/computationalbiology/chapter/command-line-blast/ the Pairwise output is not parseable . Is it correct?

ADD REPLY
1
Entering edit mode

goh, not parseable is a bit strong I would say, but it clearly is difficult and perhaps not even advised to start doing it.

Especially since there are much more suited output formats, that do allow easy parsing. The tabular being the most obvious one (-outfmt 6 or 7 ) also XML output is somewhat parseable. And I guess also the sam like output is parseable (though I never used that myself )

ADD REPLY
1
Entering edit mode

It is not easily parsable but one could. You would want to try -outfmt 6 (or 7) (LINK). If you need SAM format output then magicblast (LINK) is what you want.

ADD REPLY
0
Entering edit mode

So how can I analyze this type of file?

what does it mean ?

ADD REPLY
0
Entering edit mode

For example, if I take the MATLAB swalign (https://it.mathworks.com/help/bioinfo/ref/swalign.html) function this returns me a structure where for each alignment I have, separated, the scores and the alignment. I would like to understand what kind of format this is returned to me in order to choose a function or strategy to import (so I would like to do the reverse) the data into my development environment.

ADD REPLY

Login before adding your answer.

Traffic: 1391 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6