I constructed this type of file by making an alignment between two isolates of the same organism by aligning the protein coding sequences through the following command line with STAND ALONE BLAST:
blastn -query fasta1.fasta -subject fasta2.fasta -dust no -parse_deflines -evalue 1e-10 -max_target_seqs 1 -out BTOP
I got returned a text file like this :
Query= Sequence_1 Length=6624 Score E Sequences producing significant alignments: (Bits) Value Sequence_5 1528 0.0 >Sequence_5 Length=6645 Score = 1528 bits (827), Expect = 0.0 Identities = 943/1000 (94%), Gaps = 3/1000 (0%) Strand=Plus/Plus Query 5326 ACCATCCCTTTTGGTATTGCTTTCGCTTTAGGATCTATTGCTTTTTTATTTTTGAAGAAA 5385 |||||||| ||||| || ||| | || ||| || || | |||||||||||||||||| Sbjct 5227 ACCATCCCCTTTGGAATAGCTATTGCGTTAACTTCGATAGTGTTTTTATTTTTGAAGAAA 5286 Query 5386 AAAACCAAATCTACTATTGATCTTTTGCGTGTTATTAATATCCCCAAAAGTGATTATGAT 5445 |||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||| Sbjct 5287 AAAACCAAATCTACTATTGATCTTTTGCGTGTCATTAATATCCCCAAAAGTGATTATGAT 5346 Query 5446 ATACCGACAAAACTTTCACCCAATAGATATATACCTTATACTAGTGGTAAATACAGAGGC 5505 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 5347 ATACCGACAAAACTTTCACCCAATAGATATATACCTTATACTAGTGGTAAATACAGAGGC 5406 Query 5506 AAACGGTACATTTACCTTGAAGGAGATAGTGGAACTGATAGTGGTTACACCGATCATTAT 5565 ||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||| Sbjct 5407 AAACGGTACATTTACCTTGAAGGAGATAGTGGAACAGATAGTGGTTACACCGATCATTAT 5466
Unfortunately the format is lost a bit by copying and pasting from the original file but I think it is clear that "|" and "-" have a clear interpretation. So how can I analyze this type of file? In the NCBI page from which I got the command line they talk about Trace-back operations (BTOP). I have little experience with this type of file and I wanted to understand what kind of format it was and how to read it. On the NCBI page (https://www.ncbi.nlm.nih.gov/books/NBK279682/) they talk about SAM files, so a simple reading through a function suitable for parsing these files would be fine? Thanks in advance.
PS: if needed I can upload a partial and masked ID file since the data is sensitive data.