BLAST: how to print mismatch position?
1
0
Entering edit mode
8 months ago
restless.v2 ▴ 20

Hi, in brief i read BLAST manual and i did not find how to print mismatch positions. Does exist a cmd to do that? P.S. I mean exactly mismatch position (like A instead of T) not where query/subject alignment start and stop. for example: there is 1 mismatch at query position 21 --> reference base A / query base T

Thanks David

alignment snp • 623 views
0
Entering edit mode

Why do you need that information? I don't think that the BLAST algorithm really focuses on mismatch positions while looking at HSPs - in fact, I feel like identifying those positions and reporting them would make the algorithm significantly slower.

0
Entering edit mode

Hi RamRS, in BLAST output with come along with simple graphic of the alignment is represented the mismatch. I am looking a command to put this info in tabular file.

Otherwise you could suggest me another program? Thanks a lot

0
Entering edit mode

You're going to need to manually parse BLAST's simple graphic output (which is a pretty crazy thing to do), so I don't think BLAST is ideal. The closest I can think of too your requirement is a CIGAR string for an alignment, so I'd look for tools that do that. Read: https://jef.works/blog/2017/03/28/CIGAR-strings-for-dummies/

2
Entering edit mode

Dear RamRS,

your hint was very precious and pushed forward in my endevour into let BLAST works the way I want. I find in BLAST formatting options for output that can be selected a SAM format, then a CIGAR container! Wow!

It works!

This is my cmd line LOOK at -outfmt "17". It stands for SAM (from -help)

blastn -db NoV_varianti_GII4.fasta -query 2017_join.fasta -outfmt "17" -out 2017_result.txt -perc_identity 98 -subject_besthit -max_target_seqs 1 -num_threads 4

1
Entering edit mode

Excellent job and great effort! Congratulations on figuring out a solution.

1
Entering edit mode

While CIGAR string tells you what kind of change (insertion, match, mismatch, deletion) is at a position, it does not tell you the actual base change. Isn't that what you wanted?

0
Entering edit mode

Hi genomax, you are right, but in my pipeline what I am looking for is at first the position of the mismatch. This because in my alignment I have a virus variant reference library (228bp region) with 40 hot spot positions. Some sequence in library are so similar that differ for only 1 base. To understand if I can validate also an alignment with 1 or 2 or 3 (new variant or PCR bias?) mismatch I need to know at first where the mismatch happens. I need to do that because my analysis is on environmental sample (signal very low) and every read I can take home is good in particular for very low variants. In a second time I will make a reasoning about mismatch type and possible analysis like new virus variant detection. Thank you for your feedback. David

0
Entering edit mode

If you just need to know the first position of the change then you will certainly get that from CIGAR.

0
Entering edit mode

You may be able to do this using biopython's blast parser. Take a look at this page. All the way to the end is a section on HSP's. While this particular example is for blast (old version), the current blast+ parser in python has something analogous.

hsp.bits = 77.0
hsp.expect = 0.16
hsp.frame = ()
hsp.gaps = (5, 76)
hsp.identities = (22, 76)
hsp.match = '+G  K+P       AF V +  E HK    + S+  +E SK+   RWK+++  EK  F   A+  K  +  +   Y'
hsp.num_alignments = 1
hsp.positives = (35, 76)
hsp.query_start = 2
hsp.sbjct = 'EGHVKRPMN-----AFMVWSRGERHKLAQQNPSMQNTEISKQLGCRWKSLTEAEKRPFFQEAQRLKILHREKYPNY'
hsp.sbjct_start = 2
hsp.score = 34.399999999999999
hsp.strand = (None, None)

0
Entering edit mode

Hi genomax, thank your for answer. Now I know that if I can't print a result as i want from a program output I can parse what I can get at best. Thank you.

0
Entering edit mode
7 months ago

I wrote blastn2snp : https://lindenb.github.io/jvarkit/BlastNToSnp.html

\$  java -jar dist/blastn2snp.jar < blast.xml | column -t

#query              hit                                                                                              hit-index  hsp-index  query-POS  hit-POS    STRAND  REF(hit)  ALT(query)  blast.align_length  blast.hit.var  blast.query.var  blast.mid.var
No definition line  Homo sapiens chromosome 6, alternate assembly CHM1_1.1                                           1          9          21         74567818   -       T         A           18                  A              T                .
No definition line  Homo sapiens chromosome 6, alternate assembly HuRef                                              2          9          21         71600901   -       T         A           18                  A              T                .
No definition line  Homo sapiens chromosome 6, GRCh37.p13 Primary Assembly                                           3          9          21         74401398   -       T         A           18                  A              T                .
No definition line  Homo sapiens chromosome 5, alternate assembly CHM1_1.1                                           4          1          7          107821121  -       A         G           28                  T              C                .
No definition line  Homo sapiens chromosome 5, alternate assembly CHM1_1.1                                           4          9          16         14262358   +       G         C           18                  G              C                .

0
Entering edit mode

Pierre Lindenbaum : This is not identifying the first position (coordinate) in the reference where there is a change. This is what OP wants.