Hi, I am trying to find a way to parse an .XML file generated by a blastp to search for gaps (mismatches, indels, etc). I need to find the position of the gap in the alignment and also the corresponding bases in both the query and subject sequences, such as A123B --> A is the amino acid in the subject, 123 is the position and B is the amino acid in the query sequence. Is there a package or script that already does this? I managed to find something similar on jvarkit (https://github.com/lindenb/jvarkit/blob/master/src/main/java/com/github/lindenb/jvarkit/tools/blast/BlastNToSnp.java) but it only works for blastn, and my java knowledge is too basic to adapt this script to work with blastp data.
There are hundreds of parsers for blast-reports. Which language are you working with?. If its a popular one, you will find a parse-library. And if you want to write your own parser I can recommend Antlr ;D