Edits based on additional data:
Revised my post based on the additional data you provided. In the new picture, column 4 is the total length of the sequence, and column 5 is the length of the mismatched sequence.
Looking at row 3, we have 88 mismatching positions of a total sequence length of 335. Thus, naively, a first guess could be:
pident0 = (335 - 88 / 335) = 247/335 = 73.7% (0)
Interestingly, equation 0 gives a value is very close to
pident2, but not identical to either. Now consider the gap open column. If you subtract that from the length of the matching sequence, you now have 245/335, which exactly equals 73.1% - the value of
pident1. Thus, we write equation 1 as follows:
pident1 = (Matching sequence length - Open Gap sequence) / total sequence length (1)
Because repeating the steps I outline for each other row I checked gives the value
pident1 in every case, I am fairly confident this is the correct answer to your question ... However, it does leave one remaining issue:
pident2 = ? (2)
Because the values of
pident2 are very close to the value of
pident1 in every row, its highly likely it is a very similar metric, but perhaps deals with a detail differently. For example,
pident2 may deal with
gapopen differently; or it may have to do with differences between q and s. I leave verification of which - and more importantly, consideration of what biological phenomena might be better be represented by
pident2 (or vice versa) - in your capable hands.
Initial post: (now deprecated)
First, I want to commend you for going to the docs, the github page, and the example before posting - you're setting a great example for others. One last place to check is the manuscript itself, but it does not define this term, I checked.
Anywho, in the documentation,
pident is represented as the percentage of all returned matches that are identical matches. However, it actually appears to be the ratio of identical matches to non-identical matches. I say this because, using your example:
- identical matches = 340
- non-identical matches = 1247
- all matches = 340 + 1247 = 1587
- pident (percentage identical) = 27.2
But, 340/1587 = 21.2% != 27.2% ... so it is evidently not (identical matches / all matches) x 100...
Rather, it appears to be
pident = 340/1247 (identical matches/non-identical matches) x 100 = 27.2%.
This is a bit odd because, in the event identical matches exceed non-identical,
pident will be > 100%. In the github page you linked above, the maintainers note the software is actively supported. As such, if this does not resolve the question, could be worth reaching out to them directly, but I think the above goes a long way.
Does that help?