I am calculating the percent identity between hundreds of MSAs. I was using the clustal executable for mac to do so with the following code:
./clustal-omega-1.2.3-macosx -i jock.fasta --percent-id --distmat-out jock.fasta.test -o jock.clustal --full
However, when I double check against the website the percent identity values are really different using the same sequence.
For example here is some output from the command line:
D_GUNUNGCOLA.Jockey 100.000000 89.201374
D_elegans.Jockey 89.201374 100.000000
And here it is from the website:
1: D_elegans.Jockey 100.00 99.63
2: D_GUNUNGCOLA.Jockey 99.63 100.00
This is just small example of a much larger pattern.
The sequences yielding this result are here: https://pastebin.com/wTrWYxqy
The -- are because these two sequences are part of a larger MSA.
Any insight is appreciated, I don't have time to enter each MSA into the website and wait for the results but it seems to be much more accurate? What I would like to do is recreate the website results on the command line.
Ok I keep trying to attach or copy the fasta files in here but I keep getting an error.......
You can post the file at https://pastebin.com or via a GitHub
gist
. People sometimes share files from google drive etc also.ok, added. thank you!
Are you using this web implementation: https://www.ebi.ac.uk/jdispatcher/msa/clustalo ? Are you using all the same options (click on
more options
link to check all other option values) with your command line version.