Clustal Omega stand alone versus website percent identity matrix
1
0
Entering edit mode
4 months ago
sorrymouse ▴ 120

I am calculating the percent identity between hundreds of MSAs. I was using the clustal executable for mac to do so with the following code:

./clustal-omega-1.2.3-macosx -i jock.fasta --percent-id  --distmat-out jock.fasta.test -o jock.clustal --full

However, when I double check against the website the percent identity values are really different using the same sequence.

For example here is some output from the command line:

D_GUNUNGCOLA.Jockey    100.000000  89.201374 
D_elegans.Jockey                 89.201374 100.000000

And here it is from the website:

     1: D_elegans.Jockey                                        100.00   99.63 
     2: D_GUNUNGCOLA.Jockey                            99.63  100.00

This is just small example of a much larger pattern.

The sequences yielding this result are here: https://pastebin.com/wTrWYxqy

The -- are because these two sequences are part of a larger MSA.

Any insight is appreciated, I don't have time to enter each MSA into the website and wait for the results but it seems to be much more accurate? What I would like to do is recreate the website results on the command line.

clustal MSA identity percent • 897 views
ADD COMMENT
0
Entering edit mode

Ok I keep trying to attach or copy the fasta files in here but I keep getting an error.......

ADD REPLY
1
Entering edit mode

You can post the file at https://pastebin.com or via a GitHub gist. People sometimes share files from google drive etc also.

ADD REPLY
0
Entering edit mode

ok, added. thank you!

ADD REPLY
0
Entering edit mode

Are you using this web implementation: https://www.ebi.ac.uk/jdispatcher/msa/clustalo ? Are you using all the same options (click on more options link to check all other option values) with your command line version.

ADD REPLY
2
Entering edit mode
4 months ago
Mensur Dlakic ★ 30k

You don't have the most recent program version (1.2.3 vs 1.2.4). As already pointed out, it is possible that you are using different options locally.

I tested your sequences after I wrote the above paragraph, and also get the same result as you did locally. While everything I said above stands, it is possible that there is some kind of a bug here, as the identity between the two sequences is clearly >99% and is correctly reported by the web version.

ADD COMMENT
0
Entering edit mode

Original poster is using the pre-compiled binary for macOS which is still v.1.2.3. Web based version I linked above is using the latest i.e. 1.2.4.

ADD REPLY
0
Entering edit mode

i'm sure that 10% in sequence divergence is whats different between 1.2.4 and 1.2.3........

ADD REPLY

Login before adding your answer.

Traffic: 4421 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6