Question

Clustal Omega stand alone versus website percent identity matrix

0

Entering edit mode

6 months ago

sorrymouse ▴ 120

I am calculating the percent identity between hundreds of MSAs. I was using the clustal executable for mac to do so with the following code:

./clustal-omega-1.2.3-macosx -i jock.fasta --percent-id  --distmat-out jock.fasta.test -o jock.clustal --full

However, when I double check against the website the percent identity values are really different using the same sequence.

For example here is some output from the command line:

D_GUNUNGCOLA.Jockey    100.000000  89.201374 
D_elegans.Jockey                 89.201374 100.000000

And here it is from the website:

     1: D_elegans.Jockey                                        100.00   99.63 
     2: D_GUNUNGCOLA.Jockey                            99.63  100.00

This is just small example of a much larger pattern.

The sequences yielding this result are here: https://pastebin.com/wTrWYxqy

The -- are because these two sequences are part of a larger MSA.

Any insight is appreciated, I don't have time to enter each MSA into the website and wait for the results but it seems to be much more accurate? What I would like to do is recreate the website results on the command line.

clustal MSA identity percent • 1.1k views

ADD COMMENT • link 6 months ago by sorrymouse ▴ 120

0

Entering edit mode

Ok I keep trying to attach or copy the fasta files in here but I keep getting an error.......

ADD REPLY • link 6 months ago by sorrymouse ▴ 120

1

Entering edit mode

You can post the file at https://pastebin.com or via a GitHub gist. People sometimes share files from google drive etc also.

ADD REPLY • link 6 months ago by GenoMax 154k

0

Entering edit mode

ok, added. thank you!

ADD REPLY • link 6 months ago by sorrymouse ▴ 120

0

Entering edit mode

Are you using this web implementation: https://www.ebi.ac.uk/jdispatcher/msa/clustalo ? Are you using all the same options (click on more options link to check all other option values) with your command line version.

ADD REPLY • link 6 months ago by GenoMax 154k

score 2 · Accepted Answer · 2025-04-28

2

Entering edit mode

6 months ago

Mensur Dlakic ★ 30k

You don't have the most recent program version (1.2.3 vs 1.2.4). As already pointed out, it is possible that you are using different options locally.

I tested your sequences after I wrote the above paragraph, and also get the same result as you did locally. While everything I said above stands, it is possible that there is some kind of a bug here, as the identity between the two sequences is clearly >99% and is correctly reported by the web version.

ADD COMMENT • link 6 months ago by Mensur Dlakic ★ 30k

0

Entering edit mode

Original poster is using the pre-compiled binary for macOS which is still v.1.2.3. Web based version I linked above is using the latest i.e. 1.2.4.

ADD REPLY • link 6 months ago by GenoMax 154k

0

Entering edit mode

i'm sure that 10% in sequence divergence is whats different between 1.2.4 and 1.2.3........

ADD REPLY • link 6 months ago by sorrymouse ▴ 120