Question: Make matrix of protein pairwise identities/similarities from multiple protein sequences
0
gravatar for al-ash
15 months ago by
al-ash100
Japan/Okinawa/OIST
al-ash100 wrote:

Is there an already existing tool to generate a matrix of pairwise protein identities/similarities for an input which consists of multiple protein sequences?

I did not find a working solution for MAC OS/UNIX (the non-working solution for me is MatGAT for which I managed to find executables only for Windows OS).

I'm aware that parsing results from pairwise alignments of all pairwise combinations of proteins from the input file and arranging it into a table is one solution but I'm trying to avoid this at this point as it would take me, with my current skills, a lot of time to write such a script.

UPDATE To be more specific, I'm looking for % protein sequence identities from global sequence alignment (such as the % similarities/identities retrieved by https://www.ebi.ac.uk/Tools/psa/emboss_needle/)

ADD COMMENTlink modified 7 months ago • written 15 months ago by al-ash100
1
gravatar for Bill Pearson
15 months ago by
Bill Pearson860
Bill Pearson860 wrote:

Phylip uses its own special interleaved sequence alignment, which is definitely neither FASTA format nor CLUSTAL format, but you can find programs that will convert. Phylip format is well known and quite old (1980's).

The advantage of Phylip's protdist over clustal's is that it gives corrected (scaled) protein distances, not raw similarities/distances. As protein similarities go down, (< 50% identity, which is very high for proteins), the distances go up exponentially, so that a 50% identical sequence might have a distance of PAM70, while a 30% identical sequence could be PAM160, and 20% identity PAM250. protdist does the conversion from observed protein distance to corrected evolutionary distances, using one of several evolutionary models.

ADD COMMENTlink written 15 months ago by Bill Pearson860
0
gravatar for jrj.healey
15 months ago by
jrj.healey12k
United Kingdom
jrj.healey12k wrote:

Are you looking for something like a Position Specific Score Matrix? In which case, BioPython can build this for you already.

http://biopython.org/DIST/docs/api/Bio.Align.AlignInfo.PSSM-class.html

ADD COMMENTlink written 15 months ago by jrj.healey12k
0
gravatar for Bill Pearson
15 months ago by
Bill Pearson860
Bill Pearson860 wrote:

The Phylip program package (http://evolution.genetics.washington.edu/phylip/getme-new1.html), which uses an unfortunate format for multiple sequence alignment, includes "protdist", which does exactly what you want, and converts from observed distance to evolutionary distance.

ADD COMMENTlink written 15 months ago by Bill Pearson860

Not using Phylip before, I'm a bit confused by their documentation - according to http://evolution.genetics.washington.edu/phylip/doc/protdist.html the "program uses protein sequences" which would evoke to me, that the inout is multifasta, but actually it seems that the input is rather multiple alignment, according to what you wrote (?) and also I'm not sure, that the can be % identities and/or similarities (please see my updated question, I was apparently not clear enough).

ADD REPLYlink written 15 months ago by al-ash100
1

Clustal can report pairwise identities I believe, but it won’t write you a matrix, you’d still have to parse that out yourself.

ADD REPLYlink modified 15 months ago • written 15 months ago by jrj.healey12k

You are right! Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) gives directly sequence %identity matrix (Result Summary -> Percent Identity Matrix in the web interface).

ADD REPLYlink written 15 months ago by al-ash100

I take it all back then! I guess I was half right!

ADD REPLYlink written 15 months ago by jrj.healey12k
0
gravatar for al-ash
7 months ago by
al-ash100
Japan/Okinawa/OIST
al-ash100 wrote:

I ended up with the following command line solution using clustal omega which converts distance matrix to percent identity matrix:

clustalo-1.2.4-Ubuntu-x86_64 --full --percent-id --distmat-out=output.distmat -i input.aa.fa
ADD COMMENTlink modified 7 months ago • written 7 months ago by al-ash100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 911 users visited in the last hour