Question: Make matrix of protein pairwise identities/similarities from multiple protein sequences
0
gravatar for al-ash
2.2 years ago by
al-ash140
Japan/Okinawa/OIST
al-ash140 wrote:

Is there an already existing tool to generate a matrix of pairwise protein identities/similarities for an input which consists of multiple protein sequences?

I did not find a working solution for MAC OS/UNIX (the non-working solution for me is MatGAT for which I managed to find executables only for Windows OS).

I'm aware that parsing results from pairwise alignments of all pairwise combinations of proteins from the input file and arranging it into a table is one solution but I'm trying to avoid this at this point as it would take me, with my current skills, a lot of time to write such a script.

UPDATE To be more specific, I'm looking for % protein sequence identities from global sequence alignment (such as the % similarities/identities retrieved by https://www.ebi.ac.uk/Tools/psa/emboss_needle/)

ADD COMMENTlink modified 19 months ago • written 2.2 years ago by al-ash140
1
gravatar for Bill Pearson
2.2 years ago by
Bill Pearson860
Bill Pearson860 wrote:

Phylip uses its own special interleaved sequence alignment, which is definitely neither FASTA format nor CLUSTAL format, but you can find programs that will convert. Phylip format is well known and quite old (1980's).

The advantage of Phylip's protdist over clustal's is that it gives corrected (scaled) protein distances, not raw similarities/distances. As protein similarities go down, (< 50% identity, which is very high for proteins), the distances go up exponentially, so that a 50% identical sequence might have a distance of PAM70, while a 30% identical sequence could be PAM160, and 20% identity PAM250. protdist does the conversion from observed protein distance to corrected evolutionary distances, using one of several evolutionary models.

ADD COMMENTlink written 2.2 years ago by Bill Pearson860
0
gravatar for Joe
2.2 years ago by
Joe16k
United Kingdom
Joe16k wrote:

Are you looking for something like a Position Specific Score Matrix? In which case, BioPython can build this for you already.

http://biopython.org/DIST/docs/api/Bio.Align.AlignInfo.PSSM-class.html

ADD COMMENTlink written 2.2 years ago by Joe16k
0
gravatar for Bill Pearson
2.2 years ago by
Bill Pearson860
Bill Pearson860 wrote:

The Phylip program package (http://evolution.genetics.washington.edu/phylip/getme-new1.html), which uses an unfortunate format for multiple sequence alignment, includes "protdist", which does exactly what you want, and converts from observed distance to evolutionary distance.

ADD COMMENTlink written 2.2 years ago by Bill Pearson860

Not using Phylip before, I'm a bit confused by their documentation - according to http://evolution.genetics.washington.edu/phylip/doc/protdist.html the "program uses protein sequences" which would evoke to me, that the inout is multifasta, but actually it seems that the input is rather multiple alignment, according to what you wrote (?) and also I'm not sure, that the can be % identities and/or similarities (please see my updated question, I was apparently not clear enough).

ADD REPLYlink written 2.2 years ago by al-ash140
1

Clustal can report pairwise identities I believe, but it won’t write you a matrix, you’d still have to parse that out yourself.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Joe16k

You are right! Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) gives directly sequence %identity matrix (Result Summary -> Percent Identity Matrix in the web interface).

ADD REPLYlink written 2.2 years ago by al-ash140

I take it all back then! I guess I was half right!

ADD REPLYlink written 2.2 years ago by Joe16k
0
gravatar for al-ash
19 months ago by
al-ash140
Japan/Okinawa/OIST
al-ash140 wrote:

I ended up with the following command line solution using clustal omega which converts distance matrix to percent identity matrix:

clustalo-1.2.4-Ubuntu-x86_64 --full --percent-id --distmat-out=output.distmat -i input.aa.fa
ADD COMMENTlink modified 19 months ago • written 19 months ago by al-ash140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1014 users visited in the last hour