Question

[Clustal Omega] Posterior Probability Output File; columns meaning

0

Entering edit mode

6.2 years ago

ReWeeda ▴ 120

Hi!

Just to introduce the context: I clustered protein structures given annotated function and architectures (structural domains along the sequence) and now I want to perform a MSA for each cluster trying to highlight emergent relations at sequence level.

I'm trying different softwares (mafft, clustal omega, muscle, etc) from the command line and I'm exploring their available options.

Testing clustal omega (v1.2.1), I found the option: --posterior-out=<file> Posterior probability output file

that's listed in the command line --help but not in the README file (http://www.clustal.org/omega/README) that's probably referred to an older version of the tool given the presence of old commands that are not supported anymore.

the output file obtained using the above mentioned option is composed by 7 columns as follow:

1.i 2.name                    3.L1 4.L2 5.sum       6.sum/L1    7.HH

0   3VPG:A|PDBID|CHAIN|SEQUENCE 310 377 304.243683  0.981431    262.894775

1   1T2D:A|PDBID|CHAIN|SEQUENCE 322 377 307.759918  0.955776    252.773529

(These are the first columns of the output file resulting from one of my tests)

I don't know what columns 5,6,7 represent.

Reading the article about Clustal Omega (doi:10.1038/msb.2011.75) and the one about the alignment engine (HHalign) used by Clustal Omega (doi:10.1093/bioinformatics/bti125) I didn't find information about this output.

I think that the column n5.sum represents the Log-sum-odd-score discussed in the HHalign article while I have no idea about the meaning of column n7.HH.

Anyone can help me?

Thanks in advance!

D.

msa posterior probability clustal omega • 1.7k views

ADD COMMENT • link 6.2 years ago by ReWeeda ▴ 120

score 1 · Answer 1 · 2018-02-19

If someone ever need in the future:

the answer has been given to me by the "help desk" of Clustal developers.

Column 5: It's the sum of the probabilities of each residue of being aligned to its corresponding position in the HMM computed when the --posterior-out=<file> option is flagged

Column 6: Is the average probability for each position.

Column 7: Is an HHalign internal measure that measures how well a sequence aligns back onto the overall alignment (Higher is better)