Question: Hierarchical Clustering Using Python
gravatar for 1240192
6.9 years ago by
12401920 wrote:

Hi there, new to clustering,don't quiet get the idea in terms of programming. I have a set of files full on non-coding DNA sequences alignments, I found the distance measure for each alignment, they'll be an array. As I've understood now I have to produce the distance matrix (dimensional array) and then perform hierarchical clustering, which would then generate a tree-structured view. The question is how to produce the distance matrix and what are the further steps for successful clustering? Does the matrix have to be similar?

Thanks in advance.

python clustering distance • 5.1k views
ADD COMMENTlink modified 6.9 years ago by Ashutosh Pandey12k • written 6.9 years ago by 12401920
gravatar for Ashutosh Pandey
6.9 years ago by
Ashutosh Pandey12k wrote:

Well what have you described above is the basis of most of the multiple sequence alignment alogrithms such as CLUSTALW. You may use any of these tools to accomplish what you want.

Assuming you have N sequences. You will have to create N x N matrix where each element (cell) will contain the distance between the corresponding sequences. The value of this distance can be calculated by aligning sequences against each other and calculating alignment score or using some other score. Also, it will be a symmetric matrix i.e. distance between seqA and seqB will be same as distance between seqB and seqA. so you only need to compute half of the matrix.

Once you are done with the matrix creation, you can proceed to Hierarchical clustering.

You will have to start with sequences that have the smallest distance between them. You will merge them and will have to come up with a way to create a consensus sequence that represent the two sequences. Then you will have to create the distance matrix again and merge the two sequences with the smallest distance. This will go on until you are finished with the sequences.

I think in your case, using Python to come up with a consensus sequences is a crucial and complicated step.

ADD COMMENTlink written 6.9 years ago by Ashutosh Pandey12k

Thanks for help. Read up a bit more and watched some youtube tutorials. Using Python as I need it for my project. Installed Orange for Python, will figure out how this thing works and look what output I'd have :)

ADD REPLYlink written 6.9 years ago by 12401920
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 544 users visited in the last hour