Hierarchical Clustering Using Python
1
0
Entering edit mode
11.0 years ago
1240192 • 0

Hi there, new to clustering,don't quiet get the idea in terms of programming. I have a set of files full on non-coding DNA sequences alignments, I found the distance measure for each alignment, they'll be an array. As I've understood now I have to produce the distance matrix (dimensional array) and then perform hierarchical clustering, which would then generate a tree-structured view. The question is how to produce the distance matrix and what are the further steps for successful clustering? Does the matrix have to be similar?

Thanks in advance.

python clustering distance • 6.7k views
ADD COMMENT
2
Entering edit mode
11.0 years ago

Well what have you described above is the basis of most of the multiple sequence alignment alogrithms such as CLUSTALW. You may use any of these tools to accomplish what you want.

Assuming you have N sequences. You will have to create N x N matrix where each element (cell) will contain the distance between the corresponding sequences. The value of this distance can be calculated by aligning sequences against each other and calculating alignment score or using some other score. Also, it will be a symmetric matrix i.e. distance between seqA and seqB will be same as distance between seqB and seqA. so you only need to compute half of the matrix.

Once you are done with the matrix creation, you can proceed to Hierarchical clustering.

You will have to start with sequences that have the smallest distance between them. You will merge them and will have to come up with a way to create a consensus sequence that represent the two sequences. Then you will have to create the distance matrix again and merge the two sequences with the smallest distance. This will go on until you are finished with the sequences.

I think in your case, using Python to come up with a consensus sequences is a crucial and complicated step.

ADD COMMENT
0
Entering edit mode

Thanks for help. Read up a bit more and watched some youtube tutorials. Using Python as I need it for my project. Installed Orange for Python, will figure out how this thing works and look what output I'd have :)

ADD REPLY

Login before adding your answer.

Traffic: 2657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6