Question

Hierarchial Clustering

1

Entering edit mode

12.8 years ago

Sanju ▴ 90

Hi all

I have protein sequences dataset which are in fasta format. I have to find non redundant sequences from this data set. That is my aim. I have found the pairwise sequence similarity percentage and stored the result in excel sheet. My professor told me to use R programming for doing hierarchical clustering (single linkage method). I don’t want to use any software for this. I have to create a dendogram also. How can I do hierarchical clustering of protein sequences using R programming? Could you give R script for this?

I would like to get the R script for

             1) Reading excel file

             2) Hierarchial clustering (single linkage)

             3) Phylogenetic analysis                                      

             4) Creating dendogram.

Please help me.

r programming clustering tree • 5.8k views

ADD COMMENT • link updated 12.8 years ago by Steve Moss 2.3k • written 12.8 years ago by Sanju ▴ 90

0

Entering edit mode

Is this not the same question you already asked once? Hierarchial Clustering

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 12.8 years ago by Lars Juhl Jensen 11k

score 4 · Answer 1 · 2011-07-27

4

Entering edit mode

12.8 years ago

Philippe ★ 1.9k

Hi,

to answer more precisely to your question you can use the following functions:

You can read any file with the read.table() function. If your input file is a csv file you can use an alias like read.csv() or read.csv2() whose default parameters might be those you need.
Once your data is loaded into R you can cluster them using the hclust() function. It implements single linkage clustering (you can select it through the method argument).
Phylogenetic analysis is a broad description. Depending on what you want to do the package ape might be helpful.
To create a dendogram you can directly give the output of the hclust() function as an argument to the function plot. For example: h <- hclust(data); plot(h) # Will plot a dendogram

In general, if you need more information about one of the function you have to use you can read the associated help file using the help or ? command. Example for hclust(): ?hclust; help(hclust) # Two different ways to read the help files for the hclust function

ADD COMMENT • link 12.8 years ago by Philippe ★ 1.9k

0

Entering edit mode

Thank you very much for your answer.

ADD REPLY • link 12.8 years ago by Sanju ▴ 90

0

Entering edit mode

Dear friend,

Could you please provide a sample script for me? Because I am a beginner in programming. Please help me.

ADD REPLY • link 12.8 years ago by Sanju ▴ 90

0

Entering edit mode

I'm sorry but I don't think that providing a full script will help you. The things you ask can be done at least at 90% with the functions I gave you. Looking at these functions, trying by yourself and looking at the help file as I mentioned will allow you to achieve your goal and improve your programming skills. If you are really stuck with one precise point then you can ask help for it. Also, if your profesor asked you to use R maybe he can help you with some special needs you might have.

ADD REPLY • link 12.8 years ago by Philippe ★ 1.9k

0

Entering edit mode

Dear friend,

I am really sorry for the trouble. I will try.

ADD REPLY • link 12.8 years ago by Sanju ▴ 90

Ram · Answer 2 · 2011-07-27

1

Entering edit mode

12.8 years ago

Assa Yeroslaviz ★ 1.8k

Have a look here:

hierarchial-clustering

Aleksandr Levchuk has already wrote a script for blasting and constructing a hierarchical clustering with R. (Using the search option would have save you time.)

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 12.8 years ago by Assa Yeroslaviz ★ 1.8k

Ram · Answer 3 · 2011-07-27

0

Entering edit mode

12.8 years ago

Steve Moss 2.3k

MCL clustering is also a very popular solution (http://micans.org/mcl/), but works with graphs/networks instead of the more traditional dendrograms. See the manual here http://micans.org/mcl/man/mclcm.html.

The algorithm author (Stijn van Dongen) has also provided an R script here http://www.bigre.ulb.ac.be/Users/jvanheld/BMOL-F-501/practicals/r_scripts/mcl.R

Not sure if this is useful in this context, but something to consider for future work perhaps?

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 12.8 years ago by Steve Moss 2.3k

0

Entering edit mode

Thank you very much

ADD REPLY • link 12.8 years ago by Sanju ▴ 90