I am trying to perform K means clustering on TSS distance vs chipseq tag density. My aim is generate heat map as shown @ Fig 2E of http://www.ncbi.nlm.nih.gov/pubmed/18992931
I generated tag densities using Hommer around TSS 1000/ on both sides (2k total) in a bin of 50 bp It provided me matrix which i take to Cluster 3 to perform K means clustering, I can also use other tools to perform such clustering. However All of them kind of freeze and complain about memmory etc. I am not so very good in command line tools.Having said that I think one of the solution is to reduce the data in the matrix generated by Hommer. I have tried to use filter tools in Cluster 3 but failed to reduce the data. Could some one suggest how I can reduce the data before performing K means clustering. My reads are 50bp and this facor tightly bound around TSS so have selected 1k on either side of TSS.
Thanks
What are the spec of the machine you are running the software on? If all software complain about memory an easy fix is to add memory ;-) There are many solution around to cluster big data. 2kb split by 50bp give 400 regions. What is the number of TSS in your matrix?
(A: Why does the Homer tool find TSS sites for so many (41,478) genes?). Hommer identify 41478TSS mapped. X43 columns when I use 1000bp across TSS with 5pb bins. I am using windows 7; 64 bit; 12 GB ram i7 cpu. I also have access to iMAC