id sno1 sno2 sno3 sno4 gene1 23.42 23.4 88.8 98.21 gene2 0 0 99.7 95.5 gene3 77.4 100 44.4 65.6 gene4 0 0 0 0 gene5 100 100 100 100 : : gene16000 58.3 33.8 78.8 56.6
I have 16000 rows (which represents each gene id) and columns (from different samples-sno1, sno2,sno3 & sno4) which is given in percentage. I want to compare those four samples:(i)how many of genes occur (i.e 100%) between samples (between sno1 & sno2 and between sno3 & sno4) and in all the samples. Eventhough I reduced data rows containg only 0 (for absent) and 100 (present), I come around 1000 rows. I would like to know, if there is any statistical technique (like normalization) to reduce the data dimension, so that it will be easy for heatmap generation.
Why do you want to reduce the number of rows or columns? You just want a smaller heatmap? around 1000 rows and 4 dimensions actually seems very manageable.
Yes, I want a smaller heatmap.
I could be completely off base, but maybe you could aggregate by biological function (using the gene ontology). I think it will be difficult to reduce your dataset in an unbiased way.