My lab is using gene expression data generated by Illumina Human HT-12 v3 Expression Beadchips. As advertised by the company, this products has 48000+ probes for 25000 genes. I have never used expression data before and would like to cluster genes based on their expression. The data has already been normalized and corrected for batch effects.
The current file format is:
ProbeID Sample1 Sample2
I would like to get the following format:
GeneID Sample1 Sample2
It seems that some genes have more probes than others. Moreover, there can be multiple transcripts for a given gene. I was wondering if someone could please give me a general idea about getting the desired format.
Thank you for your time.