I have a file that has enhancers in 1st column and the name of transcription factor in 2nd column for which it has binding sites. I wanted to find out which enhancers have binding sites for common transcription factors so I made a heatmap in R but since my data is so huge its impossible to estimate the no. of TFs shared by a group of enhancers. How can I accomplish this in R? My data looks like this:
Enhancer TF Gene1_Enhancer1 Arid3a Gene1_Enhancer1 Hoxa4 Gene1_Enhnacer1 Ascl2 Gene1_Enhancer1 EBP Gene1_Enhancer2 ETS1 Gene2_Enhancer1 ETS1 Gene2_Enhancer1 EBP Gene2_Enhancer1 Arid3a Gene2_Enhancer1 Hoxa4 Gene3_Enhancer1 Arid3a Gene3_Enhancer1 Hoxa4 Gene3_Enhancer1 EBP Gene3_Enhancer2 Hoxa7 Gene4_Enhancer1 Hoxa4 Gene4_Enhancer1 EBP Gene4_Enhancer1 Arid3a
Is there a way I could have my output like this in a text file such that I have groups containing 1 or more enhancer from all 4 genes:
Group Common TFs Gene1_Enhancer1, Gene2_Enhancer1, Arid3a, EBP, Hoxa4 Gene3_Enhancer1, Gene4_Enhancer1
Thanks a lot!!!