After, performing the differential analysis with limma. After, mapping with the feature data, I have got the data frame as follows-
FDR Probe_ID Gene.Symbol Gene.ID
0.009 1555272_at RSPH10B2///RSPH10B 728194///222967
0.007 1557203_at PABPC1L2B///PABPC1L2A 645974///340529
0.007 1557384_at LOC100506639///ZNF131 100506639///7690
The code for making the above df in R is as follows-
df <- data.frame(
FDR = c (0.009, 0.007, 0.007),
Probe_ID = c("1555272_at", "1557203_at", "1557384_at"),
Gene.Symbol = c("RSPH10B2///RSPH10B","PABPC1L2B///PABPC1L2A","LOC100506639///ZNF131"),
Gene.ID = c("728194///222967","645974///340529","100506639///7690"))
I want to perform a GSEA using the column df$Gene.Symbol. However, I can see that more than one gene-symbol is mapped with the one Probe-ID, for which I split the whole data frame by-
df_split <- as.data.frame(df %>% separate_rows(Gene.Symbol, Gene.ID, sep = "///"))
But got repetitive gene symbols. What should be the correct way to resolve this and go about just annotating the df$Gene.Symbol with non-repetitive gene symbols. I don't want to use any online tool as I am hard coding the micro-array pipeline as a part of my project.