Question

How to handle multiple mapping of Gene-Symbols and Probe-ID in Micro-array Data

0

Entering edit mode

5.4 years ago

sp29 ▴ 50

After, performing the differential analysis with limma. After, mapping with the feature data, I have got the data frame as follows-

FDR Probe_ID Gene.Symbol Gene.ID

0.009 1555272_at RSPH10B2///RSPH10B 728194///222967

0.007 1557203_at PABPC1L2B///PABPC1L2A 645974///340529

0.007 1557384_at LOC100506639///ZNF131 100506639///7690

The code for making the above df in R is as follows-

df <- data.frame( FDR = c (0.009, 0.007, 0.007), Probe_ID = c("1555272_at", "1557203_at", "1557384_at"), Gene.Symbol = c("RSPH10B2///RSPH10B","PABPC1L2B///PABPC1L2A","LOC100506639///ZNF131"), Gene.ID = c("728194///222967","645974///340529","100506639///7690"))

I want to perform a GSEA using the column df$Gene.Symbol. However, I can see that more than one gene-symbol is mapped with the one Probe-ID, for which I split the whole data frame by-

df_split <- as.data.frame(df %>% separate_rows(Gene.Symbol, Gene.ID, sep = "///"))

But got repetitive gene symbols. What should be the correct way to resolve this and go about just annotating the df$Gene.Symbol with non-repetitive gene symbols. I don't want to use any online tool as I am hard coding the micro-array pipeline as a part of my project.

R Micro-Array Data-Frame Probe-ID Gene • 1.1k views

ADD COMMENT • link 5.3 years ago by sp29 ▴ 50

score 0 · Answer 1 · 2020-07-16

# Merging

annotated <- as.data.frame(annotated %>%

group_by(Gene.symbol) %>%

filter(across(c("logFC"), ~ n_distinct(sign(.)) == 1)) %>%

summarise(across(c("logFC","P.Value","adj.P.Val","B","AveExpr","t"), mean), X = str_c(X, collapse= " | "),

Gene.title = str_c(Gene.title, collapse= " | "), Gene.ID = str_cGene.ID, collapse= " | "),

GenBank.Accession = str_c(GenBank.Accession, collapse= " | ")))