Question: Gage duplicate identifiers as row names
0
gravatar for bsp017
21 months ago by
bsp01730
Denmark, Copenhagen, UCPH
bsp01730 wrote:

I have a dataset with Enterez gene annotations and log fold change values under different conditions. I would like to do a geneset enrichment analysis with Gage v2.28.0. I am using RStudio. However I'm not sure how to handle duplicate row.names in column 1

I followed the 'Gene set and data preparation vignette to make sure my data was in the correct format:

cuff.res<-read.csv("swissport_for_gage2.csv", row.names=1, check.names = F)
Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  duplicate 'row.names' are not allowed

If I take out "row.names=1' and run gage I get the following result:

cuff.res<-read.table("swissport_for_gage2.csv", header = T, sep=",")
df1<-na.omit(cuff.res)
new_file1<-as.matrix(df1)
ref.idx=2:3
samp.idx=4:5
keggres = gage(new_file1, gsets=kg.eco.eg$kg.sets, ref = ref.idx, samp = samp.idx)
lapply(keggres, head)
    $greater
                                                      p.geomean stat.mean
    eco00010 Glycolysis / Gluconeogenesis                    NA       NaN
    eco00020 Citrate cycle (TCA cycle)                       NA       NaN
    eco00030 Pentose phosphate pathway                       NA       NaN
    eco00040 Pentose and glucuronate interconversions        NA       NaN
    eco00051 Fructose and mannose metabolism                 NA       NaN
    eco00052 Galactose metabolism                            NA       NaN

Is there a workaround for this? My input data looks like this:

entrezid    Bg_NB_NS_2  Bg_NB_NP_2  BgGq_NB_NB_2    BgGq_NB_NS_2    BgGq_NB_NP_2    BgGq_NS_NS_2
NA  1.33639 0.735912    -1.87482    -2.36335    -1.9769 -3.69974
NA  -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
NA  -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
NA  -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
9126923 -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
NA  1.46519 0.568023    -3.50016    -3.34538    -2.1212 -4.81057
NA  -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
9126923 0.655123    0.202802    -2.62253    -2.04046    -2.21114    -2.69559
1234980 -3.81436    -3.91876    0.541314    -0.0579239  0.399745    3.75644
NA  -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
1234980 -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
9126923 -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
NA  -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
1175404 -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
877311  -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
NA  -3.03675    -3.14115    2.14204 2.69014 1.9918  5.72689

Thanks James

ADD COMMENTlink modified 21 months ago by h.mon28k • written 21 months ago by bsp01730
3
gravatar for h.mon
21 months ago by
h.mon28k
Brazil
h.mon28k wrote:

EntrezID should be the rownames of the matrix, so GAGE can know which gene each row corresponds:

new_file1<-as.matrix(df1)
rownames(new_file1) <- df1$entrezid
ADD COMMENTlink written 21 months ago by h.mon28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1405 users visited in the last hour