Question: Gage duplicate identifiers as row names
0
gravatar for bsp017
14 months ago by
bsp01720
Wales, Bangor, Bangor Uni
bsp01720 wrote:

I have a dataset with Enterez gene annotations and log fold change values under different conditions. I would like to do a geneset enrichment analysis with Gage v2.28.0. I am using RStudio. However I'm not sure how to handle duplicate row.names in column 1

I followed the 'Gene set and data preparation vignette to make sure my data was in the correct format:

cuff.res<-read.csv("swissport_for_gage2.csv", row.names=1, check.names = F)
Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  duplicate 'row.names' are not allowed

If I take out "row.names=1' and run gage I get the following result:

cuff.res<-read.table("swissport_for_gage2.csv", header = T, sep=",")
df1<-na.omit(cuff.res)
new_file1<-as.matrix(df1)
ref.idx=2:3
samp.idx=4:5
keggres = gage(new_file1, gsets=kg.eco.eg$kg.sets, ref = ref.idx, samp = samp.idx)
lapply(keggres, head)
    $greater
                                                      p.geomean stat.mean
    eco00010 Glycolysis / Gluconeogenesis                    NA       NaN
    eco00020 Citrate cycle (TCA cycle)                       NA       NaN
    eco00030 Pentose phosphate pathway                       NA       NaN
    eco00040 Pentose and glucuronate interconversions        NA       NaN
    eco00051 Fructose and mannose metabolism                 NA       NaN
    eco00052 Galactose metabolism                            NA       NaN

Is there a workaround for this? My input data looks like this:

entrezid    Bg_NB_NS_2  Bg_NB_NP_2  BgGq_NB_NB_2    BgGq_NB_NS_2    BgGq_NB_NP_2    BgGq_NS_NS_2
NA  1.33639 0.735912    -1.87482    -2.36335    -1.9769 -3.69974
NA  -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
NA  -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
NA  -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
9126923 -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
NA  1.46519 0.568023    -3.50016    -3.34538    -2.1212 -4.81057
NA  -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
9126923 0.655123    0.202802    -2.62253    -2.04046    -2.21114    -2.69559
1234980 -3.81436    -3.91876    0.541314    -0.0579239  0.399745    3.75644
NA  -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
1234980 -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
9126923 -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
NA  -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
1175404 -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
877311  -0.2294 -0.333797   -0.574163   -1.68241    -0.873274   -1.45301
NA  -3.03675    -3.14115    2.14204 2.69014 1.9918  5.72689

Thanks James

ADD COMMENTlink modified 14 months ago by h.mon24k • written 14 months ago by bsp01720
3
gravatar for h.mon
14 months ago by
h.mon24k
Brazil
h.mon24k wrote:

EntrezID should be the rownames of the matrix, so GAGE can know which gene each row corresponds:

new_file1<-as.matrix(df1)
rownames(new_file1) <- df1$entrezid
ADD COMMENTlink written 14 months ago by h.mon24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1513 users visited in the last hour