Question: Gage GO annotations vs Entrez
0
gravatar for bsp017
12 months ago by
bsp01720
Wales, Bangor, Bangor Uni
bsp01720 wrote:

Hi all,

I'm using Gage v2.28.2 to analyse gene enrichment pathways in RNA-seq data. Genes were initially annotated with entrez ID, which worked fine:

kg.eco=kegg.gsets("eco")
kg.eco.eg=kegg.gsets("eco", id.type = "entrez")
headkg.eco.eg$kg.sets, 2)
    keggres = gage(p3, gsets=kg.eco.eg$kg.sets, ref = ref.idx, samp = samp.idx, same.dir = F)
    lapply(keggres, head)
    $greater
                                                             p.geomean stat.mean
    eco02010 ABC transporters                             5.636397e-06  4.437748
    eco01100 Metabolic pathways                           4.192560e-07  3.560120
    eco02020 Two-component system                         2.976999e-03  2.635986
    eco02060 Phosphotransferase system (PTS)              1.029505e-02  2.316506
    eco02024 Quorum sensing                               8.269080e-03  2.290890
    eco01120 Microbial metabolism in diverse environments 8.855620e-04  2.186511
                                                                 p.val        q.val
    eco02010 ABC transporters                             1.255645e-09 6.654916e-08
    eco01100 Metabolic pathways                           4.372851e-07 1.158805e-05
    eco02020 Two-component system                         1.382499e-04 2.442415e-03
    eco02060 Phosphotransferase system (PTS)              8.903017e-04 1.053680e-02
    eco02024 Quorum sensing                               9.940376e-04 1.053680e-02
    eco01120 Microbial metabolism in diverse environments 1.307795e-03 1.076939e-02

Input data is the same except row names are changed to GO terms:

head(p3)
                                 Bg_NB25_v_BgNS26_2h_fc Bg_NB25_v_BgNP27_2h_fc
<NA>                                           1.559100               1.492170
GO:0000150|GO:0003677|GO:0006310               1.696600               1.251170
<NA>                                           0.688138               0.403168
GO:0003824                                     0.770600               0.744205
GO:0006355                                     1.185640               1.403170
GO:0008982|GO:0009401|GO:0016020               2.092530               0.818206
                                 Bg_NB31_v_BgNS32_2h_fc Bg_NB31_v_BgNP33_2h_fc
<NA>                                         -0.0207885               0.401330
GO:0000150|GO:0003677|GO:0006310              1.3511800               0.285852
<NA>                                          0.3511800              -0.299110

Then I ran the following commands:

data(go.sets.hs)
data(go.subs.hs)
keggres = gage(p3, gsets=go.sets.hs[go.subs.hs$BP], same.dir = F)

The results contain only NA's, I'm not sure why this is?

lapply(keggres, head)
$greater
                                               p.geomean stat.mean p.val q.val
GO:0000002 mitochondrial genome maintenance           NA       NaN    NA    NA
GO:0000003 reproduction                               NA       NaN    NA    NA
GO:0000012 single strand break repair                 NA       NaN    NA    NA
GO:0000018 regulation of DNA recombination            NA       NaN    NA    NA
GO:0000019 regulation of mitotic recombination        NA       NaN    NA    NA
GO:0000022 mitotic spindle elongation                 NA       NaN    NA    NA
                                               set.size Bg_NB31_v_BgNS32_2h_fc
GO:0000002 mitochondrial genome maintenance           0                     NA
GO:0000003 reproduction                               0                     NA
GO:0000012 single strand break repair                 0                     NA
GO:0000018 regulation of DNA recombination            0                     NA
GO:0000019 regulation of mitotic recombination        0                     NA
GO:0000022 mitotic spindle elongation                 0                     NA

There are a total of ~13,0000 genes in the original mapping database. Only ~32000 of these have GO annotations. The vast majority of enriched genes should be bacterial.

ADD COMMENTlink written 12 months ago by bsp01720

Remove the NA row names from the p3 object and try again.

ADD REPLYlink modified 12 months ago • written 12 months ago by h.mon24k

Thanks for the reply. Same result with NA row names removed:

lapply(keggres, head)
$greater
                                                                       p.geomean
GO:0000009 alpha-1,6-mannosyltransferase activity                             NA
GO:0000010 trans-hexaprenyltranstransferase activity                          NA
GO:0000014 single-stranded DNA specific endodeoxyribonuclease activity        NA
GO:0000016 lactase activity                                                   NA
GO:0000026 alpha-1,2-mannosyltransferase activity                             NA
GO:0000030 mannosyltransferase activity                                       NA
                                                                       stat.mean
GO:0000009 alpha-1,6-mannosyltransferase activity                            NaN
GO:0000010 trans-hexaprenyltranstransferase activity                         NaN
GO:0000014 single-stranded DNA specific endodeoxyribonuclease activity       NaN
GO:0000016 lactase activity                                                  NaN
GO:0000026 alpha-1,2-mannosyltransferase activity                            NaN
GO:0000030 mannosyltransferase activity                                      NaN
ADD REPLYlink written 12 months ago by bsp01720
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2386 users visited in the last hour