Does anybody has experience with GEO2R?
1
0
Entering edit mode
6.2 years ago
s.kheitan ▴ 40

Hi everyone,

I want to integrate expression values into my network. For this purpose, I selected Dataset GSE29801 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE29801).  After assigning samples to each group, adjustment to the P-values was applied with the Benjamini & Hochberg false discovery rate method which is selected by default in GEO2R options. After calculation, I saved all results. But, the complete results table was not complete!!

There was 20330 IDs in this table, whereas 41000 IDs were existed in series matrix file of GSE29801. Almost half of genes were not included in the result table. Therefore, I would not able to enrich all the nodes in my network with their related P-values.

I don't know what the problem is. I would appreciate any suggestion.

GEO2R P-value • 2.9k views
ADD COMMENT
1
Entering edit mode
6.2 years ago

The array contains multiple probes per gene/transcript.

I am not familiar with GEO2R but we can get the same results in R directly:

> library(GEOquery)

# Get the date from GEO:
> gse = getGEO('GSE29801')

# Gene names and IDs can be accessed with fData(gse):
>  head(subset (fData(gse), !is.na(GENE) & GENE!=''))
   ID COL ROW         NAME      SPOT_ID CONTROL_TYPE    REFSEQ    GB_ACC      GENE  GENE_SYMBOL
12 12 266 148  A_24_P66027  A_24_P66027        FALSE NM_004900 NM_004900      9582     APOBEC3B
14 14 266 144 A_23_P212522 A_23_P212522        FALSE NM_014616 NM_014616     23200       ATP11B
15 15 266 142 A_24_P934473 A_24_P934473        FALSE            AK092846 100132006 LOC100132006
16 16 266 140   A_24_P9671   A_24_P9671        FALSE NM_001539 NM_001539      3301       DNAJA1
18 18 266 136 A_24_P801451 A_24_P801451        FALSE NM_006709 NM_006709     10919        EHMT2
19 19 266 134  A_32_P30710  A_32_P30710        FALSE NM_000978 NM_000978      9349        RPL23

# How many gene ID are unique?
> length(subset (fData(gse))$GENE)
[1] 41000
> length(unique(subset (fData(gse))$GENE))
[1] 19753

Consider that the human genome contains only ~19,000 genes, so there is no way a chip can detect expression for 40k.

ADD COMMENT

Login before adding your answer.

Traffic: 3152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6