Question: Problems Converting Gene Names To Numeric Values In R
1
shaikhfarahdeeba20 wrote:

Hi i am carrying out differential gene expression analysis using limma further i need to do gene set enrichment analysis using GOstats but thers a problem. These are my set of differential expressed genes

 "1557994_at"       "205933_at"        "1559688_at"
 "232837_at"        "212253_x_at"      "212845_at"
 "233520_s_at"      "236931_at"        "205054_at"
 "237981_at"        "209896_s_at"      "221718_s_at"
 "226648_at"        "208195_at"        "211928_at"

but when I convert the character vector to numeric I get a warning that NA's introduced as coercion and getting result somewhat this way :

 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

how do I solve this problem. And when i carry out analysis its taking hours and no output .

• 2.1k views
modified 5.7 years ago by Devon Ryan92k • written 5.7 years ago by shaikhfarahdeeba20
2

Are you literally just as.numeric(d) on a character vector d (just as an example)? That will always produce an NA since there's no obvious conversion between probe IDs like that and numbers. You can as.numeric(c("1","2","100")) since those are just character representations of numbers, but you have probe IDs.

is it necessary to convert them into numeric vecctor

1

Have you read the GOstats documentation (PDF) ? Nowhere does it mention conversion of probeset IDs to a numeric value. Perhaps what you want to do is convert to Entrez Gene ID?

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Neilfws48k

how am i supposed to move ahead i am trying dis from past 10 days but couldnt get the result

i have generated top 500 genes and saved their rownames in vector rn as

rn<-rownames(toptable(fit,coef=2,n=500)) rn rn<as.numeric(rn) dat.s&lt;-eset.new[rn,]="" i="" created="" an="" object="" dat.s="" to="" store="" the="" differentially="" exprsd="" genes.="" but="" i="" m="" getng="" nly="" na's<="" p="">

1
Devon Ryan92k wrote:

There are annotation packages for most arrays you'll ever use in R. You'll find that easier than trying to roll your own solution.

>library("hgu133plus2.db")
>d
 "1557994_at"  "205933_at"   "1559688_at"  "232837_at"   "212253_x_at"
 "212845_at"   "233520_s_at" "236931_at"   "205054_at"   "237981_at"
 "209896_s_at" "221718_s_at" "226648_at"   "208195_at"   "211928_at"
>select(hgu133plus2.db, d, "SYMBOL", "PROBEID")
PROBEID  SYMBOL
1   1557994_at     TTN
2    205933_at  SETBP1
3   1559688_at   GRAPL
4    232837_at  KIF13A
5  212253_x_at     DST
6    212845_at  SAMD4A
7  233520_s_at   CMYA5
8    236931_at    <NA>
9    205054_at     NEB
10   237981_at   CMYA5
11 209896_s_at  PTPN11
12 221718_s_at  AKAP13
13   226648_at  HIF1AN
14   208195_at     TTN
15   211928_at DYNC1H1

Thnx ryan but this vl nly give me the symbols i have to the hypergeometric test to using GOstats.Plz if u could help on this.

That's just an example. It looks like GOtats is expecting an EntrezID, so just use ENTREZID instead of SYMBOL. You could even directly get the associated GO terms if you wanted (you'd have to roll your own test function then, most likely) by instead using GO.

As an aside, you have a full keyboard on your computer. There's no need to use things like "Plz" or "u" or "dis".