Question: How to convert Illumina probe expression data into gene expression data?
gravatar for Avro
5.5 years ago by
Avro140 wrote:

Hi everyone,

My lab is using gene expression data generated by Illumina Human HT-12 v3 Expression Beadchips. As advertised by the company, this products has 48000+ probes for 25000 genes. I have never used expression data before and would like to cluster genes based on their expression. The data has already been normalized and corrected for batch effects. 

The current file format is:

ProbeID      Sample1      Sample2   

I would like to get the following format:

GeneID       Sample1      Sample2

It seems that some genes have more probes than others. Moreover, there can be multiple transcripts for a given gene. I was wondering if someone could please give me a general idea about getting the desired format. 

Thank you for your time.


ht-12 illumina • 11k views
ADD COMMENTlink modified 5.5 years ago by poisonAlien2.8k • written 5.5 years ago by Avro140
gravatar for poisonAlien
5.5 years ago by
poisonAlien2.8k wrote:


Its easier to do this in R. All you need is to convert ProbeID into the Gene name to which it is mapped.


>probeID=c("ILMN_1690170", "ILMN_2410826", "ILMN_1675640", "ILMN_1801246",
          "ILMN_1658247", "ILMN_1740938", "ILMN_1657871", "ILMN_1769520",
>library("illuminaHumanv4.db") #Get this library if you don't have

>data.frame(Gene=unlist(mget(x = probeID,envir = illuminaHumanv4SYMBOL)))
ILMN_1690170 CRABP2
ILMN_2410826   OAS1
ILMN_1675640   OAS1
ILMN_1801246 IFITM1
ILMN_1658247   OAS1
ILMN_1740938   APOE
ILMN_1657871  RSAD2
ILMN_1769520 UBE2L6
ILMN_1778401  HLA-B
ADD COMMENTlink written 5.5 years ago by poisonAlien2.8k

Thank you very much for your time! I am trying it right now.  I was wondering: when someone wants to cluster genes, don’t they need one expression value for each gene? If so, how can you incorporate the expression of several probes within a gene into one value? 

Thank you!

ADD REPLYlink written 5.5 years ago by Avro140

Not that I am sure of this, but I would not try to summarize different probesets of a gene into a single value, since as you have mentioned, they could be from different transcripts of the same gene. Its better to continue with the normalized expression values of probes for clustering.

ADD REPLYlink written 5.5 years ago by poisonAlien2.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2318 users visited in the last hour