Question: How to combine expression values of multiple probes for one gene?
2
gravatar for ayanava18
3.7 years ago by
ayanava1820
ayanava1820 wrote:

I am a bit new to R Bioconductor and microarray analysis.

I have loaded a GEO series matrix file (GSE2990) from GEO database in R Bioconductor.  This dataset contain expression values of 22283 probes. I wish to get the expression values for the genes for the dataset. Since, there are multiple probes for an individual gene in many cases, I would like to know if there is a package /R code that can combine the expression values of multiple probes for the same gene. Also does oneChannel GUI has this feature? [Please note that I wish to work with a processed GEO dataset ]

ADD COMMENTlink modified 3.7 years ago by poisonAlien2.7k • written 3.7 years ago by ayanava1820

What array platform is this? Typically, if it's an Illumina Bead Array then the different probes that represent the same gene, target different parts of the gene. 

ADD REPLYlink written 3.7 years ago by andrew.j.skelton735.5k
1
gravatar for Sean Davis
3.7 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

Take a look at the findLargest() function in the Bioconductor genefilter package.

ADD COMMENTlink written 3.7 years ago by Sean Davis25k

Hi Sean,

Can you please help me with the exact code I need to try?

I loaded the genefilter library and  tried like this, but getting those warning messages

> findLargest()
Warning message:
In read.dcf(file.path(p, "DESCRIPTION"), c("Package", "Version")) :
  cannot open compressed file 'C:/Users/Ayanabha/Documents/R/win-library/3.2/survival/DESCRIPTION', probable reason 'No such file or directory'
Error in mget(gN, getAnnMap(map, data)) : 
  error in evaluating the argument 'x' in selecting a method for function 'mget': Error: argument "gN" is missing, with no default
> findLargest(gN,testStat,data="hgu133plus2")
Warning message:
In read.dcf(file.path(p, "DESCRIPTION"), c("Package", "Version")) :
  cannot open compressed file 'C:/Users/Ayanabha/Documents/R/win-library/3.2/survival/DESCRIPTION', probable reason 'No such file or directory'
Error in mget(gN, getAnnMap(map, data)) : 
  error in evaluating the argument 'x' in selecting a method for function 'mget': Error: object 'gN' not found

ADD REPLYlink written 3.7 years ago by ayanava1820
1
gravatar for poisonAlien
3.7 years ago by
poisonAlien2.7k
Asgard
poisonAlien2.7k wrote:

I see that, its an afyfymetrix chip. Here is snippet which would calculate mean expression of all probesets mapping to same gene.

 

#Download and install this package.
source("http://bioconductor.org/biocLite.R")
biocLite("hgu133a.db")

#Assuming you have CEL files
aBatch = read.affybatch(filenames = "*.CEL")


#Normalizing with gcrma
gset = gcrma(aBatch)


#fetch entrez id for all probesets.
tab = select(hgu133a.db, keys = keys(hgu133a.db), columns = c("ENTREZID"))

e = exprs(gset)
#merge probes to genes(by Mean expression)
geneExpr = t(sapply(split(tab[,1], tab[,2]), function(ids){
                    colMeans(e[ids,,drop=FALSE])
                }))

P.S: It's not recommended to do this for many reasons.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by poisonAlien2.7k

Can you explain why it's not recommended to this?

ADD REPLYlink written 20 months ago by DataFanatic130
1

because multiple probesets from a single gene could represent multiple isoforms and by merging them you're loosing this information.

ADD REPLYlink written 20 months ago by poisonAlien2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1707 users visited in the last hour