Question: Extract Expression Data For Candidate Gene List In Microarray
7
gravatar for Kerem
9.7 years ago by
Kerem80
Kerem80 wrote:

Hi all,

I'm struggling my way through beginner's Bioconductor (and R in general) and have a specific problem I hope somebody can help with. I can load CEL files, do RMA processing on them and check QC etc, eventually I end up with a data frame with these data.

I have a few lists of about 200 candidate genes each that are of specific interest to us and I would like to run a script to print the expression data for just those genes to a tab-delim txt file.

Can anybody point me in the right direction to do this. I do have the Affymetrix Probe Set IDs for these, so can use either that or gene symbol.

Apologies if this is a very basic question, I'm very new to this.

Thanks in advance, Kerem.

R bioconductor microarray • 9.8k views
ADD COMMENTlink modified 7.2 years ago by Biostar ♦♦ 20 • written 9.7 years ago by Kerem80

In addition to the answers you might find this document useful: http://cran.r-project.org/doc/manuals/R-data.pdf

ADD REPLYlink modified 13 months ago by RamRS30k • written 9.7 years ago by Michael Dondrup47k
7
gravatar for Christof Winter
9.7 years ago by
Lund, Sweden
Christof Winter990 wrote:

After RMA processing, you should end up with an ExpressionSet object. Let's assume it's called eset, and your probe set IDs are in a character vector called yourProbeSetIDs. Then:

exprs(eset)    # gives a dataframe with expression levels, probe set IDs are the row names
exprs(eset)[yourProbeSetIDs, ]  # gives you of subset dataframe with your probe sets of interest

To write a tab-delimited file:

write.table(yourDataFrame, file="filename.txt", sep="\t", quote=F, row.names=T)

Christof

ADD COMMENTlink modified 13 months ago by RamRS30k • written 9.7 years ago by Christof Winter990
1

It does work. Have a look at indexing with character vectors: http://cran.r-project.org/doc/manuals/R-intro.html#Index-vectors

ADD REPLYlink modified 13 months ago by RamRS30k • written 9.7 years ago by Christof Winter990
1
library(oligo)
geneCELs <- list.celfiles(path, full.names=TRUE)
affyGeneFS <- read.celfiles(geneCELs)
genePS <- rma(affyGeneFS, target="probeset")
featureData(genePS) <- getNetAffx(genePS, "probeset")

so I can do exprs(genePS) and get the probesets, however I want the original probes, before they are "summarised" to the probe-set level.

So something like:

exprs(affyGeneFS)
ADD REPLYlink modified 13 months ago by RamRS30k • written 8.9 years ago by Jimbo120

AFAIK, exprs(eset)[yourProbeSetIDs, ] does not work. It either needs to be probeIndices or a boolean array of length(exprs) between the square brackets.

ADD REPLYlink modified 13 months ago by RamRS30k • written 9.7 years ago by Michael Schubert7.0k

Dear both, Firstly, massive thanks for your help, really appreciated. Eventually got exprs(eset)[yourProbeSetIDs, ] to work. My fault - needed to make the char vector correctly and include the trailing ", " or else it fails. Very useful for me.

However, I've tested with 2 probeIDs and the output file has a couple of problems. First, the header row is missing column title for probeID. Maybe hints I've imported the data wrongly? Second, I want to include gene symbols for the rows. I've got and installed the appropriate .db for my mouse gene st chips. Any pointers?

Thanks again, K.

ADD REPLYlink written 9.7 years ago by Kerem80

Didnt know that point 4 existed, thanks :) @Kerem: you may either reset the headers with colnames()= or take the subset from you dataframe instead of the expression data which would (I assume) preserve the headers. For the 2nd, see the annaffy documentation.

ADD REPLYlink written 9.7 years ago by Michael Schubert7.0k

For mouse, also have a look at the org.Mm.eg.db package.

ADD REPLYlink written 9.7 years ago by Christof Winter990

Anyone know how to get the probe set ids for the expression matrix row names BEFORE summarisation (so like 14029281_1, 14029281_2 etc.). When I load the cel files into R, the original unnormalised expression matrix has row names from 1:number of probes.

ADD REPLYlink written 8.9 years ago by Jimbo120

Can you post your code how you load and process the CEL files?

ADD REPLYlink written 8.9 years ago by Christof Winter990

Sorry, that's rather messy, shall I repost this as a new question?

ADD REPLYlink written 8.9 years ago by Jimbo120

Reposted here: Getting Probe Ids For Expression Matrix Before Summarisation (In R If Possible)

ADD REPLYlink modified 13 months ago by RamRS30k • written 8.9 years ago by Jimbo120
5
gravatar for Michael Schubert
9.7 years ago by
Cambridge, UK
Michael Schubert7.0k wrote:

Here are some pointers:

  • I assume you know how to get the expression values from your dataframe by using exprs(dataframe)
  • From the expression values, you can get a vector of gene identifiers for the whole array by using rownames() or colnames() depending on the orientation of your dataframe

Then you need to check if each ID matches one of your list of interest and finally extract the appropriate dataset. When you are new to programming in general you may want to do this with loops, otherwise there are shortcuts. You can try to following commands and make them fit for your situation:

all = c(1:5)
wanted = c(1,4,5)
x = all %in% wanted
all[x]

Hope that helps :)

ADD COMMENTlink modified 13 months ago by RamRS30k • written 9.7 years ago by Michael Schubert7.0k
1

Thanks Michael, enjoying the programming and this will come in useful. After I read your reply I found https://wiki.brandeis.edu/twiki/bin/view/Bio/BioConductor which has a subsection about 2/3 of the way down the page titled "Filtering Genes" which seems to use similar principles to do this. Also, I've asked a different question in reply to the answer below about including gene symbols in output. Wonder if you have any pointers on this for me? Thanks, Kerem.

ADD REPLYlink written 9.7 years ago by Kerem80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2258 users visited in the last hour