[Affymetrix mixroarray] How to group together the probes related to the same gene?
1
0
Entering edit mode
3.6 years ago
cg1440 ▴ 60

I'm trying to reproduce the RMA preprocessing algorithm. In the summarization step, normalized expression values of individual probes are combined together to get the overall expression values of the genes they represent.

My first step would be to take from the matrix containing individual probes (that results from the normalization step) all submatrices, where each submatrix contains a probe set relating to one common gene. Is there a specific package that I can use in order to produce these submatrices?

For reference, I'm working on the following workflow An end to end workflow for differential gene expression using Affymetrix microarrays. So basically I'm trying to reproduce all what is done in this line of code palmieri_eset_norm <- oligo::rma(raw_data, target = "core") (Step 9).

R Microarray Affymetrix • 1.4k views
ADD COMMENT
0
Entering edit mode
3.6 years ago

Hi,

One key part of summarisation via RMA is whether or not to summarise at the probe-set or gene level (or, if an exon array, the probe-set or exon level). By using target = 'core', it should summarise expression to the gene level for most genes, but you may still find some genes that appear twice (or more) in the output. Take a look at my quick answer here:

The 'quirks' that exist on the Affy array designs are many... you'd have to spend some time exploring why there may still be duplicate genes in your output even after normalisation and using target = 'core'.

If you still insist on summarisation across genes, then you could take a look at limma::avereps(), but I am not sure that it is needed here.

My first step would be to take from the matrix containing individual probes (that results from the normalization step) all submatrices, where each submatrix contains a probe set relating to one common gene. Is there a specific package that I can use in order to produce these submatrices?

You should not have to do this, but I admit to not completely understanding what you mean.

Kevin

ADD COMMENT
0
Entering edit mode

Thank you for you reply.

but I admit to not completely understanding what you mean.

What I meant is that I want to take from the big matrix containing the normalized expression values of all probes, all possible k submatrices (k= number of genes represented by the chip) where each submatrix contains all probes relating to one same gene. By "probeset" I meant the set of probes related to the same gene. AT this point I want to summarize on the gene level, not the exon level.

So basically I have the following steps in mind for the summarization stage: 1. Create the k submatrices 2. Apply the Median Polish method on each submatrix to estimate the corresponding gene expression values in each microarray 3. Combine the gene expression values in a bigger matrix, which would be the final output

I apologize if I'm mixing up the terms or if I'm not being very clear; I still don't have a deep understanding of the subject.

ADD REPLY
0
Entering edit mode

I see, but, why not use the standard RMA approach and then summarise to gene (if needed) after RMA?; or just follow the tutorial to which you linked?

In Part 8 ( https://www.bioconductor.org/packages/devel/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html#8_relative_log_expression_data_quality_analysis ), they perform background correction and log2 transformation, but not quantile normalisation, it seems, after which they plot the medians in an attempt (I suppose) to show how median summarisation functions.

The RMA approach actually already performs median polish for the summarisation step; although, again, on 'Exon' arrays, this summarisation is to the Exon level, while, on other 'Gene' arrays, the summarisation is to gene level.

ADD REPLY
0
Entering edit mode

why not use the standard RMA approach and then summarise to gene (if needed) after RMA?; or just follow the tutorial to which you linked?

I guess I did not make my point clear, I apologize for it. So we are assigned to write the code of the RMA function by ourselves from scratch, i.e. to reproduce the function. I was assigned to write the code of the summarization step.

The RMA approach actually already performs median polish for the summarisation step

Yes I know that, but again, I myself have to write my own code that performs this step. And that's where I'm coming from.

From the info I collected about this step, I guess it proceeds as follows: 1. log2 transformation of the quantile normalized expression values 2. Median Polishing of probesets (again, sorry if I'm mixing up the terms) 3. Returning an expression matrix similar to the initial one of probe intensities, but here it would contain genes or exons expression values rather than probes. BUT I couldn't find any resource that explains what actually happens internally in the code.

I guess the log2 transformation is easy. However, I'm not sure how to proceed with step 2 in how to produce the submatrices of probesets on which I will then apply the median polish. Is there a function in a certain package that can produce these submatrices?

Also, am I even getting it right? Sorry if I'm being naive but I'm only a couple of weeks into this topic and it's still not very clear at this point.

ADD REPLY

Login before adding your answer.

Traffic: 2646 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6