Question: (Closed) Array Normalization on focussed array in Limma using R
0
gravatar for reubenmcgregor88
4 months ago by
reubenmcgregor8840 wrote:

I have been analysing protein array data with hundreds and thousands of proteins using Limma in R.

For normalisation I have been using the following:

y <- normalizeBetweenArrays(log2(exprs), method="quantile")

followed by box plots and density plots for QC. Followed by linear model fitting for differential expression analysis in Limma.

However we then chose the most promising 35 proteins and had a "focussed" array synthesised. Here we chose the 35 proteins that were highest in patients vs controls. When we got the data back I had a think about the analysis and normalising between arrays may be fine when there are many random proteins to bring between array intensities to similar levels.

However it seems to me (I am relatively new to array analysis so I may be wrong) that if we have specifically chosen proteins based on the low expression in some samples and high expression in other samples that this normalisation would not be valid ,as the assumption for normalisation is that genes are expected to have low variation. Is this correct?

If so what kind of normalisation is more appropriate for this type of analysis?

Any guidance much appreciated.

EDIT: I have been toying with the idea of using:

y <- normalizeBetweenArrays(log2(exprs), method="cyclicloess")

Which may be more appropriate?

EDIT2: The array was a Protoarray and the analysis has actually already been done by someone from the provider of the service. However I managed to repeat their analysis getting the same values in Limma with the quantile normalisation mentioned above. The issue is I am questioning if they simply ran a standard analysis pipeline for large arrays not putting much thought into the different design of the array

Note: all values are Log2 transformations of the fluorescence data

Histograms and box plots of data pre normalisation: enter image description here

enter image description here

Histograms and box plots of data post quantile normalisation:

enter image description here

enter image description here

Histograms and box plots of data post cyclic loess normalisation:

enter image description here

enter image description here

limma R array • 219 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by reubenmcgregor8840

You should at least also provide the array manufacturer(s) and array version(s), and any guidance that the vendor gave with regard to normalisation. For example, they usually suggest some normalisation method and/or some program to use. Without any histograms or information on the array(s) used, we neither have any sense of the distribution of your raw signal data. A histogram may help.

ADD REPLYlink written 4 months ago by Kevin Blighe42k

Thanks Kevin, point taken, I have updated the post. As many posts on here I have come to the project late and do not have all of the information about what analysis was done etc, and we have asked but they are asking to charge us extra for more "bioinformatics services". As mentioned above I have already managed to mimic the analysis they sent out to us (i.e. same fold changes and p-values) however I am questioning the analysis done as we are looking to go to peer review soon and would rather be sure than take the analysis not done by me for granted.

ADD REPLYlink written 4 months ago by reubenmcgregor8840

I see. Yes, I would not automatically assume quantile to be suited to protein arrays. On the other hand, I wanted to know the array type because protein measurement platforms differ a lot and require different types of processing (whereas, for cDNA microarrays, the methods are now quite standard).

It seems that robust linear normalisation (RLM) is the most used for ProtoArray, and there is a Bioconductor package:

You may want to quickly re-process with "rlm".

ADD REPLYlink modified 4 months ago • written 4 months ago by Kevin Blighe42k

Thanks I will try that, although reading the Vignette this will not solve the problem of few proteins which we have chosen specifically due to their high variance between cases and controls?

ADD REPLYlink written 4 months ago by reubenmcgregor8840

Hello reubenmcgregor88!

It appears that your post has been cross-posted to another site: https://support.bioconductor.org/p/116668/

ADD REPLYlink modified 4 months ago • written 4 months ago by Kevin Blighe42k

Yes it has, sorry did not realise it was forwned upon I will delete post here as it was answered there

ADD REPLYlink written 4 months ago by reubenmcgregor8840

No worries. No, please leave it open. You are lucky that Gordon Smyth responded, though... he is the best person in the World to answer questions on arrays and his answer far supersedes that of mine.

ADD REPLYlink written 4 months ago by Kevin Blighe42k

Yes happy with Gordons response, I guess would be good for others to see his answer too.

ADD REPLYlink written 4 months ago by reubenmcgregor8840

Exactly. Now someone finding this post on Biostars will migrate to the answer on Bioconductor. Best wishes with your project.

ADD REPLYlink written 4 months ago by Kevin Blighe42k

Hello reubenmcgregor88!

Cross posted: https://support.bioconductor.org/p/116668/

For this reason we have closed your question.

ADD REPLYlink modified 4 months ago • written 4 months ago by Kevin Blighe42k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 814 users visited in the last hour