Question: normalization after merging datasets
0
gravatar for parinv
9 months ago by
parinv0
pune, India
parinv0 wrote:

I merged three datasets, two from same platform and one from a different platform, after merging I performed normalization and try to visualize using boxplot. But I am not getting a proper boxplot. I used the following codes:

# normalization of merged file
#change to summarize experiment file
mergenorm<- normalize(sum, norm.method = "quantile", data.type = "ma")
#converted to matrix file for boxplot
JM3<- assay(mergenorm)
#boxplot for normalized data
boxplot(exprs(JM3))

boxplot: ![got a boxplot in this image][1] Rplot

can anyone suggest what went wrong? or I can use any other plot?

R • 215 views
ADD COMMENTlink modified 8 months ago by svlachavas680 • written 9 months ago by parinv0
2

First off, normalizing between different microarray platforms is generally futile - the discrepancies between them are just too vast to compare between platforms.

Second, there is not enough information here for us to help you. I'm assuming this is RNA microarray data, but you should explicitly state that. What are you using to process these? We need a minimal, complete example of how you're dealing with this. What package is the boxplot function from? What is your end goal?

ADD REPLYlink written 9 months ago by jared.andrews077.9k
3
gravatar for Kevin Blighe
8 months ago by
Kevin Blighe67k
Republic of Ireland
Kevin Blighe67k wrote:

Agreed to the above by Jared. I think that boxplot() may just be the standard function that comes with base R, though.

pv, although the issue that you want to address is related to the boxplot, it is important to understand your general methodology here. One cannot just take 3 datasets from GEO and then 'hack' them together without justification. Even if the datasets are related to the same microarray type and version, batch effects will still exist.

For what it is worth,the boxplot is simply too crowded, and it looks like there is an extreme number of outliers, which is what one would expect when normalisaing disparate datasets together.

For what it is worth, I have given answers in this area previously:

Kevin

ADD COMMENTlink written 8 months ago by Kevin Blighe67k

Thank You, here I merged 3 microarray datasets, two are from Affymetrix HG-U133_Plus_2 and one from Affymetrix HG-U133A. I normalized the data separately using the affy package and remove batch effects using limma package. Then created a gene list and merged all three datasets. After merging I again performed normalization and remove batch effect function, to visualize the normalized data I used standard boxplot() function.

I have a list of questions if you can answer them:

  • Is it important to perform normalization after merging data or can I skip that step and only remove batch effects?
  • Should I convert data to the Z- score?
  • Can you please elaborate on the Z- score from your previous answer? Why is that important? What difference does it make?

Parinv.

ADD REPLYlink modified 8 months ago • written 8 months ago by parinv0
2

I am not sure that it's a good idea to apply a batch correction twice... Have you tracked the values of some of the probes to see how they have changed after the 2 batch corrections...? Technically, one should not even have to directly modify the data for batch.

Z-scores are intuitive to apply to data that is already normalised. The Z-transformation converts values to 'standard deviations from the mean'. These are sometimes called 'standard scores' because they are standardised across data-types.

With your data, I would process/normalised each separately, filter them for common probes (across the 3), and then merge them together, using batch as a covariate for limma

ADD REPLYlink written 8 months ago by Kevin Blighe67k
1
gravatar for svlachavas
8 months ago by
svlachavas680
Greece
svlachavas680 wrote:

Initially, as Jared mentioned, you should provide detailed information about your experimental design and biological question, without just explicitly posting some code chunks, as others will be more willing and able to help you.

In conjunction with the above answers, you might also want to check this:

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2641-8

Additionally, if both datasets share the same phenotype and have similar experimental condition, you can perform more "elegant" DE tests such as roast and mroast, testing if the DEG list from your one experiment has the "same behaviour" in your other dataset, minimizing the necessity of merging expression data.

Finally a semantic or functional analysis separately, might reveal common perturbed biological mechanisms.

Efstathios

ADD COMMENTlink written 8 months ago by svlachavas680

Thank you so much for sharing this.

ADD REPLYlink written 8 months ago by parinv0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1761 users visited in the last hour