Question: Data preprocessing with background correction and normalization
0
gravatar for landscape95
2.8 years ago by
landscape95170
landscape95170 wrote:

Hi al, after I get the METABRIC data set, I used background correction and normalization between arrays in limma package, I produced these figures for frequency and boxplot overview, am I doing well at the first glance for quality control of expression data? Is there any criterion/method for quality control of microarray expression data?

Your help is really appreciated! Thank you very much!

Here is my code, I plotted the first 150 samples:

MB_miRNA_processed <- backgroundCorrect(MB_miRNA_processed, method = "normexp", verbose = F)
MB_miRNA_processed <- normalizeBetweenArrays(MB_miRNA_processed, method="quantile")
hist(as.matrix(MB_miRNA_processed), main = "MB_miRNA_hist")
boxplot(MB_miRNA_processed[, 1:150], main="MB_miRNA_boxplot_150samples")

enter image description here

And this is the figure after I used boxplot with outline=F

boxplot(MB_miRNA_processed[, 1:150], main="MB_miRNA_boxplot_150samples", outline=F)

enter image description here

AFTER log2 transformation

MB_miRNA_processed <- backgroundCorrect(MB_miRNA_processed, method = "normexp", verbose = F)
MB_miRNA_processed <- normalizeBetweenArrays(log2(MB_miRNA_processed), method="quantile")
hist(as.matrix(MB_miRNA_processed), main = "MB_miRNA_hist")
boxplot(MB_miRNA_processed[, 1:150], main="MB_miRNA_boxplot_150samples", outline=F)

enter image description here

rna-seq • 948 views
ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by landscape95170
1

Hi landscape95,

As far i know the METABRIC is a microarray dataset not RNA-seq. The plots are not very clear but what you describe seems ok to me. Have you log2 transformed your data?

ADD REPLYlink written 2.8 years ago by Matina180

Yes, it is a microarray expression dataset, I haven't log2 transformed my data. What's your opinion?

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by landscape95170
2

I think Kevin is right - maybe sharing the commands you used would be useful. I would plot the log2 normalised expression in the box plot and maybe check how it looks before and after normalisation as well.

ADD REPLYlink written 2.8 years ago by Matina180
1

METABRIC, as in, the breast cancer cohort? Can you confirm the array type and also the commands that you have used?

It does and does not look normalised. There are tonnes of outliers in your box-and-whisker plot on the right, but I don't know if that's just because you are using a large point size. You can avoid plotting outliers by using outline=FALSE in the boxplot function(). This would just help to improve visualisation for checking everything.

ADD REPLYlink written 2.8 years ago by Kevin Blighe69k

Hi @Kevin, thank you! I have updated the information above

ADD REPLYlink written 2.8 years ago by landscape95170

Going by your variable name, this is the METABRIC micro-RNA data, right? - it's not all mRNA species? The profile still looks odd. I don't know what Matina thinks.

Can you confirm the exact source (website)?

ADD REPLYlink written 2.8 years ago by Kevin Blighe69k

Hi Matina, thank you, I have updated the information above.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by landscape95170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1049 users visited in the last hour
_