Data preprocessing with background correction and normalization
0
0
Entering edit mode
6.0 years ago
landscape95 ▴ 190

Hi al, after I get the METABRIC data set, I used background correction and normalization between arrays in limma package, I produced these figures for frequency and boxplot overview, am I doing well at the first glance for quality control of expression data? Is there any criterion/method for quality control of microarray expression data?

Your help is really appreciated! Thank you very much!

Here is my code, I plotted the first 150 samples:

MB_miRNA_processed <- backgroundCorrect(MB_miRNA_processed, method = "normexp", verbose = F)
MB_miRNA_processed <- normalizeBetweenArrays(MB_miRNA_processed, method="quantile")
hist(as.matrix(MB_miRNA_processed), main = "MB_miRNA_hist")
boxplot(MB_miRNA_processed[, 1:150], main="MB_miRNA_boxplot_150samples")

enter image description here

And this is the figure after I used boxplot with outline=F

boxplot(MB_miRNA_processed[, 1:150], main="MB_miRNA_boxplot_150samples", outline=F)

enter image description here

AFTER log2 transformation

MB_miRNA_processed <- backgroundCorrect(MB_miRNA_processed, method = "normexp", verbose = F)
MB_miRNA_processed <- normalizeBetweenArrays(log2(MB_miRNA_processed), method="quantile")
hist(as.matrix(MB_miRNA_processed), main = "MB_miRNA_hist")
boxplot(MB_miRNA_processed[, 1:150], main="MB_miRNA_boxplot_150samples", outline=F)

enter image description here

RNA-Seq • 2.2k views
ADD COMMENT
1
Entering edit mode

Hi landscape95,

As far i know the METABRIC is a microarray dataset not RNA-seq. The plots are not very clear but what you describe seems ok to me. Have you log2 transformed your data?

ADD REPLY
0
Entering edit mode

Yes, it is a microarray expression dataset, I haven't log2 transformed my data. What's your opinion?

ADD REPLY
2
Entering edit mode

I think Kevin is right - maybe sharing the commands you used would be useful. I would plot the log2 normalised expression in the box plot and maybe check how it looks before and after normalisation as well.

ADD REPLY
1
Entering edit mode

METABRIC, as in, the breast cancer cohort? Can you confirm the array type and also the commands that you have used?

It does and does not look normalised. There are tonnes of outliers in your box-and-whisker plot on the right, but I don't know if that's just because you are using a large point size. You can avoid plotting outliers by using outline=FALSE in the boxplot function(). This would just help to improve visualisation for checking everything.

ADD REPLY
0
Entering edit mode

Hi @Kevin, thank you! I have updated the information above

ADD REPLY
0
Entering edit mode

Going by your variable name, this is the METABRIC micro-RNA data, right? - it's not all mRNA species? The profile still looks odd. I don't know what Matina thinks.

Can you confirm the exact source (website)?

ADD REPLY
0
Entering edit mode

Hi Matina, thank you, I have updated the information above.

ADD REPLY

Login before adding your answer.

Traffic: 2771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6