Quality control of gene expression data
1
0
Entering edit mode
6.1 years ago
landscape95 ▴ 190

Hello everybody,

I currently have the microarray gene expression matrix obtained from GEO (GSE40267), I normalized it with voom function in limma package, I wonder how to assess the quality of the processed data (by box-plotting or something, the criteria)? Could any one give me some hint to this quality assessment?

Thank you

QC • 1.3k views
ADD COMMENT
1
Entering edit mode
6.1 years ago

'Quality' in this sense has no defined threshold, or boundary. If, after you have normalised your data, there is a problematic sample, then it will usually be very obvious on each of the following:

  • Box-and-whisker plot
  • Violin plot
  • Unsupervised hierarchical clustering
  • PCA

For the first 3 of these, I have put some code here: A: Hierarchical Clustering in single-channel agilent microarray experiment

For PCA, take a look here: A: PCA plot from read count matrix from RNA-Seq

------------------------------------------

A sample that has degraded cDNA will likely come back with very low expression values, many of which will actually be NA due to having fallen below the background noise (microarray) or will be zero (RNA-seq). These may additionally show up as having very low coefficient of variation (CoV). There are obviously, then, other reasons why a sample may have low quality, such as technical issues. Note, of course, that an outlier sample is not necessarily a poor quality sample and may just be a sample that behaves differently / unexpectedly.

There is one tool at which you may take a look, called nsFilter.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6