Question

filtering low expressed gene in microarray for WGCNA

0

Entering edit mode

3 months ago

Fluke ▴ 10

Hi everyone

I have 2 dataset, one is RNA-seq data from TCGA and another is microarray data. For RNA-seq, I filter low expressed gene and normalization by using DESeq2

dds75<-dds[rowsums(counts(dds)>=10)>=475,]
vst(dds75)

Everything went well for RNA-seq but for microarray data I normalize data using gcRMA but i am not sure how to filter low expressed gene prior to do WGCNA because DESeq2 can’t apply to microarray array data.

microarray WGCNA DESeq2 • 503 views

ADD COMMENT • link updated 12 weeks ago by Ram 43k • written 3 months ago by Fluke ▴ 10

score 1 · Answer 1 · 2024-01-01

1

Entering edit mode

3 months ago

Kevin Blighe 87k

If you have followed a standard protocol for normalisation of the microarray, then no filtering is required for the purposes of preparing the data for WGCNA.

If you wish, you can still filter based on low intensity, by following this advice: https://bioconductor.org/packages/devel/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html#10_Filtering_based_on_intensity

You should also ensure that you remove the control probes from your dataset prior to running WGCNA.

Kevin

ADD COMMENT • link 3 months ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin, first of all, thanks for your suggestion. I did follow your suggestion to not filter the gene and remove the control probe from the dataset prior to running WGCNA (as i understand the control probe name start with AFFY). the problem is that during runing WGCNA, i noticed the distribution of data to detect outliers using

gsg <- goodSamplesGenes(norm.counts)
summary(gsg)
gsg$allOK
{  if(sum(!gsg$goodGenes)>0)    
  printFlush(paste("Removing genes:", paste(names(norm.counts)[!gsg$goodGenes], collapse = ", ")));  
  if(sum(!gsg$goodSamples)>0)    
    printFlush(paste("Removing samples:", paste(rownames(norm.counts)[!gsg$goodSamples], collapse = ", ")));  
  norm.counts = norm.counts[gsg$goodSamples, gsg$goodGenes] 
}
sampleTree = hclust(dist(norm.counts), method = "average");
byHist = hist(sampleTree$height,main = "Histogram of Height",xlab = "Height")

The distribution is not a bell shape and reassure to detect outliers

plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="",
 cex.lab = 1.5,cex.axis = 1.5, cex.main = 2)

After that I exclude some samples above the height cut off and then run the pickSoftThreshold and get this result enter image description here Do you have any suggestion to fix this problem?

P.S. I used ReadAffy() and gcrma(Data) to extract and normalized expresion data and next I mapped the probe using hgu133plus2.db and exclude N/A probe and the control probe. Finally, I used avereps() function to average duplicated gene ID.

Thanks again for helping me.

ADD REPLY • link 3 months ago by Fluke ▴ 10