Question

Outlier removal for WGCNA

0

Entering edit mode

2.3 years ago

jms2520 ▴ 30

Hello, When working through the WGCNA tutorials on my RNA seq data my complex design lends itself well to using the consensus network analysis to compare each experimental group. After the first run through the script I found that the mean connectivity is really really high and that the samples clustered based on brain region. I re-analyzed the data starting from raw counts and used Deseq2 normalization, and separated the data based on region and ran the consensus network again. The mean connectivity did not change at all which I imagine means there is still a driving factor for the data. (picture below) When looking at the sample clustering, I thought the driving factor would be sex but it looks like it is actually age. So now I split the data again into region and age. (Giving me 8 different data sets) Now when looking at my 8 "sample_clustering" diagrams it seems that there are few data sets with outliers but at all different heights (example of one below). I am struggling with applying different heights to different trees. The tutorial for outlier removal is below but I am struggling to apply it to my particular situation. When I tried to create separate cut heights all of my sample numbers for each set come out as zero.

baseHeight = 16

cutHeights = c(16, 16*exprSize$nSamples[2]/exprSize$nSamples[1]); **#I am very confused what this line is achieving and why we cannot just create a vector of cut heights for each graph**
pdf(file = "Plots/SampleClustering.pdf", width = 12, height = 12);
par(mfrow=c(2,1))
par(mar = c(0, 4, 2, 0))
for (set in 1:nSets)
{
plot(sampleTrees[[set]], main = paste("Sample clustering on all genes in", setLabels[set]),
xlab="", sub="", cex = 0.7);
abline(h=cutHeights[set], col = "red");
}
dev.off();

My code

> baseHeight = 59
> # Adjust the cut height for the male data set for the number of samples
> cutHeights = c(59, 59, 59,75, 59,59, 59, 59);
> # Re-plot the dendrograms including the cut lines
> pdf(file = "New_Plots_SampleClustering.pdf", width = 12, height = 12);
> par(mfrow=c(2,1))
> par(mar = c(0, 4, 2, 0))
> for (set in 1:nSets)
+ {
+   plot(sampleTrees[[set]], main = paste("Sample clustering on all genes in", setLabels[set]),
+        xlab="", sub="", cex = 0.7);
+   abline(h=cutHeights[set], col = "red");
+ }
> dev.off();

> 
> for (set in 1:nSets)
+ {

+   labels = cutreeStatic(sampleTrees[[set]], cutHeight = cutHeights[set])
+   # Keep the largest one (labeled by the number 1)
+   keep = (labels==1)
+   multiExpr[[set]]$data = multiExpr[[set]]$data[keep, ]
+ }
> 
> collectGarbage();
> # Check the size of the leftover data
> exprSize = checkSets(multiExpr)
> exprSize
$nSets
[1] 8

$nGenes
[1] 18736

$nSamples
[1] 0 0 0 0 0 0 0 0

$structureOK
[1] TRUE

enter image description here

RNAseq WGCNA • 1.1k views

ADD COMMENT • link updated 23 months ago by machado.rodrigo • 0 • written 2.3 years ago by jms2520 ▴ 30

0

Entering edit mode

Hello jms2520, could you solve this problem? Because I'm having the same issue since monday that I started with this and I couldn't get it over it

ADD REPLY • link 23 months ago by machado.rodrigo • 0