Question

Correlation methods giving very different results (WGCNA)

0

Entering edit mode

4 months ago

ian.will ▴ 30

Hi all, I've come back to WGCNA after some years and have run into a bit of a quirky result when looking at my soft power thresholds depending correlation the methods I use. Generally, this topic has been discussed a fair bit - but was looking to see if anyone had some wisdom to drop on my particular case.

I have 36 samples of salmon->deseq2 RNAseq data of host animals (worms) with a wild-type microbiome, a bacterial pathogen, germ free, or some combination of pathogen strain variants with or without a microbiome.There are 4 biological replicates for each of 7 experimental treatments. And 8 replicates of germ free controls in two 4x batches (batch effect looked minimal).

A PCA of rlog transformed data shows that the big differences here are pathogen (any kind, with/without microbiome) vs. microbiome-only vs control germ free. With some messy sub-clustering within the various pathogen treatment types. This seems biologically reasonable, if not ideal. Ellipses are 95% CI (point size can be ignored).

PCA of Samples. Green group seems to cluster better than other pathogen treatments, but WGCNA finds an outlier there:

image: PCA of Samples. Green group seems to cluster better than other pathogen treatments, but WGCNA finds an outlier there

Oddly, when checking my samples with a clustering dendrogram using the WGCNA pipeline, one of my pathogen samples presents itself as an outlier, which was not detected in the PCA or hierarchical clustering I've done previously. I've tried both a pre-filter on the WGCNA input data (TPM >= 0.5 for at least 4 samples) or allowing the gsg$goodgenes check provided with the WGCNA docs - they yield similar results.

WGCNA sample clustering:

image: WGCNA sample clustering

I've tried variations on data prep that tested VST/RLOG transformations (makes little difference, sensibly), PEARSON/BICOR correlations (big difference), and including-outlier-sample/excluding-outlier-sample (big difference). I'm running signed-hybrid networks.

My understanding was that, if anything, the "bicor" approach should be more robust to outliers by using the median/MAD. Counter intuitively (for me), Pearson gives a much more typical soft power threshold of around 7 to reach ~ 0.9 R^2 and a plateau on the curve. Bicor gives closer to 30+ (while squishing the mean connectivy << 100). Removing the outlier sample allows the bicor method to get to 12 (albeit, not quite reaching 0.9 R^2) [not shown].

Pearson (rlog, all samples included), looks nice:

image: Pearson (rlog, all samples included), looks nice

Bicor (rlog, all samples included), looks rubbish:

image: Bicor (rlog, all samples included), looks rubbish

Thoughts? Statistically, I think bicor should be the move and I shouldn't just choose Pearson because it looks nice. But I'm also disinclined to drop a sample simply because I don't like it. And I think choosing a massive soft power threshold isn't good either, especially looking at what it does to mean connectivity. I'm also seeing WGCNA docs aren't being hosted where they used to be a few years ago ... is this method becoming depreciated/unused these days?

Thank you!

WGCNA bicor pearson • 531 views

ADD COMMENT • link updated 4 months ago by Ram 43k • written 4 months ago by ian.will ▴ 30

0

Entering edit mode

I suppose a last option, would be to pick few treatment types to analyze (removing highly variable treatments that we might be less interested in) and see if that reduces some noise. I would be very curious about all sample types, but we could do a constrained analysis on control/microbiome/pathogen-isolate-1/pathogen1+microbiome. Don't love it, but it would give us four distinct groups with less variability (by PCA, the two green groups, the reddish group, and a the gold group). We would still have 20 samples with this scheme.

ADD REPLY • link 4 months ago by ian.will ▴ 30

0

Entering edit mode

When you add an image with a "description", please be aware that the description will not be displayed if the image is, and vice versa. Please do not add important information not mentioned elsewhere in this description.

ADD REPLY • link 4 months ago by Ram 43k