Question: Collapsing biological replicates for co-expression analysis using WGCNA
gravatar for ab4232
5 weeks ago by
ab42320 wrote:

First my sincere thanks to all community members. Posts here really helps people like us who are new in the field.

Recently I completed a RNA-Seq project consisting 25 samples with 3 biological replicates each. In brief, due to absence of a reference genome for the organism of interest, I performed denovo transcriptome assembly followed by redundancy removal, estimating raw read counts, and differential expression using DeSeq2. Now I need to perform co-expression using WGCNA package, which I have done only once before but it was using output from tuxedo pipeline.

Now from DeSeq2, I have the normalized, rlog, variance stabilized counts, but the count matrix has 75 entries (25 samples x 3 replicates). Earlier in output from Tuxedo pipeline fpkm obtained were after collapsing the biological replicates. So seek help or any suggestion on how to handle the biological replicates for WGCNA analysis from DeSeq2 output (mention in previous lines) or can the biological replicates be collapsed somehow in order to perform co-expression on final set of 25 samples. Everywhere it is suggested not to use DeSeq2's collapseReplicates for biological replicates.

I have been searching for solution for sometime and really appreciate any help or suggestion to proceed further. Thanks.

rna-seq deseq2 wgcna • 152 views
ADD COMMENTlink modified 5 weeks ago by Kevin Blighe61k • written 5 weeks ago by ab42320
gravatar for Kevin Blighe
5 weeks ago by
Kevin Blighe61k
Kevin Blighe61k wrote:


I would proceed to WGCNA with the variance stabilised expression levels and without any collapsing of replicates.


ADD COMMENTlink written 5 weeks ago by Kevin Blighe61k

Thanks. Really appreciate it. I have two small doubts.

  1. Won't WGCNA will treat each biological replicate as individual sample ?
  2. If I would like to perform clustering like k-means, to see expression pattern across the samples, then what to use because then again it will be across biological replicates (25 samples x 3 replicates = 75) than across 25 samples.

Sorry if I asking silly question. With tool like cuffdiff, these were not a problem.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by ab42320

Are you sure that these are not technical replicates? Could you explain further the source of the samples. Normally we do not collapse biological replicates, but we may collapse technical replicates.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Kevin Blighe61k

Yes I am sure they are biological replicates. The samples are plant tissue samples collected at 25 different time points from 3 individuals (same genotype) grown in separate pots under identical conditions.

With getting familiar with packages like DeSeq2, I learned not to collapse biological replicates. My confusion arose because cuffdiff's fpkm.tracking file use to have single fpkm value for each sample for a given gene, inspite the biological replicates given in input. And it was easy to use it for tasks like clustering, WGCNA.

Hope I am making some sense.

ADD REPLYlink written 4 weeks ago by ab42320

I see - thank you for elaborating. Problem there is that FPKM expression units should not even be used for clustering purposes, or any analyses where samples are being compared in any way, in my opinion. The TopHat2 / Cufflinks pipeline (and, these days, HISAT2 / StringTie) are good for performing de novo transcriptome assembly and discovering new transcripts and / or splice isoforms - in this way, they provide a summary metric, a single value, across replicates.

If you have already used DESeq2 with your data, then, for WGCNA, I would use [as input to WGCNA] the regularised log or variance-stabilised expression values that are produced by DESeq2.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Kevin Blighe61k

I really appreciate you taking out time to reply and helping out. Earlier with fpkm I used log2(fpkm+1) for clustering etc.

Just one last one. While further searching online, I came across this link . Do you think it is a good approach ? I understood till calculating Spearman's correlation, but didn't understood how they calculated the weights. Specially the line

"The weighting of each replicate is then calculated as the normalized sum of associations between each replicate with the others."

Thanks again.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by ab42320

Yes, but, FPKM units are produced in a way such that absolutely no cross-sample normalisation occurs. So, even logging or Z-scaling these will still leave bias in the data. You essentially cannot faithfully compare the FPKM value, either logged or unlogged, in one sample versus another.

The CMAP analysis shown via the link is a specific use-case. For WGCNA, I would still favour not collapsing the biological replicates. The inquisitive nature within me would do it with and without collapsing, just out of interest. Biology has no rules, and neither therefore does bioinformatics.

ADD REPLYlink written 4 weeks ago by Kevin Blighe61k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 999 users visited in the last hour