Question: Effect of gene size on WGCNA
3
gravatar for rodd
7 months ago by
rodd100
London, United Kingdom
rodd100 wrote:

Dear all,

I was recently asked in a lab meeting whether there may be a gene length bias in the results produced by WGCNA, if the input data consists of normalized counts or variance-stabilized transformed data from DESeq2 (which does not correct for gene length). I am not sure how to answer this, as there are several papers that used DESeq2 normalized counts as input for WGCNA, and I thought that this was actually recommended by the authors of WGCNA.

In other words, could my detected expression modules be biased for gene length, with some modules being particularly driven by the length of the genes in that module as opposed to its actual expression?

ADD COMMENTlink modified 6 months ago by i.sudbery7.8k • written 7 months ago by rodd100

That sounds more like a normalization problem rather than how wgcna works. In my experience, in a transcripts network analysis, I found slightly different results, introduced by the normalization, at the level of small modules, but not for the largest ones. If the analysis does not take too long, just try different normalization methods

ADD REPLYlink modified 6 months ago • written 6 months ago by andres.firrincieli590
5
gravatar for colin.kern
7 months ago by
colin.kern900
United States
colin.kern900 wrote:

Like DEG analysis, WGCNA is looking at each gene independently of one another. All that matters is how the expression of the same gene compares across samples. That comparison will be proportionally the same whether you normalize by gene length or not, because all the values within each comparison are from the same gene, so they have the same length. The modules are formed by finding genes that have the same patterns of expression across samples, not by comparing actual gene expression values between different genes.

ADD COMMENTlink written 7 months ago by colin.kern900

Thank you, it absolutely makes sense to me now!

ADD REPLYlink modified 7 months ago • written 7 months ago by rodd100
2
gravatar for i.sudbery
6 months ago by
i.sudbery7.8k
Sheffield, UK
i.sudbery7.8k wrote:

In any analysis that is based on the number of reads mapping to a gene, the longer a gene is the more reads will map to it.

In DE analysis (such as DESeq and edgeR), this means that genes that a long gene is more likely to be called significantly differential than a short gene as there is less noise.

I don't know of any literature on how WGCNA might be affected by this, however, one can imagine that where the expression is more accurately estimated (as it is with longer genes) there is a higher chance of the correlation being more significant. So, yes, I might image that you would find that longer genes are more likely to be assigned to clusters.

Might even be a short paper in it if you could demonstrate it.

ADD COMMENTlink written 6 months ago by i.sudbery7.8k
1

Just published, this paper seems relevant: https://journals.plos.org/plosbiology/article/file?id=10.1371/journal.pbio.3000481&type=printable

They find that: a) Gene length bias goes beyond just longer genes have more reds, and finds that there are sample-specific length bias' b) That there are correlations between the expression of short genes or between long genes, which means that the independence assumptions of gene enrichment methods c) Non of this is fixed by FPKM normalizaiton.

I'm pretty sure this is relevant to WGCNA, particularly b

ADD REPLYlink written 6 months ago by i.sudbery7.8k

I should point that normalising by gene length wouldn't change any of this.

ADD REPLYlink written 6 months ago by i.sudbery7.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1466 users visited in the last hour