Hello, I’m new to bioinformatics and would appreciate some guidance on the general workflow for WGCNA analysis in disease studies. If there are any tutorials or resources you can point me to, that would be appreciated! I watched the tutorial from Bioinformagician, but she only performs WGCNA using count data.
Expression data: What type of expression data is best for WGCNA? Should I use VST-transformed counts, TPMs, FPKMs, or something else if starting from FASTQ files? My plan is to first generate raw counts using nf-core/rnaseq, then apply VST transformation in DESeq2 before WGCNA. Is this okay?
Sample inclusion: If I have both healthy controls and disease samples, should I include all samples or only disease samples? I’ve read that WGCNA doesn’t require controls, but I’ve also seen suggestions that a reference group can help. I’m planning to combine datasets. While controls are age- and sex-matched within each dataset, there's still variability—especially across datasets. Should I limit WGCNA to disease samples only and apply batch correction?
Preprocessing pipeline: What’s the best pipeline/tool for local processing of FASTQ files for downstream WGCNA? I’m considering this pipeline: FastQC, fastp, HISAT2, featureCounts. Would you recommend this over using nf-core/rnaseq or GenPipes?
Use of fastp: Even if no adapters are detected, would you still recommend using fastp for consistency across datasets?
Thanks in advance!
No harm in using it. If there are any residual adapters not detected by FastQC they will be removed.