Entering edit mode
4 days ago
Hamada
•
0
Hello everyone, I have a question and would appreciate your input. We are planning to perform a differential expression analysis on Head and Neck cancer samples from TCGA. However, I am concerned about possible batch effects arising from tissue heterogeneity, since the dataset includes ~500 samples from different tissue origins. Would it be appropriate to apply Surrogate Variable Analysis (SVA) to account for these potential batch effects before running the RNA-seq differential expression pipeline, or would another approach be more suitable for this type of dataset?
SVA is definitely an option. For inspiration (using RUVseq though, but for the end user it's similar) you might want to look into this exploratory analysis from the DESeq2 developer Mike Love where he addresses tackling unwanted variation in larger cohort sequencing data. From experience, the problem with these tools like SVA and RUV is that it is (at least to me) largely arbitrary how many of the surrogate variables or factors you eventually include into your design. Maybe someone has a best practice here, but in the end, to me, it's always "try-apply-stare at PCA plot-repeat" until your gut tells you it's "now fine" to proceed with DE analysis.