I am trying to relate a factor (sensitivity) to gene expression. I have ~40 samples of breast cancer, each a different cell line, from a few lung cancer subtypes.
When I model my known clinical factors by variance partition to examine the variation explained from each, I see that the vast majority of my variance in gene expression is explained by unknown factors and represented by residuals. It seems my data is very noisy (which is probably to be expected as each sample comes from a different primary tumour).
My question is: should I model these residuals / noise as unwanted variation using RUV or similar? With the aim of increasing variance explained by sensitivity, separating them strongly on PCA by sensitivity, then running DESeq with these in my design formula?
Or do I have to accept that my data is too noisy for DESeq analysis and look elsewhere for markers of sensitivity?
Crosspost on Bioconductor: https://support.bioconductor.org/p/9155039/
How are you calculating these residules?
variancePartition package in R : https://bioconductor.org/packages/release/bioc/html/variancePartition.html
do you really have 1 sensitive sample and 3 moderate? if so, you may have a bad time.
with regard to the variability, gene expression generally is ... variable
because you are looking at a somatic state, stratification by or controlling for sample genotype may possibly help you. the idea here would be that, because tumor samples can have large scale changes to their chromosomal complement, stratification by the most common chromosome or chromsome-arm level changes could (possibly) increase statistical power.
however, if you do truly have only one sensitive sample, you wont be able to make that info useful, even if it would otherwise be effective...
I was considering grouping moderate with sensitive, so 4 'sensitive' and remaining resistant. Small sample sizes unfortunately.
Re: stratification by chromosome - how would I use this info to stratify my samples? Would I identify most common chromosome (or chromosome arm) for each sample, and group them in this way? And then identify the variance attributed to each chromosome arm group?
If you have an example paper where this has been used to help me understand, that'd be amazing :)
Re moderate + sensitive - yes if thats the best you can do, do that; could also code the response as 1 and 0.5 (with resistant being 0).
i dont have a citation handy beyond the literature that discusses stratified versus pooled analysis generally, but the idea is the same. there are definitely good videos on performance of stratified analysis vs. pooled in a variety of contexts that will let you see that concept in action. perhaps another reader will have a domain specific example, but i dont think one is strictly necessary.
most cancers have such large scale genomic changes. considering going to cBioportal, selecting a large study, selecting only samples with mRNA, then viewing the CNA track for that malignancy.
Okay, that makes sense, I'll take a look at some of those videos - thanks for your suggestions :)
Also, I wanted to mention that I have other sets of comparable data from other cancer lineages, that have more even groups (15 sensitive and 15 resistant), if that makes a difference to your answer.
that makes a tremendous difference.
here the answer is, you process both datasets as similarly as possible, then you organize all the data you have into a meta-analysis and analyze all the data you have jointly.
the issue is that there will likely be batch effect issues that generate spurious (false positive and false negative) results because they are driven by difference between batches not real biology.
thus, you need to employ techniques that control for this. there are many options. for instance, you could control for batch as a covariate, and see if that brings the sensitive and resistant samples in line with each other, etc.
to date, every analysis ive ever done (whether expression data or otherwise) has benefited from data pooling followed by application of meta-analytic techniques, but its a lot more work.