Entering edit mode
5.7 years ago
Teresa
▴
20
Hi,
I have done an RNA-seq experiment with samples from Spain with a condition X, samples also from Spain but healthy controls, samples from France with a condition Y and samples also from France with the condition Y. If I want to do the DEG analysis using the DESEQ2 package to know the DE genes between the condition X and the condition Y, how do I have to normalice for the geographical gene expression changes?
Thanks a lot
You have two types of samples, and they combine two effects: the treatment (X and Y) and the sampling location (Spain and France).
You'll not be able to disentangle if the difference is due to the treatment or the sampling location. It is then only with support of the litterature or what you know about your system that you can discuss wether it is more likely to be one or the other...
Now with regards to the counts normalisation, I would proceed the usual way (like if you had 2 samples in condition A and 2 samples in condition B), following this paper recommendations for instance: https://f1000research.com/articles/5-1408/v3
Thanks for your repply.
Just to be sure, Wouldn`t it be possible so using the counts from the healthy controls from each country to normalice the count of the samples of the same country with the condition A/B and then compare condition A and B?
By following the above link (f1000research), you will see that they use the group information and the lane information. You could define the groups X, Y, Control_France and Control_Spain (or ControlFR and ControlS). That way you can come up with a contrast matrix that will result in the comparison of X to Y, Control France to Control Spain, X to Control Spain and Y to Control France I guess.
Design matrix:
Contrast matrix:
The main issue will be the number of replicates maybe? How many do you have for each combination of Country and Treatment (in the example above there is two replicates per condition)?
The normalisation will happen on every given sample, independently of the groups I think.
Another way to proceed is to only perform the normalisation for each comparison you want to do (by only running DESeq2 / EdgeR on the samples you want to compare directly, for example, one run with ControlFR and Y, another run with ControlS and X, etc). Both ways can be done. For example the paper above is using the first way (normalisation of everything together then perform the comparisons you want), while Snakepipes pipeline is using the second way (normalise and compare one against another). For more information: https://snakepipes.readthedocs.io/en/latest/content/workflows/RNA-seq.html#rna-seq