Hello everyone, I would like to perform a differential gene analysis on my dataset. It is composed of 8 pH points 5 different bacterial cocultures 3 biological triplicates = 120 samples. I have performed a de novo metatranscriptome with rnaSPAdes and now I am preparing my data to use Deseq2.
I'd like to carry out two types of analysis:
- Within my samples I have cultures of 2 bacteria and cultures of 3 bacteria (the same 2 + 1 to see the impact of adding this 3rd bacteria). I want to compare the differential gene expression of the 3-bacteria samples with the control, which would be my 2-bacteria sample. But I'm wondering if this is possible, given that the samples that always have one more bacteria (3-bacteria) than the control (2-bacteria) will inevitably have extra genes not present in the control?
- I want to compare gene expression between pH points for each 3-bacteria and 2-bacteria sample.
What tool should I use to obtain the counts of each gene per sample in order to have a matrix of counts for Deseq2? Is RSEM followed by tximport ok?
Do I have to use the RSEM tool (or other) to obtain a matrix of counts by groups of samples that I'm going to analyze together under Deseq2 or can I use it on all 120 samples and then cut up my matrix to use only groups of samples that I want to compare (the groups mentioned above, by groups of bacteria for the same pH and by pH points for the same co-culture) with Deseq2?
More theoretical question: Does it make sense to remove from the count matrix obtained the genes that with Blastx against a database give low % identities or should I do it earlier in the analysis like removing them from the metatranscriptome obtained directly?
Thank you for your reply. I now have my counts by transcript for each sample. So I'd like to use DESeq2 to make comparisons between my samples and my controls. I have a dataset including 2 variables, pH (8 values) and strains consortium (5 values) all in triplicate. My ColDesign for Deseq2 looks like this :
I would like to answer two questions: 1- For a given strains consortium do I have gene expression differences at any pH point compared to my control pH point (J1) ? 2- For a given pH point, do I have gene expression differences between my strains consortium compared to the control consortium (straincontrol) ?
1- I would like to have for straincontrol :
pHJ1_vs_pH4.6
For strain1 :
pHJ1_vs_pH4.6
And those comparisons for each of the strains.
2- I would like to have for pH J1 :
For pH 4.6 :
And those comparisons for each of the pH points.
I would like then generate one heatmap per pH comparison and per strain comparison since in each one I compare the samples to the same control. It would generate 13 heatmaps (8 with the pH and 5 with the strains consortium).
it is possible with DESeq2 to block one of the 2 variables, for example comparing for the consortium of strain1 the differents pH points to the control pH or should I split my data set to keep only the results of counts of genes corresponding to the consortium of strain1 and use DESeq2 with a single variable, the pH?
I tried differents designs such as :
but it doesn't give me the comparisons I want.
You would probably want to test if strain:pH is significant, you can test it with LRT test having the reduced formula strain + pH, this will tell you if the strains influence how pH changes gene expression. If you then want to test a specific strain for pH effect then you can use
results
to get the effect of a specific interaction term or compare one strain:pH to the reference strain:pH, which is different since it will include the pH overall (not strain specific) effect.Thxs for the answer! I tried doing this :
I get these comparisons :
and each strain except the straincontrol agaisnt each pH except J1.
I did this to extract the comparisons I want :
but as I don't have the comparison of each strain against the pHcontrol which is J1 I can't extract the comparisons I really want. And same for the controlstrain. How can I also get the comparisons with J1 and the straincontrol.pHxx ?
I didn't really understand what comparison you're looking for. You can use
contrast=list(..,..)
and put any pair of interacting strain and pH to compare them so you can compare a strain in different pH or two strains in the same pH. You can use thecontrast=c(strain, 'strain1', 'strain2')
to compare the overall effect of one strain against the control (regardless of the pH). If you can't find what you're looking for with this you can always design a vector the same length as theresultsNames
with 1, 0 and -1 for the numerator, not relevant and denominator respectively.Another thing you might want to consider is treating the pH as a continuous feature (and translate J1 to the actual pH onviously), you might want to add the squared pH for a LQ (linear quadratic) model to account for the case that the effect is not entirely linear. If the pH effect is vastly non-linear then forget what I've just said but I doubt it's the case. This will reduce the size of the design matrix.
To make a comparison I want, for exemple this one :
I get this error : Error in checkContrast(contrast, resNames) : all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'
When I run the command :
Comparisons of each strain against the pHcontrol which is J1 are not displayed. I have : strain2.pH6.45, strain2.pH6.26, strain2.pH6.00, strain2.pH5.73, strain2.pH5.37, strain2.pH5.02, strain2.pH4.6 but not strain2.pHJ1. however, I'd like to compare each pH point of the same strain to point J1 of that strain. Like this:
If I do it like this :
Does it give me the DEG between pH 6.45 and pH J1 for the strain2 ? I seen this possibility here : https://github.com/tavareshugo/tutorial_DESeq2_contrasts/blob/main/DESeq2_contrasts.md but I'm really not sure. Sorry I'm new to bioinformatics!
And is it normal to not have controlstrain.pHxx displayed after running the command resultsNames(dds) ?
You can add a
0 + strain + ...
to the formula, this way there won't be an intercept and all the factors should appear in the resultsNames.thxs for the help! with :
I get :
controlstrain, strain2, strain3, strain4, strain5, pH6.45, pH6.26, pH6.00, pH5.73, pH5.37, pH5.02, pH4.6 and eachstrain.eachpH except pHJ1 (control) and controlstrain so I'm still not able to get the DEG of the controlstrain for eachpH against controlpH or maybe I miss something.
That's weird, the controls should be there