Hi all,
I am trying to analyze a dataset of samples that had early or late progression and I am using limma.
My code is as follows:
design<-model.matrix(~ 0 + df$progression, data = df)
colnames(design)<- levels(df$progression)
contr<-makeContrasts(EvsL=Early-Late, LvsE=Late-Early, levels=design)
y_des_df<-voom(y_df, design = design, normalize.method = "quantile")
y_des_df.fit<-lmFit(y_des_df, design = design)
y_des_df.fit<-contrasts.fit(y_des_df.fit, contrasts=contr)
y_des_df.fit<-eBayes(y_des_df.fit) 
topTable_df<-topTable(y_des_df.fit, p.value = 0.05, coef="LvsE", n=Inf, adjust.method = "BH", sort.by="logFC")
head(topTable_df)
And this is my topTable:
GeneID Chr Length Symbol   type_of_gene    logFC  AveExpr         t
1915    1915   6   4090 EEF1A1 protein-coding 13.91672 13.91672 173.13201
4512    4512   M   1542   COX1 protein-coding 12.91041 12.95493 115.00098
7178    7178  13   4814   TPT1 protein-coding 12.75672 12.61048  86.01986
567      567  15   1588    B2M protein-coding 12.69427 12.52030  91.45237
My problem is that the logFC is very similar to my AveExpr and if I look in my count data I see no real difference between different samples, for example:
    Early_1 Early_2 Early_3 Late_4 Late_5 Late_6 
COX1        13.06414        12.60298      13.06414      13.06414 13.06414      12.87005
I assume that there is something wrong with my contrasts but I can't understand what. Is anyone able to help me please?
Many thanks!
That is weird. Maybe it's because your contrast are essentially the same? Early vs Late is the same as Late vs Early (only the logFC direction will change). I think you don't need the contrast at all because your design is simple (AvsB).
Hi Amar, I've tried it completely without contrasts too and the results are the same. Also, when I use a different contrast (or different coefficient in the case of no contrasts) the results look exactly the same!
...a bizarre coincidence, or, like, winning the national lottery? Can you explain a bit more about the data itself. You have not shown your pre-processing steps.
Hi Kevin,
Yes, I know, but not as fun as winning the lottery! :/
I've used feature counts and edgeR, as follows:
That's pretty much it, I'm not sure if you need anything else?
Thanks very much!
Thanks! How does a MDS plot appear?
?
What is the output of
?
Also, what if you avoid quantile normalisation and instead just use:
Thank you Kevin!
Same results without the normalisation!!
And my MDS plot looks like that: (Red-Late, Green-Early)
https://ibb.co/yy0nqF6
The number of genes failing your
is expressedfilter is quite high, which is interesting in itself. Were the raw counts generally low for all samples?Yeah, I can't say that they're massively high...
for example:
Those are genes as rows, right? The number of zeros seems high even after you filtered the data? I have used DESeq2 much more than limma / voom, so, I'm not sure how voom handles zeros.
Oops, sorry, wrong data set! Here's the correct one, that's before the calcNormFactors:
And yes, the rownames are the transcript ids
It has come to the point whereby, at least from my perspective, it would literally require an examination of the input and output of every single command that is being run, i.e., in order to ensure that each is doing what you expect. The fact remains that it could still just be coincidence, I suppose.
Another option: post this on the Bioconductor support forum (and link back to this thread), where Gordon Smyth (limma and EdgeR 'mastermind') will pick it up.
I just tried everything using your exact code but using random data, and was not able to reproduce the problem. This code is reproducible:
Thanks Kevin, I ran it again from scratch, it seems now that my results are not significant, since the toptable is empty.
Thanks for all you help!
I just noticed this thread on Bioconductor, too: https://support.bioconductor.org/p/126919/
can you post both your design matix and your contrasts matrix, please
That's my contrasts matrix:
And that's my design matrix:
What does
dflook like? It looks like this is testing the intercept, which shouldn't actually exist.Hi Devon,
This is how my df looks like:
In the Toptable for COX1, your logFC is equal to the mean of the (
Early_1,Early_2,Early_3) values (which is incorrect; it should equal the mean of the 'late's, minus the mean of the 'early's), whereas the AveExpr value is equal to the mean across all samples (which is correct). Very odd. Are there only two levels in yourprogressioncolumn?Yes, indeed it's very strange! These are the only too levels in my progression column!