I am trying to analyze a dataset of samples that had early or late progression and I am using limma.
My code is as follows:
design<-model.matrix(~ 0 + df$progression, data = df) colnames(design)<- levels(df$progression) contr<-makeContrasts(EvsL=Early-Late, LvsE=Late-Early, levels=design) y_des_df<-voom(y_df, design = design, normalize.method = "quantile") y_des_df.fit<-lmFit(y_des_df, design = design) y_des_df.fit<-contrasts.fit(y_des_df.fit, contrasts=contr) y_des_df.fit<-eBayes(y_des_df.fit) topTable_df<-topTable(y_des_df.fit, p.value = 0.05, coef="LvsE", n=Inf, adjust.method = "BH", sort.by="logFC") head(topTable_df)
And this is my topTable:
GeneID Chr Length Symbol type_of_gene logFC AveExpr t 1915 1915 6 4090 EEF1A1 protein-coding 13.91672 13.91672 173.13201 4512 4512 M 1542 COX1 protein-coding 12.91041 12.95493 115.00098 7178 7178 13 4814 TPT1 protein-coding 12.75672 12.61048 86.01986 567 567 15 1588 B2M protein-coding 12.69427 12.52030 91.45237
My problem is that the logFC is very similar to my AveExpr and if I look in my count data I see no real difference between different samples, for example:
Early_1 Early_2 Early_3 Late_4 Late_5 Late_6 COX1 13.06414 12.60298 13.06414 13.06414 13.06414 12.87005
I assume that there is something wrong with my contrasts but I can't understand what. Is anyone able to help me please?
That is weird. Maybe it's because your contrast are essentially the same? Early vs Late is the same as Late vs Early (only the logFC direction will change). I think you don't need the contrast at all because your design is simple (AvsB).
Hi Amar, I've tried it completely without contrasts too and the results are the same. Also, when I use a different contrast (or different coefficient in the case of no contrasts) the results look exactly the same!
...a bizarre coincidence, or, like, winning the national lottery? Can you explain a bit more about the data itself. You have not shown your pre-processing steps.
Yes, I know, but not as fun as winning the lottery! :/
I've used feature counts and edgeR, as follows:
That's pretty much it, I'm not sure if you need anything else?
Thanks very much!
Thanks! How does a MDS plot appear?
What is the output of
Also, what if you avoid quantile normalisation and instead just use:
Thank you Kevin!
Same results without the normalisation!!
And my MDS plot looks like that: (Red-Late, Green-Early)
The number of genes failing your
is expressedfilter is quite high, which is interesting in itself. Were the raw counts generally low for all samples?
Yeah, I can't say that they're massively high...
Those are genes as rows, right? The number of zeros seems high even after you filtered the data? I have used DESeq2 much more than limma / voom, so, I'm not sure how voom handles zeros.
Oops, sorry, wrong data set! Here's the correct one, that's before the calcNormFactors:
And yes, the rownames are the transcript ids
It has come to the point whereby, at least from my perspective, it would literally require an examination of the input and output of every single command that is being run, i.e., in order to ensure that each is doing what you expect. The fact remains that it could still just be coincidence, I suppose.
Another option: post this on the Bioconductor support forum (and link back to this thread), where Gordon Smyth (limma and EdgeR 'mastermind') will pick it up.
I just tried everything using your exact code but using random data, and was not able to reproduce the problem. This code is reproducible:
Thanks Kevin, I ran it again from scratch, it seems now that my results are not significant, since the toptable is empty.
Thanks for all you help!
I just noticed this thread on Bioconductor, too: https://support.bioconductor.org/p/126919/
can you post both your design matix and your contrasts matrix, please
That's my contrasts matrix:
And that's my design matrix:
dflook like? It looks like this is testing the intercept, which shouldn't actually exist.
This is how my df looks like:
In the Toptable for COX1, your logFC is equal to the mean of the (
Early_3) values (which is incorrect; it should equal the mean of the 'late's, minus the mean of the 'early's), whereas the AveExpr value is equal to the mean across all samples (which is correct). Very odd. Are there only two levels in your
Yes, indeed it's very strange! These are the only too levels in my progression column!