Hi, I am learning how to use limma for bulk RNA-seq data. Below is a snapshot of the metadata file.
My samples are categorised by Sex (Male and Female) and by 6 Age categories (Infant, Young child, Older child, Teens, Young adults and Older adults).
I would like to identify which genes are differentially expressed for each sex and age group. For example, which genes are overexpressed in female infants vs male infants. All the expression values are from controls there is no treatment here.
After reading the limma user guide and several other online resources, the following is the matrix I designed.
<h5>Specifying the linear model</h5>design <- model.matrix(~0+ Group, data = metadata)
colnames(design) <- c("F_infant","F_olderadult" ,"F_olderchild", "F_teens","F_youngadult",
"F_youngchild" , "M_infant", "M_olderadult", "M_olderchild","M_teens",
"M_youngadult","M_youngchild")
fit<-lmFit(eset, design)
<h5>Contrast Matrix</h5>
cont.matrix <- makeContrasts( infants = F_infant-M_infant,
youngchild = F_youngchild - M_youngchild,
olderchild= F_olderchild- M_olderchild,
teens= F_teens-M_teens,
youngadult= F_youngadult - M_youngadult,
olderadult= F_olderadult - M_olderadult,
levels=design)
<h5>Fit the model</h5>
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
results <- decideTests(fit2, p.value = 0.05)
Can you please let me know if this is the correct approach or am I missing something. Any help is really appreciated!
Thanks. LM