I have set of miRNA data from TCGA. I want differential expression analysis using limma R package. Please guide me in details how to proceed. How to take input?
Which type of data did you get? - the open-access raw counts or the aligned BAM files?
Raw data from TCGA database
Okay, I recently conducted an analysis of this exact data from the TCGA but I used DESeq2 to do the normalisation and differential expression.
First, I got a list of all TXT files containing the counts and read them into R using a loop. I then had a data-frame of raw counts. With that, it was easy to use DESeq2.
Do you have to use Limma or was it just a suggestion? I have only used Limma for microarray analyses, but I'm aware that the developers recently upgraded so that it can be used for RNA-seq
No.. I didn't used Limma right now. I gone through the Limma tutorial "theory" only. I want microarray analyses to see the differential expression of genes. Please suggest. Which tools will be better to used to see the differential expression DEseq2, limma and other tools.
If it's TCGA micro-RNA microarray data, then use limma. If it's TCGA micro-RNA-seq, then use DESeq2 or Limma-Voom. It looks like you have microarray data in which case I imagine that you have the normalised counts.
To run limma, you can use the following code:
We assume that you have a counts matrix, countsMatrix, and metadata in an object called metadata. The order of rows in metadata should match the order of colnames in countsMatrix.
In metadata, we have a column 'Condition', which gives details of the condition applied to each sample, e.g., control, treatmentA, treatmentB, etc.
#Set up the comparison matrix
design <- paste(metadata$Condition, sep="")
design <- factor(design)
comparisonmodel <- model.matrix(~0+design)
colnames(comparisonmodel) <- levels(design)
#Create a limma object
project.fitmodel <- lmFit(countsMatrix, comparisonmodel)
project.fitmodel.eBayes <- eBayes(project.fitmodel)
#Make individual contrasts, fit the model, and adjust by empirical Bayes
#Control Vs treatmentA
limControlVtreatA <- makeContrasts(Q1="Control-treatmentA", levels=comparisonmodel)
limControlVtreatA.fitmodel <- contrasts.fit(project.fitmodel.eBayes, limControlVtreatA)
limControlVtreatA.fitmodel.eBayes <- eBayes(limControlVtreatA.fitmodel)
#Control Vs treatmentB
limControlVtreatB <- makeContrasts(Q2="Control-treatmentB", levels=comparisonmodel)
limControlVtreatB.fitmodel <- contrasts.fit(project.fitmodel.eBayes, limControlVtreatB)
limControlVtreatB.fitmodel.eBayes <- eBayes(limControlVtreatB.fitmodel)
#Obtain statistics for each comparison and output results
#output all stats; no P value adjusment
topTable(limControlVtreatA.fitmodel.eBayes, coef="Q1", adjust="none", number=99999, p.value=1)
#output all stats; adjust by Benjamini-Hochberg
topTable(limControlVtreatA.fitmodel.eBayes, coef="Q1", adjust="BH", number=99999, p.value=1)
#output only adjust P values < 0.05
topTable(limControlVtreatA.fitmodel.eBayes, coef="Q1", adjust="BH", number=99999, p.value=0.05)
I have data set miRNA from TCGA. Data set look like this, miRNA_ID, read_count , reads_per_million_miRNA_mapped, and crossed-mapped
miRNA_ID read_count reads_per_million_miRNA_mapped cross-mapped
hsa-let-7a-1 74726 8647.065140 N
hsa-let-7a-2 75087 8688.838961 N
hsa-let-7a-3 75246 8707.237957 N
hsa-let-7b 77259 8940.176186 N
hsa-let-7c 1480 171.261093 N
So, My objective is to different fold differential analysis of miRNA data set. Please guide which tool(s) is better for this data analysis. I seen some paper using limma and edger. So please guide how to take input and proceed this data set. Thanks
@skjobs1234 did you find an appropriate answer for your query? Even I'm performing the similar analysis.
Take read_count and use that as input for Limma-Voom. Presumably you have separate read_count values for different samples?
By the way, it looks like you have micro-RNA-seq data, not microarray.
@kevin Yes it is miRNA-seq data from TCGA
...and you just have a single column of values? Sorry for the late follow-up - I missed your response.
It seems like the data is for single sample. the same read count is available for multiple samples and can be merged. I'm Guessing to go with DESeq2 or EdgeR for differential analysis. can limma be used for same? which one would be more effitient in identifying differential expression?
Yes, you should merge them together into a data-frame, i.e., the read_count columns. Try cbind() or just data.frame(). EdgeR and DESeq2 are more common for differential expression analysis in RNA-seq. Limma was the supreme method for decades for microarrays and the equivalent method for RNA-seq data is Limma-Voom.
Just to add on to what Kevin was getting at,
At this link -- https://github.com/BarryDigby/TCGA_Biolinks/blob/master/TCGA_Biolinks.Rmd
You can download miRNA / gene counts from TCGA via Biolinks and perform DEA with DESeq2. I used PRAD in this example.
There is some code for handling messy metadata in there too, but be careful that it applies to your data as TCGA changes over time.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy