Question: How to take input TCGA miRNA data in Limma R package for differential expression
gravatar for skjobs1234
3.1 years ago by
skjobs123420 wrote:

I have set of miRNA data from TCGA. I want differential expression analysis using limma R package. Please guide me in details how to proceed. How to take input?

rna-seq next-gen R • 2.4k views
ADD COMMENTlink modified 8 months ago by Barry Digby500 • written 3.1 years ago by skjobs123420

Which type of data did you get? - the open-access raw counts or the aligned BAM files?

ADD REPLYlink written 3.1 years ago by Kevin Blighe66k

Raw data from TCGA database

ADD REPLYlink written 3.1 years ago by skjobs123420

Okay, I recently conducted an analysis of this exact data from the TCGA but I used DESeq2 to do the normalisation and differential expression.

First, I got a list of all TXT files containing the counts and read them into R using a loop. I then had a data-frame of raw counts. With that, it was easy to use DESeq2.

Do you have to use Limma or was it just a suggestion? I have only used Limma for microarray analyses, but I'm aware that the developers recently upgraded so that it can be used for RNA-seq

ADD REPLYlink written 3.1 years ago by Kevin Blighe66k

No.. I didn't used Limma right now. I gone through the Limma tutorial "theory" only. I want microarray analyses to see the differential expression of genes. Please suggest. Which tools will be better to used to see the differential expression DEseq2, limma and other tools.

ADD REPLYlink written 3.1 years ago by skjobs123420

If it's TCGA micro-RNA microarray data, then use limma. If it's TCGA micro-RNA-seq, then use DESeq2 or Limma-Voom. It looks like you have microarray data in which case I imagine that you have the normalised counts.

To run limma, you can use the following code:

We assume that you have a counts matrix, countsMatrix, and metadata in an object called metadata. The order of rows in metadata should match the order of colnames in countsMatrix.

In metadata, we have a column 'Condition', which gives details of the condition applied to each sample, e.g., control, treatmentA, treatmentB, etc.


    #Set up the comparison matrix
    design <- paste(metadata$Condition, sep="")

    design <- factor(design)

    comparisonmodel <- model.matrix(~0+design)

    colnames(comparisonmodel) <- levels(design)

    #Create a limma object
    project.fitmodel <- lmFit(countsMatrix, comparisonmodel)
    project.fitmodel.eBayes <- eBayes(project.fitmodel)


    #Make individual contrasts, fit the model, and adjust by empirical Bayes
    #Control Vs treatmentA
    limControlVtreatA <- makeContrasts(Q1="Control-treatmentA", levels=comparisonmodel)
    limControlVtreatA.fitmodel <-, limControlVtreatA)
    limControlVtreatA.fitmodel.eBayes <- eBayes(limControlVtreatA.fitmodel)

    #Control Vs treatmentB
    limControlVtreatB <- makeContrasts(Q2="Control-treatmentB", levels=comparisonmodel)
    limControlVtreatB.fitmodel <-, limControlVtreatB)
    limControlVtreatB.fitmodel.eBayes <- eBayes(limControlVtreatB.fitmodel)

    #Obtain statistics for each comparison and output results

    #output all stats; no P value adjusment
    topTable(limControlVtreatA.fitmodel.eBayes, coef="Q1", adjust="none", number=99999, p.value=1)

    #output all stats; adjust by Benjamini-Hochberg
    topTable(limControlVtreatA.fitmodel.eBayes, coef="Q1", adjust="BH", number=99999, p.value=1)

    #output only adjust P values < 0.05
    topTable(limControlVtreatA.fitmodel.eBayes, coef="Q1", adjust="BH", number=99999, p.value=0.05)
ADD REPLYlink modified 2.1 years ago • written 3.1 years ago by Kevin Blighe66k

I have data set miRNA from TCGA. Data set look like this, miRNA_ID, read_count , reads_per_million_miRNA_mapped, and crossed-mapped

miRNA_ID    read_count  reads_per_million_miRNA_mapped  cross-mapped 
hsa-let-7a-1    74726   8647.065140                          N
hsa-let-7a-2    75087   8688.838961                          N 
hsa-let-7a-3    75246   8707.237957                          N 
hsa-let-7b      77259   8940.176186                          N 
hsa-let-7c       1480   171.261093                           N

So, My objective is to different fold differential analysis of miRNA data set. Please guide which tool(s) is better for this data analysis. I seen some paper using limma and edger. So please guide how to take input and proceed this data set. Thanks

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by skjobs123420

Hi, @skjobs1234 did you find an appropriate answer for your query? Even I'm performing the similar analysis.

ADD REPLYlink written 2.1 years ago by pradyumna Jayaram200

Take read_count and use that as input for Limma-Voom. Presumably you have separate read_count values for different samples?

ADD REPLYlink written 2.1 years ago by Kevin Blighe66k

By the way, it looks like you have micro-RNA-seq data, not microarray.

ADD REPLYlink written 2.1 years ago by Kevin Blighe66k

@kevin Yes it is miRNA-seq data from TCGA

ADD REPLYlink written 2.1 years ago by pradyumna Jayaram200

...and you just have a single column of values? Sorry for the late follow-up - I missed your response.

ADD REPLYlink written 2.1 years ago by Kevin Blighe66k

It seems like the data is for single sample. the same read count is available for multiple samples and can be merged. I'm Guessing to go with DESeq2 or EdgeR for differential analysis. can limma be used for same? which one would be more effitient in identifying differential expression?

ADD REPLYlink written 2.1 years ago by pradyumna Jayaram200

Yes, you should merge them together into a data-frame, i.e., the read_count columns. Try cbind() or just data.frame(). EdgeR and DESeq2 are more common for differential expression analysis in RNA-seq. Limma was the supreme method for decades for microarrays and the equivalent method for RNA-seq data is Limma-Voom.

ADD REPLYlink written 2.1 years ago by Kevin Blighe66k
gravatar for Barry Digby
8 months ago by
Barry Digby500
National University of Ireland, Galway
Barry Digby500 wrote:

Just to add on to what Kevin was getting at,

At this link --

You can download miRNA / gene counts from TCGA via Biolinks and perform DEA with DESeq2. I used PRAD in this example.

There is some code for handling messy metadata in there too, but be careful that it applies to your data as TCGA changes over time.

ADD COMMENTlink modified 4 days ago • written 8 months ago by Barry Digby500
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 949 users visited in the last hour