How to take input TCGA miRNA data in Limma R package for differential expression
1
1
Entering edit mode
5.2 years ago
skjobs1234 ▴ 40

I have set of miRNA data from TCGA. I want differential expression analysis using limma R package. Please guide me in details how to proceed. How to take input?

next-gen RNA-Seq R rna-seq • 3.5k views
1
Entering edit mode

Which type of data did you get? - the open-access raw counts or the aligned BAM files?

0
Entering edit mode

Raw data from TCGA database

0
Entering edit mode

Okay, I recently conducted an analysis of this exact data from the TCGA but I used DESeq2 to do the normalisation and differential expression.

First, I got a list of all TXT files containing the counts and read them into R using a loop. I then had a data-frame of raw counts. With that, it was easy to use DESeq2.

Do you have to use Limma or was it just a suggestion? I have only used Limma for microarray analyses, but I'm aware that the developers recently upgraded so that it can be used for RNA-seq

0
Entering edit mode

No.. I didn't used Limma right now. I gone through the Limma tutorial "theory" only. I want microarray analyses to see the differential expression of genes. Please suggest. Which tools will be better to used to see the differential expression DEseq2, limma and other tools.

0
Entering edit mode

If it's TCGA micro-RNA microarray data, then use limma. If it's TCGA micro-RNA-seq, then use DESeq2 or Limma-Voom. It looks like you have microarray data in which case I imagine that you have the normalised counts.

To run limma, you can use the following code:

We assume that you have a counts matrix, countsMatrix, and metadata in an object called metadata. The order of rows in metadata should match the order of colnames in countsMatrix.

In metadata, we have a column 'Condition', which gives details of the condition applied to each sample, e.g., control, treatmentA, treatmentB, etc.

    require(limma)

#Set up the comparison matrix

design <- factor(design)

comparisonmodel <- model.matrix(~0+design)

colnames(comparisonmodel) <- levels(design)

#Create a limma object
project.fitmodel <- lmFit(countsMatrix, comparisonmodel)
project.fitmodel.eBayes <- eBayes(project.fitmodel)

names(project.fitmodel.eBayes)

#Make individual contrasts, fit the model, and adjust by empirical Bayes
#Control Vs treatmentA
limControlVtreatA <- makeContrasts(Q1="Control-treatmentA", levels=comparisonmodel)
limControlVtreatA.fitmodel <- contrasts.fit(project.fitmodel.eBayes, limControlVtreatA)
limControlVtreatA.fitmodel.eBayes <- eBayes(limControlVtreatA.fitmodel)

#Control Vs treatmentB
limControlVtreatB <- makeContrasts(Q2="Control-treatmentB", levels=comparisonmodel)
limControlVtreatB.fitmodel <- contrasts.fit(project.fitmodel.eBayes, limControlVtreatB)
limControlVtreatB.fitmodel.eBayes <- eBayes(limControlVtreatB.fitmodel)

#Obtain statistics for each comparison and output results

#output all stats; no P value adjusment

#output all stats; adjust by Benjamini-Hochberg

#output only adjust P values < 0.05

0
Entering edit mode

I have data set miRNA from TCGA. Data set look like this, miRNA_ID, read_count , reads_per_million_miRNA_mapped, and crossed-mapped

miRNA_ID    read_count  reads_per_million_miRNA_mapped  cross-mapped
hsa-let-7a-1    74726   8647.065140                          N
hsa-let-7a-2    75087   8688.838961                          N
hsa-let-7a-3    75246   8707.237957                          N
hsa-let-7b      77259   8940.176186                          N
hsa-let-7c       1480   171.261093                           N


So, My objective is to different fold differential analysis of miRNA data set. Please guide which tool(s) is better for this data analysis. I seen some paper using limma and edger. So please guide how to take input and proceed this data set. Thanks

2
Entering edit mode

Hi, @skjobs1234 did you find an appropriate answer for your query? Even I'm performing the similar analysis.

1
Entering edit mode

Take read_count and use that as input for Limma-Voom. Presumably you have separate read_count values for different samples?

1
Entering edit mode

By the way, it looks like you have micro-RNA-seq data, not microarray.

2
Entering edit mode

@kevin Yes it is miRNA-seq data from TCGA

1
Entering edit mode

...and you just have a single column of values? Sorry for the late follow-up - I missed your response.

2
Entering edit mode

It seems like the data is for single sample. the same read count is available for multiple samples and can be merged. I'm Guessing to go with DESeq2 or EdgeR for differential analysis. can limma be used for same? which one would be more effitient in identifying differential expression?

2
Entering edit mode

Yes, you should merge them together into a data-frame, i.e., the read_count columns. Try cbind() or just data.frame(). EdgeR and DESeq2 are more common for differential expression analysis in RNA-seq. Limma was the supreme method for decades for microarrays and the equivalent method for RNA-seq data is Limma-Voom.

2
Entering edit mode
2.8 years ago
Barry Digby ★ 1.0k

Just to add on to what Kevin was getting at,

There is some code for handling messy metadata in there too, but be careful that it applies to your data as TCGA changes over time.