Question: miRNA differential expression analysis, no significant adjusted p-values
0
gravatar for Alexander Skates
3.9 years ago by
United Kingdom
Alexander Skates360 wrote:

I'm attempting to carry out a differential expression analysis upon a GEO dataset, specifically focusing on the epithelial-mesenchymal transition (EMT). However I've gotten no signifcant results and I don't know where I'm going wrong.

I'm using empirical Bayes in limma to identify differentially expressed micro RNA. The experiment is done using the NCI-60 panel, and I've built the design matrix by categorising each cell line as either "epithelial" or "mesenchymal", based off Figure 1 of this paper (which categorises each cell line on the basis of the ratio of E-cadherin to Vimentin expression).

These are the top results I get after analysis. The lowest adjusted p-value is 0.20. The analysis that I'm trying to replicate got ~100 differentially expressed miRNAs to my 0.

         ID     miRNA_ID    logFC      P.Value adj.P.Val
10946 10946  hsa-miR-141 4.285924 0.0001753329 0.2018081
17427 17427 hsa-miR-200c 4.277219 0.0004007614 0.2306382
27777 27777 hsa-miR-200a 3.738894 0.0016939681 0.5190115
9578   9578 hsa-miR-200b 3.992947 0.0018036889 0.5190115
18900 18900 hsa-miR-200b 3.826707 0.0026484041 0.6096626
11000 11000 hsa-miR-200a 3.366898 0.0035893928 0.6885652

Additionally, the distribution of uncorrected p-values clusters around 1. Which definitely indicates that something is up.

The relevant bits of code I'm using (cell_type_nci60.csv can be found here):

library(limma)
library(GEOquery)

#####################################################################
## Step 2: GEO Preprocessing (miRNA)
#####################################################################

G <- getGEO("GSE26375",GSEMatrix=T) 
eset.geo <- G[[1]] # Get the ExpressionSet object

# Normalization: may not be necessary as GEO datasets should be pre-processed. 
#library(affyPLM) 
#eset.geo <- normalize.ExpressionSet.quantiles(eset.geo)

#####################################################################
## Step 3: Unify the sample names so can build the design matrix
#####################################################################

cleanNames <- function(names) {
  names <- gsub("-", "", names)
  names <- gsub(" ", "", names)
  names <- gsub("/", "", names)
  names <- gsub("\\(TB\\)", "", names) # Bit hacky but oh well
}

# Want to see each cell line, as need to split by epithelial/mesenchymal
# Phenotype data taken from Sun-Mi Park et al (2008)
pheno.cat <- read.csv("Data/cell_type_nci60.csv")
pheno.cat$title <- cleanNamespheno.cat$title)

mirna.names <- cleanNames(sapply(pData(phenoData(eset.geo.n))[,1:2]$title, as.character))
pheno.mirna <- sapplypheno.cat[match(geo.names,pheno.cat$title),3], as.character)

#####################################################################
## Step 4: Set up bits and pieces for limma
#####################################################################

# Design matrix 
design.mirna <- cbind(mi.e=pheno.mirna=="epithelial",
                      mi.m=pheno.mirna=="mesenchymal")*1 # to make it numeric
# Make contrast matrix
c.mirna <- makeContrasts(mi.e-mi.m, levels=design.mirna)

#####################################################################
## Step 5: Perform differential expression analysis
#####################################################################

### miRNA Analysis:

# Fit the linear model
fit.mirna <- lmFit(eset.geo, design.mirna) 
fit2.mirna <- contrasts.fit(fit.mirna, c.mirna) 
fit2.mirna <- eBayes(fit2.mirna) 

# Get the first few results
head(topTable(fit2.mirna, coef=1))

# Output everything
n.mirna <- length(featureNames(eset.mirna))
top.mirna <- topTable(fit2.mirna, coef=1, n=n.mirna, adjust="fdr")

# Generate output
write.table(top.mirna, file="mirna_de.out", quote=F, sep="\t", row.names=F)
ADD COMMENTlink modified 5 months ago by husainbioinfos0 • written 3.9 years ago by Alexander Skates360
1

Make sure you're associating the appropriate samples to the appropriate group in your design matrix. You can also just compare what you get with GEO2R directly in GEO.

ADD REPLYlink written 3.9 years ago by Devon Ryan95k
2
gravatar for Alexander Skates
3.9 years ago by
United Kingdom
Alexander Skates360 wrote:

The issue lay in the fact that the undefined samples acted as noise. Removing all non-epithelial and non-mesenchymal samples gave me better looking results!

ADD COMMENTlink written 3.9 years ago by Alexander Skates360
0
gravatar for husainbioinfos
5 months ago by
pune
husainbioinfos0 wrote:

i have downloaded LUAD miRNA data from GEO which has miRNA_id and read counts ,how can i get p values?as shown in above fig.

ADD COMMENTlink written 5 months ago by husainbioinfos0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 839 users visited in the last hour