Question: CCLE miRNA expression data
gravatar for baris.blknt
14 months ago by
baris.blknt0 wrote:


Recently, CCLE released a miRNA expression data. I was looking for the normalization method of this miRNA expression data but I couldn't find. When I sum up all miRNA's expression values for a sample(cell line), I realized that variation is very high between cell lines like unnormalized data. Does this data need any normalization and do you have any suggestion?

You can find the expression data with following link.


mirna nanostring ccle • 951 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by baris.blknt0
gravatar for shawn.w.foley
14 months ago by
shawn.w.foley1.2k wrote:

From Ghandi et al. (2019) Nature it looks like the miRNAs were measured via Nanostring and normalized using the nSolver software, they don't go into too much detail, but the Methods section states:

Samples were divided into 14 batches, and two replicates of the K-562 cell line were included in each batch as a control. Internal positive and negative controls were used for normalization as recommended by NanoString using NanoString nSolver software. We excluded samples that failed NanoString nSolver quality control as well as one sample based on low positive control signal (normalization coefficient >6) and another sample based on high background signal (with second ranked negative control value >80). To estimate the background signal, we sorted the values for the negative controls within each sample and picked the second highest value as the background estimate. The median background estimate across all cell lines was 26.1. We used log(50 + N), in which N is the nSolver normalized value to reduce the effect of the background signal in the downstream analyses.

ADD COMMENTlink written 14 months ago by shawn.w.foley1.2k

Shawn, thank you for the reply. However, I want to ask this: when I sum up all miRNA expression values for each cell line, I observed 9-fold difference between some cell lines. Do you think it is normal?

ADD REPLYlink written 14 months ago by baris.blknt0

That's definitely a bit of a red flag, so I started to dig into the data a bit. In my hands, the extreme data ranges seem to be outliers. I reformatted and read in the miRNA data (each column is a cell line, each row is a miRNA), and I found:

mir <- read.table('CCLE_miRNA_20181103.reformat.gct',header=T,row.names=1,,sep='\t')
cellLines <- colSums(mir)
        0%        10%        20%        30%        40%        50%        60% 
  36710.01  137371.74  163233.40  192464.58  214991.96  244239.64  275519.66 
       70%        80%        90%       100% 
 311112.12  371830.14  465336.89 1218169.48 
[1] 3.387428

So, while there's a large range of expression, there's only a 3.4-fold difference between the 10% and 90% quantiles. Additionally, if you plot the log of these data as plot(density(log10(cellLines))) it'll generate an approximately normal curve. So it does appear that the large variance is occurring at the extreme ends of the spectrum.

The paper also specified that they normalized to the Nanostring positive and negative controls. If the miRNA panel is like gene expression panels I've analyze then it includes a set of spike in and endogenous standards to normalize for RNA input. Everything seems above board to me.

ADD REPLYlink written 14 months ago by shawn.w.foley1.2k

I see. Thank you again.

ADD REPLYlink modified 4 weeks ago • written 14 months ago by baris.blknt0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1133 users visited in the last hour