Question: CCLE miRNA expression data
gravatar for baris.blknt
17 months ago by
baris.blknt0 wrote:


Recently, CCLE released a miRNA expression data. I was looking for the normalization method of this miRNA expression data but I couldn't find. When I sum up all miRNA's expression values for a sample(cell line), I realized that variation is very high between cell lines like unnormalized data. Does this data need any normalization and do you have any suggestion?

You can find the expression data with following link.


mirna nanostring ccle • 1.2k views
ADD COMMENTlink modified 22 days ago by venugopal8870 • written 17 months ago by baris.blknt0

Hi. I am new to CCLE data, at present it's very valuable information for me. I was struggling for a week to crack miRNA from CCLE. Could you please provide anything that I could understand and processing. I didn't understand data type also. What type of data in CCLE like( raw read counts) in that file.

Please help me if you find my message. Thank you very much This is my email i.d for any time.

ADD REPLYlink written 22 days ago by venugopal8870
gravatar for shawn.w.foley
17 months ago by
shawn.w.foley1.2k wrote:

From Ghandi et al. (2019) Nature it looks like the miRNAs were measured via Nanostring and normalized using the nSolver software, they don't go into too much detail, but the Methods section states:

Samples were divided into 14 batches, and two replicates of the K-562 cell line were included in each batch as a control. Internal positive and negative controls were used for normalization as recommended by NanoString using NanoString nSolver software. We excluded samples that failed NanoString nSolver quality control as well as one sample based on low positive control signal (normalization coefficient >6) and another sample based on high background signal (with second ranked negative control value >80). To estimate the background signal, we sorted the values for the negative controls within each sample and picked the second highest value as the background estimate. The median background estimate across all cell lines was 26.1. We used log(50 + N), in which N is the nSolver normalized value to reduce the effect of the background signal in the downstream analyses.

ADD COMMENTlink written 17 months ago by shawn.w.foley1.2k

Shawn, thank you for the reply. However, I want to ask this: when I sum up all miRNA expression values for each cell line, I observed 9-fold difference between some cell lines. Do you think it is normal?

ADD REPLYlink written 17 months ago by baris.blknt0

That's definitely a bit of a red flag, so I started to dig into the data a bit. In my hands, the extreme data ranges seem to be outliers. I reformatted and read in the miRNA data (each column is a cell line, each row is a miRNA), and I found:

mir <- read.table('CCLE_miRNA_20181103.reformat.gct',header=T,row.names=1,,sep='\t')
cellLines <- colSums(mir)
        0%        10%        20%        30%        40%        50%        60% 
  36710.01  137371.74  163233.40  192464.58  214991.96  244239.64  275519.66 
       70%        80%        90%       100% 
 311112.12  371830.14  465336.89 1218169.48 
[1] 3.387428

So, while there's a large range of expression, there's only a 3.4-fold difference between the 10% and 90% quantiles. Additionally, if you plot the log of these data as plot(density(log10(cellLines))) it'll generate an approximately normal curve. So it does appear that the large variance is occurring at the extreme ends of the spectrum.

The paper also specified that they normalized to the Nanostring positive and negative controls. If the miRNA panel is like gene expression panels I've analyze then it includes a set of spike in and endogenous standards to normalize for RNA input. Everything seems above board to me.

ADD REPLYlink written 17 months ago by shawn.w.foley1.2k

I see. Thank you again.

ADD REPLYlink modified 3 months ago • written 17 months ago by baris.blknt0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 926 users visited in the last hour