Question: ERCC count matrix smart-seq2
gravatar for VHahaut
23 months ago by
VHahaut1.1k wrote:


I received some data produced by single-cell sequencing (smart-seq2). After aligning onto the corresponding genome (+ERCC) I got a matrix of count which look quite sparsed:

ID  start   end strand  length  count1  count2  count3
ERCC-00004  1   523 +   523 1902    145 2328
ERCC-00009  1   984 +   984 318 29  428
ERCC-00012  1   994 +   994 0   0   0
ERCC-00013  1   808 +   808 0   0   0
ERCC-00014  1   1957    +   1957    0   0   0

With typically higher concentration ERCCs being detected but not lower ones.

I know for sure that those samples were not sequenced deep enough but I wanted to know if for ERCCs in single-cell sequencing I should expect to see every ERCC detected or is it common to see some drop-off ?

Thanks in advance!

rna-seq • 1000 views
ADD COMMENTlink modified 23 months ago by Charles Plessy2.7k • written 23 months ago by VHahaut1.1k

Hi! I also received Smart-seq2 data. I never analyzed data coming from... so I would ask how to perform the first steps. I got XX fastq files.. so I first trimmed after a QC report then I would align them separately with star or tophat and then quantify the abundance of each transcript with RSEM or HTseq count.. then use R (scater) to perfom further analysis.. is this workflow correct or should I use different tools ??

thanks in advance

ADD REPLYlink written 9 weeks ago by santamariagianluca0
gravatar for Charles Plessy
23 months ago by
Charles Plessy2.7k
Charles Plessy2.7k wrote:

The ERCC RNA spike-in mixes manufactured by Thermo Fisher cover six orders of magnitude of concentrations, therefore it is expected by design that a large number of them will not be detected. You can have a look at Svensson et al., 2007 for an example on how to analyse the spike data.

Side comment: while people have been for a long time aligning their reads to the sequence of the plasmid inserts used to produce the spikes, the NIST now distributes the sequence of the transcribed products, which are more accurate at the 5′ end. It may not make a big difference for a RNA-seq analysis, but I recommend to use these reference sequences (disclaimer: I contributed to prepare the file, see Patch ERCC spike sequences to get their real 5-prime ends. for details).

ADD COMMENTlink modified 23 months ago • written 23 months ago by Charles Plessy2.7k

Isn't ERCC-00012 one of the ERCC controls with amongst the highest concentrations, though? I would have expected that to show some counts.

ADD REPLYlink modified 23 months ago • written 23 months ago by Kevin Blighe49k

Thanks for your comments! According to thermoSc. website ERCC-00012 is one of the lowest:

ERCC-00012  C   0.11444092 attomoles/ul
ADD REPLYlink written 23 months ago by VHahaut1.1k

Then it makes sense what Charles wrote. Great, thank for confirming.

ADD REPLYlink written 23 months ago by Kevin Blighe49k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1832 users visited in the last hour