Question

ERCC count matrix smart-seq2

2

Entering edit mode

6.5 years ago

VHahaut ★ 1.2k

Hi!

I received some data produced by single-cell sequencing (smart-seq2). After aligning onto the corresponding genome (+ERCC) I got a matrix of count which look quite sparsed:

ID  start   end strand  length  count1  count2  count3
ERCC-00004  1   523 +   523 1902    145 2328
ERCC-00009  1   984 +   984 318 29  428
ERCC-00012  1   994 +   994 0   0   0
ERCC-00013  1   808 +   808 0   0   0
ERCC-00014  1   1957    +   1957    0   0   0

With typically higher concentration ERCCs being detected but not lower ones.

I know for sure that those samples were not sequenced deep enough but I wanted to know if for ERCCs in single-cell sequencing I should expect to see every ERCC detected or is it common to see some drop-off ?

Thanks in advance!

RNA-Seq • 2.9k views

ADD COMMENT • link updated 6.5 years ago by Charles Plessy ★ 2.9k • written 6.5 years ago by VHahaut ★ 1.2k

0

Entering edit mode

Hi! I also received Smart-seq2 data. I never analyzed data coming from... so I would ask how to perform the first steps. I got XX fastq files.. so I first trimmed after a QC report then I would align them separately with star or tophat and then quantify the abundance of each transcript with RSEM or HTseq count.. then use R (scater) to perfom further analysis.. is this workflow correct or should I use different tools ??

thanks in advance

ADD REPLY • link 4.7 years ago by santamariagianluca • 0

score 5 · Answer 1 · 2017-10-27

5

Entering edit mode

6.5 years ago

Charles Plessy ★ 2.9k

The ERCC RNA spike-in mixes manufactured by Thermo Fisher cover six orders of magnitude of concentrations, therefore it is expected by design that a large number of them will not be detected. You can have a look at Svensson et al., 2007 for an example on how to analyse the spike data.

Side comment: while people have been for a long time aligning their reads to the sequence of the plasmid inserts used to produce the spikes, the NIST now distributes the sequence of the transcribed products, which are more accurate at the 5′ end. It may not make a big difference for a RNA-seq analysis, but I recommend to use these reference sequences (disclaimer: I contributed to prepare the file, see Patch ERCC spike sequences to get their real 5-prime ends. for details).

ADD COMMENT • link 6.5 years ago by Charles Plessy ★ 2.9k

0

Entering edit mode

Isn't ERCC-00012 one of the ERCC controls with amongst the highest concentrations, though? I would have expected that to show some counts.

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

2

Entering edit mode

Thanks for your comments! According to thermoSc. website ERCC-00012 is one of the lowest:

ERCC-00012  C   0.11444092 attomoles/ul