Question

Which zUMIs output to use for differential gene expression?

2

Entering edit mode

20 months ago

firestar ★ 1.7k

I have single cell Smart-Seq 3 data. The zUMIs pipeline produces about 19 different output count matrices:

umicount
  exon
    all
    downsampled
  inex
    all
    downsampled
  intron
    all
    downsampled
readcount
  exon
    all
    downsampled
  inex
    all
    downsampled
  intron
    all
    downsampled
readcount_internal
  exon
    all
    downsampled
  inex
    all
    downsampled
  intron
    all
    downsampled
rpkm
  exon
    all

Read counts from Exons are the traditional RNASeq counts that one would expect. UMI Counts from exons are the equivalent corrected for PCR duplication. So this will have lower counts compared to reads. The zUMIs paper claims that intron+exon counts improves clustering.

i am interested in views from the experts. What are downsampled counts? What is readcount_internal? What are the implications of using intron+exon? Should rpkm be used at all? Which dataset should be used for differential gene expression? What are the pros and cons of different types of data? What considerations should one keep in mind?

zumis rnaseq smartseq3 single-cell • 1.0k views

ADD COMMENT • link updated 7 weeks ago by dsull ★ 7.3k • written 20 months ago by firestar ★ 1.7k

0

Entering edit mode

Hello guys! Is there any feedback on the question above?

ADD REPLY • link 7 weeks ago by Evlampia • 0

0

Entering edit mode

There are too many questions and I’m not sure what you want me to actually answer. The smart-seq3 paper already does a very good job of explaining what things like “internal” means, what “umi” means, how they use rpkm in their figures, how they use downsampled counts in their figures, etc.

ADD REPLY • link 7 weeks ago by dsull ★ 7.3k