New to sequencing analysis but I have some familiarity with standard analysis pipelines on typical (non-nuclear) RNAseq datasets, bulk and single-cell. But I now need to analyze a public dataset of single nuclear RNA-sequencing data (from this paper https://www.pnas.org/content/116/39/19619). The GEO accession provides R objects for each sample, each of which contains the output from a zUMIs pipeline. The objects are organized into lists: Top level contains UMI and Reads, and each of these contains Intron, exon, and intron-exon.
This may be an obvious, but I don't even know where to begin. What is the difference between "UMI count" and "Read count" in this case? UMI counts are overall lower so I'm assuming that matrix contains only unique reads... is this correct?
Second, what is the difference between the "intron", "exon", and "inex" lists? I would have imagined that "inex" contains both the intron and exon lists, but the number of counts don't quite add up. (intron + exon counts add up to 48.9 million, whereas total inex counts number 47.6 million)
Thanks in advance and sorry if this was already covered elsewhere.