Question

Forum:A plot twist to DropletQC (scRNAseq Nuclear Fraction-based empty droplet filter)

1

Entering edit mode

3 months ago

hsuknowledge ▴ 10

DropletQC has been an alternative way to filter empty droplets, using droplet nuclear fraction summarized from the CellRanger-specific RE:A bam tags, plotted against log-UMI counts, to make the empty droplets stand out. As far as I'm concerned, DropletQC results in equal or better cell calling than DropletUtils::emptyDrops, especially on the high-UMI, low-nuclear fraction end.

Here is an output from identify_empty_drops using CellRanger bam as a source. Alternatively, if you have Velocyto matrices from an RNA velocity analysis, you can use them too, to derive the nuclear fraction number.

DropletQC from CellRanger bam and raw barcodes

I have run the same sample through STARSolo, and here is what you may find obstructive when using the Velocyto output from STARSolo. If you'd like to extend the UMI threshold lower like I do, you will find the lowermost rows are discretized. This makes DropletQC's density estimation algorithm generate a lot of ripples and become trapped at a local minimum.

DropletQC from STARSolo Velocyto spliced/unspliced counts

In trying to fix the issue (github), I discovered a pretty neat relationship.

You see, in the scatter plots, there seems to be a curve that can delineate the two groups, from upper left to lower right. Fortunately, when I switch the plot to log-log axes, it becomes clear that a slanted line runs through the gap between the two.

log-log plot reveals a linear gap between the two droplet groups

By fixing the slope to -1, I can write this linear formula

log(y) = -log(x) + intercept

which rearranges to

xy = constant.

And since x, the nuclear fraction, equals Unspliced count over UMI count, and y is the UMI count, we find the gap line can be expressed simply as

Unspliced = constant.

And so, we can plot log(Unspliced count) to log(UMI count), and get a vertical gap instead.

og-log plot, but x axis is now log(Unspliced count)

This suggests that our empty droplets can be delineated by a threshold on the unspliced count. Isn't that neat? I hope we can continue to find more cases where this holds.

But one caveat. If you work from CellRanger bam tags, there's a chance that this won't hold. In DropletQC's nuclear_fraction_tags and nuclear_fraction_annotation functions, they forgot to account for duplicated molecules, so the nuclear fraction calculated is based on all the reads, not Unique Molecular counts. I believe this has to be fixed, for a UMI-UMI plot makes more sense than if one side is not UMI. I'm also showing the upper bound of Unspliced counts for each UMI level on the previous plot and the next, which is exceeded in the next one because it uses the number of sequencing reads rather than unique molecular counts. (For this I have to multiply nuclear fractions by read counts from molecule_info.h5, which has removed some reads from resolving UMI clashes; thus the recovered unspliced counts are hardly whole numbers.)

log-log plot, but with Unspliced reads instead of Unspliced UMIs

Update

The unspliced threshold isn't necessarily constant, or a vertical line. For example, in the 10X13_4 sample (E14.5 mouse dorsal forebrain, La Manno et al. 2021), a slope-2 line fits through the gap.

plotting Unspliced vs UMI for 10X13_4 sample

Here is a code snippet that makes the plot from a list of Velocyto matrices.

library(Matrix)
DropletQC <- function(velocyto, lower = 30, count.ambiguous = TRUE) {
    ## velocyto: list(spliced, unspliced, ambiguous)
    umi.table <- sapply(velocyto, colSums)
    umi.count <- rowSums(umi.table)
    unspliced <- umi.table[, 2] + if (count.ambiguous) umi.table[, 3] else 0
    cbind(unspliced, umi.count)[umi.count > lower, ]
}

## usage
x <- DropletQC(vel.list)
plot(x, log = "xy")
abline(a = 0, b = 1)

Say you find a line that goes through the gap,

y = bx - a

Then the passing condition is

Unspliced^b > UMI * 10^a.

scRNAseq DropletQC Velocyto STARSolo • 7.9k views

ADD COMMENT • link 3 months ago by hsuknowledge ▴ 10