STARsolo --EmptyDrops_CR parameters
3 months ago
PianoEntropy ▴ 20

I'm having some issues with the alignment of 10x scRNAseq data to a custom genome and I thought it would be good to do a stricter filtering on the cells so more cells get filtered out as empty droplets. Apparently STARsolo has several filtering algorithms, of which the EmptyDrops one is most similar to the CellRanger filtering. The manual lists 10 different parameters, but I can't make sense of what they mean and there's no further explanation.

In STARsolo, this filtering can be activated by: --soloCellFilter EmptyDrops_CR. It can be followed by 10 numeric parameters: nExpectedCells (3000), maxPercentile (0.99), maxMinRatio (10), indMin (45000), indMax (90000), umiMin (500), umiMinFracMedian (0.01), candMaxN (20000), FDR (0.01), simN (10000).

Now if I just want a stricter threshold on e.g. UMIs/cell for the filtering (see picture), should I increase umiMin or also set anything else? If someone can provide a full explanation of the parameters that would also be great.

3 months ago
Rob 4.9k

The EmptyDrops_CR filter is, indeed, very similar to what is done by Cell Ranger. Alex did a fantastic job reverse-engineering what Cell Ranger is doing internally. Unfortunately, Cell Ranger's defaults (and hence those inherited by solo) are not particularly well-described, and it's also unclear the range of experiments for which these defaults should be reasonable versus those where they should be changed.

Recently, as part of a much larger project, a student of mine re-re-implemented this methodology in R. This link to the pull request on the DropletUtils repository includes function documentation that describes the effects of the different parameters as we understood them when re-implementing the approach.