I finally got my hands on a dataset with properly designed ERCC92 spike ins. The question is, how should I use these with ALR in theory?
The additive log-ratio transformation (alr), which allows the user to scale their data by a feature with an a priori known fixed abundance, such as a house-keeping gene or an experimentally fixed variable (e.g., a ThermoFisher ERCC synthetic RNA “spike-in”15), may provide a superior alternative. In contrast to clr, proportionality calculated with alr does not change with missing feature data because it effectively back-calculates the absolute feature abundance.
Do I use a single ERCC92 feature as the reference, the summation, or the mean?
Do I include all or only a select few if it's the latter 2 options?
Should I scale all the datasets so their ERCC92 spike counts are the same before transformation? (This will likely result in the same data, though I'm thinking out loud and haven't tested)