Question: Normalization and other ambiguous terminology
gravatar for John
3.1 years ago by
John12k wrote:

Hi :)

I find when I write notes or try and explain how something works to someone else, I easily lose track of when some value has been "normalized", and what exactly normalization in this circumstance means.

For example, a particular ChIP-Seq assay could be:

  • Raw signal

  • Normalized by total reads in sample

  • Normalized by total signal in sample

  • "Normalized" to signal in input control (which generally gets normalized itself)

  • Quantile normalized to other samples of the same assay type

I'm sure there are other, more complicated, ways to "normalize" -- and perhaps in half the cases where I say I have normalized the data I have actually just "transformed it" (certainly the distribution is not normal) -- but those are the ones I commonly do. Given that their is so much room for ambiguity here, I was wondering if there is a standard nomenclature for this in Mathematics or Statistics? I don't think there are enough symbols for every normalization scenario - but just to be able to differentiate between input normalized, read count normalized, and not at all normalized, would really help. Googling "normalization symbol" didn't help :(

If you have any other examples of confusing terminology or non-specific "bioinformatic slang", it would be great to hear about those too :) Some times I don't realize how unspecific i'm being when i say things (usually because I expect everyone to know what I mean), so it would be great to hear common tropes people encounter or use themselves.

bioinformatics • 780 views
ADD COMMENTlink modified 3.1 years ago by Steven Lakin1.4k • written 3.1 years ago by John12k
gravatar for Steven Lakin
3.1 years ago by
Steven Lakin1.4k
Fort Collins, CO, USA
Steven Lakin1.4k wrote:

Unfortunately, I don't believe there is a universal way of indicating this. I've seen hat and tilde versions of variables used to represent some kind of "normalizing" transformation, but this is likely abuse of nomenclature at some level.

You're better off explicitly defining the transformation first, then if you need a variable to represent that, define the variable in terms of the transformation function and the data. Those who understand the transformation being used will likely also recognize the symbolic form, and if they don't, perhaps it would be better to explain using a simplified example.

Another personal opinion: as an audience member in cross-disciplinary talks, I'd rather see the explicit mapping of data to its transformed state than a field-specific symbol representing a concept I'm likely not familiar with. However, that's also dependent on your audience at the time.

ADD COMMENTlink written 3.1 years ago by Steven Lakin1.4k

I think you hit the nail on the head there - for cross-disciplinary talks, you're options boil down to:

  • explicitly describe the transformation to a bunch of people who probably will drift-off as soon as you say anything vaguely non-biological.

  • brush over the specifics of the transformation, maybe not even mentioning it at all.

The latter is what I think happens most often in cross-disciplinary talks, probably due to time constraints, but it results in a sort of institutionalized muddiness when talking about concepts which should really be very clear. If only R functions came with a little animation of how they worked that you could embed into your presentations :)

ADD REPLYlink written 3.1 years ago by John12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1734 users visited in the last hour