Question: Which is the best smoothing technique for replacing zeros from data?
gravatar for Deepak Tanwar
5.8 years ago by
Deepak Tanwar4.1k
ETH Zürich, Switzerland
Deepak Tanwar4.1k wrote:

I read some presentations and papers regarding smoothing techniques.

Smoothing N-gram Language models

An Empirical Study of Smoothing Techniques for Language Modeling

N-gram models 

Improved Smoothing for N-gram Language Models Based on Ordinary Counts

Smoothing Language Models

NLP Lunch Tutorial: Smoothing

Language models


I want to apply Smoothing on a data, containing zero values. Which one should be the best?


This is just an example:

  Pathway1 Pathway2 Pathway3 Pathway4
Calcium ions 0 3 1 0
ATP 2 1 0 7


smoothing • 1.6k views
ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Deepak Tanwar4.1k

Sorry Deepak, I don't really understand - smoothing in my mind is something you do to continuous data, like a time series or genomic data. Your example of pathways is categorical data, in that Pathway2 doesn't really come before Pathway3 or after Pathway1, they are just categories.

So how do you want this data to look like?

Ultimately, the best smoothing algorithm is the one that is well described/understood to anyone who has to look at the result :)
Although it would never stand up in any other aspect of science, too often when it comes to smoothing of data or intersection of genomic coordinates, you see "then we did [stuff we're not even going to detail in the appendix] - and the result was [bold claims of novel biology]". 

ADD REPLYlink written 5.8 years ago by John12k

John, it was just an example, and I am not going to do with pathways anything. I can't disclose what I want to do. I have already used Good Touring estimate, Witten Bell smoothing. To explain further, I can say that, suppose you have a list of 30 people and a list of 500 softwares. You create a table, columns as name of people and rows as name of softwares. you fill the value in each cell for the number of times, that person used that software in last 10 years:

Softwares Person1 Person2 Person3 Person4 Person5 Person....
Soft1 19 200 0 17 0 0
Soft2 500 0 10000 0 900 6
Soft3 1000 900 0 12 5 17
Soft.... 0 0 23 33 16 0


I want to replace the 0 counts. One way is Laplace smoothing by adding 1 to each value.


I hope, I made it clear.


ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by Deepak Tanwar4.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 961 users visited in the last hour