Question: how to combine standard deviations and mean for each group?
gravatar for mohammadhassanj
20 months ago by
mohammadhassanj110 wrote:

I have a table with 10000 rows, each row having 10 columns and their numbers vary from 0 to 80.Now, if I want to compare the mean of these rows.Maybe because of the largeness of one of the numbers in the row and the compensation of the smaller numbers of the same row, I can not make a fair comparison (because the value of each column is also important to me).

I know that I can use standard deviation. But I want to use a method that combines standard deviation and mean for each row to have just one number so I can sort the rows based on those rows

normalization R • 439 views
ADD COMMENTlink written 20 months ago by mohammadhassanj110

If I understand correctly, you want to sort the rows based on some summary statistics but you're worried about different rows having different distributions. First you should look at the distribution of the data. If the data is not unimodal, measures of central tendency (mean, median, mode) may not be useful. Second if the problem is with outliers, you could decide whether or not to include them. As already suggested, the median is a more robust measure than the mean but depending on your data. you may also want to consider the mode (i.e. the most frequent value). Similarly if interested in a measure of dispersion, the inter-quartile range is more robust than the standard deviation.

ADD REPLYlink written 20 months ago by Jean-Karim Heriche23k

Consider using the median instead of mean if you worry about a potential outlier in a row that can bias your mean. Or try to see if log normalization makes your data more normal. What kind of data are we talking about btw?

ADD REPLYlink modified 20 months ago • written 20 months ago by Benn8.0k

Rows are microRNAs and in columns there are genes (gene set) that are targeted by each microRNA. Numbers each row are the number of databases that report the targeting of a particular gene by miRNA. My goal is to look at the microRNA that targets the gene's set. So it should have the highest average and the least standard deviation. Can they be compared by normalizing each row individually?

ADD REPLYlink written 20 months ago by mohammadhassanj110

In that case you can try to see if you can test for enrichment, with hypergeometric test (Fisher's exact).

ADD REPLYlink written 20 months ago by Benn8.0k

could you please explain more about this? how can I do that?

ADD REPLYlink written 20 months ago by mohammadhassanj110

I am not sure, I never seen such an approach like yours yet. But I was thinking maybe you can test for each miRNA if a certain gene set is enriched (and do this for all 10 gene sets). What you need for such enrichment analysis, is for each miRNA the total number of gene targets in a gene set (you say you have this number already). Furthermore you'll need the total number of gene targets of that gene set (so the total number of genes targeted by any miRNA). Then you need to find the total targets of the specific miRNA on the whole genome (so all genes targeted by this miRNA), and finally the total number of targets by all miRNAs. You can use Fisher's exact with these 4 numbers, and test for enrichment. Of course use p-value adjustment such as FDR adjustment.

ADD REPLYlink written 20 months ago by Benn8.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 822 users visited in the last hour