I have a problem interpreting my results and would like to ask for your help.
I have a table of expression values which looks like that:
ctrl.high ctrl.low log_ratio gene1 9.572083 6.461176 3.1109074 gene2 2.725700 3.354198 -0.6284985 gene3 10.002005 8.190133 1.8118717 gene4 3.812149 1.90948 1.9026686 gene5 5.561375 3.16058 2.4007949 gene6 5.515633 3.394174 2.1214594
The goal is to try and identify the differentially regulated genes between the two fractions (high vs. low). to do so, I calculated the log-ratio for each of the genes (high- low as this are log values) to identify the fold-changes between the two of them.
At first we thought about taking the mean+/- twice the standard deviation of the means as a threshold to decide which genes are significantly deregulated, mainly because this is how the biologist wanted it to be analyzed, but after looking at the distribution of the data I am not certain anymore that this is the right choice.
So I have a couple of questions regarding this kind of analysis:
Is it possible to discriminate differentially regulated genes based on the mean of the log-values of their expression? Are there any papers for or against this method of calculations?
I upload the image of the distribution of the log-values. I expected it to be a normal-distributed around 1, but as it looks like, there is a second peak at the left side of the plot. I know that this is probably a very vague question, but is there a way to explain this kind second peak or to find out how it happens?