Hi,
I have a problem interpreting my results and would like to ask for your help.
I have a table of expression values which looks like that:
ctrl.high ctrl.low log_ratio gene1 9.572083 6.461176 3.1109074 gene2 2.725700 3.354198 -0.6284985 gene3 10.002005 8.190133 1.8118717 gene4 3.812149 1.90948 1.9026686 gene5 5.561375 3.16058 2.4007949 gene6 5.515633 3.394174 2.1214594
The goal is to try and identify the differentially regulated genes between the two fractions (high vs. low). to do so, I calculated the log-ratio for each of the genes (high- low as this are log values) to identify the fold-changes between the two of them.
At first we thought about taking the mean+/- twice the standard deviation of the means as a threshold to decide which genes are significantly deregulated, mainly because this is how the biologist wanted it to be analyzed, but after looking at the distribution of the data I am not certain anymore that this is the right choice.
So I have a couple of questions regarding this kind of analysis:
Is it possible to discriminate differentially regulated genes based on the mean of the log-values of their expression? Are there any papers for or against this method of calculations?
I upload the image of the distribution of the log-values. I expected it to be a normal-distributed around 1, but as it looks like, there is a second peak at the left side of the plot. I know that this is probably a very vague question, but is there a way to explain this kind second peak or to find out how it happens?
Thanks
A.
My recommendation, tell the biologists they need to invest a bit more into replication, otherwise the experiment cannot be analyzed and published(!). Refuse to analyze the experiment otherwise, it is not worth wasting your and your clients time with sub-par analysis attempts, when investing a few hundred eur/dollars you can get so much more.
Good news with about replicates, follow Stefano's recommendations then. Start normalization and then run statistical test(s) from scratch if at all possible. Correct for multiple testing.