Hi everybody
I made a wgcna std analysis with RNASeq normalized data (log2). I setup a Signed-Ntw with dynamic cut off (Pearson) according with most of the recommendations.
At the beginning I didn't get a good correlation score regarding with the scale-free topology for dynamic cut-off soft thresholds. So, to reach this goal, I built several matrices in which basically I applied a data-cutoff based on the quartiles (stats) until I reached a decent correlation score (0.82)., During this process obviously the original matrix reduced its size (from ~ 27000 genes to ~ 2000) in 17 samples. In theory this is right because I just want to keep the highest expression scores, but mathematically I am not sure if I am biasing the experiment applying this criteria.
Thus, my question is if when performing several cutoff to a data-matrix until get the desire behavior is a normal practice? or, am I biasing the experiment? ... The think is, that at the end I have proper results, but I want to be sure that these results are also valid.
I highly appreciate any comment.
These are distributions on each cut off for your reference
Hi Sudbery
So in that sense, the most best practice would be to deal with the less manipulated matrix closest to the expecting result?
The issue with the original matrix is that the clusters gotten are a bit huge and also the clutter's itself are huge to analyze one by one. I am really stuck in this point, because I have not found a practical strategy to get what I am looking for with out lost resolution in my data. I see your point, but not sure to get a specific answer. If something to add, please share with me.
Thanks
Cynthia
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized. This comment belongs under @Ian's answer.SUBMIT ANSWER
is for new answers to original question.