Aggregate data frame using function with a logical criterion
0
0
Entering edit mode
2.6 years ago
friasoler ▴ 30

I would like to detect which of the elements in the first column (gene) have an ambiguous value in the second column (log2 fold.change). Note that "Golgi integral membrane protein 4-like" has different sign of log2 fold change, but "protein HEG homolog 1" no. I would like to have a final data frame with the mean of the log2 fold-change of the ambiguous genes and "NA" in the rest. I always get the mean irrespective they are or not ambiguous.

df:
Gene                                                       l2fch
Golgi integral membrane protein 4-like  0.308
Golgi integral membrane protein 4-like  -0.35
protein HEG homolog 1                           -2.92
protein HEG homolog 1                           -5.92
centlein                                                -1.4760831106834
HAUS augmin-like complex subunit 6  0.319711425528765


Code:

df2=aggregate(.~Gene,df, function(h){if (max(h)*min(h)>0) mean(h) else NA})

R aggregate • 397 views
0
Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

0
Entering edit mode

You're computing max(h). Have you tried looking at what h actually is? I have a hunch it might be a set of rows/vectors and not a single vector of l2fch values. You might need to use max(h\$l2fch) or max(h[2]) or something along those lines.