Generally you never filter in R using any kind of loop, there are usually ways to do that without any explicit iteration.
This below will work with any number of groups without explicit looping which is super slow. Here are two solutions, the first with tidyverse (dplyr) and the second with base R:
library(dplyr)
#/ Dummy data:
set.seed(2021)
n <- 20
df <-data.frame(Group=rep(LETTERS[1:n], each=n),
T1=rnorm(25,10,n))
#/ tidyverse:
tidy <-function(x) {
df %>%
group_by(Group) %>%
filter(T1 > (median(T1)-1) & T1 < (median(T1)+1))
}
#/ base R:
base <- function(x){
#/ get medians
medians <- aggregate(T1~Group, median, data=df)
#/ merge results for simple filtering
merged <- merge(x=df, y=medians, by.x="Group", by.y="Group", sort=FALSE)
#/ apply the filter
merged[merged[,2] > (merged[,3]-1) & merged[,2] < (merged[,3] + 1), c(1,2)]
}
#/ returns TRUE if all values are TRUE so the results of both tindy and base are the same:
all(tidy() == base())
[1] TRUE
#/ The base solution is almost double as fast but "less handy" (imho):
library(microbenchmark)
microbenchmark(tidy=tidy(),
base=base())
Unit: milliseconds
expr min lq mean median uq max neval
tidy 3.525646 3.819852 4.303717 4.119709 4.360010 13.25243 100
base 2.059017 2.118340 2.367351 2.247750 2.347017 11.38729 100
Note that the base solution not generic, you could of course write this into a more customizable wrapper function for generic filtering based on column names etc.
thanks, could you please explain what the first part does? and also how the second part will filter each group with his respective T1median?
The
group_bydeclares that all rows with the same value forGroupare a group. The second one then calculates the median for these declared groups based on the T1 column, and then filters rows that meet the criteria you mention, being +/- 1 from the groupwise median. This is all tidyverse syntax, which is a convenient grammar for data science, you can learn more here: https://dplyr.tidyverse.org/Basically the verbs (that is how tidyverse calls these standardized functions like filter, mutate, group etc), do a lot of stuff codewise under the hood but for the end user it comes down to learn the limited tidyverse syntax, enabling you to do efficient manipulation of data without lots of code. Base R can often be faster for the same operation, but requires more custom code to write, therefore the tidyverse is often preferrable in analysis scripts to make things short, readable and thereby "tidy".
thanks for all the info. I am trying to adjust it to my data, but it doesn't work or I am missing something. Meanwhile I would really like to find a way to call the "T1_"Var, with a generic name, in order to be able to filter using the +/- of T1_A or T1_B for e.g.
when trying this for e.g.
for (Var in unique(df$Group)) { temp2 <- df %>% filter((Group== Var)$T1 < T1_A)}it works, but I need to call each respective T1_ for each Group, e.g. for Group==A, then the T1_AIf you add data examples via
dputand an example desired output we can see to get some code working.Hi, thanks. It's the same data I am talking about and same question. As I could not use your commands, I was trying to find a way... the desired outcome is the same to subset keeping only the rows with T1_"Var"-1<T1<T1_"Var"+1, but as I am newbie I don't know how you can call the T1_"Var", when you have already created the vectors T1_A, T1_B....T1_W