Hello everyone,
I'm not sure if I'm wording this question right but I'll provide a detailed example below. Basically I have a few files RNA-seq comparison files text that I made comparing different conditions and now what I want to filter out some of the differentially expressed gene based on another comparison.
Let's say I have two files that are structured the same way: One column has gene name (called gene_names), the other has the log2 fold change values (called log2foldchange), and the rest have the q value stats (I've already filtered all the files so that they have FDR of less than 0.05 and log2 fold change greater than 1). File one is comparing sample A vs B. File two compares C vs B. Initially I used the %in% operator like this
AvsB_filterout <- AvsB_comparison[ ! AvsB_comparison$gene_names %in% CvsB_comparison$gene_names, ]
But now what I want to do is filter out the genes which have a higher log2 fold change in the C vs B condition than the A vs B, because I found that a lot of genes that were filtered out which much more highly expressed in A vs B than C vs B.
Would anyone know how to rewrite my code so that I'm only filtering out genes which have a higher log2 fold change in C vs B than A vs B? I hope this makes sense.
Thank you so much,
Yonatan
Just now I noticed that condition 1 would not hold in your case, because you did the filtering before that step. If you want to use the boolean operators, you have do all the filtering in a single go.
Thank you so much for your prompt reply. The problem is because these are differentially expressed genes that have already been subsetted based on certain thresholds the list of genes are not going to the exactly the same (the naming convention is the same it's just there may be differentially expressed genes in one list that isn't in the other).
Is there anyway to bypass this issue?
You need to get the original unfiltered files. If you think about it a bit more, you will see that otherwise, your request doesn't make sense at all. Hint, you want to compare stuff that is in A but not in B ... :)
Look into the semi_join and left_join functions in tidyverse. Semi_join will create a df with the gene names that are the same between the two files, then you can left_join to add the additional l2fc and pvals. Then you would apply your filter requiring l2fcA > l2fcB
Nevermind, I just saw your response. Thank you so much!