I'm working on a DESeq2 pipeline and trying to plot some genes of significance by padj value. I'm using bioMart to exchange Ensemble IDs for hgnc symbols. But, when I do the exchange I end up with fewer rows, so I then can't match the symbols to my column of ENS ids. I think this is because some ENS ID isn't in the biomart package for some reason. I plan to manually replace this, but first I have to find which one is missing.
I know I must be missing something basic about how R works, but I'm still relatively new to R and I'm not sure what exactly is going on. Following this formula from: https://stackoverflow.com/questions/13774773/check-whether-values-in-one-data-frame-column-exist-in-a-second-data-frame gets me "NULL"
A$C[!A$C %in% B$C]
[1] 2 # returns all values of A$C that are NOT in B$C
v---my code---v
filter_df_padj$rownames[!filter_df_padj$rownames %in% gns_padj$ensemble_gene_id]
I also tried to assign each column to a vector, then compare as in this guide: https://www.r-bloggers.com/2017/03/match-function-in-r/ and ended up getting "NULL".
Thanks for any help.
v1 <- filter_df_padj$rownames
v2 <- gns_padj$ensemble_gene_id
Try
setdiff
. Ensure the columns are strings and not factors. Google these terms to understand more.Are you sure that you mean $ensemble_gene_id, and not $ensembl_gene_id?