Does anyone have a good idea to combine a data frame with one containing comma-separated values?
Data frame A, which has 2 variables (mouse gene name and Ensemble ID):
head(dfA) # gene EnsembleID # 1 Hoxa9 ENSMUSG00000038227 # 2 Zfp128 ENSMUSG00000060397 # 3 Zfp853 ENSMUSG00000093910 # 4 Nr1h2 ENSMUSG00000060601 # 5 Nr1h3 ENSMUSG00000002108 # 6 Nr1h4 ENSMUSG00000047638
Data frame B has 3 variables (orthogroup, mouse gene Ensemble ID, and zebrafish gene Ensemble ID). Noted that in the picture below the column names are Mus_musculus.GRCm39.pep.all and Danio_rerio.GRCz11.pep.all, respectively.
I want to add a new column for "zebrafish Ensemble ID" into A as referenced information in B. So, the final data I need looks like this:
The point is, in data frame B, both Ensemble IDs' columns contain comma-separated values. This file is originally from the outputs of orthofinder.
If there is only a single value in each row, I understand that we can easily combine those data frames using
merge function in R. The length of data frame A is 1700 (= 1700 genes), so I would like to use some codes instead of search©&paste manually. Any suggestions or ideas would be really appreciated!