I have a dataset of genes associated with drugs derived from DrugBank. I wish to simply translate all the drugbank IDs to drug names readable by a human.
The drugbank vocabulary (.csv) looks like this: [DBvocabulary.csv]
DrugBank.ID,Common.name DB00001,Lepirudin DB00002,Cetuximab DB00003,Dornase alfa DB00004,Denileukin diftitox DB00005,Etanercept DB00006,Bivalirudin
My dataset (.csv) has 15 columns but the important ones are:
Gene.Name,DrugBank.ID F8,DB09130 TCN2,DB00200 LDLR,DB09270; DB11251; DB14003 ALB,DB00070; DB00137; DB00159; DB00162; DB00214
As you can see some genes are linked to multiple or even hundreds of drugs. the multiple Drug IDs are separated by semicolons, in the same comma-delimited "column" The R studio match or merge function only work for the first identifier in each column, thus effectively deleting the remainder in the same column "cell".
Any advice is welcome, thanks in advance!