Hi, I am new in this field and I am having some problems regarding a new project.
I built a graph using Drugbank Data connected to SIDER Adverse Reactions. I used Organ- level Terms to classify the ADRs in 25 different groups. I aim to use the algorithm Metapath2vec in order to cluster drugs based on the group of ADRs they lead to. I am currently using the python package for Machine Learning on graphs Stellargraph and the related implementation of Metapath2vec.
I chose Clustering as downstream task with the resulting node embeddings, e.g. DBSCAN but the results are not promising. I would like to have some clusters of drugs related to different adverse drug reaction. Since I am new in this field everything I am trying is based on scientific literature but I don't know if this is the right approach for my objective.
This is the code related to the metapath2vec algorithm:
walk_length = 100
# maximum length of a random walk to use throughout this notebook
specify the metapath schemas as a list of lists of node types.
metapaths = [
["drug", "adr", "drug"],
["drug", "adr", "drug", "drug"],
["drug", "drug"],
["drug", "adr", "group_adr", "adr", "drug"],
]
# Create the random walker
rw = UniformRandomMetaPathWalk(graph)
walks = rw.run(
nodes=list(graph.nodes()), # root nodes
length=walk_length, # maximum length of a random walk
n=1, # number of random walks per root node
metapaths=metapaths, # the metapaths
)
from gensim.models import Word2Vec
model = Word2Vec(walks, size=128, window=5, min_count=0, sg=1, workers=2, iter=1)
Should I change approach? what other algorithm for representation learning can I use in order to reach my goal? what can I improve in the presented approach to have better node embeddings?