Best way to map biological pathways to cancer hallmarks using PLMs (without building models)?
1
0
Entering edit mode
7 weeks ago
DEPANSHI • 0

Hi everyone,

I’m working on a project where I need to map biological pathways (from KEGG, Reactome, etc.) to the cancer hallmarks (Hanahan & Weinberg). I don’t have gene expression or omics data, and I’m not trying to build ML/DL models from scratch, but I’m open to using pretrained language models if there are existing workflows or tools that can help.

Are there tools or notebooks that use PLMs to compare text (e.g., pathway descriptions vs hallmark definitions) or something similiar?

I’m from a biology background and have some bioinformatics knowledge, so I’m looking for something I can plug into without deep ML coding.

Thanks for any tips or pointers!

PLM mapping LLM extraction Relation • 591 views
ADD COMMENT
0
Entering edit mode
15 days ago
Kevin Blighe ★ 90k

You can explore the Cancer Hallmarks Analytics Tool (CHAT), which employs text mining to categorize scientific literature according to the Hanahan and Weinberg cancer hallmarks. Although designed for PubMed abstracts, you may input pathway descriptions from KEGG or Reactome to classify them into hallmarks. Access it at http://chat.lionproject.net/.

For pretrained language models, BioBERT is suitable for computing semantic similarity between pathway descriptions and hallmark definitions without requiring deep machine learning coding. Retrieve pathway descriptions using Biopython's Bio.KEGG or Reactome APIs, then use the Hugging Face Transformers library to generate embeddings and calculate cosine similarity.

Here is a basic Python notebook outline you can adapt:

# Install required packages (run once)
!pip install transformers biopython torch scipy

import torch
from transformers import AutoTokenizer, AutoModel
from scipy.spatial.distance import cosine
from Bio.KEGG import REST  # For KEGG example

# Load BioBERT
tokenizer = AutoTokenizer.from_pretrained("dmis-lab/biobert-v1.1-pubmed")
model = AutoModel.from_pretrained("dmis-lab/biobert-v1.1-pubmed")

# Define cancer hallmarks (example; expand with full definitions)
hallmarks = {
    "Sustaining proliferative signaling": "Cancer cells acquire the capability to sustain proliferative signaling...",
    # Add other hallmarks here
}

# Function to get embedding
def get_embedding(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).detach().numpy()

# Example: Fetch KEGG pathway description
pathway_id = "hsa05200"  # Pathways in cancer
pathway_desc = REST.kegg_get(pathway_id).read().split("\n")[1]  # Parse description

# Compute similarities
pathway_emb = get_embedding(pathway_desc)
similarities = {}
for name, desc in hallmarks.items():
    hall_emb = get_embedding(desc)
    similarities[name] = 1 - cosine(pathway_emb.flatten(), hall_emb.flatten())

print(similarities)

This script requires minimal modification. Run it in Google Colab for ease. For Reactome, use their content service API instead of Bio.KEGG.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 3404 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6