What I've built (Link : https://github.com/Bit-2310/TinyVariant )
I've been working on TinyVariant – adapting Samsung's Tiny Recursive Models (TRM) architecture to variant pathogenicity prediction. The core hypothesis: clinical variant interpretation involves iterative evidence synthesis, which might benefit from the same recursive reasoning that solves complex logical puzzles.
Early results are encouraging, but I'm hitting the limits of my domain knowledge and need input from the community.
The approach
The architecture maintains TRM's hierarchical reasoning structure (H-level and L-level with multiple refinement cycles) but processes biological features instead of puzzle grids:
Input features (25-token sequences):
- Variant fundamentals: gene, chromosome, alleles, amino acid change, position
- Clinical context: up to 3 phenotype terms (HPO/OMIM)
- Provenance signals: review status, submitter counts, evidence sources
The model performs ~10 reasoning cycles before final classification, allowing it to iteratively integrate different types of evidence.
Current results
Dataset: 100k ClinVar missense variants (80k train / 20k test, balanced)
| Model | Accuracy | ROC AUC |
|---|---|---|
| TinyVariant TRM | 88.3% | 94.5% |
| Logistic regression baseline | 89.6% | 95.9% |
The recursive model is competitive with the baseline while using a fundamentally different computational approach. More importantly, feature ablations show the architecture is learning to integrate contextual information:
- Without phenotypes: 94.2% AUC (-0.3 points)
- Without provenance: 94.4% AUC (-0.1 points)
Why this might matter
1. Different computational paradigm: Most variant effect predictors use direct feature-to-prediction mappings. Recursive refinement could capture interactions that linear/ensemble methods miss.
2. Context integration: The ablation results suggest the model is actually using clinical phenotypes to refine predictions, not just memorizing variant patterns.
3. Room for growth: I'm working with minimal features compared to established tools. There's significant headroom if domain experts can point me toward the right signals.
4. Interpretability potential: The multi-cycle reasoning structure could expose how the model integrates evidence, not just what it predicts.
Where I need help
I'm confident the architecture works, but I'm not a clinical genomics expert. Specific questions:
1. Feature engineering priorities
Currently using basic position/context + phenotypes + provenance. What would move the needle most?
- Conservation scores (PhyloP, GERP++, GERP-RS)?
- Population data (gnomAD AF, homozygote counts)?
- Protein structure context (AlphaFold confidence, domain annotations)?
- Protein language model embeddings (ESM-2, ProtBERT)?
- Functional annotations (GO terms, pathway membership)?
2. Evaluation gaps
What matters beyond ROC AUC? Should I be looking at:
- Performance on specific gene families or functional classes?
- Rare variant handling vs common variants?
- Disagreement with expert review panels?
- Calibration on edge cases?
3. Benchmark comparisons
Which tools represent the current state-of-the-art for this task?
- REVEL (ensemble meta-predictor)
- AlphaMissense (AlphaFold + pathogenicity)
- EVE (evolutionary models)
- BayesDel, MutPred, others?
4. Clinical utility questions
Where do current tools struggle most? What types of variants are hardest to classify? Are there specific scenarios where iterative reasoning might provide unique value?
Next steps
I have working infrastructure and a promising baseline. What I need is domain knowledge to:
- Identify the most informative features to add
- Understand where current methods fail
- Design evaluation strategies that matter clinically
If you've worked on variant effect prediction, have thoughts on recursive architectures in biology, or just know what features actually matter – I'd really value your input.
Bottom line: 94.5% AUC on ClinVar missense classification with minimal features and a novel architecture. The approach works – now I need help making it work well.