Shalev Yaacov

Decoding the missing heritability of vision.

A note from Shalev

I'm a computational biology researcher building tools that read evolution like a manuscript, looking for the genes that scribble themselves into species together, and the ones that vanish in the same paragraph. My work focuses on Inherited Retinal Diseases: 40% of patients still receive no genetic diagnosis. The rest is a search problem.

~2,000
eukaryotic species
20,000
human genes profiled
≈ 40%
IRD cases without diagnosis
4,254+
clinical cases validated
Phylogenetic Profiling · Multi-Omics Integration · Machine Learning · Inherited Retinal Disease · Co-evolution · HPO Phenotype Mapping ·
Where is the missing half hiding?

For ≈40% of patients with inherited retinal disease, modern sequencing returns a verdict of unsolved. The mutations are there, somewhere; we are not yet asking the right question.

Most diagnostic pipelines read a gene the way you'd read a single sentence; does it spell a known disease? But genes don't function alone. They co-evolve in modules across hundreds of millions of years, leaving a graded fingerprint of conservation and loss across the tree of life. That fingerprint is where I look.

"Genes that travel together work together."
Working principle of normalized phylogenetic profiling

So I build the second look. Computational frameworks that compare ~2,000 eukaryotic genomes at once, normalize evolutionary signal against phylogenetic distance, integrate transcriptomics and HPO phenotypes, and rank candidates whose evolutionary biography matches the disease. The output isn't a final answer; it's a much shorter, much sharper list of suspects to take to the bench.

A.
Read evolution

Build a continuous similarity profile for every human gene across ~2,000 eukaryotic genomes.

B.
Find the rhymes

Detect genes whose normalized profiles co-vary across clades using NPP correlations.

C.
Match the phenotype

Layer HPO ontology + transcriptomics to score candidates bidirectionally, gene↔phenotype.

A Normalized Phylogenetic Profile, dissected.

Each row below is a gene. Each column is a species, sorted left-to-right by evolutionary distance. Cell intensity reflects a continuous conservation score, normalized against phylogenetic distance, and a vermillion column marks a clade where related genes' scores drop in concert; a co-evolutionary signature. Hover a row to read its profile. Hover a species to see how it participates across the matrix.

NPP_MATRIX · 13 genes × 30 species (preview) hover a cell to read
High similarity
Co-loss signature
Low / absent
A
"IRD cluster", correlated drop in vertebrate clade
B
"Cilia cluster", independent co-variation elsewhere
C
Noise, no aligned signal
How NPP reads this
  1. 01. Map each gene's orthologs across ~2,000 eukaryotic genomes.
  2. 02. Score conservation as a continuous similarity, normalized against evolutionary distance.
  3. 03. Cluster genes whose normalized profiles correlate across clades.
  4. 04. Surface clusters enriched for IRD-related function (450+ known IRD genes as anchors).
  5. 05. Re-rank with HPO + transcriptomics; validate against real-world variant cohorts.
Glossary · for non-biologists [+] expand
Phylogenetic profile

A continuous vector of normalized similarity scores; one per species; describing how conserved a gene is across the tree of life.

Co-evolution

Two genes co-evolve when their normalized profiles rise and fall together across clades, suggesting shared function.

HPO

Human Phenotype Ontology, a controlled vocabulary describing patient symptoms, used to link genes to organs.

Field
notebook.

Five built & shipped projects. Click any row to expand the case study.

What's in the drawer.

A pragmatic mix of computational and wet-lab craft. The dry side runs the analyses; the wet side keeps me honest about what biology actually does when you ask it a question.

Languages & Environments
A
python pandas · numpy · scikit-learn · biopython ●●●●○
R tidyverse · Bioconductor · ComplexHeatmap · gganatogram ●●●●○
shell bash · awk · cluster job submission ●●●○○
git version control · code review ●●●○○
Analytical Methods
B
Machine learning
Naïve Bayes · Random Forest · XGBoost · supervised + unsupervised
Phylogenetics
NPP · ortholog mapping · co-evolution clustering
Multi-omics
transcriptomics · PPI networks · phenomics (HPO) · real clinical data
Visualization
heatmaps · anatomograms · publication figures
Wet-lab craft
C

Before the matrices, there were pipettes. Recombinant expression of bovine lactoferrin and human keratin in E. coli and Arabidopsis, in collaboration with Miruku at Prof. Oded Shoseyov's lab – including a co-authored chapter in Alternative Dairy Products and Technologies.

cloning
FPLC
qPCR
ELISA
plant transformation
selection
protein purification
recombinant expression
Origin story
D
B.Sc. · Octopus motor learning
B.Sc. Marine Biotechnology, internship in Marine Agriculture, Ruppin Academic Center · GPA 91/100

Designed and analyzed behavioral experiments on octopus learning strategies. Where my taste for data on stubborn, complex biological systems began.

↗ recommendation letter
Databases & Tools
E
Clustal Omega
BLAST
PyMOL
PDB
AlphaFold
Rummagene
FUMA
EnrichR
Cytoscape
STRING
UniProt
GeneCards
OMIM
ClinVar
gnomAD
UCSC Genome Browser
NCBI
Ensembl
Seurat
Scanpy
AI-Integrated Workflows
F
  • Claude Code, Cursor, Codex, Google Labs (research-grade use for pipeline development and scientific reasoning)
  • Custom agent and prompt engineering
  • Local LLM inference via Ollama

Let's talk
biology
and data.

Open to research collaborations, R&D roles in computational biology / machine learning, and conversations with curious people from any discipline.

© 2026 Shalev Yaacov · Hebrew University of Jerusalem
All matrices are illustrative. Real findings live in the manuscripts.
Continuously evolving.
Tweaks