M.Sc. Candidate @ HUJI (Tabach Lab)

Shalev Yaacov

>

Decoding the "missing heritability" in Inherited Retinal Diseases (IRDs). I design and develop computational frameworks integrating Normalized Phylogenetic Profiling (NPP) (evolutionary co-signal across ~2,000 species) with multi-omics data to uncover novel disease-causing genes across ~2,000 eukaryotic species.

The Core Method (Click for Info)

Normalized Phylogenetic Profiling (NPP)

I turn the hypothesis that genes functioning together share evolutionary histories into a predictive model. I construct phylogenetic profiles across ~2,000 eukaryotic species to identify co-evolving modules and prioritize candidate genes for unsolved IRD cases.

Human Genes Phylogenetic Signature
Preserved
HOTSPOT / LOST
Absent

Discovery Mode

The algorithm detects "Hotspots" (Red) where genes are lost or gained together. These anomalies often point to specific disease mechanisms in IRD.

Input
Phylogenetic profiles of 20,000 genes across 1,905 eukaryotic species
Compute
Machine-learning models (automated pattern recognition) detect co-evolving gene modules

/// Selected Projects

Below are selected projects where I designed and built computational pipelines and visual tools for exploratory genomic analysis.

R / Phylogenetics

Cilia Cluster Analysis

  • Built: Computational framework for identifying co-evolutionary modules.
  • Purpose: Integrate evolutionary profiles (NPP data) with functional annotations to detect cilia-related signals.
  • Output: Prioritized list of putative ciliary candidates from large genomic clusters.
Python / Machine Learning

IRD Phenotype Integration

  • Designed: Machine-learning integrator bridging genotype and phenotype.
  • Purpose: Use HPO (phenotype grouping) and ML (predictive modeling) to match candidates to disease profiles.
  • Output: Ranked candidates prioritized by evolutionary and phenotypic similarity.
R / Data Viz

High-Dimensional Viz

  • Developed: Custom pipelines for large-scale phylogenetic matrices.
  • Purpose: Solve "big data" visual challenges using ComplexHeatmap (advanced R viz).
  • Output: Publication-quality figures for inspecting 20,000 x 2,000 datasets without noise.
Python / Algorithms

Genomic Segmentation (LBS)

  • Implemented: Local Barcode Segmentation (LBS) algorithm.
  • Purpose: Filter phylogenetic noise and detect robust evolutionary signals (barcodes).
  • Output: Cleaned phylogenetic profiles highlighting disjoint genomic regions.

/// Technical & Wet Lab Toolkit

Development & Scripting

Python (Pandas, Scikit-learn)
Proficient
R (Tidyverse, Bioconductor)
Proficient
Git & Version Control
Working-level
Linux/UNIX & Shell Scripting
Working-level

Data Science & Analytics

Machine Learning (ML) (predictive modeling) Naïve Bayes, Random Forest, Supervised/Unsupervised Learning
Multi-Omics Integration Transcriptomics, PPI Networks, Phenomics (HPO), NPP (phenotype analysis)
Phylogenetic Algorithms Normalized Phylogenetic Profiling (NPP) (co-evolution clustering)

Wet Lab Expertise

My computational work is grounded in hands-on wet-lab experience in molecular biology and protein engineering. I’ve worked on recombinant expression of bovine lactoferrin and human keratin using both E. coli and Arabidopsis systems, with practical experience in cloning, transgenic plant transformation and selection, FPLC purification, qPCR, and ELISA.

This work was carried out in collaboration with Miruku in Prof. Oded Shoseyov’s lab, and included co-authoring a book chapter on molecular farming in "Alternative Dairy Products and Technologies".

Molecular Cloning
FPLC Purification
qPCR & ELISA
Plant Transformation

B.Sc. Research Project

Octopus Motor Learning

A behavioral research project on octopus motor learning, where I designed and analyzed experiments to study adaptive learning strategies — an early foundation for my interest in data-driven analysis of complex biological systems.