Presentation - Data Scientist

Machine Learning & Data Science Professional with 6+ years of experience delivering predictive models, statistical frameworks, and analytics platforms in highly regulated environments. Proven success developing, deploying, and scaling ML solutions for financial forecasting, compliance, and operational risk. Adept at cross-functional collaboration with product, engineering, and treasury teams to translate data into trusted decision-making tools. Passionate about building resilient, secure, and high-impact AI systems in mission-driven organizations.

Work Experience

Genhealth AI

Boston, MA — 2024 - Present

Senior Data Scientist

  • Led ML efforts at GenHealth.ai as the sole data scientist; owned model development, evaluation, large-scale inference, and architecture design.
  • Benchmarked transformer-based LLMs against industry risk/cost prediction models; improved accuracy by 14.1%.
  • Co-authored Large Medical Model (2024); led fine-tuning for mortality, chronic disease, and cost prediction using longitudinal claims data.
  • Architected scalable, distributed MLM inference across AWS/Databricks/NVIDIA clusters; simulated 100M+ synthetic EHRs.
  • Developed ETL platform and FastAPI ingestion pipeline supporting OCR, auth, and orchestration.
  • Automated Prior Authorization; integrated GPT chatbot modules for internal and HCP-facing use.
  • Designed actuarial forecasting models; prevented $17M+ in projected losses by aligning model outputs with CMS and revenue risk.
  • Partnered cross-functionally with engineering, GTM, and actuarial teams to align models with business strategy.

Foghorn Therapeutics

Cambridge, MA — 2021 - 2024

Computational Biologist

  • Owned company-wide ATAC-seq data pipeline using Snakemake; deployed across AWS EC2/HPC environments.
  • Implemented non-parametric IDR estimation with FHT-nIDR; improved reproducibility and QC standards.
  • Built PCA/Mahalanobis-based multivariate QC systems to detect mislabeling and outliers in cell identity data.
  • Developed statistical linkage framework between ATAC-seq and RNA-seq; built visualization dashboard in Shiny.
  • Applied Cox models and Kaplan-Meier estimators for survival analysis in chromatin studies.
  • Led WGS-based ML modeling to detect non-coding regions driving genetic dependencies in cancer.
  • Mentored junior scientists on reproducible pipelines and statistical coding best practices.

Interactive Brokers, Greenwich CT

June –December 2020

Software Engineer in the Compliance team

  • Appointed as Lead Developer to design and develop a cloud-based management platform for international vendor contracts and policies and procedures, working with teams spread across different time zones.
  • Programed with Python and Microsoft Power Automate to build complex integration solutions and automate workflow services (equivalent of Azure Logic Apps).
  • Partnered with Microsoft Programmers to integrate company procedures with new Microsoft tools.
  • Worked alongside the Compliance team to create new technical specifications to expedite their daily work.
  • Validated Proof of Concept (POC) by testing and successfully applying the platforms I designed.
  • Created documentation and training curriculum for users of platforms.

University of Massachusetts, Amherst MA

Fall 2019 –Spring 2021

Teaching Assistant in Mathematics and Statistics

  • Graduate Level Statistics, Elementary Statistics, Linear Methods and Probability for Business.

Dimeo Construction Company, Boston MA

Summer 2019

Estimator

  • Quantified construction cost, potential project failure and cost overrun financial impact.
  • Supervised change orders and assisted in quality assurance and quality control of Architecture Code.

Initiatives, France

Summer 2018

Portfolio Manager

  • Provided research and analysis to portfolio managers and other investment professionals in support of a broad range of investment solutions.
  • Evaluated changes in portfolio behavior, decision engine logic and pricing parameters.
  • Analyzed appropriate level of risk based on the client's time horizon, risk preferences, return expectations.

Madrivo, Tel Aviv Israel

Summer 2017

Data Analyst in Programmatic

  • Ran SQL queries to extract data and provide quantitative analysis to support investment decisions aimed at maximizing risk-adjusted total return for online display advertising.
  • Supervised a team based in India to leverage Real Time Bidding programmatic buying and selling methods.
  • Implemented algorithms in Python and VBA to target consumers, optimize campaign performance, and maximize returns.

Education

University of Massachusetts Amherst

May 2021

Master of Statistics and Data Science     GPA: 4.00/4.00

Neural Network, Machine Learning, Data Visualization and Exploration, Biomed & Health Data Analysis, Survival Analysis, Data Analysis, Regression Modeling, Categorical Data Analysis, Statistical Inference, Applied Multivariate Statistics, Algorithms for Data Science, Stochastic Processes

Certificate of Statistical and Computational Data Science – Computer Science Department   GPA: 4.00/4.00

PSL (Paris Sciences et Lettres) University

May 2019

Master of Actuarial Sciences – 1st year

Risk Management, Market Finance, Portfolio Management, Actuarial Sciences, Macro/Microeconomics

Paris-Dauphine University/PSL University

May 2018

Bachelor of Applied Mathematics

Discrete Processes, Topology, Differential Geometry, Functional Analysis, Time Series, Lebesgue Theory, Estimation and Test Statistics, Bayesian Theory, Non-param and High Dimension Statistics, Data Analysis

Certificate: Internet and Computer – data manipulation language and digital right

Key Skills

  • Programming & Scripting: Python, SQL, Bash, R, git
  • Quantitative Methods: Survival analysis, Statistical Inference, Time Series forecasting, Monte Carlo simulation, Bayesian inference
  • ML & GenAI: Transformers, LLMs, GenAI, PyTorch, TensorFlow, safetensors, scikit-learn
  • Scientific Computing: ATAC-seq, RNA-seq, Snakemake, Bioconductor, statsmodels, experimental QC
  • MLOps & Cloud: FastAPI, Docker, model serving, AWS, Snowflake, Azure, multi-GPU
  • Data Engineering: Polars, Snakemake, SQLAlchemy, PostgreSQL, OCR
  • Visualization & BI: Tableau, Power BI, interactive dashboards (Shiny), D3.js
  • Languages: English (Fluent), French (Native), Spanish (Advanced), Chinese (汉语水平考试三 – HSK3, Intermediate)

Project

  • Distributed Data Generation Platform: Built a high-throughput data pipeline and inference system that generated over 100M structured synthetic records using masked language modeling. Scaled across multi-GPU AWS infrastructure (H100/A100) with 10T+ token throughput, demonstrating production-scale ETL, logging, and orchestration.
  • Integrated Forecasting & Analytics Engine: Designed a forecasting platform that combined transformer-based outputs with financial and compliance data models. Enabled regulatory bid simulation, scenario planning, and financial reporting, resulting in avoidance of $17M+ in projected losses.
  • Survival Analysis: Partnered with medical professionals from the Hospital of the University of Pennsylvania (HUP) to research Survival Analysis of decannulations after patients underwent tracheotomy. Compared Kaplan-Meier curves, Cox Proportional Hazards (Cox-PH) model and Accelerated Failure Time (AFT) model in RStudio.
  • Batch-Effect measure: Compared k-BET method with different test metrics for detecting single-cell RNA-seq batch effects. Used to quantify batch-effect removal such as Combat, Seurat’s canonical correlation analysis and projection of mutual nearest neighbor. Compared two simulated datasets from mild to strong batch-effects.
  • Worldwide Trends in Covid-19: Created interactive visualization including a combination of connected graph (choropleth map, a bar chart, etc.) to illustrate trends in Covid-19 since January 1, 2020, using JavaScript (D3.js (v6)) for plots and Leaflet combined with Mapbox for maps.
  • Gender's Transformation of people’s face using CycleGAN: Implemented a model to generate gender swap on people’s from an unpaired dataset with Cycle Generative Adversarial Network (GAN).
  • NMF - Cosine Movie Recommender: Developed movie recommendation algorithm combining the collaborative filtering approach of non-Negative Matrix Factorization, NMF, (with Surprise library in Python) with the content filtering approach of cosine similarity (with scikit-learn in Python).
  • Shiny App: Designed visualizations of U.S. Mass Shootings with Shiny App with server-side linking with shiny and plotly.

Additional Skills

  • Music: Graduated from the Conservatory in violin (15 years of practice), Member of Paris Sciences & Lettres Orchestra, Alfred Lowenguth Orchestra, Hauts-de-Seine Orchestra. Music tour in Bulgaria (2017).
  • Social activities: First aid certificate, Lifeguard in summer camps, Humanitarian travel in Costa Rica.