Deep Learning Research Engineer · Kyutai Labs · Paris, France
Building open-science AI at kyutai.org 🇫🇷 · Previously Qualcomm AI Research 🇳🇱 · PhD from IST Austria (2020) 🇦🇹
My work sits at the intersection of computer vision, multimodal learning, and neural network efficiency. I care about building models that are both powerful and practical. I'm very interested in learning from multiple modalities and how to best make use of raw data that is inherently multimodal.
| Project Higlights | Venue | What it is about |
|---|---|---|
| 🏠 CASA -- Cross-Attention Strikes Back | arXiv, 2026 | Vision-language model with cross-attention for scalable streaming inference |
| M👁️shiVis -- Kyutai with an "eye" | CVPR 2026 | Adds visual understanding to the Moshi speech model with a data-efficient training pipeline |
| MSViT -- Mixed-Scale Tokenization | ICCV NViT Workshop, 2023 | Dynamic token scaling for Vision Transformers based on image content |
| Scalarization for Multi-Task Learning | NeurIPS, 2023 | Large-scale study of multi-task/domain training dynamics + population-based optimization |
| Knowledge Distillation: A good teacher is patient and consistent (oral) | CVPR, 2022 | Patient & consistent teacher = surprisingly strong distillation recipe |
→ Full list on my website
Personal experiments, small tools, and one-off builds:
- 🍬 PeperNoten -- A small script to auto-generate a "skimmed-through" Obsidian note in markdown from an arxiv paper.
- 🎄 Advent of Code -- A list of my past Advent of Code participations
- 🕵️ Codenames Solo -- solo mode for the Codenames boardgame, powered by GPT and Streamlit
- 🌊 Glow JAX -- clean JAX/Flax implementation of the Glow generative model





