Applied Methods
~jobsFundamentalSenior Applied Research Engineer

Fundamental

Senior Applied Research Engineer

ResearchBarcelonaon-sitefull-timeposted 2 weeks ago

About the role

About Fundamental

Fundamental is an AI company pioneering the future of enterprise decision-making. Founded by DeepMind alumni, Fundamental has developed NEXUS – the world's most powerful Large Tabular Model (LTM) – purpose-built for the structured records that actually drive enterprise decisions. Backed by world class investors and trusted by Fortune 100 companies, Fundamental unlocks trillions of dollars of value by giving businesses the Power to Predict.

At Fundamental, you'll work on unprecedented technical challenges in foundation model development and build technology that transforms how the world's largest companies make decisions. This is your opportunity to be part of a category-defining company from the ground-up. Join the team defining the future of enterprise AI.

Key responsibilities

  • Profile end-to-end distributed training runs to identify bottlenecks across compute, GPU memory, and inter-GPU communication

  • Contribute to architectural decisions that improve the efficiency and reliability of large-scale training jobs, including developing Triton/CUDA kernels when needed

  • Design and implement model scaling, parallelization, and memory optimization techniques for training workloads with very large context sizes

  • Collaborate closely with ML Researchers to diagnose architectural inefficiencies, ensure new research ideas scale efficiently in practice, and spread internal knowledge about model efficiency and optimization

  • Drive the productionization and serving of our models from the research side, including improving inference efficiency through techniques such as quantization

Must have

  • Strong understanding of modern ML architectures and large-scale training pipelines

  • Experience running distributed training jobs on multi-GPU systems

  • Advanced profiling and debugging skills across CPU, GPU, memory usage, latency, and inter-GPU communication

  • Strong programming skills in Python

  • Experience with model scaling and parallelization strategies, including tensor and pipeline parallelism

Nice to have

  • Familiarity with NCCL, MPI, and distributed communication primitives

  • Knowledge of PyTorch and Triton internals

  • Programming experience with C++ and CUDA

Benefits

  • Competitive compensation with salary and equity

  • Comprehensive health coverage, including medical, dental, vision, and 401K

  • Paid parental leave for all new parents, inclusive of adoptive and surrogate journeys

  • Relocation support for employees moving to join the team in one of our office locations

  • A mission-driven, low-ego culture that values diversity of thought, ownership, and bias toward action