poolside logo

Member of Engineering (Reinforcement Learning) - poolside

Engineering Member

Posted: May 1, 2026
Posted 58 days ago
Last seen in crawl: June 25, 2026 (2d ago)
Estimated Expiry: June 5, 2026
Role & Management
Role Level:Mid-Level
Job Type
Required Languages

Job Description

You would be working on our reinforcement learning team focused on improving reasoning and coding abilities of Large Language Models through reinforcement learning. This is a hands-on role where you'll work end-to-end from researching new exploration or training algorithms, to designing and scaling up RL environments, to implementing your ideas across the stack. You will have access to thousands of GPUs in this team. Your mission is to push the frontier of reasoning and coding capabilities of foundational models. Responsibilities include researching and experimenting on ways to improve reasoning and code generation for LLMs, owning the full experiment life cycle, staying updated with latest research, designing and analyzing training/fine-tuning/data generation experiments, writing high-quality code, and collaborating with the team.

Company Information

Data shown is based on historical job postings from our database.

Job Details

Responsibilities

  • Research and experiment on reasoning and code generation improvements
  • Own full experiment cycle
  • Stay updated with latest research
  • Design and analyze training/fine-tuning/data experiments
  • Write high-quality code
  • Collaborate with team

Requirements

  • Experience with Large Language Models (LLM)
  • Deep knowledge of Transformers
  • Deep learning fundamentals
  • Trained and fine-tuned LLMs from scratch
  • Familiar with distributed training
  • Research experience
  • Programming in Python with PyTorch or Jax

Skills & Technologies

Large Language ModelsTransformersDeep learningDistributed trainingResearchPythonPyTorchJax

Education Level

None required
2 days agoContent Complete

Help us improve JobCrawls — sign in to sync saved jobs across devices, or send feedback anytime.