
Suorituskykyinsinööri, Kontit/Serverless - Verda - Etätyö - Globaali
Suorituskykyinsinööri
Tap this card for salary charts and full compensation details.
Laajenna nähdäksesi täyden palkkakontekstin
Näe markkinasijoittuminen, palkkahaarukan vertailukaavio ja lokalisoitu palkkanarratiivi.
Työtehtävän kuvaus
Verda is a technology company building the next generation of cloud infrastructure for AI. We operate GPU clusters across Europe, the US, and Asia, and we run some of the most demanding AI/ML workloads in production today — from frontier-model training to latency-sensitive inference at scale. We’re a low-hierarchy team that ships pragmatically and gives engineers real ownership of the systems they build. Running ML/AI workloads in containers and serverless environments looks simple on a slide and is anything but in production. Between the object store and the GPU sit a dozen layers — image pulls, network filesystems, page cache, model loaders, runtime initialization — and each one quietly contributes to how long a workload takes to become useful and how fast it runs once it is. We’re hiring a Performance Engineer to own that surface for Verda’s container and serverless GPU platforms. You’ll characterize where time and throughput actually go across the stack, and turn that understanding into concrete platform improvements. Cold-start latency is one of the more visible expressions of the problem — a 70B model that takes ninety seconds to load is a ninety-second outage from the user’s perspective — but the same fundamentals shape steady-state inference throughput, training step time, and checkpoint behavior. We want someone who understands how all of those parts interact. Profile and optimize the end-to-end path for containerized ML/AI workloads: image distribution, runtime startup, weight loading, inference hot path. Design and tune the storage layer that sits between S3-compatible object stores and GPU nodes — prefetchers, caching tiers, network filesystems and local NVMe layouts. Drive measurable wins on time-to-first-token, training step time, and cold-start latency across both internal services and customer workloads. Benchmark and characterize real workloads, not synthetic ones, and turn the findings into platform changes. Work across compute, networking, and platform teams to remove bottlenecks end to end, not just at one layer. Publish internal (and occasionally external) write-ups so the rest of the org — and our customers — understand the trade-offs. Keep up to date with the evolving ML/AI ecosystem.
Yrityksen tiedot

Verda
Näytetyt tiedot perustuvat tietokantamme aiempiin työpaikkailmoituksiin.
Työn tiedot
