Nexthink-logo

Senior Site Reliability Engineer - Nexthink - Etätyö - Globaali

Site Reliability Engineer

Julkaistu: 27. huhtikuuta 2024
Julkaistu 764 päivää sitten
Viimeksi nähty crawlissa: 29. toukokuuta 2026 (1pv sitten)
Arvioitu päättymispäivä: 1. kesäkuuta 2024
Työskentelytapa
Rooli ja johtaminen
Roolitaso:Keskitaso
Esihenkilötaso:Ei henkilöstöjohtamista
Työsuhteen tyyppi
Kokemus
5 vuotta
Vaaditut kielet

Työtehtävän kuvaus

At Nexthink, we empower our customers with industry-leading solutions to enable continuous improvement of employee experience. We deliver unmatched visibility across all environments, so IT teams can consistently see, diagnose, and fix digital workplace issues. As a SaaS provider, our commitment is to deliver a seamless, resilient, and scalable platform around the clock. We are looking for an experienced, proactive and innovative professional that is keen to join as a Senior Site Reliability Engineer! The mission of Nexthink's SRE team is to strengthen our infrastructure and enhance our ability to deploy, monitor, and scale systems effectively and reliably. They work closely with over 50 Product Engineering teams that develop our products and services, as well as with the Technical Platform Engineering, Security and Architecture teams to understand the reliability requirements, design and implement solutions, and promote them for adoption and usage. Join our vibrant team of diverse and experienced engineers where cutting-edge technology meets innovation. Be a part of Nexthink's Digital Employee Experience technological revolution, ensuring our global customers enjoy a seamless user experience. Apply now and become a key player in our dynamic SRE organisation. As a Senior Site Reliability Engineer, you will:* Implement and manage cloud-native systems (AWS) using best-in-class tools and automation.* Operate and enhance Kubernetes clusters, deployment pipelines, and service meshes to support rapid delivery cycles.* Design, build, and maintain the infrastructure powering our multi-tenant SaaS platform with reliability, security, and scalability in mind.* Define and maintain SLOs, SLAs, and error budgets, and proactively address availability and performance issues.* Develop infrastructure-as-code (Terraform or similar) for repeatable and auditable provisioning.* Build internal platform tools and automation to support provisioning, monitoring, and operational efficiency.* Monitor infrastructure and applications ensuring high-quality user experiences.* Participate in a shared on-call rotation, responding to incidents, troubleshooting outages, and driving timely resolution and communication.* Act as an Incident Commander during the on-call duty and coordinate cross-team responses effectively to maintain an SLA.* Drive and refine incident response processes, reducing Mean Time to Detect (MTTD) and Mean Time to Recovery (MTTR).* Diagnose and resolve complex issues independently, minimizing the need for external escalation.* Work closely with software engineers to embed observability, fault tolerance, and reliability principles into service design.* Automate runbooks, health checks, and alerting to support reliable operations with minimal manual intervention.* Support automated testing, canary deployments, and rollback strategies to ensure safe, fast, and reliable releases.* Contribute to security best practices, compliance automation, and cost optimization.

Yrityksen tiedot

Current open roles at Nexthink on JobCrawls
LocationActive listings
Etätyö - Globaali27
Etätyö - Pohjois-Amerikka2
Maailmanlaajuinen - Etä1
Current role mix at Nexthink on JobCrawls
Role typeActive listings
Yrityksen tietoturva-asiantuntija5
Ohjelmistoinsinööri3
Myyntiedustaja3
Ohjelmistosuunnittelija3
Ammattipalveluiden konsultti1
Tuotetuen insinööri1
Senior Site Reliability Engineer1
GTM Projektipäällikkö1
Hakukoneinsinööri1
Henkilöstöasiantuntija1
Ohjelmistokehittäjä1
Data-analyytikko / Insinööri1
Tuoteturvallisuusinsinööri1
Asiakashenkilö1
Site Reliability Engineer1
AI-tutkimusinsinööri1
Konsultti1
Total Rewards & Transformation1
Full Stack Developer1
Engagement Manager1
Current role-level mix at Nexthink on JobCrawls
Role levelActive listings
Keskitaso24
Senior1
Senior-taso1

Nexthink 30 indeksoitua työpaikkailmoitusta JobCrawlsin Suomen aineistossa ajankohdasta lokakuu 2023 lähtien. Historiallisessa indeksissä vahvimmat sijaintisignaalit tälle työnantajalle ovat Etätyö - Globaali, Etätyö - Pohjois-Amerikka, ja Maailmanlaajuinen - Etä.

Näytetyt tiedot perustuvat tietokantamme aiempiin työpaikkailmoituksiin.

Työn tiedot

Vastuut

  • Implement and manage cloud-native systems (AWS) using best-in-class tools and automation.
  • Operate and enhance Kubernetes clusters, deployment pipelines, and service meshes to support rapid delivery cycles.
  • Design, build, and maintain the infrastructure powering our multi-tenant SaaS platform with reliability, security, and scalability in mind.
  • Define and maintain SLOs, SLAs, and error budgets, and proactively address availability and performance issues.
  • Develop infrastructure-as-code (Terraform or similar) for repeatable and auditable provisioning.
  • Build internal platform tools and automation to support provisioning, monitoring, and operational efficiency.
  • Monitor infrastructure and applications ensuring high-quality user experiences.
  • Participate in a shared on-call rotation, responding to incidents, troubleshooting outages, and driving timely resolution and communication.
  • Act as an Incident Commander during the on-call duty and coordinate cross-team responses effectively to maintain an SLA.
  • Drive and refine incident response processes, reducing Mean Time to Detect (MTTD) and Mean Time to Recovery (MTTR).
  • Diagnose and resolve complex issues independently, minimizing the need for external escalation.
  • Work closely with software engineers to embed observability, fault tolerance, and reliability principles into service design.
  • Automate runbooks, health checks, and alerting to support reliable operations with minimal manual intervention.
  • Support automated testing, canary deployments, and rollback strategies to ensure safe, fast, and reliable releases.
  • Contribute to security best practices, compliance automation, and cost optimization.

Vaatimukset

  • Bachelor’s degree in Computer Science or equivalent practical experience
  • 5+ years of experience as a Site Reliability Engineer or Platform Engineer
  • Strong hands-on experience with cloud services (AWS, GCP, Azure)
  • Strong programming or scripting skills (Python, Go, Bash)
  • Experience with infrastructure-as-code (Terraform)
  • Proficiency with Kubernetes, Docker, Helm
  • Experience supporting microservices architectures
  • Experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI, FluxCD)
  • Experience with monitoring solutions (Datadog)
  • Comfortable with on-call management and incident response
  • Strong troubleshooting skills and Linux knowledge
  • Understanding of network stack, cloud architectures, service mesh, storage
  • Knowledge of deployment strategies and compliance standards
  • Experience with chaos engineering or resilience testing
  • Excellent problem-solving and communication skills
  • Fluent in English

Taidot ja teknologiat

AWSKubernetesTerraformDockerHelmDatadogJenkinsGitHub ActionsGitLab CIFluxCDLinuxTCP/IPVPNIstioS3EBS

Koulutustaso

Ei vaadita
1 päivä sittenContent Complete

Help us improve JobCrawls — sign in to sync saved jobs across devices, or send feedback anytime.