OfficeSpace Software

OfficeSpace Software

OfficeSpace Software is the world's leading workplace management platform that provides a complete solution for the allocation and management of company workspaces. With a focus on efficiency and user-friendly tools, OfficeSpace empowers organizations ...

Internet Software & Services
251-1K
Founded 2004
$150M raised

Description

  • Drive measurable improvements in latency, throughput, and availability across a large-scale production environment.
  • Own system performance across Linux internals and Kubernetes scheduling, and eliminate bottlenecks before customers are impacted.
  • Define and enforce SLIs, SLOs, and error budgets to balance speed, reliability, and growth.
  • Partner with application engineers to profile code paths, improve execution efficiency, and harden services under real load.
  • Lead database performance optimization across queries, indexing, replication, and workload isolation.
  • Design and oversee AI-assisted load testing, stress testing, and capacity planning workflows.
  • Guide the migration from monolithic deployments to multi-tenant Kubernetes platforms.
  • Reduce infrastructure spend through architectural decisions, right-sizing, and intelligent scaling strategies.
  • Build and supervise automation for infrastructure provisioning, configuration management, and observability.
  • Set operational standards for reliability, performance, and incident response for production systems.

Requirements

  • 7+ years of experience operating and evolving large-scale production systems.
  • Deep Linux systems expertise with hands-on performance tuning across CPU, memory, disk, and networking.
  • Strong Python skills for automation, tooling, and AI-assisted systems workflows.
  • Production experience with Ruby/Rails ecosystems, including Puma and Sidekiq.
  • Proven ability to diagnose and resolve complex database performance issues in MySQL/MariaDB or PostgreSQL.
  • Advanced Kubernetes experience, including workload sizing, scheduling, and multi-tenant operations.
  • Infrastructure-as-code experience with Terraform and Terragrunt.
  • Experience with configuration management tools such as Puppet or Ansible.
  • Strong observability experience across metrics, logs, and traces using tools like Prometheus, Grafana, Datadog, or ELK.
  • AI fluency and comfort supervising AI agents for analysis, testing, and reporting, and validating their outputs.
  • Preferred background scaling and refactoring monolithic applications under real production load.
  • Preferred background extracting databases or other stateful components from monoliths.
  • Preferred background with Apache and Nginx tuning at scale.
  • Preferred background in Redis performance optimization and operational management.
  • Preferred background with CI/CD systems and GitOps workflows, including ArgoCD.
  • Preferred background with cloud cost optimization and FinOps-aligned operational practices.

Benefits

  • Competitive benefits packages globally.
  • Benefits designed to support health, well-being, and financial security.
  • Autonomy and ownership in a trust-based environment.
  • Opportunities for growth, learning, and professional development.
  • A collaborative, results-driven culture.
  • A fast-paced, innovation-focused environment that embraces AI and change.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Operations Engineer

Mozilla 251-1K Internet Software & Services

Mozilla is hiring a Staff Operations Engineer to lead the design, reliability, and evolution of hybrid-cloud and workplace infrastructure across teams.

Ansible DNS Linux Puppet Python TCP/IP Unix
6 hours, 35 minutes ago

Principal Site Reliability Engineer (SRE)

Symmetrio Professional Services

Symmetrio is recruiting a Principal Site Reliability Engineer for a rapidly growing healthcare technology company to own the reliability, scalability, security, and performance of a mission-critical SaaS platform used by healthcare providers across the United States.

Active Directory AWS CI/CD Datadog Django Grafana Kubernetes Python Terraform Windows Server
6 hours, 50 minutes ago

Performance Test Engineer Lead

PartnerOne 51-250 Media

An enterprise performance engineering role at a cloud-focused organization, responsible for validating the scalability, stability, and production readiness of distributed systems across Azure and hybrid environments.

Azure CI/CD Kubernetes PowerShell
7 hours, 5 minutes ago

Site Reliability Engineer

MLabs 11-50 Internet Software & Services

Remote UK-hours Site Reliability Engineering role at a financial technology company, focused on automating and operating the infrastructure that supports global integration services for financial institutions.

Active Directory Ansible AWS CI/CD GCP OAuth PostgreSQL SAML
7 hours, 20 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers