OfficeSpace Software

OfficeSpace Software

OfficeSpace Software is the world's leading workplace management platform that provides a complete solution for the allocation and management of company workspaces. With a focus on efficiency and user-friendly tools, OfficeSpace empowers organizations ...

Internet Software & Services
251-1K
Founded 2004
$150M raised

Description

  • Drive measurable improvements in latency, throughput, and availability across a large-scale production environment.
  • Own system performance across Linux internals and Kubernetes scheduling, and eliminate bottlenecks before customers are impacted.
  • Define and enforce SLIs, SLOs, and error budgets to balance speed, reliability, and growth.
  • Partner with application engineers to profile code paths, improve execution efficiency, and harden services under real load.
  • Lead database performance optimization across queries, indexing, replication, and workload isolation.
  • Design and oversee AI-assisted load testing, stress testing, and capacity planning workflows.
  • Guide the migration from monolithic deployments to multi-tenant Kubernetes platforms.
  • Reduce infrastructure spend through architectural decisions, right-sizing, and intelligent scaling strategies.
  • Build and supervise automation for infrastructure provisioning, configuration management, and observability.
  • Set operational standards for reliability, performance, and incident response for production systems.

Requirements

  • 7+ years of experience operating and evolving large-scale production systems.
  • Deep Linux systems expertise with hands-on performance tuning across CPU, memory, disk, and networking.
  • Strong Python skills for automation, tooling, and AI-assisted systems workflows.
  • Production experience with Ruby/Rails ecosystems, including Puma and Sidekiq.
  • Proven ability to diagnose and resolve complex database performance issues in MySQL/MariaDB or PostgreSQL.
  • Advanced Kubernetes experience, including workload sizing, scheduling, and multi-tenant operations.
  • Infrastructure-as-code experience with Terraform and Terragrunt.
  • Experience with configuration management tools such as Puppet or Ansible.
  • Strong observability experience across metrics, logs, and traces using tools like Prometheus, Grafana, Datadog, or ELK.
  • AI fluency and comfort supervising AI agents for analysis, testing, and reporting, and validating their outputs.
  • Preferred background scaling and refactoring monolithic applications under real production load.
  • Preferred background extracting databases or other stateful components from monoliths.
  • Preferred background with Apache and Nginx tuning at scale.
  • Preferred background in Redis performance optimization and operational management.
  • Preferred background with CI/CD systems and GitOps workflows, including ArgoCD.
  • Preferred background with cloud cost optimization and FinOps-aligned operational practices.

Benefits

  • Competitive benefits packages globally.
  • Benefits designed to support health, well-being, and financial security.
  • Autonomy and ownership in a trust-based environment.
  • Opportunities for growth, learning, and professional development.
  • A collaborative, results-driven culture.
  • A fast-paced, innovation-focused environment that embraces AI and change.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Manager, Software Engineering (Resilience Engineering)

Affirm 1K-5K Diversified Financial Services

Affirm is hiring an Engineering Manager to lead its Resilience Engineering team in building production load testing and chaos engineering capabilities that improve the safety and reliability of its production systems.

AWS Java Kotlin Kubernetes Python
1 hour, 54 minutes ago

Manager, Software Engineering (Resilience Engineering)

Affirm 1K-5K Diversified Financial Services

Affirm is seeking an Engineering Manager to lead its Resilience Engineering team, building production load testing and chaos engineering capabilities that improve the safety and reliability of production systems.

AWS Java Kotlin Kubernetes Microservices Python
3 hours, 12 minutes ago
7 hours, 9 minutes ago

Site Reliability Engineer

Sitetracker 251-1K Diversified Telecommunication Services

Site Reliability Engineer at a Canada-based technology company, responsible for building and scaling a proactive reliability practice for AI-driven platform workloads in a remote environment.

AWS Bash CloudFormation EC2 GitHub Actions Load Balancing Terraform
9 hours, 52 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers