Staff Software Engineer - Grafana Cloud Observability, Kubernetes Monitoring | Canada | Remote

1 hour, 23 minutes ago
Full-time
Lead
Software Development
Grafana

Grafana

Grafana is the open observability platform providing analytics, monitoring, and visualization solutions with a focus on user control and cost efficiency.

IT Services
1K-5K
Founded 2014
$535M raised

Description

  • Design and implement scalable integrations for infrastructure components, applications, and data ingestion pipelines.
  • Create middleware components and libraries that simplify observability solution development and maintenance.
  • Build and maintain backend systems for opinionated observability applications such as Cloud Provider Observability, Database Observability, and Kubernetes Monitoring.
  • Develop dashboards, alerts, documentation, and infrastructure that support observability workflows.
  • Collaborate with product, design, docs, Sales, and Support teams to deliver customer-focused features and a holistic product experience.
  • Lead the technical direction and vision of the team and contribute to strategic discussions on future observability solutions.
  • Estimate, plan, coordinate, and deliver large cross-system projects.
  • Coach and mentor other team members and help identify areas for technical and process improvement.
  • Represent Grafana Labs in open source forums, working groups, and events when needed.
  • Contribute to open source projects and communities, including Alloy, Prometheus, OpenTelemetry, Beyla, and related initiatives.

Requirements

  • 8+ years of experience with at least one major programming language, such as Python, .NET, Java, Go, or Rust.
  • Experience operating high-scale production systems on Kubernetes, including on-call participation, incident response, and postmortem practices.
  • Familiarity with observability tooling, such as Grafana.
  • Strong understanding of time-series data, metrics cardinality challenges, and observability cost/performance tradeoffs.
  • Hands-on technical leadership experience, including setting technical direction and influencing architectural decisions beyond your immediate team.
  • Deep understanding of distributed systems concepts, including scalability, consistency, high availability, and failure modes.
  • Experience writing clean, maintainable, robust, and performant software.
  • Experience delivering projects from start to finish in a self-driven manner.
  • Excellent problem-solving and debugging skills.
  • Strong mentoring and leadership skills.
  • Passion for observability and willingness to share knowledge through documentation and blog posts.
  • Relevant open source experience, ideally in the observability domain, and interest in becoming an active member of the OpenTelemetry and Prometheus communities.
  • Curiosity to learn new programming languages and frameworks, set up examples, and understand how systems work.
  • Good understanding of typical production environments and experience operating production services and organizing on-call.
  • Certified Kubernetes Administrator (CKA), Certified Kubernetes Application Developer (CKAD), or another CNCF Kubernetes-related certification is a plus.
  • Experience operating or scaling Prometheus in high-cardinality, multi-tenant environments is a plus.
  • Experience with OpenTelemetry Collector pipelines or similar telemetry ingestion systems is a plus.
  • Experience developing Kubernetes operators, controllers, or custom resources is a plus.
  • Experience contributing to or maintaining open source projects with successful pull requests and community collaboration is a plus.
  • Experience designing and building observability backends for various systems and applications is a plus.

Benefits

  • CAD 186,368 to CAD 223,642 annual compensation in Canada, depending on level, experience, and skillset.
  • Restricted Stock Units (RSUs) for all roles, giving team members ownership in Grafana Labs' success.
  • 100% remote, global-first work environment.
  • Global annual leave policy of 30 days per year.
  • 3 days of annual leave reserved for Grafana Shutdown Days.
  • In-person onboarding to help new hires get started and connected.
  • Access to modern AI coding assistants with a company-funded usage budget, within security guidelines.
  • Access to frontier models such as GPT-Codex 5/3, Claude Opus 4.6, and Gemini 3 Pro.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Software Engineer, Security & Privacy

Abnormal AI Internet Software & Services

Abnormal AI is hiring a Staff Software Engineer to lead Security & Privacy strategy for its Multi-Product Platform, shaping technical roadmaps and delivering initiatives that strengthen trust across the company’s enterprise platform.

Go Machine Learning Microservices Python System Design
8 minutes ago

AI - Leveraged Product Development Generalist

Hyphen Connect 1-10 staffing & recruiting

An AI Leveraged Product Generalist at a fast-moving product lab will help build and launch prototypes and MVPs across martech, fintech, web apps, websites, crypto, and related domains.

Blockchain Figma JIRA Prototyping React Wireframing
8 minutes ago

Senior Software Engineer, Backend - Platform (Core AI Automation)

Coinbase 1K-5K Capital Markets

Coinbase is hiring a Software Engineer for its Core Automation Team to build AI infrastructure and automation that improve customer support and compliance operations across the company.

Docker Generative AI Go LLM Microservices MongoDB PostgreSQL Python
23 minutes ago

Senior Software Engineer

6sense 1K-5K IT Services

6sense is hiring a Senior Software Engineer to build scalable backend and full-stack systems that support its growth-focused technology platform.

AWS Azure GCP Go Java Microservices Python System Design TypeScript
38 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers