Grafana

Grafana is the open observability platform providing analytics, monitoring, and visualization solutions with a focus on user control and cost efficiency.

IT Services

Information Technology

1K-5K (1039)

Founded 2014

$535M raised

21 open positions

Links

View All Jobs

Staff Software Engineer - Grafana Databases, Managed Services | Canada | Remote

1 day, 3 hours ago

Canada

Full-time

Lead

Site Reliability Engineer (SRE)

DevOps and Infrastructure

AWS Azure Cassandra ClickHouse GCP Go Grafana Helm Kafka Kubernetes Linux Microservices PostgreSQL Snowflake Terraform

Apply Now

Grafana

Grafana is the open observability platform providing analytics, monitoring, and visualization solutions with a focus on user control and cost efficiency.

IT Services

1K-5K

Founded 2014

$535M raised

View All Jobs 21

Description

Operate and evolve 100+ multi-cloud streaming clusters and related database infrastructure.
Diagnose and resolve cross-layer failures involving storage latency, noisy neighbors, control-plane bottlenecks, and query regressions.
Design safe upgrade, rollout, migration, and partitioning strategies at scale.
Improve observability, automation, and day-to-day operational ergonomics.
Partner with database and platform teams to support safe scaling, consumer fan-out, and query performance.
Work hands-on with distributed systems behavior, Kubernetes scheduling, storage engines, and compression trade-offs.
Serve as a primary escalation point and participate in on-call incident response.
Own relationships with system vendors, including WarpStream Labs.
Define and evolve technical direction for operating WarpStream and adjacent shared database systems.
Mentor engineers and help mature the team’s technical practices.

Requirements

8+ years of engineering experience, including time in SRE, platform engineering, production engineering, infrastructure engineering, or distributed systems roles.
Experience with high-throughput streaming systems, analytical or storage backends, or large-scale database infrastructure such as Kafka, Redpanda, WarpStream, Postgres, ClickHouse, Snowflake, or Cassandra.
Strong Kubernetes experience in AWS, GCP, or Azure, plus familiarity with infrastructure-as-code tools such as Helm, Terraform, or Jsonnet.
Experience leading or driving complex technical efforts, even without formal management responsibilities.
Strong understanding of distributed systems failure modes in multi-cloud environments.
Proficiency in at least one systems-oriented language; Go is preferred.
Working knowledge of Linux internals, networking, cloud storage, and performance/scaling behavior.
Experience participating in blameless incident response and writing high-quality post-incident reviews.
Clear communication skills and the ability to collaborate across teams while working autonomously.
Must be located in Canadian time zones; role is remote-first.

Benefits

Base salary range in Canada: CAD 186,368 to CAD 223,642.
Equity and bonus eligibility, where applicable.
All roles include Restricted Stock Units (RSUs).
100% remote, global work environment.
Global annual leave policy of 30 days per year, including 3 Grafana Shutdown Days.
In-person onboarding to help new hires get started.
Access to company-funded modern AI coding assistants within security guidelines.
Access to frontier AI models for daily development work.
Career growth pathways and development opportunities.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Voleon is hiring a Site Reliability Engineer to improve the reliability, operations, and efficiency of production-critical infrastructure and data pipelines supporting its AI- and ML-driven investment systems.

United States Full-time Junior Site Reliability Engineer (SRE)

$120k-$160k

Apache Airflow CI/CD Git Go Grafana gRPC Jenkins Kubernetes Linux Microservices Pandas PostgreSQL Prometheus Python R SQL

1 day, 3 hours ago

Apply

1 day, 3 hours ago

Senior SRE/DevOps Engineer

Metabase 51-250 IT Services

Metabase is hiring a Senior SRE/DevOps Engineer to own the infrastructure and operations behind its fast-growing Metabase Cloud hosted analytics product.

Anywhere Full-time Senior DevOps Engineer Site Reliability Engineer (SRE)

AWS CI/CD Datadog Go Grafana Kubernetes Prometheus Python Terraform

1 day, 3 hours ago

Apply

1 day, 3 hours ago

Lead Site Reliability Engineer - 10929

Coupa Software 1K-5K Internet Software & Services

Coupa is hiring a Lead Site Reliability Engineer in Mexico City to build and operate reliable cloud and GenAI infrastructure for its spend management platform.

Mexico Full-time Lead Machine Learning Engineer Site Reliability Engineer (SRE)

AWS Azure Bash Chef DNS GCP Generative AI Git GitHub Actions Helm Kubernetes Linux LLM Machine Learning Microservices MySQL New Relic PagerDuty Python SageMaker Terraform

1 day, 3 hours ago

Apply

1 day, 3 hours ago

Site Reliability Engineer

Binance 5K-10K Capital Markets

Binance is hiring a Senior Site Reliability Engineer to improve the reliability and performance of its internal distributed test and validation environment for web, API, and Android testing.

Asia Full-time Senior Site Reliability Engineer (SRE)

Android Android Development Appium CI/CD Microservices Node.js Playwright Puppeteer Selenium

1 day, 4 hours ago

Apply

1 day, 4 hours ago

Grafana

Tags

Links

Staff Software Engineer - Grafana Databases, Managed Services | Canada | Remote

Grafana

Description

Requirements

Benefits

Similar Roles

Site Reliability Engineer

Senior SRE/DevOps Engineer

Lead Site Reliability Engineer - 10929

Site Reliability Engineer

You're on a roll! Sign up now to keep applying.