Staff Software Engineer - Grafana Cloud k6 | USA | Remote

14 hours, 44 minutes ago
Full-time
Lead
Software Development
Grafana

Grafana

Grafana is the open observability platform providing analytics, monitoring, and visualization solutions with a focus on user control and cost efficiency.

IT Services
1K-5K
Founded 2014
$535M raised

Description

  • Build and scale a culture of operational excellence by defining standards and coaching teams to own reliability and availability.
  • Drive DevOps/SRE practices, including incident response, post-incident reviews, on-call readiness, runbooks, alerting, observability, and release/change management.
  • Establish and apply reliability frameworks such as SLIs, SLOs, and error budgets to guide prioritization and trade-offs.
  • Provide visibility into system health through operational metrics and reliability reporting.
  • Guide teams in the design, development, evolution, and operation of large-scale distributed cloud systems.
  • Influence product and system direction through design reviews, architectural discussions, and cross-team collaboration.
  • Share knowledge through clear documentation and technical communication to help teams build and operate systems effectively.
  • Grow into broader application and product development leadership as the reliability foundation matures.

Requirements

  • Strong experience with DevOps/SRE practices and operating production systems at scale.
  • Strong programming background in a modern language; Python and Go are the primary languages, but prior experience in them is not required.
  • Experience designing, building, and operating large-scale distributed systems.
  • Strong understanding of reliability engineering concepts such as incident management, observability, and failure modes.
  • Experience with test automation, including performance and functional testing.
  • Ability to influence engineering practices through clear technical communication, reviews, and collaboration.
  • Strong interpersonal skills and ability to work effectively across teams.
  • Familiarity with modern software engineering processes and delivery practices.
  • Self-driven and comfortable operating with a high degree of autonomy and ambiguity.
  • Experience with containerized and cloud-native systems such as Docker, Kubernetes, and AWS (preferred).
  • Familiarity with observability tooling and platforms, including the Grafana stack (preferred).
  • Experience working with Python, Go, JavaScript, and/or Jsonnet (preferred).
  • Experience building or operating event-driven or asynchronous systems (preferred).
  • Experience defining or applying SLIs/SLOs, error budgets, or reliability metrics (preferred).
  • Interest in or experience with building testing frameworks or developer tooling (preferred).

Benefits

  • Base salary range of $174,986 to $209,983 in the U.S., depending on level, experience, and skillset.
  • Restricted Stock Units (RSUs) for all roles.
  • 100% remote work with a global, remote-only culture.
  • Global annual leave policy of 30 days per year.
  • 3 days of annual leave reserved for Grafana Shutdown Days.
  • Company-funded access to modern AI coding assistants for daily development work, within security guidelines.
  • Access to frontier models such as GPT-Codex 5/3, Claude Opus 4.6, and Gemini 3 Pro.
  • In-person onboarding to help new hires ramp up successfully.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Database Reliability Engineer

Sporty Group 51-250 Media

Sporty is seeking a Database Reliability Engineer to own and improve its database infrastructure supporting multiple platforms and international expansion.

Ansible Argo CD Elasticsearch GitHub Actions Go Grafana Helm Jenkins Kubernetes MongoDB MySQL PostgreSQL Prometheus Python RabbitMQ Terraform
10 hours, 14 minutes ago

Senior Site Reliability Engineer

Moniepoint 1K-5K Diversified Financial Services

Moniepoint is hiring an experienced Site Reliability Engineer to improve the reliability, scalability, and observability of its highly distributed financial platform serving emerging markets.

AWS Azure Datadog GCP Go Java Kafka Kubernetes Microservices MySQL New Relic OpenTelemetry PostgreSQL Prometheus Python RabbitMQ Rust
10 hours, 59 minutes ago

Staff Software Engineer - Product Analytics

Datadog 5K-10K IT Services

Datadog is hiring a Staff Engineer to lead the backend technical direction for its Product Analytics platform, building systems that help customers analyze user behavior, retention, and growth at scale.

SQL
11 hours, 14 minutes ago

Principal Engineer, Ads Measurement

Unity 5K-10K Internet Software & Services

Unity is hiring a Principal Engineer for Ads Measurement to lead the development of self-attribution and install measurement systems that help the company independently evaluate ad performance and support optimization across its ads platform.

C++ Go Java
11 hours, 14 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers