Grafana

Grafana

Grafana is the open observability platform providing analytics, monitoring, and visualization solutions with a focus on user control and cost efficiency.

IT Services
1K-5K
Founded 2014
$535M raised

Description

  • Design, build, and operate reconciliation systems that track desired stack state and detect and repair configuration drift.
  • Collaborate across SSS, grafana.com, deployment configurations, and adjacent teams to keep stack lifecycle workflows reliable and resilient.
  • Improve operational efficiency by simplifying deployment and rollout processes for stack services.
  • Manage rollout mechanisms for plugins, dashboards, data sources, Grafana versions, release channels, and stack-level configuration.
  • Support new region and cluster rollouts and the operational paths required to bring stacks online safely.
  • Improve incident response and recovery for stack misalignment, reconciliation failures, rollout issues, and integration failures.
  • Partner with Product, Hosted Grafana, Infrastructure, Support, and other AppCore squads on customer-impacting lifecycle work.
  • Contribute to roadmap planning, technical design, on-call improvements, and long-term simplification of stack operations.
  • Own the production behavior of the systems you build by improving runbooks, dashboards, alerts, safety controls, and recovery procedures.
  • Write efficient, readable, maintainable code and implement new microservices or systems as needed.

Requirements

  • At least 1 year of fully remote work experience.
  • Professional experience with Golang.
  • Experience working on a SaaS platform.
  • Familiarity with distributed systems concepts such as scalability, multi-tenancy, and high availability.
  • Ability to work across both backend service and application code.
  • Strong focus on developer experience, user experience, and product quality.
  • Experience contributing to projects from initial brainstorming through delivery.
  • Ability to write clean, well-tested software that is easy for other engineers to operate and maintain.
  • Experience breaking down well-defined tasks into iterative deliveries and gathering feedback.
  • Willingness to collaborate across teams and align work with other squads and external stakeholders.
  • Familiarity with Kubernetes in AWS, GCP, or Azure.
  • Exposure to infrastructure-as-code tools such as Helm, Terraform, or Jsonnet.
  • Experience participating in blameless incident response and post-incident reviews.
  • Experience with TypeScript/Node.js is a plus.
  • Experience with Kubernetes control-plane patterns, operators, reconcilers, or desired-state systems is a plus.
  • Experience with Jsonnet/Tanka, Terraform, Flux, Argo, or similar deployment/configuration tooling is a plus.
  • Experience with SaaS provisioning, tenancy, regional expansion, plugin rollout, or customer lifecycle systems is a plus.
  • Experience with incident response involving configuration drift, partial failure, or cross-service state mismatch is a plus.

Benefits

  • UK compensation range of GBP 72K - GBP 90K.
  • Restricted Stock Units (RSUs).
  • 100% remote, global work culture.
  • 30 days of annual leave per year, including 3 Grafana Shutdown Days.
  • In-person onboarding.
  • Career growth pathways and development opportunities.
  • Transparent communication and approachable leadership.
  • High trust, low ego, innovation-driven environment.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Java Engineer - Distributed Systems - Elasticsearch

Elastic 1K-5K Internet Software & Services

Elastic is hiring a Senior Software Engineer for the Elasticsearch Distributed Systems team to improve cluster-scale indexing, coordination, and resilience across a highly distributed search platform.

Elasticsearch Java Lucene
4 hours, 59 minutes ago

Senior Java Engineer - Distributed Systems - Elasticsearch

Elastic 1K-5K Internet Software & Services

Elastic is hiring a Senior Software Engineer for its Elasticsearch Distributed Systems team to improve the scale, performance, and resilience of clustered search infrastructure.

Elasticsearch Java Lucene
4 hours, 59 minutes ago

Senior Java Engineer - Distributed Systems - Elasticsearch

Elastic 1K-5K Internet Software & Services

Elastic is hiring a Senior Software Engineer for its Elasticsearch Distributed Systems team to help improve the scale, performance, and resilience of the cluster systems that handle indexing, allocation, replication, and node coordination.

Elasticsearch Java Lucene
4 hours, 59 minutes ago

Senior Software Engineer - Fullstack (Backend Focused)

New Relic 1K-5K Internet Software & Services

New Relic is hiring a backend engineer to help build a new observability experience and next-generation platform services for distributed systems in an AI-first environment.

Agile CI/CD Docker Git GraphQL Java Kafka Kubernetes Microservices React REST API Spring Boot TypeScript
4 hours, 59 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers