Grafana

Grafana

Grafana is the open observability platform providing analytics, monitoring, and visualization solutions with a focus on user control and cost efficiency.

IT Services
1K-5K
Founded 2014
$535M raised

Description

  • Own the end-to-end development of multi-agent AI systems, from architecture and implementation through testing, deployment, and ongoing operation.
  • Build modular agentic systems using orchestration frameworks such as LangChain, CrewAI, Anthropic MCP, or similar.
  • Develop reusable agent skills that work across interfaces including Slack, dashboards, internal apps, and CLIs.
  • Implement observability, feedback loops, prompt iteration, model evaluation, and cost management for AI workflows.
  • Establish governance and compliance standards for AI workflows, including access controls, audit trails, PII handling, and human-in-the-loop escalation.
  • Build MCP servers, APIs, CLIs, and microservices that connect AI models to internal and third-party business systems.
  • Architect data flows for retrieval-augmented generation that connect LLMs to internal knowledge bases, customer data, and real-time business context.
  • Build serverless or containerized services on GCP that scale with usage and integrate with Grafana's cloud infrastructure.
  • Partner with Marketing Operations, RevOps, Demand Generation, Regional Marketing, and SDR teams to identify automation opportunities and deliver measurable outcomes.
  • Design self-service automation workflows with documentation, playbooks, and enablement materials so partner teams can operate independently.

Requirements

  • 8+ years of software engineering experience with depth in backend development, systems integration, or data/analytics engineering.
  • 2+ years of hands-on experience applying LLMs or AI to production workflows.
  • Strong proficiency in Python and JavaScript/Node.js, with Git-based workflows, code review practices, and testing discipline.
  • Hands-on experience with LLM patterns including prompt engineering, RAG, function calling/tool use, structured output parsing, and evaluation.
  • Experience building and operating multi-agent systems at scale, including agent decomposition, orchestration patterns, state management, and production monitoring.
  • Deep familiarity with Google Cloud Platform, BigQuery, and serverless/containerized services such as Cloud Functions and Cloud Run.
  • Understanding of LLM failure modes and production mitigations such as confidence thresholds, fallback logic, human escalation, and cost/latency management.
  • Proven ability to identify high-leverage problems, push back on low-impact requests, and deliver end-to-end with minimal direction.
  • Fluency with AI-assisted development tools such as GitHub Copilot, Cursor, and Claude Code.
  • Clear technical communication skills for explaining complex systems to engineers and business stakeholders.
  • Experience with vector databases or retrieval pipelines such as Pinecone, Weaviate, ChromaDB, Qdrant, or pgvector (bonus).
  • Familiarity with marketing or sales platforms such as Salesforce, Customer.io, HubSpot, Marketo, or Outreach (bonus).
  • Experience with frontend frameworks such as React or Slack Block Kit for user-facing AI interfaces (bonus).
  • Experience with observability tooling for AI systems such as LangSmith, Weights & Biases, or custom evaluation frameworks (bonus).
  • Experience with workflow orchestration platforms such as n8n, Temporal, Prefect, or Airflow (bonus).
  • Familiarity with Model Context Protocol (MCP) or similar standards for connecting AI systems to data sources (bonus).
  • Prior work automating marketing, sales, or customer success workflows in a B2B SaaS environment (bonus).
  • Active participation in open-source communities (bonus).

Benefits

  • Base compensation range in the United States of USD $154,445 to USD $185,334.
  • All roles include Restricted Stock Units (RSUs).
  • 100% remote work with a global team across 40+ countries.
  • Global annual leave policy of 30 days per year.
  • 3 days of annual leave reserved for Grafana Shutdown Days.
  • In-person onboarding to help new hires get started successfully.
  • Career growth pathways and development opportunities.
  • Transparent, high-trust, low-ego culture with open communication.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Backend Engineer

Remote 251-1K Professional Services

Remote is hiring a fully remote Engineer to build tools, APIs, and integrations for its HR and Payroll products as part of cross-functional teams operating globally.

Angular AWS CI/CD Clojure Docker Elixir Erlang GitHub GitLab Haskell Jenkins Kubernetes Next.js Phoenix PostgreSQL React Scala Vue.js
6 hours, 37 minutes ago

Senior Python Engineer

PatientPoint 251-1K IT Services

PatientPoint is hiring a Senior Software Engineer to help build its next-generation unified healthcare platform, focused on scalable backend systems and APIs used in digital advertising and patient engagement.

Agile Angular AWS Docker Elasticsearch HIPAA JavaScript JWT Kubernetes Microservices New Relic OAuth Python React SQL TypeScript
6 hours, 37 minutes ago

Staff Information Security Engineer - AI First

Rithum Internet Software & Services

Rithum is hiring a Staff AI-First Information Security Engineer to secure AI adoption across its commerce platform and internal operations by designing guardrails, automating controls, and reducing risk at scale.

AWS LLM Machine Learning Python SIEM Terraform
6 hours, 52 minutes ago

Senior Backend Engineer

Remote 251-1K Professional Services

Remote is hiring a fully remote engineer to help build tools, APIs, and integrations for its global HR and payroll products in a cross-functional, async environment.

Angular AWS CI/CD Clojure Docker Elixir Erlang GitHub GitLab Haskell Jenkins Kubernetes Next.js Phoenix PostgreSQL React Scala Vue.js
6 hours, 52 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers