Grafana

Grafana

Grafana is the open observability platform providing analytics, monitoring, and visualization solutions with a focus on user control and cost efficiency.

IT Services
1K-5K
Founded 2014
$535M raised

Description

  • Own end-to-end development of multi-agent AI systems, from architecture and implementation through testing, deployment, and ongoing operation.
  • Build modular, composable agentic systems using orchestration frameworks such as LangChain, CrewAI, Anthropic MCP, or similar tools.
  • Develop reusable agent capabilities that can be accessed across Slack, dashboards, internal apps, and CLI interfaces.
  • Implement observability and feedback loops for AI systems, including logging, performance metrics, prompt iteration, model evaluation, and cost management.
  • Establish governance and compliance standards for AI workflows, including access controls, audit trails, PII handling, and human escalation paths.
  • Build MCP servers, APIs, CLIs, and microservices that connect AI models to business systems such as BigQuery, Slack, CRMs, email, calendars, and analytics tools.
  • Architect retrieval-augmented generation data flows that connect LLMs to internal knowledge bases, customer data, and real-time business context.
  • Build serverless or containerized services on GCP, including Cloud Functions and Cloud Run, that scale with usage and integrate with Grafana's cloud infrastructure.
  • Partner with RevOps, Demand Generation, Regional Marketing, and SDR teams to identify automation opportunities and deliver measurable business outcomes.
  • Design and deploy automated workflows using orchestration tools or custom platforms, with CI/CD, testing, and production reliability standards.
  • Create documentation, playbooks, and enablement materials so partner teams can operate automation solutions independently.

Requirements

  • 8+ years of software engineering experience with depth in backend development, systems integration, or data/analytics engineering.
  • 2+ years of hands-on experience applying LLMs or AI to production workflows, not just prototypes.
  • Strong proficiency in Python and JavaScript/Node.js, with Git-based workflows, code review practices, and testing discipline.
  • Hands-on experience with LLM frameworks and patterns, including prompt engineering, RAG, function calling/tool use, structured output parsing, and evaluation.
  • Experience building and operating multi-agent systems at scale, including agent decomposition, orchestration patterns, state management, and production monitoring.
  • Deep familiarity with Google Cloud Platform, BigQuery, and serverless/containerized services such as Cloud Functions and Cloud Run.
  • Understanding of LLM failure modes and production mitigations, including confidence thresholds, fallback logic, human escalation, and cost/latency management.
  • Proven ability to identify high-leverage problems, push back on low-impact requests, and deliver end-to-end with minimal direction.
  • Fluency with AI-assisted development tools such as GitHub Copilot, Cursor, and Claude Code.
  • Clear technical communication skills, with the ability to explain complex systems to both engineers and business stakeholders.
  • Experience with vector databases or retrieval pipelines such as Pinecone, Weaviate, ChromaDB, Qdrant, or pgvector (bonus).
  • Familiarity with marketing or sales platforms such as Salesforce, Customer.io, HubSpot, Marketo, or Outreach (bonus).
  • Experience with frontend frameworks such as React or Slack Block Kit for user-facing AI interfaces (bonus).
  • Experience with observability tooling for AI systems such as LangSmith, Weights & Biases, or custom evaluation frameworks (bonus).
  • Experience with workflow orchestration platforms such as n8n, Temporal, Prefect, or Airflow (bonus).
  • Familiarity with Model Context Protocol (MCP) or similar standards for connecting AI systems to data sources (bonus).
  • Prior work automating marketing, sales, or customer success workflows in a B2B SaaS environment (bonus).
  • Active participation in open-source communities is preferred.

Benefits

  • Base compensation range of USD $174,986 to USD $209,983 in the United States.
  • All roles include Restricted Stock Units (RSUs).
  • 100% remote, global culture.
  • Global annual leave policy of 30 days per year.
  • 3 days of annual leave reserved for Grafana Shutdown Days.
  • In-person onboarding for new hires.
  • Career growth pathways and development opportunities.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

AI Full-Stack Developer

Nebius 51-250 Internet Software & Services

Nebius is hiring an AI Full-Stack Developer to build and scale internal automation solutions using AI, LLMs, and agent-based systems across business processes and system integrations.

JavaScript Linux LLM macOS Python React TypeScript
0 minutes ago

Software Engineer (AI)

Tines 51-250 Construction & Engineering

Tines is hiring a software engineer in Ireland to build and maintain AI-powered product features and the AI platform behind them as part of a collaborative engineering team.

Docker Machine Learning PostgreSQL React Redis Ruby Ruby on Rails TypeScript
0 minutes ago

Software Engineer, Marketplace ML Platform

Waymo Autonomous vehicles, robotics, AI, ride-hailing / mobility tech

Waymo is hiring a Platform engineer to build the core marketplace systems behind its autonomous ride-hail service, with a focus on economic decision engines, ML infrastructure, and real-time optimization for pricing, matching, and vehicle positioning.

C++ Feature Engineering Java Keras Machine Learning Python PyTorch TensorFlow
0 minutes ago

Senior Software Engineer, Data

PlayOn Sports is hiring a Senior Software Engineer, Data to design and operate the data services and APIs that support application teams across its high school sports platform.

Flink Kafka Machine Learning Python REST API Snowflake SQL
0 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers