Cribl

Cribl

Cribl provides a unified data management platform specifically designed for IT and security data, enabling users to explore, collect, process, and access their data at scale while offering enhanced control and flexibility in managing their data workflows.

IT Services
251-1K
Founded 2018
$402M raised

Description

  • Engage with engineering teams to improve service delivery and reliability across the full lifecycle of services.
  • Measure and monitor production systems for availability, latency, and overall system health.
  • Investigate errors and instability in production cloud services and drive operational improvements.
  • Partner with product and platform teams to improve reliability, resilience, and observability.
  • Reduce operational toil through automation and creative problem-solving.
  • Contribute to design, development, testing, deployment, and shipping of Cribl products.
  • Provide input on cloud architecture, scaling, high availability, and reliability decisions.
  • Participate in standby, on-call, or off-hours support as needed.

Requirements

  • Proven experience designing, implementing, and operating observability systems for complex cloud-based platforms.
  • Experience with configuration management and infrastructure as code tools such as Terraform (preferred) or Ansible.
  • Experience working with cloud SDKs is a plus.
  • Knowledge of cloud platforms, preferably AWS and Azure, and container plus orchestration technologies.
  • Experience with APM and observability tools such as New Relic, Splunk, CloudWatch, Prometheus, Grafana/Kibana, or Sentry.
  • Extensive experience with enterprise-scale continuous delivery environments.
  • Development experience with JavaScript, Node.js, or TypeScript in a Linux or Mac environment.
  • Experience with sustainable, blameless incident response.
  • Background in Linux systems engineering.
  • Experience with incident response tools such as PagerDuty, FireHydrant, or Blameless.
  • Comfort working with a high degree of autonomy and a distributed team.
  • Knowledge of cloud and application security best practices.
  • Strong knowledge of cloud design patterns for scale, data management, and resiliency.
  • A commitment to high-quality software and testing.
  • Strong opinions about business metrics and SLOs.

Benefits

  • Remote-first work environment with the role based remotely within Poland.
  • Opportunity to work on a fast-growing, mission-driven platform used by major enterprise customers, including half of the Fortune 100.
  • Collaborative global team culture that values curiosity and ownership.
  • Inclusive workplace that supports diversity and welcomes applicants from all backgrounds.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

Kaseya 1K-5K IT Services

Kaseya is hiring a Site Reliability Engineer to own the reliability, automation, and production stability of its AWS-based services used by thousands of MSPs worldwide.

Ansible AWS Chef CloudFormation Datadog DevSecOps Elasticsearch Kibana Kubernetes MySQL PostgreSQL Puppet Secrets Management Serverless Terraform
11 hours, 33 minutes ago

Site Reliability Engineer

Obsidian Security 51-250 Internet Software & Services

Obsidian Security is hiring a Site Reliability Engineer in the UK to help ensure the reliability, scalability, and operational excellence of its multi-tenant SaaS platform for enterprise and financial customers.

Argo CD AWS Datadog GCP GitHub Actions GitOps Grafana Helm Kubernetes Microservices Prometheus
12 hours, 55 minutes ago

Senior Site Reliability Engineer (SRE) - (GCP)

Devsu 51-250 Internet Software & Services

Devsu is hiring a Site Reliability Engineer to own monitoring, observability, and reliability operations for systems running across on-premises infrastructure and Google Cloud Platform, with backup support for application incidents when needed.

Bash GCP Grafana Kubernetes Linux PagerDuty Prometheus Python
15 hours, 55 minutes ago

Vice President Site Reliability Engineering (Data Centers)

Galaxy 251-1K Capital Markets

Galaxy is hiring a Site Reliability Engineering leader to own enterprise automation and infrastructure platform reliability across a hybrid environment supporting digital assets, data center operations, and AI-related compute.

Active Directory Ansible AWS Azure Bash Git GitHub Actions GitLab CI Go Grafana Jenkins Linux Packer Palo Alto PowerShell Prometheus Python Splunk Terraform Windows Server
18 hours, 54 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers