Cribl

Cribl

Cribl provides a unified data management platform specifically designed for IT and security data, enabling users to explore, collect, process, and access their data at scale while offering enhanced control and flexibility in managing their data workflows.

IT Services
251-1K
Founded 2018
$402M raised

Description

  • Engage with engineering teams to improve service delivery and reliability across the full lifecycle of services.
  • Measure and monitor production systems for availability, latency, and overall system health.
  • Investigate errors and instability in production cloud services and drive operational improvements.
  • Partner with product and platform teams to improve reliability, resilience, and observability.
  • Reduce operational toil through automation and creative problem-solving.
  • Contribute to design, development, testing, deployment, and shipping of Cribl products.
  • Provide input on cloud architecture, scaling, high availability, and reliability decisions.
  • Participate in standby, on-call, or off-hours support as needed.

Requirements

  • Proven experience designing, implementing, and operating observability systems for complex cloud-based platforms.
  • Experience with configuration management and infrastructure as code tools such as Terraform (preferred) or Ansible.
  • Experience working with cloud SDKs is a plus.
  • Knowledge of cloud platforms, preferably AWS and Azure, and container plus orchestration technologies.
  • Experience with APM and observability tools such as New Relic, Splunk, CloudWatch, Prometheus, Grafana/Kibana, or Sentry.
  • Extensive experience with enterprise-scale continuous delivery environments.
  • Development experience with JavaScript, Node.js, or TypeScript in a Linux or Mac environment.
  • Experience with sustainable, blameless incident response.
  • Background in Linux systems engineering.
  • Experience with incident response tools such as PagerDuty, FireHydrant, or Blameless.
  • Comfort working with a high degree of autonomy and a distributed team.
  • Knowledge of cloud and application security best practices.
  • Strong knowledge of cloud design patterns for scale, data management, and resiliency.
  • A commitment to high-quality software and testing.
  • Strong opinions about business metrics and SLOs.

Benefits

  • Remote-first work environment with the role based remotely within Poland.
  • Opportunity to work on a fast-growing, mission-driven platform used by major enterprise customers, including half of the Fortune 100.
  • Collaborative global team culture that values curiosity and ownership.
  • Inclusive workplace that supports diversity and welcomes applicants from all backgrounds.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Operations Engineer

Mozilla 251-1K Internet Software & Services

Mozilla is hiring a Staff Operations Engineer to lead the design, reliability, and evolution of hybrid-cloud and workplace infrastructure across teams.

Ansible DNS Linux Puppet Python TCP/IP Unix
3 hours, 39 minutes ago

Principal Site Reliability Engineer (SRE)

Symmetrio Professional Services

Symmetrio is recruiting a Principal Site Reliability Engineer for a rapidly growing healthcare technology company to own the reliability, scalability, security, and performance of a mission-critical SaaS platform used by healthcare providers across the United States.

Active Directory AWS CI/CD Datadog Django Grafana Kubernetes Python Terraform Windows Server
3 hours, 54 minutes ago

Performance Test Engineer Lead

PartnerOne 51-250 Media

An enterprise performance engineering role at a cloud-focused organization, responsible for validating the scalability, stability, and production readiness of distributed systems across Azure and hybrid environments.

Azure CI/CD Kubernetes PowerShell
4 hours, 9 minutes ago

Site Reliability Engineer

MLabs 11-50 Internet Software & Services

Remote UK-hours Site Reliability Engineering role at a financial technology company, focused on automating and operating the infrastructure that supports global integration services for financial institutions.

Active Directory Ansible AWS CI/CD GCP OAuth PostgreSQL SAML
4 hours, 24 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers