Cribl

Cribl provides a unified data management platform specifically designed for IT and security data, enabling users to explore, collect, process, and access their data at scale while offering enhanced control and flexibility in managing their data workflows.

IT Services

Information Technology

251-1K (1000)

Founded 2018

$402M raised

34 open positions

Links

View All Jobs

Senior Site Reliability Engineer

1 month, 3 weeks ago

Poland

Full-time

Senior

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Ansible AWS Azure CI/CD Grafana JavaScript Kibana Linux New Relic Node.js PagerDuty Prometheus Splunk Terraform TypeScript

Apply Now

Cribl

IT Services

251-1K

Founded 2018

$402M raised

View All Jobs 34

Description

Engage with engineering teams to improve service delivery and reliability across the full lifecycle of services.
Measure and monitor production systems for availability, latency, and overall system health.
Investigate errors and instability in production cloud services and drive operational improvements.
Partner with product and platform teams to improve reliability, resilience, and observability.
Reduce operational toil through automation and creative problem-solving.
Contribute to design, development, testing, deployment, and shipping of Cribl products.
Provide input on cloud architecture, scaling, high availability, and reliability decisions.
Participate in standby, on-call, or off-hours support as needed.

Requirements

Proven experience designing, implementing, and operating observability systems for complex cloud-based platforms.
Experience with configuration management and infrastructure as code tools such as Terraform (preferred) or Ansible.
Experience working with cloud SDKs is a plus.
Knowledge of cloud platforms, preferably AWS and Azure, and container plus orchestration technologies.
Experience with APM and observability tools such as New Relic, Splunk, CloudWatch, Prometheus, Grafana/Kibana, or Sentry.
Extensive experience with enterprise-scale continuous delivery environments.
Development experience with JavaScript, Node.js, or TypeScript in a Linux or Mac environment.
Experience with sustainable, blameless incident response.
Background in Linux systems engineering.
Experience with incident response tools such as PagerDuty, FireHydrant, or Blameless.
Comfort working with a high degree of autonomy and a distributed team.
Knowledge of cloud and application security best practices.
Strong knowledge of cloud design patterns for scale, data management, and resiliency.
A commitment to high-quality software and testing.
Strong opinions about business metrics and SLOs.

Benefits

Remote-first work environment with the role based remotely within Poland.
Opportunity to work on a fast-growing, mission-driven platform used by major enterprise customers, including half of the Fortune 100.
Collaborative global team culture that values curiosity and ownership.
Inclusive workplace that supports diversity and welcomes applicants from all backgrounds.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer 2 (Azure)

PhonePe 5K-10K Capital Markets

PhonePe Limited is hiring a Site Reliability Engineer to manage and scale core cloud infrastructure for a high-volume digital payments environment in India.

India Full-time Senior Site Reliability Engineer (SRE)

Ansible Azure Bash DNS Docker Go Grafana HAProxy InfluxDB Java Linux MySQL Nginx Prometheus Python RabbitMQ SaltStack Terraform Ubuntu

17 hours, 51 minutes ago

Apply

17 hours, 51 minutes ago

Sr. Control System Engineer/Site Reliability Engineer (SRE)

QuEra Computing 11-50 Internet Software & Services

QuEra is seeking a Sr. Control System Engineer/Site Reliability Engineer to integrate and maintain the hardware and software systems that support its quantum control stack and keep development and production environments reliable.

Japan Full-time Senior Site Reliability Engineer (SRE)

Ansible Bash CI/CD Debian DHCP DNS Docker ELK Stack Embedded Systems Git GitLab CI Go Grafana Jenkins Kubernetes Linux Prometheus Python TCP/IP Terraform Ubuntu

18 hours, 21 minutes ago

Apply

18 hours, 21 minutes ago

Incident Commander

PENN Entertainment 10K-50K Hotels, Restaurants & Leisure

PENN Interactive is hiring an Incident Commander to join its site reliability team and lead cross-functional incident response for its online and physical platforms.

Canada Full-time Senior Site Reliability Engineer (SRE)

$67k-$102k

Ansible AWS Docker Elasticsearch GCP Helm JIRA Kafka Kubernetes Linux MySQL PostgreSQL Prometheus Python Redis Terraform

18 hours, 36 minutes ago

Apply

18 hours, 36 minutes ago

Site Reliability Engineer

VantageScore 11-50 Banks

Site Reliability Engineer at a growing engineering team, focused on DevSecOps for maintaining the reliability, security, and compliance of cloud infrastructure, APIs, and software supply chains.

United States Full-time Senior Site Reliability Engineer (SRE)

$150k-$150k

Agile AWS AWS CDK Bash CI/CD CloudFormation CodePipeline Datadog DevSecOps Docker EC2 GitHub Actions Grafana HashiCorp Vault Kong Kubernetes Microservices Python REST API Scrum Terraform

1 day, 18 hours ago

Apply

1 day, 18 hours ago

Cribl

Tags

Links

Senior Site Reliability Engineer

Cribl

Description

Requirements

Benefits

Similar Roles

Site Reliability Engineer 2 (Azure)

Sr. Control System Engineer/Site Reliability Engineer (SRE)

Incident Commander

Site Reliability Engineer

You're on a roll! Sign up now to keep applying.