Sumo Logic

Sumo Logic

Sumo Logic offers top-tier cloud monitoring, log management, and Cloud SIEM tools for web and SaaS apps, empowering businesses with real-time insights and high-quality software delivery.

Internet Software & Services
251-1K
Founded 2010

Description

  • Improve the lifecycle of microservices and architectural components from design through deployment, operation, and refinement.
  • Define, evolve, and manage service-level objectives (SLOs).
  • Write code and automation to reduce operational workload, eliminate toil, and improve efficiency and security.
  • Scale systems sustainably through automation and reliability improvements.
  • Facilitate blame-free root cause analysis meetings for incidents and drive follow-up improvements.
  • Participate in and improve global incident response coordination across products.
  • Drive root cause identification and issue resolution across teams.
  • Collaborate with multiple teams to optimize the operations of their microservices.
  • Work in a fast-paced, iterative environment.

Requirements

  • 1+ years of industry experience.
  • Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or another scientific or technical discipline.
  • Cloud-native application development experience using best practices and design patterns.
  • Strong debugging and troubleshooting skills across the full technology stack.
  • Understanding of AWS networking, compute, storage, and managed services.
  • Experience with modern CI/CD and infrastructure tooling such as Kubernetes, Terraform, Ansible, and Jenkins.
  • Experience with Infrastructure as Code using Terraform or CloudFormation.
  • Experience supporting services through the full lifecycle from creation to production support.
  • Ability to write production-ready code in at least one of Java, Scala, or Go.
  • Experience with Linux systems and comfort using the command line.
  • Understanding and application of modern cloud-native software security approaches.
  • Experience working in agile frameworks such as Scrum and Kanban.
  • Flexibility to step into new roles and responsibilities.
  • Willingness to learn and use Sumo Logic products to solve reliability and security issues.
  • Preferred: Experience using Sumo Logic or other observability products for reliability and security.
  • Preferred: Experience with planet-scale product development.
  • Preferred: Experience operating SaaS products on AWS with expert-level proficiency.
  • Preferred: Experience with streaming technologies such as Kafka, Kafka Streams, or KSQL.
  • Preferred: Advanced experience in one or more of Java, Go, Scala, or Python.
  • Preferred: Advanced experience in one or more of Terraform, Jenkins, or Kubernetes.
  • Preferred: Extensive experience running and tuning JVM workloads at scale.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Operations Engineer

Mozilla 251-1K Internet Software & Services

Mozilla is hiring a Staff Operations Engineer to lead the design, reliability, and evolution of hybrid-cloud and workplace infrastructure across teams.

Ansible DNS Linux Puppet Python TCP/IP Unix
6 hours, 13 minutes ago

Principal Site Reliability Engineer (SRE)

Symmetrio Professional Services

Symmetrio is recruiting a Principal Site Reliability Engineer for a rapidly growing healthcare technology company to own the reliability, scalability, security, and performance of a mission-critical SaaS platform used by healthcare providers across the United States.

Active Directory AWS CI/CD Datadog Django Grafana Kubernetes Python Terraform Windows Server
6 hours, 28 minutes ago

Performance Test Engineer Lead

PartnerOne 51-250 Media

An enterprise performance engineering role at a cloud-focused organization, responsible for validating the scalability, stability, and production readiness of distributed systems across Azure and hybrid environments.

Azure CI/CD Kubernetes PowerShell
6 hours, 43 minutes ago

Site Reliability Engineer

MLabs 11-50 Internet Software & Services

Remote UK-hours Site Reliability Engineering role at a financial technology company, focused on automating and operating the infrastructure that supports global integration services for financial institutions.

Active Directory Ansible AWS CI/CD GCP OAuth PostgreSQL SAML
6 hours, 58 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers