Sumo Logic

Sumo Logic

Sumo Logic offers top-tier cloud monitoring, log management, and Cloud SIEM tools for web and SaaS apps, empowering businesses with real-time insights and high-quality software delivery.

Internet Software & Services
251-1K
Founded 2010

Description

  • Improve the lifecycle of microservices and related architectural components from design through deployment, operation, and refinement.
  • Define, evolve, and manage service level objectives (SLOs).
  • Write code and automation to reduce operational workload, improve efficiency, strengthen security posture, and eliminate toil.
  • Scale systems sustainably through automation and reliability-focused improvements.
  • Facilitate blame-free root cause analysis meetings and drive learning from incidents.
  • Participate in and improve global incident response coordination across products.
  • Drive root cause identification and issue resolution with cross-functional teams.
  • Work closely with multiple teams to optimize the operations of their microservices.
  • Operate in a fast-paced, iterative environment.

Requirements

  • 6+ years of industry experience.
  • Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or another scientific or technical discipline.
  • Cloud-native application development experience using best practices and design patterns.
  • Strong debugging and troubleshooting skills across the full technology stack.
  • Deep understanding of AWS networking, compute, storage, and managed services.
  • Experience with modern CI/CD tooling such as Kubernetes, Terraform, Ansible, and Jenkins.
  • Experience with full lifecycle support of services, from creation to production support.
  • Infrastructure as Code experience with tools such as Terraform or AWS CloudFormation.
  • Ability to author production-ready code in at least one of Java, Scala, or Go.
  • Experience with Linux systems and command-line work.
  • Understanding of modern cloud-native software security practices.
  • Experience with agile frameworks such as Scrum and Kanban.
  • Flexibility to step into new roles and responsibilities.
  • Willingness to learn and use Sumo Logic products to solve reliability and security issues.
  • Preferred: experience using Sumo Logic or other observability products for reliability and security.
  • Preferred: experience with planet-scale product development.
  • Preferred: expert-level experience running and operating SaaS products on AWS.
  • Preferred: experience with streaming technologies such as Kafka, Kafka Streams, or KSQL.
  • Preferred: expert-level experience in one or more of Java, Go, Scala, or Python.
  • Preferred: expert-level experience in one or more of Terraform, Jenkins, or Kubernetes.
  • Preferred: extensive experience running and tuning JVM workloads at scale.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Software Engineer - Databases SRE | Sweden | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer, SRE to improve the reliability and scalability of Grafana Cloud’s database products for high-value customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Microservices Python Terraform
1 day, 13 hours ago

Senior Site Reliability Engineer (SRE)

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a Senior Site Reliability Engineer to own the reliability and operational excellence of business-critical production systems for international clients in a remote, collaborative environment.

AWS Datadog Go Heroku Kubernetes PostgreSQL Python SQL Server TypeScript
1 day, 13 hours ago

Staff Software Engineer - Databases SRE | Spain | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer - SRE to strengthen the reliability of its cloud database products for high-SLA customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Python Terraform
1 day, 13 hours ago

Site Reliability Engineer IV

OpenX 51-250 Media

OpenX is hiring a Senior Cloud SRE in Poland to ensure the performance, uptime, and growth of large-scale Google Cloud Platform systems serving globally distributed teams.

AWS Docker GCP Go Java Kubernetes Load Balancing Prometheus Python Shell Scripting Terraform
1 day, 14 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers