Senior Site Reliability Engineer Team Lead - OP02087

2 weeks, 6 days ago
Full-time
Lead
DevOps and Infrastructure
Dev.Pro

Dev.Pro

Dev.Pro is a globally distributed software development partner, specializing in custom outsourced software development for innovative technology companies to scale their businesses efficiently.

Internet Software & Services
251-1K
Founded 2011

Description

  • Lead the Cloud/SRE Support team with coaching, prioritization, and day-to-day oversight.
  • Drive team performance to ensure high-quality support, SLA compliance, and continuous improvement.
  • Coordinate with India-based and cross-functional teams to maintain alignment and 24/7 coverage.
  • Translate complex operational issues into actionable plans and scalable solutions.
  • Design and improve support processes and operational frameworks for the team.
  • Identify operational gaps and risks, and help improve team engagement and effectiveness.
  • Collaborate with cross-functional stakeholders to define priorities and communicate progress, risks, and solutions.
  • Oversee MDM operations, cloud and user access management, monitoring, incident handling, and root cause analysis.
  • Maintain documentation, runbooks, and escalation procedures.
  • Promote reliability best practices and customer-focused operational support.

Requirements

  • Based in Chile.
  • Upper-Intermediate English level.
  • 5+ years of experience in cloud operations, platform support, or IT operations.
  • 2+ years of experience leading technical support or SRE teams.
  • Strong operational support mindset, including incident handling, user requests, and escalation management.
  • Solid understanding of cloud technologies, monitoring, and observability tools.
  • Knowledge of incident management best practices and access management concepts.
  • Ability to break down complex problems into structured, actionable plans.
  • Strategic thinking to evaluate options, weigh tradeoffs, and design processes.
  • Experience collaborating with global, multi-time zone teams.
  • Strong communication skills for both technical and non-technical stakeholders.
  • Preferred experience leading MDM support teams such as Esper, MobileIron, or Workspace ONE.
  • Familiarity with cloud platforms such as Azure, GCP, or AWS.
  • Basic understanding of CI/CD pipelines, Docker, and Kubernetes.
  • Experience working with onshore/offshore teams, ideally in the U.S. and India.
  • Experience in 24/7 or follow-the-sun operational models.
  • Strong analytical and documentation skills, including process mapping and root cause analysis.

Benefits

  • 99.9% remote work with the ability to work from anywhere in the world.
  • 30 paid days off per year for vacation, holidays, or personal time.
  • 5 paid sick days, up to 60 days of medical leave, and up to 6 paid days off for major family events.
  • Partially covered health insurance after probation.
  • Wellness bonus for gym memberships, sports nutrition, and similar needs after 6 months.
  • Salary paid in U.S. dollars with all approved overtime covered.
  • English lessons and access to Dev.Pro University programs.
  • Online activities and team-building events.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer (Senior or Staff), Atlas

MongoDB 1K-5K Internet Software & Services

MongoDB is hiring a Senior Site Reliability Engineer for its Atlas team to help support, maintain, and grow a multi-cloud platform for customer-facing production workloads.

AWS Azure DNS GCP Go HTTP Linux Python Ruby TLS
4 hours, 4 minutes ago

Manager, Software Engineering (Resilience Engineering)

Affirm 1K-5K Diversified Financial Services

Affirm is seeking an Engineering Manager to lead its Resilience Engineering team, building production load testing and chaos engineering capabilities that improve the safety and reliability of production systems.

AWS Java Kotlin Kubernetes Microservices Python
4 hours, 14 minutes ago

Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

MongoDB 1K-5K Internet Software & Services

MongoDB’s Storage Layer Services team is hiring a Site Reliability Engineer to help re-architect the cloud storage layer for Atlas and ensure the reliability and operational safety of its distributed storage infrastructure.

AWS Azure DNS GCP Go Kubernetes Linux Python TCP/IP TLS
5 hours, 2 minutes ago

Manager, Software Engineering (Resilience Engineering)

Affirm 1K-5K Diversified Financial Services

Affirm is hiring an Engineering Manager to lead its Resilience Engineering team in building production load testing and chaos engineering capabilities that improve the safety and reliability of its production systems.

AWS Java Kotlin Kubernetes Python
7 hours, 18 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers