Twilio

Twilio

Twilio is a cloud communication company that offers Communication APIs for SMS, Voice, Video, and Authentication, empowering developers to embed communication capabilities into their software applications globally.

Diversified Telecommunication Services
5K-10K
Founded 2008

Description

  • Partner with senior technical leaders to define and communicate the reliability strategy and measurable outcomes.
  • Influence company-wide architectural decisions with attention to long-term vision, compliance, availability, performance, resilience, and cost efficiency.
  • Lead the design, implementation, and operation of scalable reliability solutions and paved roads for high-traffic services.
  • Define fault-tolerant architectures, incident response approaches, disaster recovery plans, and capacity and cost management practices.
  • Collaborate with product and cross-functional teams to identify reliability risks and translate them into designs, programs, and tooling.
  • Establish and champion reliability practices and drive systemic improvements across engineering teams.
  • Mentor and grow engineers and technical leaders.
  • Track emerging SRE, cloud, and large-scale systems best practices and introduce practical innovations that improve reliability at scale.

Requirements

  • 15+ years of experience in Reliability Engineering, Software Engineering, or DevOps with a focus on infrastructure, backend systems, and reliability, including principal or architect-level experience.
  • Strong experience driving strategic technical decisions and defining long-term technical vision.
  • Deep understanding of Reliability Engineering in a large, diverse SaaS organization.
  • Experience driving cross-organization technical architecture outcomes.
  • Knowledge of cloud architecture, DevOps practices, and large-scale microservices systems design.
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field, or equivalent experience.
  • Strong production experience with operational management, scaling, partitioning strategies, and performance and reliability tuning in high-scale environments.
  • Hands-on experience with Kubernetes, such as EKS, deploying and managing stateful services, and cloud services like AWS.
  • Proficiency with infrastructure-as-code tools such as Terraform or CloudFormation.
  • Experience with observability tools such as Prometheus, Grafana, or Datadog for monitoring and alerting.
  • Proficiency in at least one programming language such as Go, Python, or Java for automation and tooling.
  • Experience designing incident response processes, SLOs/SLIs, runbooks, and participating in on-call rotations.
  • Experience running cross-functional post-incident reviews and driving improvements.
  • Strong understanding of distributed systems principles, including consensus, durability, throughput, and availability tradeoffs.
  • Proven track record of leading reliability improvements in data-intensive or mission-critical systems.
  • Excellent problem-solving, analytical, verbal, and written communication skills in cross-functional and distributed environments.
  • Demonstrated leadership in mentoring teams, influencing decisions, and balancing long-term and short-term needs.
  • Ability to build effective working relationships across all levels of the organization.
  • Preferred: experience owning and operating large AWS footprints.
  • Preferred: knowledge of Kubernetes architecture and concepts.
  • Preferred: experience with Apache Kafka, AWS MSK, or similar streaming technologies.
  • Preferred: prior work on high-availability systems and a strong passion for building reliable products.

Benefits

  • Competitive pay with location-based salary ranges.
  • Eligibility for Twilio’s equity plan and corporate bonus plan.
  • Healthcare coverage and health care insurance.
  • 401(k) retirement savings program.
  • Generous time off, including paid sick time and paid personal time off.
  • Paid parental leave.
  • Ample parental and wellness leave.
  • Remote-first work with flexibility to work from the East Coast, USA, Ireland, the UK, or Spain.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

System Analyst

Renmoney 251-1K Banks

System Analyst at a fintech company working across Product, Engineering, and Delivery to turn ideas into feasible, production-ready solutions for products reaching millions in Nigeria.

System Design
2 minutes ago

Lead CDP Architect (Contract)

Bounteous 1K-5K Internet Software & Services

Bounteous is hiring a remote Lead CDP Architect to design and deliver Adobe Real-Time CDP implementations for clients, with a focus on scalable data architecture, governance, and secure customer data use.

AWS Azure GCP Power BI Snowflake Tableau
17 minutes ago

Lead Engineer / Payment Architect

Your Business Diversified Financial Services

A remote contract role for a payments architecture leader to modernize a high-growth digital commerce platform’s payment orchestration, gateway integrations, and checkout infrastructure.

REST API
32 minutes ago

IAM Technical Architect, Professional Services

Saviynt 251-1K Internet Software & Services

Saviynt is hiring an IAM Technical Architect to design, deploy, and implement its identity governance solution for enterprise customers in a professional services role.

Java Perl REST API SOAP SQL
47 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers