Site Reliability Engineer

2 hours, 10 minutes ago
Full-time
Senior
DevOps and Infrastructure
TextNow

TextNow

TextNow is a leading provider of free phone service, offering calling and texting through its app and SIM card. With a focus on affordability and innovation, TextNow is revolutionizing mobile phone service with cloud-based technology, providing users w...

Wireless Telecommunication Services
51-250
Founded 2009

Description

  • Design, build, and maintain scalable, resilient, highly available systems for TextNow’s infrastructure and services.
  • Develop and maintain infrastructure automation using Terraform, Ansible, and related tools.
  • Support cloud deployment, scaling, and operations for AWS-based systems.
  • Participate in an on-call rotation and respond to production incidents.
  • Troubleshoot issues, drive incident resolution, and reduce downtime.
  • Conduct post-mortems and implement corrective actions to improve reliability.
  • Implement and improve observability through logging, metrics, and monitoring solutions.
  • Collaborate with software engineers, DevOps, and product teams to improve reliability from development to production.
  • Identify opportunities to improve architecture, automation, and operational practices.
  • Contribute to the design and implementation of new SRE best practices.

Requirements

  • 5+ years of experience in an operationally focused role such as SRE, DevOps, or Infrastructure Engineering.
  • Deep understanding of reliability, scalability, and performance optimization.
  • Hands-on experience with AWS, GitHub, Terraform, Ansible, or similar tools.
  • Experience handling production incidents, performing root cause analysis, and implementing long-term fixes.
  • Strong focus on automation and scripting to reduce operational toil.
  • Experience building robust observability with logging, metrics, and monitoring tools.
  • Ability to work cross-functionally with engineers, product teams, and leadership.
  • Experience in a remote or distributed working environment is preferred.
  • Canada-based role with compensation listed in CAD and select USD markets.
  • Applicants must be eligible to work in the relevant hiring location.

Benefits

  • Competitive pay with a stated salary range of $113,400 - $162,000 CAD.
  • Employee stock options.
  • Unlimited vacation and 12 paid holidays per year.
  • Flexible work arrangements, including work-from-home, remote, or office access.
  • Health, dental, and vision benefits.
  • Short-term and long-term disability coverage.
  • $750 annual wellness benefit or healthcare spending account.
  • RRSP matching in Canada or 401(k) in the USA.
  • Parental leave for eligible employees.
  • Learning and development opportunities.
  • Free phone service.
  • Strong work-life blend.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Application Engineer

Warner Music Group is hiring a Senior Application Engineer to support, improve, and modernize the software systems behind its global music operations.

Angular AWS CI/CD GitHub Actions Java Oracle PostgreSQL Python React SQL
2 hours, 25 minutes ago

Site Reliability Engineer - Backstage

Spotify Media

Site Reliability Engineer for Spotify’s Backstage team in New York City, focused on building and operating cloud infrastructure for an external developer portal and internal AI-driven coding workflows.

AWS GCP Go Java LLM Microservices Python React Terraform TypeScript
3 hours, 40 minutes ago

Blockchain Site Reliability Engineer

InfStones 51-250 Internet Software & Services

InfStones is hiring a remote Blockchain Site Reliability Engineer in Dallas to ensure the reliability, availability, and performance of its blockchain node infrastructure.

Docker Ethereum Go Grafana JavaScript Kubernetes Linux Prometheus Python Rust Solana
4 hours, 25 minutes ago

Lead Engineer - Platform Performance & Reliability

HighLevel 251-1K Internet Software & Services

HighLevel is hiring a Lead Engineer for its Platform Performance & Reliability team to improve the speed, stability, and operational health of a high-traffic global SaaS platform.

AWS ClickHouse Firestore GCP Grafana Kubernetes Microservices MongoDB MySQL Node.js OpenTelemetry PostgreSQL Prometheus Redis
5 hours, 10 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers