Fable

Fable

Fable is a leading accessibility platform that offers digital accessibility testing and custom training powered by people with disabilities. They help digital teams improve accessibility for over 1 billion people with disabilities, moving organizations...

Professional Services
11-50
Founded 2018
$12M raised

Description

  • Design, build, and maintain reliable, scalable, and secure infrastructure for product services.
  • Improve observability, monitoring, and alerting to support high availability and fast incident response.
  • Develop and evolve SRE practices, including SLIs/SLOs, incident management, and postmortems.
  • Support and improve CI/CD pipelines and deployment processes.
  • Identify and reduce operational complexity across systems and tooling.
  • Diagnose and resolve reliability and performance issues across infrastructure and application layers, including targeted application code changes when needed.
  • Support infrastructure and platform capabilities for AI/ML-powered features, including scaling, performance, and reliability considerations.
  • Monitor, optimize, and forecast infrastructure costs and capacity across cloud environments.
  • Work with vendors and tools that support infrastructure and operations, including troubleshooting and service improvements.
  • Partner with Engineering and Product teams to improve reliability, observability, operational readiness, and production support.

Requirements

  • 5–8+ years of experience in Site Reliability Engineering, DevOps, Infrastructure Engineering, or Platform Engineering.
  • Strong experience with cloud infrastructure such as AWS, GCP, or Azure.
  • Experience building internal platforms, tooling, or shared services that improve developer productivity and system reliability.
  • Experience designing systems that bridge infrastructure and application layers.
  • Comfort reading, debugging, and making changes to application code when needed to improve reliability, performance, or observability.
  • Experience with at least one backend programming language such as Node.js, Python, Go, or Java.
  • Strong experience with monitoring, observability, and alerting tools such as Datadog, Prometheus, or Grafana.
  • Solid understanding of CI/CD systems and modern deployment practices.
  • Experience managing infrastructure as code such as Terraform or CloudFormation.
  • Experience optimizing system performance and infrastructure costs.
  • Familiarity with security and compliance considerations in cloud environments.
  • Experience working with third-party vendors and infrastructure tools.
  • Familiarity with infrastructure considerations for AI/ML workloads is a strong asset.
  • Curiosity about emerging technologies and their impact on infrastructure, reliability, and cost at scale.
  • Strong problem-solving skills and the ability to navigate complex systems.
  • Excellent collaboration and communication skills.
  • Experience contributing to platform engineering initiatives, such as internal developer platforms or self-serve infrastructure, is nice to have.
  • Experience improving developer experience (DX) is nice to have.
  • Experience with SLIs/SLOs and reliability engineering practices is nice to have.
  • Experience mentoring or supporting other engineers is nice to have.

Benefits

  • Stock options.
  • Career growth opportunities.
  • Professional development support.
  • Health and dental coverage.
  • Collaborative, mission-driven environment focused on accessibility and inclusion.
  • Salary range of $130,000–$150,000.
  • Accessibility accommodations during the hiring process and employment.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Technical Lead Manager, Release Health

Waymo Autonomous vehicles, robotics, AI, ride-hailing / mobility tech

Waymo is hiring a Systems Engineering leader to run the Behavior Test Engineering Release Health Team and help ensure the reliability and performance of its autonomous driving testing infrastructure.

1 hour, 49 minutes ago

Senior Database Reliability Engineer

Rithum Internet Software & Services

Rithum is hiring a Senior Database Reliability Engineer to manage and improve the availability, reliability, observability, and security of its large-scale hybrid database environment.

AWS CI/CD DynamoDB Elasticsearch MongoDB MySQL PostgreSQL PowerShell Python Redis SQL Server
2 hours, 55 minutes ago

[Job-28831] Senior DevOps / SRE, Brazil

CI&T 5K-10K Internet Software & Services

CI&T is hiring a Senior DevOps/SRE for its Flow AI platform team in Brazil to build and evolve an Internal Developer Platform that enables teams to consume infrastructure and services through secure, standardized self-service.

CI/CD GitHub Actions GitOps Helm Kubernetes Python Solid.js Terraform
3 hours, 10 minutes ago

Staff Site Reliability Engineer Storage

Qonto 1K-5K Banks

Qonto is hiring a Staff Site Reliability Engineer for its storage platform to ensure the reliability and safe operation of critical PostgreSQL, Kafka, and Redis systems as the company scales toward banking-grade resilience.

AWS Kafka Kubernetes PostgreSQL Redis Terraform
3 hours, 25 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers