Tyk API Management

Tyk API Management

Tyk is a leading API Management Platform that enables interconnectivity between systems and devices through its fast, scalable, and open-source API Gateway, Analytics, Dev Portal, and Dashboard.

Internet Software & Services
51-250
Founded 2015
$40M raised

Description

  • Lead hands-on maintenance and optimization of the global cloud platform within defined SLAs, SLOs, and SLIs.
  • Collaborate with the SRE team to shape strategy and translate it into actionable technical plans.
  • Identify reliability issues, perform root cause analysis, and implement corrective solutions with the squad.
  • Lead performance tuning and fault-finding using OS and application metrics.
  • Design and implement automation for operational tasks and cloud operations workflows.
  • Develop monitoring, alerting, dashboards, and KPIs to improve platform visibility and response.
  • Participate in on-call rotation and support effective incident response, resolution, and postmortems.
  • Document operational findings, maintain runbooks, and drive continuous improvement across processes and practices.
  • Support multi-region and multi-cloud expansion with a focus on scalability and automation.
  • Engage with commercial teams on growth plans and translate them into technical SRE strategy.
  • Coordinate penetration testing and plan software upgrades to improve cloud services.

Requirements

  • Experience in an SRE role.
  • Strong knowledge of cloud technologies and SLA, SLO, and SLI management.
  • Experience with software design, automation, and root cause analysis.
  • Experience supporting production systems on-call with a customer-focused mindset.
  • Excellent communication and leadership skills.
  • Ability to analyze and improve operational processes and performance metrics.
  • Hands-on experience launching and operating production Kubernetes clusters.
  • Experience designing and operating infrastructure on AWS and other cloud providers.
  • Experience operating MongoDB or another document database, Redis or another key-value store, and Linux servers.
  • Experience with Prometheus, Grafana, and logging collection/analysis systems.
  • Advanced knowledge of Go, AWS/EKS, and Linux.
  • Proficient with Terraform and infrastructure as code, plus Helm.
  • Familiarity with monitoring tools such as Prometheus, Grafana, and Thanos.
  • Strong grasp of networking concepts and protocols such as DNS, TCP/IP, HTTP, TLS, UDP, subnets, routing, peering, load balancing, and NAT.
  • Ability to participate in the on-call rotation, including early-morning coverage from 4:00am to 16:00pm UTC.
  • Proactive, energetic, innovative, and change-oriented, with a desire to lead or mentor a team.

Benefits

  • Unlimited paid holidays.
  • Remote working from anywhere in the world.
  • Flexible working hours.
  • Employee share scheme.
  • Generous maternity and paternity leave.
  • Volunteering days.
  • Employee wellbeing platform.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

TextNow 51-250 Wireless Telecommunication Services

TextNow is hiring a remote Site Reliability Engineer in Canada to own infrastructure, monitoring, logging, CI/CD, and reliability for the systems supporting its free phone service platform.

Ansible AWS CI/CD GitHub System Design Terraform
6 hours, 32 minutes ago

Senior Application Engineer

Warner Music Group is hiring a Senior Application Engineer to support, improve, and modernize the software systems behind its global music operations.

Angular AWS CI/CD GitHub Actions Java Oracle PostgreSQL Python React SQL
6 hours, 47 minutes ago

Site Reliability Engineer - Backstage

Spotify Media

Site Reliability Engineer for Spotify’s Backstage team in New York City, focused on building and operating cloud infrastructure for an external developer portal and internal AI-driven coding workflows.

AWS GCP Go Java LLM Microservices Python React Terraform TypeScript
8 hours, 2 minutes ago

Blockchain Site Reliability Engineer

InfStones 51-250 Internet Software & Services

InfStones is hiring a remote Blockchain Site Reliability Engineer in Dallas to ensure the reliability, availability, and performance of its blockchain node infrastructure.

Docker Ethereum Go Grafana JavaScript Kubernetes Linux Prometheus Python Rust Solana
8 hours, 47 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers