Tyk API Management

Tyk API Management

Tyk is a leading API Management Platform that enables interconnectivity between systems and devices through its fast, scalable, and open-source API Gateway, Analytics, Dev Portal, and Dashboard.

Internet Software & Services
51-250
Founded 2015
$40M raised

Description

  • Lead hands-on maintenance and optimization of the global cloud platform within defined SLAs, SLOs, and SLIs.
  • Collaborate with the SRE team to shape strategy and translate it into actionable technical plans.
  • Identify reliability issues, perform root cause analysis, and implement corrective solutions with the squad.
  • Lead performance tuning and fault-finding using OS and application metrics.
  • Design and implement automation for operational tasks and cloud operations workflows.
  • Develop monitoring, alerting, dashboards, and KPIs to improve platform visibility and response.
  • Participate in on-call rotation and support effective incident response, resolution, and postmortems.
  • Document operational findings, maintain runbooks, and drive continuous improvement across processes and practices.
  • Support multi-region and multi-cloud expansion with a focus on scalability and automation.
  • Engage with commercial teams on growth plans and translate them into technical SRE strategy.
  • Coordinate penetration testing and plan software upgrades to improve cloud services.

Requirements

  • Experience in an SRE role.
  • Strong knowledge of cloud technologies and SLA, SLO, and SLI management.
  • Experience with software design, automation, and root cause analysis.
  • Experience supporting production systems on-call with a customer-focused mindset.
  • Excellent communication and leadership skills.
  • Ability to analyze and improve operational processes and performance metrics.
  • Hands-on experience launching and operating production Kubernetes clusters.
  • Experience designing and operating infrastructure on AWS and other cloud providers.
  • Experience operating MongoDB or another document database, Redis or another key-value store, and Linux servers.
  • Experience with Prometheus, Grafana, and logging collection/analysis systems.
  • Advanced knowledge of Go, AWS/EKS, and Linux.
  • Proficient with Terraform and infrastructure as code, plus Helm.
  • Familiarity with monitoring tools such as Prometheus, Grafana, and Thanos.
  • Strong grasp of networking concepts and protocols such as DNS, TCP/IP, HTTP, TLS, UDP, subnets, routing, peering, load balancing, and NAT.
  • Ability to participate in the on-call rotation, including early-morning coverage from 4:00am to 16:00pm UTC.
  • Proactive, energetic, innovative, and change-oriented, with a desire to lead or mentor a team.

Benefits

  • Unlimited paid holidays.
  • Remote working from anywhere in the world.
  • Flexible working hours.
  • Employee share scheme.
  • Generous maternity and paternity leave.
  • Volunteering days.
  • Employee wellbeing platform.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Manager, Engineering

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Manager, Engineering for Application Security to lead global programs that improve product security, reliability, and operational efficiency across its cloud platform.

Agile AWS C++ Docker GCP Java Kafka Kubernetes OWASP Ruby Scala SIEM
17 hours, 30 minutes ago

Staff Software Engineer - Databases SRE | Sweden | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer, SRE to improve the reliability and scalability of Grafana Cloud’s database products for high-value customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Microservices Python Terraform
1 day, 16 hours ago

Senior Site Reliability Engineer (SRE)

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a Senior Site Reliability Engineer to own the reliability and operational excellence of business-critical production systems for international clients in a remote, collaborative environment.

AWS Datadog Go Heroku Kubernetes PostgreSQL Python SQL Server TypeScript
1 day, 17 hours ago

Staff Software Engineer - Databases SRE | Spain | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer - SRE to strengthen the reliability of its cloud database products for high-SLA customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Python Terraform
1 day, 17 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers