LivePerson

LivePerson

LivePerson is a global leader in Conversational AI, offering real-time intelligent customer engagement solutions through their platform, LiveEngage, for over 18,000 clients worldwide.

Internet Software & Services
1K-5K
Founded 1995

Description

  • Collaborate with developers, QA, and product teams during sprint planning to understand release plans, dependencies, and infrastructure needs.
  • Participate in the application release cycle to ensure deployments are automated, consistent, and reliable.
  • Manage and operate Kubernetes clusters in Google Kubernetes Engine (GKE) and Amazon Elastic Kubernetes Service (EKS).
  • Develop and maintain Terraform modules for provisioning and configuring cloud infrastructure across GCP and AWS.
  • Standardize service deployments using Helm for templating and versioned releases.
  • Build and improve observability using Prometheus, Grafana, and Datadog to monitor platform and application performance.
  • Design, implement, and maintain GitLab CI/CD pipelines for build, test, and deployment automation.
  • Develop scripts and tooling in Python, Go, or Shell to reduce manual work and improve efficiency.
  • Participate in a 24/7 on-call rotation to detect, mitigate, and resolve incidents quickly.
  • Perform root cause analysis and contribute to post-incident reviews to prevent recurrence.
  • Identify reliability and scalability gaps early and partner with teams to address systemic risks.

Requirements

  • 5-8 years of experience as a Site Reliability Engineer, Platform Engineer, or DevOps Engineer.
  • Hands-on experience managing Kubernetes clusters in GKE and EKS on GCP and AWS.
  • Strong knowledge of Terraform, Helm, and GitLab CI/CD pipelines.
  • Proficiency in Python, Go, or Shell scripting for automation and tooling.
  • Experience implementing and managing observability stacks such as Prometheus, Grafana, and Datadog.
  • Deep understanding of Linux systems, cloud networking, and container orchestration concepts.
  • Experience working in Agile/Scrum environments and partnering closely with developers.
  • Excellent analytical skills with a proactive attitude and the ability to question assumptions and escalate risks early.
  • Experience with ArgoCD or Flux is preferred.
  • Familiarity with service mesh tools such as Istio or Linkerd, or with API gateways, is preferred.
  • Knowledge of cloud cost optimization, autoscaling, or security best practices is preferred.
  • Experience with incident management tools such as PagerDuty or ServiceNow is preferred.

Benefits

  • Flexible working arrangements, including remote work in India.
  • Competitive compensation.
  • 15 days of PTO plus casual leave and sick leave.
  • 8 lakhs family floater insurance coverage.
  • Personal accident and life insurance coverage worth 3x gross annual salary.
  • Career growth opportunities, including certifications and mentorship.
  • A collaborative, global team culture that values ownership, learning, and continuous improvement.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer

Cribl 251-1K IT Services

Cribl is hiring a Senior Site Reliability Engineer in Poland to help build and operate the telemetry infrastructure and observability platform that supports its cloud products and enterprise customers.

Ansible AWS Azure CI/CD Grafana JavaScript Kibana Linux New Relic Node.js PagerDuty Prometheus Splunk Terraform TypeScript
2 hours, 14 minutes ago

Site Reliability Engineer

Kaseya 1K-5K IT Services

Kaseya is hiring a Site Reliability Engineer to own the reliability, automation, and production stability of its AWS-based services used by thousands of MSPs worldwide.

Ansible AWS Chef CloudFormation Datadog DevSecOps Elasticsearch Kibana Kubernetes MySQL PostgreSQL Puppet Secrets Management Serverless Terraform
8 hours, 33 minutes ago

Senior Site Reliability Engineer (Remote - Brazil)

Loadsmart 251-1K Air Freight & Logistics

Loadsmart is hiring a Senior Site Reliability Engineer in Brazil to build and maintain its internal platform and ensure the reliability, safety, and operational excellence of critical engineering systems.

Ansible AWS Bash Chef CI/CD Docker Kubernetes PostgreSQL Python Terraform
8 hours, 54 minutes ago

Site Reliability Engineering Manager

RapidSOS 51-250 Diversified Telecommunication Services

RapidSOS is seeking an SRE Manager to lead its SRE Operations team and own the reliability of critical cloud infrastructure that supports real-time emergency response.

Argo CD AWS Datadog GitHub Actions Helm Kubernetes Python RabbitMQ Terraform
1 day, 3 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers