Netomi

Netomi is an AI-first customer service platform revolutionizing customer support with industry-leading resolution rates and multilingual support.

IT Services

Information Technology

51-250 (131)

Founded 2015

$52M raised

32 open positions

Links

View All Jobs

Incident Engineer

1 hour, 11 minutes ago

India

Full-time

Mid Level

Site Reliability Engineer (SRE)

DevOps and Infrastructure

AWS Datadog LLM

Apply Now

Netomi

Netomi is an AI-first customer service platform revolutionizing customer support with industry-leading resolution rates and multilingual support.

IT Services

51-250

Founded 2015

$52M raised

View All Jobs 32

Description

Own the incident lifecycle from detection and triage through escalation, resolution, and postmortems.
Act as the central command during major incidents, including war rooms and stakeholder updates.
Define and enforce SLAs/SLOs, incident severity frameworks, and runbooks.
Collaborate with Engineering, ML, and Integrations teams to resolve issues quickly.
Monitor system health across integrations, including agent desks, LLMs, and ASR/TTS pipelines.
Drive root cause analysis and implement preventive actions.
Improve observability, alerting, and incident response tooling.
Maintain clear internal and customer-facing communication during incidents.

Requirements

3–6 years of experience in Incident Management, SRE, or Production Support roles.
Strong understanding of distributed systems, APIs, and cloud environments, especially AWS.
Experience with observability tools such as DataDog.
Familiarity with AI/ML systems, especially LLM integrations and voice stacks like ASR/TTS, is a plus.
Experience with monitoring and tracing tools such as Langfuse or similar.
Excellent communication and stakeholder management skills.
Ability to stay calm under pressure and drive structured resolution.
Exposure to OpenAI or similar LLM platforms is preferred.
Experience supporting customer-facing SaaS products is a plus.
An automation mindset, including runbooks, alert tuning, and incident tooling, is preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

Capital Markets Gateway 51-250 Capital Markets

Capital Markets Gateway LLC (CMG) is hiring a remote Site Reliability Engineer in Latin America to strengthen the reliability, performance, and observability of its capital markets fintech platform used by buy-side firms and investment banks.

Mexico Argentina Brazil Colombia Ecuador Paraguay Peru Contract Senior Site Reliability Engineer (SRE)

Azure Bash Datadog Docker Elasticsearch GitHub Grafana GraphQL JIRA Kubernetes Linux Microservices .NET OpenTelemetry PostgreSQL Prometheus Python React Redis Terraform TypeScript

11 minutes ago

Apply

11 minutes ago

Staff Site Reliability Engineer (Platform Reliability)

Qonto 1K-5K Banks

Qonto is hiring a Staff Site Reliability Engineer to lead platform reliability work, shape infrastructure decisions, and help scale its cloud platform for millions of customers across Europe.

Serbia Spain France Germany Italy Full-time Lead Site Reliability Engineer (SRE)

Argo CD AWS Docker Elasticsearch GitLab CI GitOps Go Kafka Kubernetes Microservices OpenTelemetry OpsGenie PostgreSQL Prometheus Python Redis Terraform

41 minutes ago

Apply

41 minutes ago

Senior Cloud Performance Engineer

ClickHouse 51-250 IT Services

ClickHouse is hiring a Cloud Performance Engineering professional to help build and optimize its cloud-native ClickHouse Cloud platform for large-scale distributed systems, performance, and resilience work.

Singapore Full-time Senior Site Reliability Engineer (SRE)

AWS Azure ClickHouse EC2 GCP Go Java Kubernetes Serverless

1 hour, 41 minutes ago

Apply

1 hour, 41 minutes ago

Sr. Site Reliability Engineer

Backblaze 251-1K IT Services

Backblaze is seeking a Senior Site Reliability Engineer to improve the stability, scalability, and reliability of its customer-facing cloud storage services and infrastructure.

United States Full-time Lead Site Reliability Engineer (SRE)

$150k-$200k

Ansible AWS Azure Bash Docker ELK Stack GCP Go Grafana HashiCorp Vault Jenkins Kubernetes Linux Microservices Prometheus Python Terraform

3 hours, 26 minutes ago

Apply

3 hours, 26 minutes ago

Netomi

Tags

Links

Incident Engineer

Netomi

Description

Requirements

Similar Roles

Site Reliability Engineer

Staff Site Reliability Engineer (Platform Reliability)

Senior Cloud Performance Engineer

Sr. Site Reliability Engineer

You're on a roll! Sign up now to keep applying.