Omilia

Omilia

Omilia is a global leader in Conversational AI, offering AI-based self-service solutions for enhanced customer care fulfillment and success.

IT Services
251-1K
Founded 2002
$20M raised

Description

  • Ensure reliability and availability across production and pre-production environments through proactive monitoring, alerting, and automation.
  • Act as first response for incidents and contribute to problem management and root cause analysis.
  • Support development teams in improving service reliability and building a reliability-focused culture in the software lifecycle.
  • Develop troubleshooting documentation and operational runbooks for production support.
  • Collaborate with engineering and cloud teams to automate operational tasks and improve delivery processes.
  • Design, implement, and evolve observability solutions using metrics, logs, traces, and dashboards.
  • Use tools such as Prometheus, Grafana, and ELK to monitor platform health and performance.
  • Participate in on-call rotations and improve alert quality and incident response processes.
  • Champion continuous improvement in reliability, performance, and operational practices across teams.

Requirements

  • Bachelor’s degree or MS in Engineering, or equivalent experience.
  • Experience operating at least one container orchestration cluster such as Kubernetes or Docker Swarm.
  • Experience developing or maintaining software for production services at scale.
  • Experience with ELK.
  • Experience with AWS.
  • Experience with Grafana and Prometheus.
  • Strong scripting skills in Bash, Python, or Go.
  • Excellent communication skills and ability to work collaboratively across teams.
  • Agile/lean mindset with a willingness to iterate, learn, and challenge existing approaches.
  • Nice to have: telephony knowledge including SIP and VoIP.
  • Nice to have: Linux administration experience with RedHat, CentOS, or AL.
  • Nice to have: configuration management experience with Terraform or Ansible.
  • Nice to have: knowledge of TCP/IP and general networking concepts.
  • Nice to have: RDBMS experience with MySQL or Postgres.
  • Nice to have: NoSQL experience with Redis.

Benefits

  • Fixed compensation.
  • Long-term employment with vacation days.
  • Professional development support including courses and training.
  • Opportunity to work on cutting-edge technology products with global impact.
  • Collaborative and fun team environment.
  • Apple gear provided.
  • Equal opportunity employer with a diverse and inclusive workplace.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Manager, Engineering

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Manager, Engineering for Application Security to lead global programs that improve product security, reliability, and operational efficiency across its cloud platform.

Agile AWS C++ Docker GCP Java Kafka Kubernetes OWASP Ruby Scala SIEM
12 hours, 44 minutes ago

Staff Software Engineer - Databases SRE | Sweden | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer, SRE to improve the reliability and scalability of Grafana Cloud’s database products for high-value customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Microservices Python Terraform
1 day, 11 hours ago

Senior Site Reliability Engineer (SRE)

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a Senior Site Reliability Engineer to own the reliability and operational excellence of business-critical production systems for international clients in a remote, collaborative environment.

AWS Datadog Go Heroku Kubernetes PostgreSQL Python SQL Server TypeScript
1 day, 12 hours ago

Staff Software Engineer - Databases SRE | Spain | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer - SRE to strengthen the reliability of its cloud database products for high-SLA customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Python Terraform
1 day, 12 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers