Omilia

Omilia

Omilia is a global leader in Conversational AI, offering AI-based self-service solutions for enhanced customer care fulfillment and success.

IT Services
251-1K
Founded 2002
$20M raised

Description

  • Ensure platform reliability and availability across production and pre-production environments through proactive monitoring, alerting, and automation.
  • Serve as first response for incidents and contribute to problem management and root cause analysis.
  • Support development teams in building a reliability-focused culture within the development lifecycle.
  • Develop troubleshooting documentation and production support materials.
  • Collaborate with engineering teams to create optimized runbooks, operational documentation, and automation for operational tasks.
  • Work with development and cloud engineering teams to embed reliability and performance into the software delivery lifecycle.
  • Design, implement, and evolve observability solutions using metrics, logs, traces, and dashboards.
  • Use tools such as Prometheus, Grafana, and ELK to improve monitoring and visibility.
  • Participate in on-call rotations and continuously improve alert quality and response processes.
  • Champion continuous improvement in reliability and performance across teams.

Requirements

  • Bachelor's degree or MS in Engineering, or equivalent experience.
  • Experience operating at least one container orchestration cluster, such as Kubernetes or Docker Swarm.
  • Experience developing or maintaining software for production services at scale.
  • Experience with ELK.
  • Experience with AWS.
  • Experience with the Grafana/Prometheus stack.
  • Strong scripting skills in Bash, Python, or Go.
  • Excellent communication skills.
  • Ability to think creatively, anticipate challenges, and question existing technologies and procedures.
  • Comfort working in agile/lean methods and iterating collaboratively.
  • Strong team-player mindset and ability to work across product, experience design, engineering, and other functions.
  • Telephony knowledge, including SIP and VoIP, is a plus.
  • Experience in Linux administration, including RedHat, CentOS, or AL, is a plus.
  • Working knowledge of configuration management tools such as Terraform and Ansible is a plus.
  • Experience with TCP/IP and general networking concepts is a plus.
  • RDBMS knowledge, such as MySQL or Postgres, is a plus.
  • NoSQL knowledge, such as Redis, is a plus.

Benefits

  • Fixed compensation.
  • Long-term employment with vacation days.
  • Professional development support, including courses and training.
  • Opportunity to work on cutting-edge products with global impact in the service industry.
  • A collaborative, fun-to-work-with team.
  • Apple gear provided.
  • Equal opportunity employer with a diverse and inclusive workplace.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Manager, Engineering

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Manager, Engineering for Application Security to lead global programs that improve product security, reliability, and operational efficiency across its cloud platform.

Agile AWS C++ Docker GCP Java Kafka Kubernetes OWASP Ruby Scala SIEM
12 hours, 49 minutes ago

Staff Software Engineer - Databases SRE | Sweden | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer, SRE to improve the reliability and scalability of Grafana Cloud’s database products for high-value customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Microservices Python Terraform
1 day, 12 hours ago

Senior Site Reliability Engineer (SRE)

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a Senior Site Reliability Engineer to own the reliability and operational excellence of business-critical production systems for international clients in a remote, collaborative environment.

AWS Datadog Go Heroku Kubernetes PostgreSQL Python SQL Server TypeScript
1 day, 12 hours ago

Staff Software Engineer - Databases SRE | Spain | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer - SRE to strengthen the reliability of its cloud database products for high-SLA customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Python Terraform
1 day, 12 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers