Orion Health

Orion Health

Orion Health is a global provider of healthcare information technology solutions, revolutionizing healthcare delivery through software solutions. Their innovative products, including the Amadeus big data platform, enable precision medicine and personal...

Internet Software & Services
251-1K
Founded 1993

Description

  • Design, implement, and maintain reliable, scalable, and secure infrastructure for Orion Health products and services.
  • Define, monitor, and improve SLIs, SLOs, and SLAs to support reliability and customer satisfaction.
  • Build and maintain observability tooling, including monitoring, logging, alerting, and tracing across cloud environments.
  • Participate in incident response, including troubleshooting, root cause analysis, remediation planning, and post-incident reviews.
  • Reduce operational toil through automation, Infrastructure as Code, and self-service capabilities.
  • Collaborate with software engineering teams to improve application reliability, performance, and operational readiness.
  • Identify and remove reliability bottlenecks through performance tuning, capacity planning, and system optimisation.
  • Support infrastructure and platform upgrades while maintaining service availability and minimizing disruption.
  • Develop operational runbooks, standards, and best practices to improve resilience and operational efficiency.
  • Contribute to disaster recovery, business continuity, and broader platform resilience initiatives.

Requirements

  • 3+ years of experience in Site Reliability Engineering, Platform Engineering, DevOps, Cloud Operations, or Infrastructure Engineering roles.
  • Experience supporting and operating production cloud environments.
  • Strong experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.
  • Experience implementing Infrastructure as Code using tools such as Terraform, Bicep, ARM, or CloudFormation.
  • Experience with containerization and orchestration technologies such as Docker and Kubernetes.
  • Experience building and maintaining monitoring, logging, and observability solutions.
  • Experience managing production incidents and conducting root cause analysis.
  • Knowledge of CI/CD pipelines and modern software delivery practices.
  • Experience with automation and scripting using tools such as PowerShell, Bash, Python, or similar.
  • Understanding of networking, security, high availability, and disaster recovery principles.
  • Bachelor's degree in Computer Science, Software Engineering, Information Technology, or a related discipline preferred.
  • Industry certifications in cloud platforms, Kubernetes, DevOps, or reliability engineering are advantageous.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Software Engineer II - Inline Mailflow

Abnormal AI Internet Software & Services

Abnormal AI is hiring a Software Engineer for the Inline Mailflow team to build next-generation SMTP relay infrastructure for outbound email security and long-term secure email gateway displacement.

Apache Spark AWS Django DNS Docker Go Kubernetes Prometheus Python
14 hours, 14 minutes ago

Site Reliability Engineer

Capital Markets Gateway 51-250 Capital Markets

Capital Markets Gateway LLC is hiring a remote Site Reliability Engineer in Canada to strengthen reliability, observability, and incident response for its ECM fintech platform supporting global capital markets workflows.

Azure Bash Datadog Docker Elasticsearch GitHub Grafana GraphQL JIRA Kubernetes Linux Microservices .NET OpenTelemetry PostgreSQL Prometheus Python React Redis Terraform TypeScript
21 hours, 46 minutes ago

Staff Software Engineer - Reliability

Rubrik 1K-5K IT Services

Rubrik is hiring a Staff Site Reliability Engineer to lead reliability, automation, and cloud infrastructure architecture for its global SaaS and government-compliant environments, while also guiding the Application-SRE team and bridging customer issues back into engineering priorities.

AWS GCP Go Grafana Java Kubernetes MySQL OpenTelemetry Prometheus Pulumi Python Terraform
22 hours, 16 minutes ago

Sr. Database Reliability Engineer

SpaceX 10K-50K Aerospace & Defense

SpaceX is seeking a Senior Database Reliability Engineer to own and improve the reliability, performance, and operational support of the company’s Oracle and PostgreSQL database environment within its IT Engineering organization.

Bash Git Linux Machine Learning MySQL Oracle PostgreSQL Python SQL Windows Server
22 hours, 16 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers