Symmetrio

Symmetrio

Symmetrio is a top Staffing and Recruiting company in the Philadelphia region, specializing in recruiting qualified full-time candidates, providing staff augmentation services, and offering advisory services to help clients meet their corporate objecti...

Professional Services

Description

  • Serve as the primary technical owner for production reliability across U.S. customer environments.
  • Investigate and resolve complex issues across web applications, APIs, backend services, data pipelines, cloud infrastructure, and customer integrations.
  • Lead production incident response efforts and coordinate cross-functional teams to restore service and reduce customer impact.
  • Perform root cause analysis and drive corrective actions that improve long-term system stability and resilience.
  • Partner with software engineering and platform teams to identify recurring reliability risks and implement sustainable solutions.
  • Design, configure, and validate secure customer connectivity solutions, including Site-to-Site VPNs, Transit Gateway integrations, routing configurations, and secure network paths.
  • Support customer onboarding by troubleshooting connectivity issues and ensuring consistent implementation processes.
  • Improve platform observability through monitoring, logging, alerting, tracing, and operational dashboards.
  • Contribute to CI/CD, infrastructure automation, and deployment processes that improve release safety and operational consistency.
  • Develop operational tooling for incident response, troubleshooting, onboarding, and system monitoring.
  • Collaborate with engineering leadership to improve cloud architecture, scalability, security, and operational readiness.
  • Partner with customer-facing teams to communicate technical issues, remediation plans, and reliability improvements clearly.
  • Support compliance, security, and risk management initiatives in regulated healthcare environments.

Requirements

  • 6+ years of hands-on experience supporting and managing AWS-based production environments.
  • 4+ years of experience supporting web applications and backend services; Python/Django experience is strongly preferred.
  • Experience with AWS networking technologies including VPCs, Site-to-Site VPNs, Transit Gateways, routing, NAT gateways, and security groups.
  • Strong experience with Terraform and infrastructure-as-code deployment practices.
  • Experience with containerized environments including ECS, Fargate, Kubernetes, or similar technologies.
  • Experience building and supporting CI/CD pipelines and release automation processes.
  • Familiarity with monitoring and observability platforms such as Datadog, CloudWatch, Sentry, Grafana, or similar tools.
  • Experience leading production incidents, outage management, and root cause analysis initiatives.
  • Exposure to Windows Server environments, Active Directory, Kerberos, and enterprise infrastructure concepts is preferred.
  • Healthcare technology, healthcare SaaS, clinical software, or other regulated industry experience is highly preferred.
  • Bachelor’s degree in Computer Science, Engineering, Information Technology, or a related technical field is preferred.

Benefits

  • Health Care Plan (Medical, Dental & Vision).
  • Retirement Plan (401k, IRA).
  • Paid Time Off (Vacation, Sick & Public Holidays).

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Operations Engineer

Mozilla 251-1K Internet Software & Services

Mozilla is hiring a Staff Operations Engineer to lead the design, reliability, and evolution of hybrid-cloud and workplace infrastructure across teams.

Ansible DNS Linux Puppet Python TCP/IP Unix
2 hours, 9 minutes ago

Performance Test Engineer Lead

PartnerOne 51-250 Media

An enterprise performance engineering role at a cloud-focused organization, responsible for validating the scalability, stability, and production readiness of distributed systems across Azure and hybrid environments.

Azure CI/CD Kubernetes PowerShell
2 hours, 40 minutes ago

Site Reliability Engineer

MLabs 11-50 Internet Software & Services

Remote UK-hours Site Reliability Engineering role at a financial technology company, focused on automating and operating the infrastructure that supports global integration services for financial institutions.

Active Directory Ansible AWS CI/CD GCP OAuth PostgreSQL SAML
2 hours, 54 minutes ago

GOV Site Reliability Engineer

Veeam Software 1K-5K Internet Software & Services

Veeam is hiring a Site Reliability Engineer for its Government and Sovereign Cloud team to support the Veeam Data Cloud SaaS platform by improving reliability, observability, and operational readiness in a regulated cloud environment.

Argo CD Azure C# ELK Stack GitHub Actions GitLab CI Go Grafana HIPAA Java JavaScript Kubernetes OpenTelemetry Prometheus Pulumi Terraform TypeScript
2 hours, 55 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers