Data Center Reliability Engineer

6 hours, 7 minutes ago
Full-time
Junior
DevOps and Infrastructure
Phaidra

Phaidra

Phaidra is an industrial AI company that creates self-learning, intelligent control systems for industrial facilities. By leveraging AI technology, Phaidra helps operators reduce risk, improve energy efficiency, and meet sustainability goals by maximiz...

Internet Software & Services
51-250
Founded 2019
$30M raised

Description

  • Analyze mechanical and electrical telemetry data to identify failure signatures for the monitoring tool.
  • Act as a primary user of the platform and work with Engineering to improve product features and data quality.
  • Translate raw telemetry into SME-level logic and real-time guidance for data center operators.
  • Build deep domain expertise across data center infrastructure, including mechanical and electrical dependencies.
  • Support customers directly with clear, data-backed recommendations on complex issues.
  • Oversee pilot projects to validate the realism, accuracy, and usefulness of AI-driven outputs.
  • Identify gaps in existing tooling and propose logic-based solutions to Engineering.
  • Contribute to the refinement of the LLM instruction set for cross-disciplinary diagnostics.
  • Present post-incident analyses that correlate telemetry with real-world root causes.

Requirements

  • 2–3 years of relevant professional experience.
  • Bachelor’s degree in Mechanical Engineering, Electrical Engineering, Control Theory, or a related field.
  • Strong Python skills and experience with Pandas and NumPy for custom analysis.
  • Ability to explain complex diagnostic findings clearly to both technical and non-technical stakeholders.
  • Proven ability to solve problems independently or collaboratively without preconceived assumptions.
  • Demonstrated commitment to Transparency, Collaboration, and Ownership.
  • Experience with critical infrastructure components such as HVAC, power distribution, or industrial automation (preferred).
  • Experience with time-series data from industrial sensors such as SCADA, BMS, or Smart Meters (preferred).
  • Exposure to or strong interest in LLMs for root-cause analysis and automated reporting (preferred).
  • U.S.-based candidate with preference for Pacific Time Zone and flexibility to overlap with APAC hours as needed.

Benefits

  • Competitive compensation with meaningful equity.
  • Base salary range for U.S. residents: $101,320–$163,900 depending on location tier.
  • 100% remote work environment.
  • Medical, dental, and vision insurance, with exact benefits varying by region.
  • Unlimited paid time off with a required minimum of 20 days per year.
  • Paid parental leave, with exact benefits varying by region.
  • Flexible stipends for workspace, well-being, and professional development.
  • Company MacBook.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer

Accenture 100K+ Professional Services

Accenture Federal Services is hiring a Site Reliability Engineer to improve the reliability, performance, and scalability of a client system supporting US federal mission operations.

6 hours, 37 minutes ago

Senior Site Reliability Engineer (Remote Build)

Remote 251-1K Professional Services

Remote is hiring a Senior Site Reliability Engineer for Remote Build to own the reliability, security, and operational strategy behind its global employment infrastructure platform.

AWS Bash CI/CD Datadog Elixir GitHub Actions GitLab Go Grafana Java Jenkins Kubernetes Linux Microservices Node.js Prometheus Python Terraform
1 day, 6 hours ago

Senior Site Reliability Engineer (Remote Build)

Remote 251-1K Professional Services

Remote is hiring a Senior Site Reliability Engineer to own the reliability, security, and operational strategy for Remote Build’s global infrastructure platform supporting AI-driven HR and Finance integrations.

AWS Bash CI/CD Datadog Elixir GitHub Actions GitLab Go Grafana Java Jenkins Kubernetes Linux Microservices Node.js Prometheus Python Terraform
1 day, 7 hours ago

Sr. Site Reliability Engineer III (6448)

MetroStar 251-1K IT Services

MetroStar is hiring a Sr. Site Reliability Engineer III to support mission-critical federal government workloads and developer tooling in a highly secure, operational environment.

Ansible AWS Bash CI/CD Kubernetes Load Balancing
2 days, 7 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers