ProArch

ProArch

At ProArch, we help our clients accelerate growth and mitigate risk with IT services, cybersecurity services, application development, cloud computing, and data analytics. ProArch was founded on the belief that a future where change is ‘business as usu...

Internet Software & Services
251-1K
Founded 2006

Description

  • Monitor system performance and reliability to ensure uptime meets organizational SLAs.
  • Implement and maintain observability tools for proactive issue detection through metrics and logs.
  • Troubleshoot and resolve complex production issues across infrastructure components.
  • Collaborate with software engineering teams to design and implement scalable, fault-tolerant architectures.
  • Develop and maintain automation scripts for deployment, monitoring, and system management.
  • Participate in the on-call rotation to respond to incidents and perform root cause analysis.
  • Contribute to capacity planning and performance tuning for optimal resource utilization.
  • Document infrastructure, processes, and incident responses to support knowledge sharing.

Requirements

  • 8+ years of experience as a Site Reliability Engineer, DevOps Engineer, or in a related role.
  • Strong experience with cloud providers such as AWS, Azure, or GCP.
  • Proficiency in scripting languages such as Python, Bash, or Go.
  • Experience with container orchestration tools like Kubernetes.
  • Familiarity with CI/CD pipelines and tools such as Jenkins or GitLab CI.
  • Experience with Snowflake, including account administration expertise.
  • Solid understanding of networking and security principles.
  • Experience with monitoring and logging tools such as Prometheus, Grafana, or the ELK stack.
  • Excellent problem-solving skills and a proactive attitude.
  • Strong communication and teamwork skills with an emphasis on collaboration.
  • Experience with Infrastructure as Code tools such as Terraform or CloudFormation is preferred.
  • Knowledge of service mesh architectures and modern microservices patterns is preferred.
  • Background in software development and familiarity with Agile methodologies is preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer (Senior or Staff), Atlas

MongoDB 1K-5K Internet Software & Services

MongoDB is hiring a Senior Site Reliability Engineer for its Atlas team to help support, maintain, and grow a multi-cloud platform for customer-facing production workloads.

AWS Azure DNS GCP Go HTTP Linux Python Ruby TLS
3 hours, 1 minute ago

Manager, Software Engineering (Resilience Engineering)

Affirm 1K-5K Diversified Financial Services

Affirm is seeking an Engineering Manager to lead its Resilience Engineering team, building production load testing and chaos engineering capabilities that improve the safety and reliability of production systems.

AWS Java Kotlin Kubernetes Microservices Python
3 hours, 10 minutes ago

Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

MongoDB 1K-5K Internet Software & Services

MongoDB’s Storage Layer Services team is hiring a Site Reliability Engineer to help re-architect the cloud storage layer for Atlas and ensure the reliability and operational safety of its distributed storage infrastructure.

AWS Azure DNS GCP Go Kubernetes Linux Python TCP/IP TLS
3 hours, 59 minutes ago

Manager, Software Engineering (Resilience Engineering)

Affirm 1K-5K Diversified Financial Services

Affirm is hiring an Engineering Manager to lead its Resilience Engineering team in building production load testing and chaos engineering capabilities that improve the safety and reliability of its production systems.

AWS Java Kotlin Kubernetes Python
6 hours, 14 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers