ProArch

ProArch

At ProArch, we help our clients accelerate growth and mitigate risk with IT services, cybersecurity services, application development, cloud computing, and data analytics. ProArch was founded on the belief that a future where change is ‘business as usu...

Internet Software & Services
251-1K
Founded 2006

Description

  • Monitor system performance and reliability to ensure uptime meets organizational SLAs.
  • Implement and maintain observability tools for proactive issue detection through metrics and logs.
  • Troubleshoot and resolve complex production issues across infrastructure components.
  • Collaborate with software engineering teams to design and implement scalable, fault-tolerant architectures.
  • Develop and maintain automation scripts for deployment, monitoring, and system management.
  • Participate in the on-call rotation to respond to incidents and perform root cause analysis.
  • Contribute to capacity planning and performance tuning for optimal resource utilization.
  • Document infrastructure, processes, and incident responses to support knowledge sharing.

Requirements

  • 8+ years of experience as a Site Reliability Engineer, DevOps Engineer, or in a related role.
  • Strong experience with cloud providers such as AWS, Azure, or GCP.
  • Proficiency in scripting languages such as Python, Bash, or Go.
  • Experience with container orchestration tools like Kubernetes.
  • Familiarity with CI/CD pipelines and tools such as Jenkins or GitLab CI.
  • Experience with Snowflake, including account administration expertise.
  • Solid understanding of networking and security principles.
  • Experience with monitoring and logging tools such as Prometheus, Grafana, or the ELK stack.
  • Excellent problem-solving skills and a proactive attitude.
  • Strong communication and teamwork skills with an emphasis on collaboration.
  • Experience with Infrastructure as Code tools such as Terraform or CloudFormation is preferred.
  • Knowledge of service mesh architectures and modern microservices patterns is preferred.
  • Background in software development and familiarity with Agile methodologies is preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff SRE Engineer

Stellar Cyber 51-250 Professional Services

Stellar Cyber is seeking a Staff Site Reliability Engineer to improve the reliability, scalability, and operational efficiency of its cloud-based cybersecurity platform and production systems.

Apache Spark Argo CD AWS Azure Bash Bitbucket Elasticsearch GCP GitHub Actions Grafana Helm Kafka Kubernetes MongoDB Prometheus Python Redis Terraform
26 minutes ago

Manager, Site Reliability Engineering I

Filevine 251-1K Specialized Consumer Services

Filevine is hiring a Manager of Site Reliability Engineering I to lead reliability and platform project execution for its Legal AI platform in close partnership with product and development teams.

AWS Kubernetes Terraform
1 hour, 11 minutes ago

Site Reliability Engineer

DEUNA 51-250 Diversified Financial Services

DEUNA is hiring a Mid Site Reliability Engineer to help ensure the reliability, scalability, and performance of its AWS-based payments platform through observability, automation, and SRE practices.

AWS Go Grafana OpenTelemetry Prometheus
1 hour, 11 minutes ago

Site Reliability Engineer

Lucidya 51-250 Media

Lucidya is hiring a Site Reliability Engineer to own the stability, scalability, and automation of its cloud infrastructure supporting real-time customer experience intelligence at global scale.

Ansible AWS Azure Bash Bitbucket CI/CD Datadog Docker GCP GitHub Actions Grafana Jenkins Kubernetes Linux Load Balancing Prometheus Python RabbitMQ Redis Terraform
1 hour, 26 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers