Senior Site Reliability Engineer

1 day, 3 hours ago
Full-time
Senior
DevOps and Infrastructure
CaptivateIQ

CaptivateIQ

CaptivateIQ is the agile commission solution that simplifies sales expenses and drives business growth with automated data management and real-time earnings visibility.

Internet Software & Services
251-1K
Founded 2017
$165M raised

Description

  • Read and write designs, documentation, runbooks, and industry literature to support team learning and execution.
  • Partner with development teams to design and implement reliable and resilient services.
  • Build infrastructure automation that is easy for other teams to use.
  • Develop observability processes, reports, and tooling to diagnose performance and stability issues.
  • Automate manual processes to eliminate toil and improve operational efficiency.
  • Ensure compliance and security commitments are met.
  • Participate in an on-call rotation to provide after-hours support and resolve critical issues.
  • Communicate ethically and professionally across the engineering organization.

Requirements

  • 5+ years of experience in Software Engineer, SRE, or DevOps roles.
  • Strong written and verbal communication skills.
  • Experience with Infrastructure as Code, including Terraform and AWS.
  • Experience with containers and container orchestration tools, including ECS.
  • Experience authoring and maintaining code in Bash, Python, and/or Golang.
  • Experience using observability tools and techniques, including Datadog.
  • Experience with cloud cost management and FinOps (nice to have).
  • Experience building, maintaining, and operating SaaS or web-based applications (nice to have).
  • Experience with distributed system principles and their application (nice to have).
  • Experience building and operating multi-region or cell-based applications (nice to have).
  • Experience managing cloud vendor relationships (nice to have).
  • Experience working in compliance and regulated environments, including SOC2 and HIPAA (nice to have).

Benefits

  • 100% of employee medical, dental, and vision coverage, including 75% coverage for dependents (US only).
  • Vacation days and quarterly mental health days.
  • 401(k) plan for US employees.
  • Apple products provided to help you do your best work.
  • Resource Groups (ERGs) that support and celebrate employee communities.
  • ERGs that support company-wide DEI goals and diverse talent retention.
  • Competitive base salary range of $195,700 to $225,000 per year in the San Francisco Bay Area.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

The Voleon Group 51-250 Capital Markets

Voleon is hiring a Site Reliability Engineer to improve the reliability, operations, and efficiency of production-critical infrastructure and data pipelines supporting its AI- and ML-driven investment systems.

Apache Airflow CI/CD Git Go Grafana gRPC Jenkins Kubernetes Linux Microservices Pandas PostgreSQL Prometheus Python R SQL
1 day, 1 hour ago

Staff Software Engineer - Grafana Databases, Managed Services | Canada | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a remote Staff Software Engineer to own and evolve shared, production-critical database and streaming infrastructure for Grafana Cloud’s managed services across multi-cloud environments.

AWS Azure Cassandra ClickHouse GCP Go Grafana Helm Kafka Kubernetes Linux Microservices PostgreSQL Snowflake Terraform
1 day, 1 hour ago

Senior SRE/DevOps Engineer

Metabase 51-250 IT Services

Metabase is hiring a Senior SRE/DevOps Engineer to own the infrastructure and operations behind its fast-growing Metabase Cloud hosted analytics product.

AWS CI/CD Datadog Go Grafana Kubernetes Prometheus Python Terraform
1 day, 1 hour ago

Lead Site Reliability Engineer - 10929

Coupa Software 1K-5K Internet Software & Services

Coupa is hiring a Lead Site Reliability Engineer in Mexico City to build and operate reliable cloud and GenAI infrastructure for its spend management platform.

AWS Azure Bash Chef DNS GCP Generative AI Git GitHub Actions Helm Kubernetes Linux LLM Machine Learning Microservices MySQL New Relic PagerDuty Python SageMaker Terraform
1 day, 1 hour ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers