Staff Site Reliability Engineer, Streaming

1 hour ago
Full-time
Lead
DevOps and Infrastructure
Alpaca

Alpaca

Alpaca is a developer-first API for stock and crypto trading, offering easy-to-use APIs for building apps and trading algorithms.

Capital Markets
51-250
Founded 2015
$87M raised

Description

  • Triage difficult technical problems and implement effective solutions.
  • Enhance the observability stack for RabbitMQ and Redpanda by defining SLOs and alerts and implementing profiling and logging.
  • Improve the reliability of RabbitMQ and Redpanda clients.
  • Respond to and resolve incidents promptly and conduct post-incident reviews to drive improvements.
  • Collaborate with development teams to ensure new features and services are designed for reliability and scalability.
  • Monitor system capacity and performance and implement changes to support future growth.
  • Work closely with development, operations, and DevOps teams to maintain robust applications and services.

Requirements

  • 5+ years of experience in Site Reliability Engineering, Performance Engineering, or a similar role.
  • 5+ years of experience with message brokers such as Kafka, RabbitMQ, and Redpanda.
  • Proven experience managing large-scale, high-availability, high-performance distributed systems.
  • Experience designing and implementing SLIs, SLOs, and SLAs with comprehensive alerting and monitoring.
  • Strong ability to work independently, lead large tasks, and collaborate with internal teams or external partners.
  • Significant production experience with Kubernetes.
  • Proficiency with Go.
  • Proficiency with Prometheus.
  • Proficiency with Linux.
  • Experience troubleshooting message broker performance issues.
  • Knowledge of trading or fintech domains (preferred).
  • Experience with low-latency systems (preferred).
  • Experience with Loki and Tempo (preferred).
  • Experience with distributed tracing (preferred).
  • Experience with the USE method (preferred).
  • Experience with perf, bpf, and pprof (preferred).

Benefits

  • Competitive salary and stock options.
  • Health benefits.
  • One-time $500 new hire home-office setup stipend.
  • $150 monthly stipend via a Brex card.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Junior Site Reliability Engineer

Coalfire 251-1K Internet Software & Services

Coalfire is hiring a Junior Site Reliability Engineer to support managed cloud services for clients by operating and maintaining secure, resilient SaaS infrastructure across major public cloud environments.

Ansible AWS Azure Bash CI/CD Docker GCP HIPAA JIRA Kubernetes Linux Palo Alto PowerShell Python SOC Splunk Terraform TLS Windows Server
25 minutes ago

Senior Site Reliability Engineer

Parallel Domain 51-250 Aerospace & Defense

Parallel Domain is hiring a Senior Site Reliability Engineer to operate and evolve the infrastructure that powers large-scale simulation and validation for autonomous systems in a remote role across Canada and the U.S. Pacific Northwest.

Active Directory Argo CD AWS Bash DNS Docker GitHub Actions Grafana Helm Kubernetes Linux Load Balancing Packer Prometheus Python Terraform
1 hour, 6 minutes ago
3 hours, 15 minutes ago

Site Reliability Engineer (Remote) - #35039

Technology Stack is hiring a Technical Support / Ops Engineer to monitor and troubleshoot a legal operations platform built on cloud services, microservices, AI agents, and workflow orchestration.

GCP Go GPT Mailgun PostgreSQL Python React Redis SQL Terraform Twilio Vue.js
5 hours, 3 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers