Alpaca

Alpaca

Alpaca is a developer-first API for stock and crypto trading, offering easy-to-use APIs for building apps and trading algorithms.

Capital Markets
51-250
Founded 2015
$87M raised

Description

  • Operate production systems day to day, including on-call support, incident response, postmortems, and follow-up remediation.
  • Define and refine reliability practices, including SLIs, SLOs, and error budgets.
  • Improve observability across metrics, logs, traces, and alerting.
  • Ship infrastructure as code through a GitOps workflow for cloud resources and Kubernetes workloads.
  • Support and improve PostgreSQL performance, schema and migration review, online migrations, high availability, disaster recovery, and CDC pipelines.
  • Mentor engineers on reliability and database fundamentals through code review, design review, and pairing.
  • Collaborate with product teams to help ensure services operate within reliability objectives.

Requirements

  • 4+ years of experience in SRE, DevOps, Platform/Infrastructure, or backend engineering with significant production operations ownership.
  • Hands-on experience operating production services on Kubernetes.
  • Experience shipping infrastructure as code in a GitOps workflow.
  • Solid working knowledge of PostgreSQL in production, including query plans, pg_stat_* views, indexing, schema trade-offs, and safe online migrations on non-trivial tables.
  • Cloud networking fundamentals, including VPCs, routing, L4/L7 load balancing, DNS, and TLS.
  • Comfort debugging cross-service connectivity issues.
  • Comfortable with a modern observability stack and proficient with Linux at the operator level.
  • Experience with incident response, structured debugging, and postmortems that drive change.
  • Working proficiency in Go or Python, along with strong written and verbal communication skills.
  • Genuine interest in databases and willingness to grow PostgreSQL/DBA expertise.
  • Deeper PostgreSQL experience with large clusters at OLTP load, connection pooling at scale, HA/DR ownership, or CDC pipelines is preferred.
  • Experience with typed SQL access layers in Go, such as pgx, gorm, or sqlc, is preferred.
  • Production experience with messaging systems at scale, such as RabbitMQ, Kafka, or Redpanda, is preferred.
  • Security and compliance experience in a regulated environment, including SOC 2, secrets management, and audit logging, is preferred.
  • Familiarity with trading, brokerage, or other regulated fintech domains is preferred.

Benefits

  • Competitive salary with stock options.
  • Health benefits.
  • One-time USD $500 new hire home-office setup stipend.
  • Monthly USD $150 stipend via a Brex Card.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Operations Engineer

Mozilla 251-1K Internet Software & Services

Mozilla is hiring a Staff Operations Engineer to lead the design, reliability, and evolution of hybrid-cloud and workplace infrastructure across teams.

Ansible DNS Linux Puppet Python TCP/IP Unix
8 hours, 6 minutes ago

Principal Site Reliability Engineer (SRE)

Symmetrio Professional Services

Symmetrio is recruiting a Principal Site Reliability Engineer for a rapidly growing healthcare technology company to own the reliability, scalability, security, and performance of a mission-critical SaaS platform used by healthcare providers across the United States.

Active Directory AWS CI/CD Datadog Django Grafana Kubernetes Python Terraform Windows Server
8 hours, 21 minutes ago

Performance Test Engineer Lead

PartnerOne 51-250 Media

An enterprise performance engineering role at a cloud-focused organization, responsible for validating the scalability, stability, and production readiness of distributed systems across Azure and hybrid environments.

Azure CI/CD Kubernetes PowerShell
8 hours, 36 minutes ago

Site Reliability Engineer

MLabs 11-50 Internet Software & Services

Remote UK-hours Site Reliability Engineering role at a financial technology company, focused on automating and operating the infrastructure that supports global integration services for financial institutions.

Active Directory Ansible AWS CI/CD GCP OAuth PostgreSQL SAML
8 hours, 51 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers