Senior Site Reliability Engineer - AWS

1 month ago
Full-time
Senior
DevOps and Infrastructure
Filevine

Filevine

Filevine is a top legal tech company revolutionizing legal work with AI-powered case management software, empowering law firms to streamline operations and enhance client services.

Specialized Consumer Services
251-1K
Founded 2015
$226M raised

Description

  • Provide leadership, mentoring, and sound judgment as the reliability engineering lead on the team.
  • Design and maintain autonomous systems for building, deploying, testing, and operating Filevine products.
  • Serve as the authoritative voice of reliability across the full software development lifecycle.
  • Monitor, aggregate, dashboard, and alert on software and infrastructure events to ensure visibility and rapid response.
  • Continuously improve CI/CD pipelines, automation scripts, playbooks, and tools to streamline operations and reduce resolution time.
  • Identify and resolve gaps in system availability, performance, and security while strengthening the overall security posture.
  • Document processes, architecture, procedures, and best practices to support team effectiveness.
  • Research, adopt, or build reliable tools that improve engineer productivity.
  • Collaborate with team members and stakeholders, mentor junior engineers, and participate in a 24/7 on-call rotation for production support and emergency response.

Requirements

  • 8+ years of hands-on technical experience in software engineering, infrastructure, or operations roles, including at least 4 years dedicated to Site Reliability Engineering.
  • Strong curiosity, self-motivation, and a continuous learning mindset with proactive enthusiasm for improving systems and processes.
  • Strong proficiency in Python, Bash, PowerShell, and other common SRE scripting and tooling technologies.
  • Expert-level experience designing, building, and maintaining autonomous systems for build, deployment, testing, monitoring, and operations.
  • Hands-on experience with AWS services such as EC2, Kubernetes/EKS, CloudWatch, Lambda, S3, and IAM.
  • Proficiency in core SRE skills including monitoring and alerting, incident response, capacity planning, performance optimization, CI/CD enhancement, and reliability best practices.
  • Bachelor’s degree in Computer Science, Information Systems, or a related field, or equivalent certifications such as AWS or Google Cloud Professional certifications, or substantial comparable direct work experience.
  • Proven track record of independently driving reliability improvements, reducing toil through automation, and supporting highly available, scalable production systems in a fast-paced environment.

Benefits

  • $160,000 - $190,000 base salary.
  • Eligible for a paid time off policy.
  • Comprehensive benefits package.
  • Medical, dental, and vision insurance for full-time employees.
  • Maternity and paternity leave for full-time employees.
  • Short- and long-term disability coverage.
  • Opportunity to learn from a dedicated leadership team.
  • Top-of-the-line company swag.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Manager, Software Engineering - Storage Platform

Figma 1K-5K Internet Software & Services

Figma is hiring an Engineering Manager to lead its Databases team, which owns the core data layer behind the company’s product and platform as it scales.

LLM MySQL PostgreSQL
11 hours, 3 minutes ago

Site Reliability Engineer

Stack AV 201-500 information technology & services

Stack AV is hiring a Site Reliability Engineer to keep its compute platform for large-scale autonomous systems development reliable, scalable, and ready to support engineering and research workloads.

CI/CD Kubernetes Linux OpenTelemetry Prometheus
11 hours, 18 minutes ago

Senior Site Reliability Engineer

Stack AV 201-500 information technology & services

Stack AV is hiring a Site Reliability Engineer to support the reliability, scalability, and uptime of its production infrastructure for autonomous trucking systems.

AWS Bash CloudFormation GCP Kubernetes Linux OpenTelemetry Prometheus Python TCP/IP Terraform
11 hours, 33 minutes ago

Manager of Monitoring Operations

Ensono 1K-5K IT Services

BMC is hiring a Manager – Monitoring Operations to lead enterprise monitoring for IT infrastructure and applications across on-prem OpenShift, network, and OS monitoring platforms.

Grafana Kubernetes Linux Prometheus
1 day, 10 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers