Recorded Future

Recorded Future

Recorded Future is the leading threat intelligence platform, empowering organizations to identify and mitigate threats across various domains with real-time, unbiased, and actionable intelligence.

Professional Services
251-1K
Founded 2009
$58M raised

Description

  • Ensure the performance, capacity, scalability, reliability, resiliency, security, compliance, supportability, cost efficiency, and service-level objectives for the platform.
  • Design, implement, and maintain scalable and reliable infrastructure on AWS.
  • Develop and manage observability solutions using tools such as Grafana, ELK, and Prometheus to monitor system health and performance.
  • Automate infrastructure provisioning and configuration using Terraform and Chef.
  • Participate in a 24/7 on-call rotation to respond to and resolve production incidents.
  • Collaborate with engineering teams to ensure applications are designed for high availability and resilience.
  • Perform comprehensive root cause analysis for outages and recurring incidents.
  • Identify performance bottlenecks and systemic issues, then drive proactive improvements.
  • Lead continuous improvement efforts through automation, process optimization, and post-incident reviews.
  • Create clear incident reports and technical documentation.

Requirements

  • 3+ years of experience in a Site Reliability Engineer, DevOps Engineer, or similar role.
  • Hands-on experience with Amazon Web Services (AWS), including AWS networking concepts.
  • Expert-level troubleshooting and diagnostic skills.
  • Proven track record of reducing system downtime.
  • Advanced Linux skills across engineering fundamentals, networking, storage, and operating systems.
  • Experience managing and optimizing observability tools such as Grafana and the ELK Stack.
  • Strong proficiency in Terraform and Chef.
  • Strong preference for automating tasks and using Infrastructure as Code rather than manual changes.
  • Ability to understand complex architectures and stay calm under pressure during outages.
  • Preferred: knowledge and experience with Kubernetes.
  • Preferred: familiarity with message brokers such as RabbitMQ and Apache Kafka.
  • Preferred: experience with NoSQL databases, particularly MongoDB and Elasticsearch.
  • Preferred: familiarity with OpenTelemetry.
  • Preferred: experience with large distributed systems and microservices architecture.
  • Preferred: experience with CI/CD pipelines.

Benefits

  • Opportunity to join a large, global intelligence company with more than 1,000 intelligence professionals serving over 1,900 clients worldwide.
  • Work for a company with a 4.6-star user rating on G2 and customers including more than 50% of the Fortune 100.
  • Be part of a diverse, inclusive workplace representing over 40 nationalities.
  • Accommodation and special assistance available during the application process.
  • Equal opportunity and affirmative action employer committed to fair hiring practices.
  • Drug-free workplace.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Manager, Engineering

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Manager, Engineering for Application Security to lead global programs that improve product security, reliability, and operational efficiency across its cloud platform.

Agile AWS C++ Docker GCP Java Kafka Kubernetes OWASP Ruby Scala SIEM
16 hours, 12 minutes ago

Staff Software Engineer - Databases SRE | Sweden | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer, SRE to improve the reliability and scalability of Grafana Cloud’s database products for high-value customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Microservices Python Terraform
1 day, 15 hours ago

Senior Site Reliability Engineer (SRE)

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a Senior Site Reliability Engineer to own the reliability and operational excellence of business-critical production systems for international clients in a remote, collaborative environment.

AWS Datadog Go Heroku Kubernetes PostgreSQL Python SQL Server TypeScript
1 day, 15 hours ago

Staff Software Engineer - Databases SRE | Spain | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer - SRE to strengthen the reliability of its cloud database products for high-SLA customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Python Terraform
1 day, 15 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers