Mistral AI

Mistral AI is a French AI company that builds frontier AI models, assistants, agents, and services for consumers and enterprises. Its mission is to make frontier AI accessible to everyone and to democratize AI through open-source, efficient, and innovative models, products, and solutions.

Artificial Intelligence

Technology

201-500 (500)

Founded 2023

8 open positions

Links

View All Jobs

Site Reliability Engineer

1 month, 4 weeks ago

France, Spain, Europe, United Kingdom, Belgium, Germany, Italy, Netherlands

Full-time

Senior

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Bash CI/CD CloudFormation Datadog Docker ELK Stack Flux Go Grafana Kubernetes Microservices Prometheus Python REST API Terraform

Apply Now

Mistral AI

Artificial Intelligence

201-500

Founded 2023

View All Jobs 8

Description

Design, build, and maintain scalable, highly available, fault-tolerant infrastructure for web services and ML workloads.
Keep platform, inference, and model training environments highly available across multiple HPC clusters.
Operate production systems, troubleshoot incidents, and handle on-call responses, user administration, data extraction, and infrastructure scaling.
Implement and improve monitoring, alerting, and incident response systems to reduce downtime.
Maintain CI/CD, containerization, orchestration, logging, and alerting workflows and tools for APIs and large training runs.
Participate in on-call rotations and perform root cause analysis to prevent recurring incidents.
Improve infrastructure automation, deployment, and orchestration using Kubernetes, Flux, and Terraform.
Collaborate with AI/ML researchers to enable safe and reproducible model-training experiments.
Build a cloud-agnostic platform that abstracts science from infrastructure.
Document processes and procedures and contribute to open source, publications, blog articles, and conferences.

Requirements

Master’s degree in Computer Science, Engineering, or a related field.
7+ years of experience in a DevOps or Site Reliability Engineering role.
Strong experience with cloud computing and highly available distributed systems.
Experience with root cause analysis, in-production troubleshooting, and on-call rotations in critical environments.
Experience working against reliability KPIs such as observability, alerting, and SLAs.
Hands-on experience with CI/CD, containerization, and orchestration tools such as Docker and Kubernetes.
Knowledge of monitoring, logging, alerting, and observability tools such as Prometheus, Grafana, ELK Stack, or Datadog.
Familiarity with infrastructure-as-code tools such as Terraform or CloudFormation.
Proficiency in scripting languages such as Python, Go, or Bash, plus knowledge of software development best practices.
Strong understanding of networking, security, and system administration concepts.
Excellent problem-solving and communication skills.
Self-motivated and able to work effectively in a fast-paced startup environment.
Experience in an AI/ML environment is preferred.
Experience with high-performance computing systems and workload managers such as Slurm is preferred.
Experience with modern AI-oriented infrastructure solutions such as Fluidstack, Coreweave, or Vast is preferred.

Benefits

Competitive salary and equity.
Health insurance.
Transportation allowance.
Sport allowance.
Meal vouchers.
Private pension plan.
Generous parental leave policy.
Visa sponsorship.
Remote-friendly arrangement with covered travel and accommodation for Paris onboarding, plus at least 3 days per month in the Paris office for eligible remote hires.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer (Top Secret Clearance)

SpaceX 10K-50K Aerospace & Defense

SpaceX is hiring a Site Reliability Engineer to support Classified IT Systems Engineering by building and operating scalable infrastructure for high-volume data products and GPU-accelerated machine learning workloads.

United States Full-time Junior Site Reliability Engineer (SRE)

$145k-$175k

Bash Kubernetes Linux Python

7 hours, 47 minutes ago

Apply

7 hours, 47 minutes ago

Junior Site Reliability Engineer

Fable 11-50 Professional Services

Fable is hiring a Junior Site Reliability Engineer to support the reliability, performance, and scalability of the infrastructure behind its accessible digital products.

Canada Full-time Junior Site Reliability Engineer (SRE)

$69k-$90k

AWS Azure Bash CI/CD CloudFormation Datadog GCP Git GitHub Actions Grafana JavaScript Linux Prometheus Python Terraform Unix

7 hours, 47 minutes ago

Apply

7 hours, 47 minutes ago

Senior SRE - Platform (Managed Kubernetes Infrastructure)

Elastic 1K-5K Internet Software & Services

Elastic is hiring a Site Reliability Engineer on its Platform Engineering team to design and operate the multi-cloud platform that hosts Elastic Cloud services and supports rapid, reliable product delivery.

Canada Full-time Senior Platform Engineer Site Reliability Engineer (SRE)

$120k-$150k

Docker Go InfluxDB Kubernetes Linux Prometheus Terraform

1 day, 7 hours ago

Apply

1 day, 7 hours ago

Site Reliability Engineer

Dropbox 1K-5K Internet Software & Services

Dropbox is hiring a Corporate Site Reliability Engineer to lead infrastructure reliability, observability, automation, and security for its IT Services environment.

Poland Full-time Senior Site Reliability Engineer (SRE)

$71k-$96k

Ansible AWS Bash Chef Datadog DHCP DNS Docker EC2 GitHub GitHub Actions GitOps Kubernetes Linux Python REST API Serverless Terraform Ubuntu WAF

1 day, 7 hours ago

Apply

1 day, 7 hours ago

Mistral AI

Tags

Links

Site Reliability Engineer

Mistral AI

Description

Requirements

Benefits

Similar Roles

Site Reliability Engineer (Top Secret Clearance)

Junior Site Reliability Engineer

Senior SRE - Platform (Managed Kubernetes Infrastructure)

Site Reliability Engineer

You're on a roll! Sign up now to keep applying.