Xenon7

Xenon7 provides advanced AI solutions and consultancy services, leveraging a team of highly qualified experts and a strong emphasis on research and innovation to address complex industry challenges and enhance operational efficiency.

Internet Software & Services

Information Technology

Founded 2014

27 open positions

Links

View All Jobs

Senior Python Developer: Databricks AI Platform, Alerting & Monitoring

2 months ago

India

Contract

Senior

Machine Learning Engineer

DevOps and Infrastructure

AWS Databricks Docker JIRA MLflow MLOps Python

Apply Now

Xenon7

Internet Software & Services

Founded 2014

View All Jobs 27

Description

Build Python-based workflows for MLOps, LLMOps, and application deployment within Databricks.
Enhance Databricks workspace onboarding and governance, including Unity Catalog, permissions, and reusable environment setup modules.
Integrate Mosaic AI components (Gateway, Model Serving, Agents) into platform automation and deployment pipelines.
Support Delta Lake (Bronze/Silver/Gold) architecture and manage MLflow model lifecycles.
Implement automated health checks and observability for AWS resources and Databricks applications.
Develop event-driven alerting mechanisms using AWS CloudWatch, SNS, and EventBridge.
Build Python automations to validate configuration consistency across multiple AWS accounts and detect anomalies or misconfigurations.
Create automated service-request workflows that bridge alerting with ticketing and collaboration tools (Slack, Jira, etc.).
Design monitoring dashboards and fail-safe/rollback mechanisms to maintain production stability and uptime.

Requirements

6+ years of professional Python development and cloud automation experience (Python mastery, internals, GIL, multiprocessing vs. multithreading, memory trade-offs).
Hands-on experience with Databricks ecosystem components: Unity Catalog, MLflow, and Mosaic AI.
Experience with Delta Lake architecture (Bronze/Silver/Gold) and ML model lifecycle management.
Strong proficiency with AWS automation and observability tools: Lambda, API Gateway, CloudWatch, EventBridge, SNS.
Experience implementing reliability engineering practices such as Docker image immutability and automated rollback strategies.
Familiarity with Service Principal–based authentication for secure Databricks/AWS integration.
Experience building event-driven alerting and integrations with ticketing/collaboration systems (Slack, Jira).
Ability to work independently in a remote, global environment; immediate availability is highly preferred.
Mindset combining development of new AI capabilities with proactive monitoring and operational uptime focus (e.g., SRE/Reliability orientation).

Benefits

Access to a networked ecosystem of client engagements, thought leadership, and mentorship opportunities.
Outcome-focused culture emphasizing autonomy, ownership, and smart execution over hours logged.
Opportunity to contribute to leading-edge AI and high-scale cloud infrastructure projects.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Operations Engineer

Mozilla 251-1K Internet Software & Services

Mozilla is hiring a Staff Operations Engineer to lead the design, reliability, and evolution of hybrid-cloud and workplace infrastructure across teams.

Canada Full-time Lead Infrastructure Engineer Site Reliability Engineer (SRE)

$86k-$127k

Ansible DNS Linux Puppet Python TCP/IP Unix

8 hours, 1 minute ago

Apply

8 hours, 1 minute ago

Software Engineer II, Backend (ML Training & Serving)

Affirm 1K-5K Diversified Financial Services

Affirm is hiring a Software Engineer II for its ML Training & Serving engineering team to build the infrastructure that trains and serves machine learning models across the company.

Canada Full-time Junior Backend Engineer Machine Learning Engineer

$89k-$126k

AWS Kotlin Kubernetes Machine Learning MySQL Python

8 hours, 1 minute ago

Apply

8 hours, 1 minute ago

Ssr. Fullstack Engineer

Resilient Co 11-50 Professional Services

Resilient Co. is hiring a semi-senior Fullstack Engineer in Argentina or Brazil to build AI-driven full-stack solutions for enterprise workflows, with a focus on agentic AI, machine learning, backend services, and cloud integration.

Argentina Brazil Contract Senior Full-stack Engineer Machine Learning Engineer

Angular Azure C# CI/CD Django Docker Entity Framework FastAPI Flask Git JavaScript Microservices .NET NumPy Pandas Python RabbitMQ React Scikit-learn Terraform Vue.js YAML

8 hours, 16 minutes ago

Apply

8 hours, 16 minutes ago

Principal Site Reliability Engineer (SRE)

Symmetrio Professional Services

Symmetrio is recruiting a Principal Site Reliability Engineer for a rapidly growing healthcare technology company to own the reliability, scalability, security, and performance of a mission-critical SaaS platform used by healthcare providers across the United States.

United States Full-time Lead Site Reliability Engineer (SRE)

Active Directory AWS CI/CD Datadog Django Grafana Kubernetes Python Terraform Windows Server

8 hours, 16 minutes ago

Apply

8 hours, 16 minutes ago

Xenon7

Tags

Links

Senior Python Developer: Databricks AI Platform, Alerting & Monitoring

Xenon7

Description

Requirements

Benefits

Similar Roles

Staff Operations Engineer

Software Engineer II, Backend (ML Training & Serving)

Ssr. Fullstack Engineer

Principal Site Reliability Engineer (SRE)

You're on a roll! Sign up now to keep applying.