Senior AI Software Developer in Test

2 hours, 4 minutes ago
Full-time
Senior
Artificial Intelligence and Machine Learning
Caseware

Caseware

CaseWare International Inc. provides cutting-edge software solutions for accounting firms, corporations, and governments, enabling users worldwide to work smarter and transform insights into impact.

Internet Software & Services
251-1K
Founded 1988

Description

  • Evolve an AI-first quality strategy for a fast-scaling cloud-native SaaS platform and emerging agentic systems.
  • Integrate AI-enhanced testing into CI/CD pipelines, including predictive flakiness detection, automated test generation, and self-healing scripts.
  • Design deterministic and statistical testing approaches for LLM-based and agentic systems to address hallucinations, prompt injection, bias, drift, and safety risks.
  • Build automated evaluation pipelines and harnesses for correctness, faithfulness, retrieval quality, generation accuracy, tool-calling, planning sequences, and multi-agent flows.
  • Develop and execute test frameworks across the full AI lifecycle, including prompts, datasets, embeddings, model versions, RAG pipelines, and guardrails.
  • Implement red-teaming, bias and fairness checks, compliance mechanisms, and AI quality signals for automated gating and continuous monitoring.
  • Partner with product, data science, AI engineering, and development teams to test AI features and support roadmap delivery.
  • Drive quality metrics and observability, including DORA metrics, test coverage, hallucination rate, context precision, and drift detection.
  • Build dashboards, support A/B testing of models, and monitor post-deployment AI behavior.
  • Mentor SDETs, lead workshops on AI testing best practices, and help define roadmaps and standards for sustainable AI quality assurance.

Requirements

  • 7+ years of experience in Quality Engineering or SDET roles within cloud-native SaaS environments.
  • 2+ years of hands-on experience with AI, ML, or LLM systems.
  • Strong experience with automated testing infrastructure, CI/CD tools such as Jenkins or GitHub Actions, and test pyramid strategies from unit to end-to-end.
  • Full-stack testing experience across frontend, backend, and API layers.
  • Proven experience testing LLMs, AI agents, and RAG pipelines, including risks such as hallucinations, prompt injection, bias, and drift.
  • Proficiency in JavaScript or TypeScript and working knowledge of Python or Java.
  • Experience with AI evaluation frameworks such as Ragas, DeepEval, LangChain, LangSmith, or LangFuse.
  • Knowledge of observability tools such as New Relic, statistical testing methods, red-teaming, and ethical AI practices.
  • Experience with performance, stress, and load testing tools such as k6, JMeter, or BlazeMeter is nice to have.
  • Bachelor's or Master's degree in Computer Science, AI, or a related field; ISTQB AI Testing certification is a plus.
  • Strong English communication and collaboration skills.
  • A strong portfolio, open-source contributions, or relevant case studies are highly regarded.

Benefits

  • Contrato a término indefinido with all legal benefits.
  • Prepaid medicine, life insurance, and funeral assistance.
  • Internet allowance and home office stipend.
  • Competitive compensation above the market average.
  • 100% remote work environment with excellent work-life balance.
  • Budget for training and mentorship from a highly experienced professional.
  • 5 personal time off days per year, plus sick leave top-up to 100% salary from day 3 to 90.
  • Recognition awards, additional paid time off, and vacation upgrades starting at 5 years of service.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

*Scout Search Quality Rater - Portuguese (Brazil)

Welocalize 1K-5K Professional Services

Welocalize is hiring a freelance, remote Search Quality Rater in Brazil to evaluate Portuguese (Brazil) search results and help improve AI-driven search experience through project-based quality ratings.

4 minutes ago

AI Game Tech, Technical Director

Skydance 251-1K Media

Skydance Games is seeking a Technical Director to lead AI-focused game technology research, prototyping, and production integration across multiple game teams.

Game Development Machine Learning Unreal Engine
4 minutes ago

Scout Search Quality Rater - English (UK)

Welocalize 1K-5K Professional Services

Welocalize is hiring a remote Freelance Search Quality Rater in the United Kingdom to evaluate search results and help improve AI training data for a client project.

Machine Learning NLP
4 minutes ago

Shape the Future of AI — Marathi Talent Hub

Welocalize 1K-5K Professional Services

Welo Data, part of Welocalize, is seeking Marathi-speaking contributors in India to join a global remote talent network for flexible AI data projects involving annotation, evaluation, and prompt creation.

LLM
4 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers