AI Red-Teamer - Adversarial AI Testing English

10 hours, 14 minutes ago
Contract
Mid Level
Artificial Intelligence and Machine Learning
Weekday

Weekday

Weekday helps companies hire engineers who are vouched by other software engineers, enabling passive income for engineers. They offer services like drafting outreach messages, shortlisting candidates, and conducting reference checks. Backed by Y Combin...

Construction & Engineering
11-50
Founded 2020

Description

  • Red-team AI models and agents by designing and executing adversarial tests such as jailbreak attempts, prompt injections, misuse scenarios, and exploit strategies.
  • Generate high-quality human evaluation data by annotating model failures, classifying vulnerabilities, and identifying systemic risks.
  • Apply structured testing methodologies using taxonomies, benchmarks, and playbooks to ensure consistent and comprehensive evaluations.
  • Document findings clearly and reproducibly, producing reports, datasets, and adversarial test cases that engineering and safety teams can act upon.
  • Work across multiple projects and AI systems, adapting testing objectives and scope to different evaluation needs and customers.
  • Proactively identify potential risks and failure modes before deployment through systematic probing of models.
  • Perform all testing and reviews using text-based inputs and outputs, and optionally review outputs referencing sensitive topics (with support).

Requirements

  • Prior red-teaming experience such as adversarial AI testing, cybersecurity, or socio-technical risk analysis.
  • Demonstrated adversarial mindset with the ability to explore ways to push systems to their limits and uncover weaknesses.
  • Comfort with structured methodologies and using frameworks, taxonomies, benchmarks, and playbooks rather than ad-hoc testing.
  • Ability to communicate risks and vulnerabilities clearly to both technical and non-technical audiences.
  • Ability to manage and contribute across multiple projects and adapt to new evaluation challenges.
  • Nice-to-have: expertise in adversarial machine learning (jailbreak datasets, prompt injection attacks, RLHF/DPO vulnerabilities, model extraction).
  • Nice-to-have: cybersecurity skills (penetration testing, exploit development, reverse engineering) or socio-technical risk analysis experience (harassment/misinformation testing, abuse pattern analysis).
  • Nice-to-have: creative adversarial thinking from disciplines like psychology, acting, or writing that support unconventional attack strategies.
  • Must be able to engage as an independent contractor and work fully remotely.
  • Candidates requiring H-1B or STEM OPT sponsorship cannot be supported for this role.

Benefits

  • Compensation range: $50–$111 per hour (varies by project, customer requirements, expertise, and content sensitivity).
  • Fully remote, flexible schedule with projects that may be extended, shortened, or concluded based on needs and performance.
  • Weekly payments issued via Stripe or Wise based on services rendered.
  • All work is text-based; participation in higher-sensitivity projects is optional and supported with clear guidelines and wellness resources.
  • Opportunity to contribute directly to frontier work in AI safety and adversarial testing and gain hands-on experience with human data-driven evaluation methodologies.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

AI Solution Strategist

Nice Côte d'Azur Hotels, Restaurants & Leisure

Individual contributor role at NiCE responsible for defining, designing, and delivering conversational AI agents that power high-volume live customer interactions while partnering with product, engineering, and go-to-market teams to drive adoption and outcomes.

12 hours, 29 minutes ago

AI Automation Specialist

Pavago IT Services

AI Automation Specialist at Pavago (remote, U.S. business hours) responsible for designing, building, and maintaining AI-powered automation and integrations to streamline operations and reduce manual work across multiple business functions.

Asana ClickUp HubSpot JavaScript Monday.com Notion Python Salesforce
12 hours, 29 minutes ago

Delta Crateris - AI Content Evaluator - French (Canada)

Welocalize 1K-5K Professional Services

Freelance AI Content Evaluator (French - Canada) at Welo Data (Welocalize) working remotely to review and annotate content against project guidelines to ensure consistent, high-quality data for AI models.

1 day, 10 hours ago

Delta Crateris - AI Content Evaluator - Bulgarian (Bulgaria)

Welocalize 1K-5K Professional Services

Welo Data is hiring a freelance AI Content Evaluator (Bulgarian) to remotely review and annotate content against defined guidelines, ensuring consistent, high-quality annotations that support AI training and improve user experience.

1 day, 10 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers