Key Responsibilities
- Design and execute test strategies for GenAI systems (LLMs, RAG pipelines, AI agents, copilots)
- Validate accuracy, relevance, consistency, and factuality of AI-generated outputs
- Perform prompt testing, adversarial testing, and edge-case validation
- Test for hallucinations, bias, toxicity, and harmful content
- Validate retrieval quality in RAG-based systems (chunking, embeddings, relevance)
- Conduct regression testing for model updates, prompt changes, and data refreshes
- Collaborate with data scientists, ML engineers, product managers, and compliance teams
- Define and track AI quality metrics (BLEU, ROUGE, faithfulness, groundedness, latency)
- Automate test cases for GenAI workflows where feasible
Required Skills
- 8+ years of experience in QA, testing, or quality engineering, with hands-on exposure to AI/ML or Generative AI systems
- Experience testing with chatbots, NLP-based applications, and GenAI solutions, including prompt engineering and optimization
- Strong understanding of AI evaluation techniques, including hallucination detection, factual accuracy, bias, and output consistency
- Knowledge of Responsible AI principles, including fairness, transparency, and explainability
- Experience validating data quality with a basic understanding of statistics and AI performance metrics
- Proficient in API testing (REST, JSON) and testing AI model endpoints
- Hands-on experience with test automation tools and scripting (Python preferred)
- Familiarity with the ML lifecycle, model versioning, and regression testing for AI systems
- Exposure to cloud-based AI platforms such as Azure OpenAI, AWS Bedrock, or Google Vertex AI
- Strong foundation in software testing methodologies, including exploratory, negative, and adversarial testing
- Ability to design test cases for non-deterministic AI systems
- Strong analytical and critical-thinking skills, with the ability to objectively assess subjective AI outputs
- Excellent documentation, communication, and collaboration skills, with experience working in Agile / DevOps, cross-functional AI teams
—
—