- Senior
- Escritório em Hyderabad
Strong knowledge of testing frameworks (e.g., PyTest, JUnit, TestNG) and bug tracking tools (e.g., Jira).
Proficiency in Python or other scripting languages for test automation.
Familiarity with REST APIs, CI/CD pipelines, and version control (Git).
Understanding of machine learning concepts and Gen AI model behaviors (LLMs, transformers, etc.).
Key Responsibilities
Design, develop, and execute manual and automated test plans for Gen AI applications (e.g., LLM-based chatbots, content generators, recommendation systems).
Test end-to-end functionality including model input/output validation, API integration, UI/UX, and performance benchmarks.
Identify, document, and track bugs and anomalies, especially in AI-generated outputs (hallucinations, bias, toxicity, etc.).
Collaborate with ML engineers to validate model quality, data preprocessing pipelines, and evaluation metrics (e.g., BLEU, ROUGE, F1, precision/recall).
Conduct model regression testing when new training data or fine-tuning is introduced.
Develop and maintain automated test frameworks for APIs and web interfaces using tools like Selenium, Postman, PyTest, etc.
Assist in A/B testing, model deployment validation, and monitoring performance in production.
Ensure compliance with responsible AI standards, including fairness, privacy, and explainability.
Required Qualifications
Bachelor's degree in Computer Science, Engineering, or a related field.
3+ years of QA experience in software testing, ideally with exposure to AI/ML projects.
Strong knowledge of testing frameworks (e.g., PyTest, JUnit, TestNG) and bug tracking tools (e.g., Jira).
Proficiency in Python or other scripting languages for test automation.
Familiarity with REST APIs, CI/CD pipelines, and version control (Git).
Understanding of machine learning concepts and Gen AI model behaviors (LLMs, transformers, etc.).
Preferred Qualifications
Experience testing LLM applications like ChatGPT, Claude, Gemini, or custom-built models.
Knowledge of prompt engineering, vector databases (e.g., Pinecone, FAISS), or RAG pipelines.
Experience with AI evaluation tools (e.g., LangChain, Weights & Biases, Trulens).
Background in NLP, computer vision, or multimodal AI testing is a plus. Candidatar-se agora