- Senior
 - Escritório em Pune
 
Role: Gen AI Developer
Total Experience: 6+ years with 2+ years working on GenAI initiatives
Employment Type: Permanent & Full time
Working Model: Hybrid (3 days work from office)
Job Summary:
We are seeking a Senior AI Developer with proven expertise in Generative AI technologies, a solid foundation in machine learning, and a strong understanding of data governance. The ideal candidate will have hands-on experience with both cloud-based LLM platforms, on-premise, open-source LLMs like Ollama, Llama.cpp, and GGUF-based models. You should also have good knowledge in Model Context Protocol (MCP). You will help architect and implement GenAI-powered products that are secure, scalable, and enterprise-ready.
Key Responsibilities:
- Design, build, and deploy GenAI solutions using both cloud-hosted and on-prem LLMs.
 - Work with frameworks like Hugging Face, LangChain, LangGraph, LlamaIndex to enable RAG and prompt orchestration.
 - Implement private LLM deployments using tools such as Ollama, LM Studio, llama.cpp, GPT4All, and vLLM.
 - Design retrieval-augmented generation (RAG) pipelines with context-aware orchestration using MCP.
 - Implement and manage Model Context Protocol (MCP) for dynamic context injection, chaining, memory management, and secure prompt orchestration across GenAI workflows.
 - Fine-tune open-source models for specific enterprise tasks and optimize inference performance.
 - Integrate LLMs into real-world applications via REST, gRPC, or local APIs.
 - Ensure secure data flows and proper context management in RAG pipelines.
 - Collaborate across data, product, and infrastructure teams to operationalize GenAI.
 - Incorporate data governance and responsible AI practices from design through deployment.
 
Required Skills and Qualifications:
- 6+ years of experience in AI/ML; 2+ years working on GenAI initiatives.
 - Experience with OpenAI, Claude, Gemini, cloud based LLMs (AWS/GCP/Azure) and open-source LLMs like Mistral, LLaMA 2/3, Falcon, Mixtral.
 - Strong hands-on expertise with on-premise LLM frameworks (Ollama, llama.cpp, GGUF models, etc.)
 - Hands-on experience with Model Context Protocol (MCP) for structured prompt orchestration, context injection and tool execution.
 - Proven experience in building and optimizing Retrieval-Augmented Generation (RAG) pipelines, including document chunking, embedding generation, and vector search integration.
 - Proficiency in Python and libraries such as Transformers, Hugging Face, LangChain, and PyTorch.
 - Experience with embedding models and vector DBs (FAISS, Pinecone, Weaviate, Qdrant, etc.)
 - Familiarity with MLOps, GPU optimization, containerization, and deployment in secure environments.
 - Good understanding of data governance—access control, lineage, auditability, privacy.
 
Nice to Have:
- Exposure to multi-modal models (image, speech) and toolformer-style agents
 - Experience integrating AI into enterprise platforms (e.g., ServiceNow, Salesforce, Jira)
 - Awareness of inference acceleration tools (vLLM, DeepSpeed, TensorRT)