We are looking for a visionary Machine Learning Engineer to build the next generation of knowledge extraction engines for Materials Science. In this role, you will go beyond simple text processing to tackle the most challenging aspects of patents and academic papers: complex table reconstruction, experimental workflow synthesis, chemical property mapping, and multimodal chart analysis. You will leverage NER, RAG, post-training to transform massive unstructured scientific literature into high-value, structured R&D databases.
Responsibilities
Data Extraction Pipelines: Develop end-to-end solutions integrating OCR, Layout Analysis, and Semantic Parsing to precisely capture chemical formulas, experimental parameters, and performance metrics from complex documents. Resolve cross-modal data alignment between body text and complex scientific visuals (e.g., tables, chemical structures, and charts).
RAG Development: Partner with Data and Agent teams to design and implement robust RAG (hybrid search, ranking optimization) and search architectures, ensuring highly relevant document retrieval for complex scientific queries.
Model Post-training: Utilize post-training skills to enhance LLMs' reasoning capabilities and understanding on Materials Science data extraction.
Qualifications
Minimum Qualifications
Bachelor's degree or higher in Computer Science, Artificial Intelligence, or a related quantitative field.
Proficient in leveraging AI-enhanced development tools such as Claude Code, solid programming skills in Python with a focus on algorithmic implementation.
3+ years of hands-on experience in NLP/LLM/RAG, with a proven track record of deploying complex information extraction systems in production.
Excellent communication skills with the ability to understand product requirement, solve data/RAG related challenges within a cross-functional team.
Preferred Qualifications
RAG & Data: Deep experience in high-quality data processing and building production-grade RAG systems (including vector databases, hybrid search, and re-ranking).
Model Refinement: Proven ability to enhance extraction accuracy through post-training techniques, including NER, prompt engineering, and instruction fine-tuning.
Domain Expertise: Familiarity with Physics, Chemistry, or Biology (e.g., understanding of molecular structures or material properties) is a significant plus.
Estes cookies são necessários para o funcionamento do sítio Web e não podem ser desactivados nos nossos sistemas. Pode configurar o seu browser para bloquear estes cookies, mas nesse caso algumas partes do sítio Web poderão não funcionar.
Segurança
Experiência do utilizador
Cookies orientados para o grupo-alvo
Estes cookies são instalados no nosso sítio Web pelos nossos parceiros publicitários. Podem ser utilizados por estas empresas para definir o perfil dos seus interesses e mostrar-lhe publicidade relevante noutro local.
Google Analytics
Anúncios do Google
Utilizamos cookies
🍪
O nosso sítio Web utiliza cookies e tecnologias semelhantes para personalizar o conteúdo, otimizar a experiência do utilizador e para individualizar e avaliar a publicidade. Ao clicar em OK ou ao ativar uma opção nas definições de cookies, está a concordar com isto.
Os melhores empregos à distância por correio eletrónico
Junte-se a mais de 5'000 pessoas que recebem alertas semanais com empregos remotos!