Senior Data Scientist | Generative AI
Location: Portugal (Remote)
We are hiring a Senior Data Scientist in our Generative AI area in Portugal. This is a Remote – Portugal position, meaning you can work from anywhere within the country. Please note that this role is only open to candidates in Portugal.
YOUR IMPACT
- Multimodal Extraction: Apply state-of-the-art tools (OCR, vision-language models, document understanding frameworks) to interpret diverse input types;
- Prompt Engineering: Develop and refine strategies for using LLMs to extract, summarize, and transform unstructured content into structured formats;
- Data Quality & Structuring: Clean, validate, and transform messy, unstructured data into well-defined schemas ready for use in training or analytics pipelines;
- Content Filtering: Define standards and build systems for cleaning, validating, and filtering data to ensure accuracy, reduce bias, and align with ethical/safety guidelines;
- Human-in-the-Loop Feedback: Design feedback loops where experts validate or enrich data, improving LLM-based extraction reliability;
- Scalability & Optimization: Architect cost-efficient, high-throughput data pipelines that are robust to noisy or incomplete sources;
- Research & Prototyping: Experiment with emerging tools and methods in the LLM + multimodal space, exploring new ways to enhance information coverage and extraction reliability;
- Collaboration: Partner with data engineers and other data scientists to integrate collected data into larger AI and analytics systems;
- Live the mission: inspire and empower others by genuinely caring for your own wellbeing and your colleagues.
WHO YOU ARE
- Master’s degree (or PhD) in Computer Science, Data Science, Machine Learning, Statistics, or a related field; Proficiency in Python and experience with libraries for web scraping, OCR (e.g., Tesseract, EasyOCR), and NLP (e.g., HuggingFace Transformers); Deep understanding of LLM capabilities in multimodal and extraction contexts, including prompt engineering and few-shot learning;
- Strong background in unstructured data processing: APIs, web scraping, HTML parsing, OCR, image/document analysis;
- Strong analytical problem-solving skills, with a track record of turning noisy data into high-quality datasets for ML;
- Excellent communication and documentation skills, with the ability to influence across technical and product teams.
WHAT WE OFFER YOU
- Flexible work models: remote options within Portugal; home office stipend and flexible work allowance
- Paid time off, parental leave, and career growth opportunities
- A culture focused on wellbeing and inclusion
Note: This listing emphasizes remote work within Portugal and is open to candidates residing in the country.