My Career
Work Experience
4+ years of building AI systems, ML pipelines, and data solutions across various industries.
Consultant Data Scientist
Nature&Decouvertes
Design and deployment of a hybrid recommendation system (collaborative filtering, content-based filtering, and hybrid approaches) to enhance user experience, with large-scale production deployment in a cloud environment.
Key Achievements
- Designed and implemented a recommendation engine in Python combining collaborative filtering, content-based filtering, and hybrid approaches.
- Built real-time user data ingestion and processing pipelines using Apache Kafka and Spark MLlib.
- Performed large-scale data preparation, cleaning, and transformation using Pandas and Spark, with advanced indexing and search capabilities via Elasticsearch.
- Developed, trained, and optimized Machine Learning and Deep Learning models using Scikit-learn, TensorFlow, and PyTorch.
- Led industrialization and MLOps practices, including Docker containerization, experiment tracking with MLflow, CI/CD integration, monitoring, and deployment on AWS SageMaker.
- Explored and integrated Large Language Models (LLMs) to enrich the recommendation system.
- Automatic generation of personalized content tailored to user profiles.
- Results : Improved recommendation relevance with an 18% increase in Click-Through Rate (CTR). Reduced inference latency by 35% through model optimization.Achieved reliable and scalable production deployment through MLOps and cloud infrastructure (AWS).
Technologies
Research Data Scientist
UPEC
Applied R&D project focused on designing and optimizing NLP models capable of automatically detecting abusive language, toxic speech, and hate content on social media platforms. The project emphasized memory and latency optimization, as well as industrialization for production use.
Key Achievements
- Designed and developed NLP models based on Transformer architectures (BERT, RoBERTa) for abusive content classification.
- Fine-tuned Large Language Models (LLMs) on domain-specific corpora to improve detection accuracy and robustness to informal language and social media–specific contexts.
- Applied model distillation techniques to reduce model size and accelerate inference while maintaining high performance.
- Built end-to-end NLP pipelines, including data cleaning, preprocessing, tokenization, embeddings, and vectorization.
- Developed Retrieval-Augmented Generation (RAG) prototypes combining LLMs with document retrieval systems (vector databases and Elasticsearch) to: Improve classification accuracy. Provide contextual explanations of model decisions to support Explainable AI (XAI).
- Results : Improved abusive language detection performance with a 15% increase in F1-score on specialized datasets. Delivered an explainable RAG prototype enabling moderators to better understand the context behind model decisions.
Technologies
AI & Data Consultant
Deloitte
Contributed to two major innovation projects supporting the adoption of retail solutions provided by OCADO and RELEX, with the goal of improving pre-sales efficiency, automating customer interactions, and increasing the effectiveness of data and ML workflows in a cloud environment.
Key Achievements
- Designed and implemented a full end-to-end MLOps architecture, including automated data collection, continuous model deployment via CI/CD pipelines (GitLab CI, Jenkins), and proactive monitoring of model performance in production.
- Developed an intelligent AI-powered pre-sales assistant leveraging advanced Prompt Engineering techniques and Large Language Models (LLMs) to analyze customer needs and automate personalized responses.
- Designed and managed a structured cloud-based Data Warehouse (AWS Redshift, BigQuery), including schema optimization and integration through automated ELT pipelines (Talend, Airflow).
- Implemented robust data quality controls, including KPI definition and monitoring, automated validation processes, and anomaly detection using Python and BI tools (Power BI).
- Orchestrated and containerized ML workflows and data pipelines using Docker and Kubernetes, ensuring scalability, reliability, and high availability.
- Collaborated within an Agile (AgileSafe) environment, actively participating in sprints, reviews, iterative planning, and continuous delivery aligned with business requirements.
- Results : Increased pre-sales productivity through the AI assistant, achieving a 30% reduction in customer request processing time and improved recommendation accuracy.
- Results : Enhanced data quality and reporting through automated KPIs and anomaly alerts, ensuring compliance and consistency for business decision-making.
Technologies
R&D Data Scientist (Internship)
SiliconeSignal Technologies
Satellite image classification and segmentation project aimed at identifying and localizing buildings and other infrastructures from Sentinel-2 data, leveraging advanced Deep Learning approaches.
Key Achievements
- Analyzed and preprocessed Sentinel-2 imagery, including radiometric correction, normalization, patch extraction, and creation of deep learning–ready datasets.
- Conducted benchmarking and evaluation of existing methods and research projects for satellite image classification.
- Compared traditional machine learning and deep learning approaches, including CNNs, autoencoders, and Transformers, for image classification and segmentation tasks.