Mohamed SeifData Scientist & ML Engineer

Mohamed Seif — Data Scientist and ML Engineer.

Building smarter systems.

Data scientist specializing in ML systems, NLP, and production AI. Based in 6 October City, Giza, Egypt.

PythonXGBoostLightGBMNLPLLMsDockerScikit-learn

About

I'm Mohamed Seif, a data scientist and ML engineer based in 6 October City, Giza, Egypt. I build end-to-end machine learning systems that go beyond notebooks, production pipelines, explainable models, and LLM-powered applications that solve real business problems.

My recent work includes KARNA, a production-grade used-car pricing platform with ensemble ML, SHAP explainability, and a bilingual Arabic/English assistant; a GIS data science pipeline for satellite and multispectral imagery; and retail demand forecasting across 1,115 stores with XGBoost and deep-learning benchmarks.

I graduated from Helwan University with a B.Sc. in Computer Science & AI (GPA 3.51/4.0, Excellent). I work in Arabic and English, and I'm open to remote freelance work in data science and ML engineering.

MS

Skills

Machine & Deep Learning

XGBoost / LightGBM

Quantile regression, SHAP, Optuna tuning

Scikit-learn

Classical ML workflows, preprocessing, evaluation

TensorFlow / Keras

ANNs, RNNs, and deep learning experiments

Ensemble Learning

Tree-based modeling and model combination strategies

Model Evaluation

Cross-validation, holdout design, leakage control

SHAP

Explainability and prediction-level feature attribution

GenAI & LLMs

LangChain

LLM workflows, RAG, conversational agents

Prompt Engineering

Few-shot, chain-of-thought, structured outputs

Conversational Agents

Memory, tool calling, and fallback handling

Structured Outputs

Pydantic schemas and JSON-safe responses

OpenAI API

LLM integrations and application orchestration

Tool Integration

Connecting LLMs to retrievers and custom tools

Data Engineering

Pandas / NumPy

Data wrangling, preprocessing, numerical analysis

SQL

PostgreSQL, Supabase, analytical queries

EDA

Distribution analysis, missingness, anomaly detection

ETL Pipelines

Reproducible data preparation and transformation flows

Synthetic Data

Scenario generation and statistically aligned datasets

Web Scraping

Selenium-based collection and structured extraction

Tools & Deployment

FastAPI

Production ML services and REST endpoints

Docker

Multi-stage builds and containerized deployments

Streamlit

Rapid dashboards and interactive model demos

PostgreSQL / Supabase

Application data storage and managed backends

Git

Version control and collaborative development

Linux

Remote environments, servers, and CLI workflows

Programming Foundations

Java

Spring Boot fundamentals and backend development

C / C++

Problem solving, algorithms, and low-level fundamentals

Projects

Graduation Project

KARNA — Used-Car Pricing Platform with GenAI Assistant

Bilingual Arabic/English conversational AI assistant and quantile ML pricing engine for the Egyptian used-car market, delivering fair price estimates with negotiation ranges and SHAP explainability.

PythonFastAPIXGBoost/LightGBMLangChainDockerSHAP
Time Series ForecastingSales Forecasting — Multi-Store Demand Prediction

End-to-end time-series forecasting pipeline for 1,115 retail stores, benchmarking XGBoost against a Deep Learning MLP to achieve R² of 0.89 on holdout.

PythonXGBoostKerasTensorFlowScikit-learnStreamlitPandasNumPyMatplotlibSeaborn

Experience

Data Scientist

Datalentech

Mar 2026 – Present

  • Built an end-to-end synthetic data generation pipeline using non-linear growth models and stochastic calibration, producing operationally constrained, reproducible synthetic datasets statistically indistinguishable from real observations.
  • Implemented a Gaussian copula-based scenario sampler with per-stratum latent-factor priors and a Monte Carlo forward simulation engine supporting configurable scenario flags, constraint enforcement, and state propagation across chained cycles for multi-scenario forecasting.
  • Engineered a 3-layer Python architecture (pure math → cycle orchestration → forward simulation) with 20+ focused modules, frozen dataclasses, deterministic reproducibility, and schema versioning.
  • Developed a constraint enforcement engine satisfying 9+ physical and business invariants to ensure synthetic trajectory validity.

Generative AI Intern

NVIDIA & ITI

Aug – Sep 2025

  • Built LLM workflows with LangChain LCEL, applying advanced prompt engineering (few-shot, chain-of-thought, Pydantic outputs).
  • Developed conversational agents with tool integration and memory management.

GIS Data Scientist

Robone

Apr – Aug 2025

  • Processed satellite imagery and multispectral data using GIS tools (GDAL, Rasterio) for geospatial preprocessing and analysis.
  • Built REST APIs using FastAPI to connect ML data pipelines with backend systems.
  • Designed PostgreSQL database architecture with RBAC & ACL access controls on Supabase.
  • Integrated LLM tooling with DeepSeek on remote Linux servers.
  • Automated reporting workflows using Python (ReportLab) and collaborated across teams using Git and Agile Scrum (Jira).

Education

University

B.Sc. Computer Science & AI

Helwan University, Cairo, Egypt

2026

  • GPA: 3.51 / 4.0 (Excellent).
  • Key Coursework: Data Science (A+), Big Data Technologies (A+), Data Mining (A), Business Intelligence & Data Analytics (A), Probability & Statistics (A+), Software Engineering (A+), Algorithms (B+), Artificial Intelligence (B), Database Systems (B+).
  • Graduation Project: KARNA — a used-car pricing platform with ensemble ML pricing, SHAP explainability, and a bilingual Arabic/English AI assistant.

Certificates

Data Analyst in Python

DataCampFeb 2026

Industry-recognized career track covering the full data analysis workflow in Python — importing, cleaning, and manipulating data with pandas and NumPy, statistical analysis and hypothesis testing, and data visualization with Matplotlib and Seaborn. Completed through hands-on exercises on real-world datasets.

View credential →

Building LLM Applications With Prompt Engineering

NVIDIA Deep Learning Institute (DLI)Sep 2025

Hands-on NVIDIA DLI workshop (intermediate level, GPU-accelerated cloud environment) covering advanced prompt engineering techniques — iterative prompts, few-shot, chain-of-thought, system messages, and streaming/batching. Built composable LLM workflows with LangChain LCEL, deployed Llama 3.1 via NVIDIA NIM, and developed an agentic tool integration mini-project with structured Pydantic outputs.

View credential →

IBM Certified Data Scientist

Digital Egypt Pioneers Initiative (DEPI) — MCIT, in partnership with IBMFeb 2025

Government-backed, IBM-designed data science certification track under Egypt's national digital transformation initiative. Curriculum covers Python (Pandas, NumPy, Scikit-learn, TensorFlow), SQL, supervised and unsupervised machine learning, EDA, predictive modeling, and data visualization. Completed with a capstone project demonstrating end-to-end data science skills.

View credential →

Problem Solving — Honorable Mention (ECPC 2024)

ICPC — International Collegiate Programming ContestJul 2024

Achieved Honorable Mention at the Egyptian Collegiate Programming Contest (ECPC) — Egypt's official ICPC regional qualifying round and one of the most competitive university programming competitions in Africa and the Arab world. Demonstrates strong algorithmic thinking, data structures proficiency, and problem-solving under pressure.

View credential →

InnovEgypt — Innovation & Entrepreneurship

TIEC (Technology Innovation and Entrepreneurship Center)Feb 2025

Certificate of acknowledgment from Egypt's national innovation program under ITIDA/TIEC, recognizing completion of an entrepreneurship and business development track covering team leadership, ideation, and building technology ventures.

FAQ

I'm Mohamed Seif, an Egyptian data scientist and ML engineer based in 6 October City, Giza, Egypt. I build production machine learning systems from quantile pricing models and demand forecasting to synthetic data generation and LLM-powered assistants with multi-provider fallback chains. I work in both Arabic and English.

I'm open to freelance work in data science and ML engineering — predictive modeling, time-series forecasting, data pipelines, and LLM/RAG applications. The best way to discuss a specific project is to email me at mohamedseif.a1@gmail.com.

My strength is taking ML from notebook to production, not just training a model, but engineering reproducible pipelines, explainability, and serving infrastructure around it. KARNA is a good example: a real platform with a quantile pricing engine, SHAP explainability, monitoring, and a bilingual AI assistant, all containerized and production-ready.

KARNA is a used-car pricing platform for the Egyptian market. I built two core services: an ML pricing engine using e XGBoost and LightGBM with three-quantile predictions (q10/q50/q90) for negotiation ranges, SHAP explainability, and a model registry with hot-reload; and a bilingual Arabic/English AI assistant on a multi-provider LLM fallback chain with fuzzy car-name matching. The full stack runs across 5 Docker services.

At Datalentech I work as a junior Data Scientist, but I can't talk about any projects cause I keep work-specific details confidential, but I'm happy to talk about the techniques/skills I use at a high level.

I build LLM applications with LangChain (LCEL), applying prompt engineering techniques like few-shot, chain-of-thought, and structured Pydantic outputs, and tool integration. I completed a Generative AI internship with NVIDIA & ITI and an NVIDIA DLI workshop on building LLM apps with prompt engineering, and I shipped a bilingual assistant in KARNA.

Python first: Ensample models, tree-based models, Scikit-learn for classical ML, TensorFlow/Keras for neural networks, and SHAP for explainability, with Optuna for tuning. For deployment I use FastAPI, Docker, and Streamlit, with PostgreSQL/Supabase for data. For LLM work I use LangChain with structured outputs and multi-provider fallback chains.

I hold a B.Sc. in Computer Science & AI from Helwan University (GPA 3.51/4.0, Excellent), graduated June 2026, my graduation project is KARNA (Used cars dynamic pricing engine). I'm also an IBM Certified Data Scientist through Egypt's DEPI initiative, hold an NVIDIA DLI certificate in LLM applications, certified python data analyst from Datacamp, and earned an Honorable Mention at the ECPC (ICPC regional) programming contest.

I've worked in GIS and geospatial data science processing satellite and multispectral imagery, in poultry, in the automotive domain through KARNA, and in synthetic data generation for multi-scenario forecasting. And I am ready to adabt quickly to any new domain or data type.

The best way is email: mohamedseif.a1@gmail.com. You can also find me on LinkedIn via the links on this site. Send me a short note about what you're working on and I'll get back to you.

Let's work together

Open to freelance work in data science and ML engineering. Reach out if you have a project in mind.