Mohamed SeifData Scientist & ML Engineer

Building smarter systems.
Data scientist specializing in ML systems, NLP, and production AI. Based in 6 October City, Giza, Egypt.
About
I'm Mohamed Seif, a data scientist and ML engineer based in 6 October City, Giza, Egypt. I build end-to-end machine learning systems that go beyond notebooks, production pipelines, explainable models, and LLM-powered applications that solve real business problems.
My recent work includes KARNA, a production-grade used-car pricing platform with ensemble ML, SHAP explainability, and a bilingual Arabic/English assistant; a GIS data science pipeline for satellite and multispectral imagery; and retail demand forecasting across 1,115 stores with XGBoost and deep-learning benchmarks.
I graduated from Helwan University with a B.Sc. in Computer Science & AI (GPA 3.51/4.0, Excellent). I work in Arabic and English, and I'm open to remote freelance work in data science and ML engineering.
Skills
Machine & Deep Learning
XGBoost / LightGBM
Quantile regression, SHAP, Optuna tuning
Scikit-learn
Classical ML workflows, preprocessing, evaluation
TensorFlow / Keras
ANNs, RNNs, and deep learning experiments
Ensemble Learning
Tree-based modeling and model combination strategies
Model Evaluation
Cross-validation, holdout design, leakage control
SHAP
Explainability and prediction-level feature attribution
GenAI & LLMs
LangChain
LLM workflows, RAG, conversational agents
Prompt Engineering
Few-shot, chain-of-thought, structured outputs
Conversational Agents
Memory, tool calling, and fallback handling
Structured Outputs
Pydantic schemas and JSON-safe responses
OpenAI API
LLM integrations and application orchestration
Tool Integration
Connecting LLMs to retrievers and custom tools
Data Engineering
Pandas / NumPy
Data wrangling, preprocessing, numerical analysis
SQL
PostgreSQL, Supabase, analytical queries
EDA
Distribution analysis, missingness, anomaly detection
ETL Pipelines
Reproducible data preparation and transformation flows
Synthetic Data
Scenario generation and statistically aligned datasets
Web Scraping
Selenium-based collection and structured extraction
Tools & Deployment
FastAPI
Production ML services and REST endpoints
Docker
Multi-stage builds and containerized deployments
Streamlit
Rapid dashboards and interactive model demos
PostgreSQL / Supabase
Application data storage and managed backends
Git
Version control and collaborative development
Linux
Remote environments, servers, and CLI workflows
Programming Foundations
Java
Spring Boot fundamentals and backend development
C / C++
Problem solving, algorithms, and low-level fundamentals
Projects
KARNA — Used-Car Pricing Platform with GenAI Assistant
Bilingual Arabic/English conversational AI assistant and quantile ML pricing engine for the Egyptian used-car market, delivering fair price estimates with negotiation ranges and SHAP explainability.
End-to-end time-series forecasting pipeline for 1,115 retail stores, benchmarking XGBoost against a Deep Learning MLP to achieve R² of 0.89 on holdout.
Experience
Data Scientist
Datalentech
Mar 2026 – Present
- Built an end-to-end synthetic data generation pipeline using non-linear growth models and stochastic calibration, producing operationally constrained, reproducible synthetic datasets statistically indistinguishable from real observations.
- Implemented a Gaussian copula-based scenario sampler with per-stratum latent-factor priors and a Monte Carlo forward simulation engine supporting configurable scenario flags, constraint enforcement, and state propagation across chained cycles for multi-scenario forecasting.
- Engineered a 3-layer Python architecture (pure math → cycle orchestration → forward simulation) with 20+ focused modules, frozen dataclasses, deterministic reproducibility, and schema versioning.
- Developed a constraint enforcement engine satisfying 9+ physical and business invariants to ensure synthetic trajectory validity.
Generative AI Intern
NVIDIA & ITI
Aug – Sep 2025
- Built LLM workflows with LangChain LCEL, applying advanced prompt engineering (few-shot, chain-of-thought, Pydantic outputs).
- Developed conversational agents with tool integration and memory management.
GIS Data Scientist
Robone
Apr – Aug 2025
- Processed satellite imagery and multispectral data using GIS tools (GDAL, Rasterio) for geospatial preprocessing and analysis.
- Built REST APIs using FastAPI to connect ML data pipelines with backend systems.
- Designed PostgreSQL database architecture with RBAC & ACL access controls on Supabase.
- Integrated LLM tooling with DeepSeek on remote Linux servers.
- Automated reporting workflows using Python (ReportLab) and collaborated across teams using Git and Agile Scrum (Jira).
Education
University
B.Sc. Computer Science & AI
Helwan University, Cairo, Egypt
2026
- GPA: 3.51 / 4.0 (Excellent).
- Key Coursework: Data Science (A+), Big Data Technologies (A+), Data Mining (A), Business Intelligence & Data Analytics (A), Probability & Statistics (A+), Software Engineering (A+), Algorithms (B+), Artificial Intelligence (B), Database Systems (B+).
- Graduation Project: KARNA — a used-car pricing platform with ensemble ML pricing, SHAP explainability, and a bilingual Arabic/English AI assistant.
Certificates
Data Analyst in Python
DataCamp — Feb 2026
Industry-recognized career track covering the full data analysis workflow in Python — importing, cleaning, and manipulating data with pandas and NumPy, statistical analysis and hypothesis testing, and data visualization with Matplotlib and Seaborn. Completed through hands-on exercises on real-world datasets.
View credential →Building LLM Applications With Prompt Engineering
NVIDIA Deep Learning Institute (DLI) — Sep 2025
Hands-on NVIDIA DLI workshop (intermediate level, GPU-accelerated cloud environment) covering advanced prompt engineering techniques — iterative prompts, few-shot, chain-of-thought, system messages, and streaming/batching. Built composable LLM workflows with LangChain LCEL, deployed Llama 3.1 via NVIDIA NIM, and developed an agentic tool integration mini-project with structured Pydantic outputs.
View credential →IBM Certified Data Scientist
Digital Egypt Pioneers Initiative (DEPI) — MCIT, in partnership with IBM — Feb 2025
Government-backed, IBM-designed data science certification track under Egypt's national digital transformation initiative. Curriculum covers Python (Pandas, NumPy, Scikit-learn, TensorFlow), SQL, supervised and unsupervised machine learning, EDA, predictive modeling, and data visualization. Completed with a capstone project demonstrating end-to-end data science skills.
View credential →Problem Solving — Honorable Mention (ECPC 2024)
ICPC — International Collegiate Programming Contest — Jul 2024
Achieved Honorable Mention at the Egyptian Collegiate Programming Contest (ECPC) — Egypt's official ICPC regional qualifying round and one of the most competitive university programming competitions in Africa and the Arab world. Demonstrates strong algorithmic thinking, data structures proficiency, and problem-solving under pressure.
View credential →InnovEgypt — Innovation & Entrepreneurship
TIEC (Technology Innovation and Entrepreneurship Center) — Feb 2025
Certificate of acknowledgment from Egypt's national innovation program under ITIDA/TIEC, recognizing completion of an entrepreneurship and business development track covering team leadership, ideation, and building technology ventures.
FAQ
I'm Mohamed Seif, an Egyptian data scientist and ML engineer based in 6 October City, Giza, Egypt. I build production machine learning systems from quantile pricing models and demand forecasting to synthetic data generation and LLM-powered assistants with multi-provider fallback chains. I work in both Arabic and English.
I'm open to freelance work in data science and ML engineering — predictive modeling, time-series forecasting, data pipelines, and LLM/RAG applications. The best way to discuss a specific project is to email me at mohamedseif.a1@gmail.com.
My strength is taking ML from notebook to production, not just training a model, but engineering reproducible pipelines, explainability, and serving infrastructure around it. KARNA is a good example: a real platform with a quantile pricing engine, SHAP explainability, monitoring, and a bilingual AI assistant, all containerized and production-ready.
KARNA is a used-car pricing platform for the Egyptian market. I built two core services: an ML pricing engine using e XGBoost and LightGBM with three-quantile predictions (q10/q50/q90) for negotiation ranges, SHAP explainability, and a model registry with hot-reload; and a bilingual Arabic/English AI assistant on a multi-provider LLM fallback chain with fuzzy car-name matching. The full stack runs across 5 Docker services.
At Datalentech I work as a junior Data Scientist, but I can't talk about any projects cause I keep work-specific details confidential, but I'm happy to talk about the techniques/skills I use at a high level.
I build LLM applications with LangChain (LCEL), applying prompt engineering techniques like few-shot, chain-of-thought, and structured Pydantic outputs, and tool integration. I completed a Generative AI internship with NVIDIA & ITI and an NVIDIA DLI workshop on building LLM apps with prompt engineering, and I shipped a bilingual assistant in KARNA.
Python first: Ensample models, tree-based models, Scikit-learn for classical ML, TensorFlow/Keras for neural networks, and SHAP for explainability, with Optuna for tuning. For deployment I use FastAPI, Docker, and Streamlit, with PostgreSQL/Supabase for data. For LLM work I use LangChain with structured outputs and multi-provider fallback chains.
I hold a B.Sc. in Computer Science & AI from Helwan University (GPA 3.51/4.0, Excellent), graduated June 2026, my graduation project is KARNA (Used cars dynamic pricing engine). I'm also an IBM Certified Data Scientist through Egypt's DEPI initiative, hold an NVIDIA DLI certificate in LLM applications, certified python data analyst from Datacamp, and earned an Honorable Mention at the ECPC (ICPC regional) programming contest.
I've worked in GIS and geospatial data science processing satellite and multispectral imagery, in poultry, in the automotive domain through KARNA, and in synthetic data generation for multi-scenario forecasting. And I am ready to adabt quickly to any new domain or data type.
The best way is email: mohamedseif.a1@gmail.com. You can also find me on LinkedIn via the links on this site. Send me a short note about what you're working on and I'll get back to you.
Let's work together
Open to freelance work in data science and ML engineering. Reach out if you have a project in mind.