NAWALDEEP SINGH

Data Scientist
Gurugram, IN.

About

Data Scientist with 1.5+ years of experience in ML modeling, SQL analytics, dashboarding and MLOps, driving outcomes through data in fast-paced, cross-functional environments. Improved ETA accuracy by 72%, deployed pipelines with 90ms inference, and built optimized central reporting dashboards with 5+ KPIs to accelerate decision-making.

Work

BluSmart Mobility
|

Data Scientist

Highlights

Built a direct ETA prediction model using XGBoost and S2 cells with feature store; reduced RMSE from 18 to 5 mins and cut driver penalty errors by 12% in production, while maintaining model documentation and audit logs.

Simulated, validated, and deployed the ETA model using XGBoost, gRPC APIs, and server-side S3 loading; achieved 45-55% better accuracy than OSRM and 30% over legacy estimates, while reducing latency from 500ms to 75-90ms.

Collaborated with business and product teams to translate data findings into operational strategy; delivered insights to non-technical stakeholders for rollout and prioritization decisions.

Led a routing model PoC by mapping GPS pings to road segments; planned integration into ETA engine to replace Google ODRD API with projected savings of ~INR 8 Cr/year focused on cost-efficiency and real-time decisioning..

Built interactive dashboards in SQL, Redash and DataBricks to track 5+ KPI metrics on supply-demand balance, delays, and cancellations; reduced weekly reporting effort by 50% and enabled real-time ops interventions.

Designed a home-charging validation logic with geo-fencing, driver-SOC mappings, and payout rules; developed custom ETL scripts to ingest and process raw charging data, ensuring under 5% variance in monthly settlements with partners.

Took initiative to conduct a literature review on battery analytics using LSTM, BERT, and GAN models; aim to forecast degradation trends ahead of summer fleet ramp-up.

LNM Institute of Technology (Under PhD Mentorship)
|

Research Assistant

Highlights

Applied LSTM, GRU, and ARIMA models to forecast Apple stock trends; achieved high predictive accuracy (LSTM: 2.93 RMSE, outperforming ARIMA by 84%)-demonstrating capabilities in forecasting and time-series analysis.

Analyzed performance drop (15%) from 2-layer to 3-layer LSTM/GRU models; observed that 1-layer LSTM over- predicted while GRU underpredicted, motivating a hybrid architecture with dropout to improve generalization.

Facilitated integration of sentiment analysis using FinBERT into a GAN-based prediction model; enhanced model accuracy to 1.827 RMSE with the researchers, improving insight generation by 38% over traditional architectures.

Education

Trinity College Dublin

Bachelor of Engineering in Computer Engineering

Grade: Upper Second-Class Honours CGPA: 9

Thapar University, Academic Excellence Scholarship

Skills

Languages

Python, SQL, Bash, BigQuery, C/C++.

Libraries

Pandas, NumPy, Scikit-learn, XGBoost, PySpark, TensorFlow, Matplotlib, Seaborn.

Analytics

A/B Testing, Exploratory Data Analysis (EDA), Forecasting, Root Cause Analysis, Reporting Automation.

BI

AWS (S3, QuickSight), Google Cloud (BigQuery), MongoDB, PowerBI, Tableau, Looker, Redash, Excel, MySQL.

ML Ops

MLflow, Docker, Git, CI/CD, Model Deployment, Scheduled Pipelines.

Tools

Databricks, Jupyter, VS Code, Linux, Airflow, Notebooks, Slack integration.

Projects

Multi-Source RAG Agent

Summary

Implemented a multi-source NLP driven retrieval system using LangChain, FAISS, and LLMs; improved factual accuracy by 22-26% using RAGAS metrics (faithfulness, relevance, similarity). Build an interactive RAG-powered app with Streamlit and multimodal tools, enabling business teams to retrieve structured insights from unstructured PDFs; designed for reproducibility using MLflow for production deployment.