IPL Data Product Case Study

Using Data to Drive Product Strategy

My Role: Product Manager, Data Analyst

IPL Dataset from Kaggle

Overview

MY ROLE

Product Manager, Data Analyst

DURATION

2 weeks

TOOLKIT

Python (Pandas), SQL, Jupyter Notebook, Plotly

TEAM

Individual project

KEY RESULTS

Identified actionable insights for team strategy and player performance optimization

THE PRODUCT

An internal insights dashboard for strategic stakeholders (e.g., team owners, fantasy league managers, broadcasters)

The 'Why': The Product Problem

"As a PM, my first step is to identify the user and the problem."

The Indian Premier League (IPL) is a massive, data-rich entity. Stakeholders like franchise owners and fantasy league PMs are all 'users' who need to make high-stakes, data-driven decisions.

IPL Data Inspection showing matches.info() and deliveries.info() output

As a Product Manager, I can't build a strategy on flawed data. My first step was to perform a deep inspection of the raw dataset to assess its reliability.

As you can see in the inspection report on the left, the 'data product' was broken. The .info() command revealed critical errors:

Missing Data:

Columns like city and winner were missing hundreds of values (you can see the non-null count is lower than the total entries).

Wrong Data Types:

The date column was an object (text), not a real date. This makes time-series analysis impossible.

Before any insights could be trusted, I had to perform Data Engineering to fix these foundational flaws and build a reliable product.

The 'How': Engineering a Trusted Data Product

The raw dataset had a critical integrity flaw: Inconsistent Business Logic.

Over 13 years, IPL franchises re-branded (e.g., 'Delhi Daredevils' became 'Delhi Capitals'). The raw data treated these as completely different teams. Any analysis of 'most wins' or 'win rates' would have been factually incorrect.

My Solution: I engineered a standardization logic (using a Python mapping dictionary) to merge these historical entities. This transformed fragmented rows into a single source of truth, ensuring all downstream insights were accurate and reliable.

Team name mapping code showing before and after standardization

The 'Proof': SQL & Technical Independence

A strong Product Manager shouldn't be blocked by data access. While I used Python for the heavy lifting, I effectively used SQL to validate my findings and explore the data independently.

This section demonstrates three critical technical competencies:

Complex Joins (INNER JOIN): Raw data is often siloed. To analyze player performance by season, I programmatically merged two distinct datasets: the granular ball-by-ball data (deliveries) and the match metadata (matches).

Aggregation Logic (GROUP BY): I didn't just look at rows; I defined metrics. The code demonstrates aggregating thousands of rows to calculate high-level KPIs like 'Total Team Wins' and 'Average Margins'.

Data Validation (QA): I used SQL as a 'sanity check'. By deriving the same insights using two different languages, I ensured the final product metrics were 100% accurate.

SQL queries demonstrating INNER JOIN and GROUP BY operations

The 'What': Shipping Actionable Insights

Clean data and SQL queries are just the means to an end. The goal of a Product Manager is to deliver insights that drive strategy.

I used Plotly to build interactive dashboard components that answer specific user questions:

The Myth:

'Winning the toss guarantees a win.'

The Data Truth:

My analysis revealed a mere 50.5% correlation league-wide.

The Strategic Pivot:

The heatmap revealed that the 'Chase' advantage is highly venue-dependent (e.g., massive advantage at Wankhede, neutral at Chennai). This transforms a generic hunch into a venue-specific game plan.

Interactive Plotly dashboard showing venue-specific toss win correlation heatmap

Key PM Takeaways

Treat Data as a Product Asset: Algorithms are useless if the foundation is broken. By engineering the fix for historical team re-branding, I ensured that every strategic decision we made was built on truth, not just noise.

Don't Wait for Permission: I refuse to be blocked by technical barriers. I switch seamlessly between Python for heavy lifting and SQL for quick validation. This independence lets me answer high-stakes questions without waiting on engineering resources.

Ship Outcomes, Not Just Outputs: I didn't set out to just "build a dashboard." I set out to solve a specific user problem: "Does the toss actually matter?" I transformed raw stats into a clear, venue-specific game plan that gives users a competitive edge.