Netflix Insights APP DA

Explore the Netflix content universe through data — genre trends, rating distributions, global reach, and content evolution.

Topics: machine-learning · deep-learning · neural-networks · recommendation-system · data-science · collaborative-filtering · content-analytics · exploratory-data-analysis · pandas · streamlit

Overview

This exploratory data analysis (EDA) dashboard brings the Netflix content catalogue to life through a suite of interactive visualisations built on Streamlit and Plotly. Working with the publicly available Netflix Titles dataset (available on Kaggle), it provides a multi-dimensional view of Netflix's content strategy: how the library has grown over time, how content is distributed across countries, genres, and ratings, and what patterns emerge when the data is sliced by content type (Movies vs. TV Shows).

The dashboard goes beyond simple bar charts. A choropleth world map visualises content volume by country of origin. A treemap breaks down the genre hierarchy. A timeline animation shows how the content mix has shifted year-over-year from 2010 to the present. Word clouds of titles and descriptions reveal recurring themes. And a full-text search engine allows any title to be located and its metadata displayed immediately.

The application is structured as a multi-page Streamlit app with a navigation sidebar, making it easy to move between the overview, genre analysis, geographic analysis, rating analysis, and content search modules.

Motivation

Public datasets about streaming platforms offer a rare window into the content strategy of one of the world's largest media companies. This project was built to demonstrate what a thoughtful EDA can reveal beyond what a data dictionary describes — and to provide a reusable, visually compelling template for content catalogue analysis that any streaming platform dataset can be adapted to.

Architecture

Netflix Titles CSV
        │
  pandas: cleaning, type casting, null handling
        │
  Feature Engineering:
  - Genre list explosion (multi-label → one-hot)
  - Year extraction from date_added
  - Duration parsing (min for movies, seasons for shows)
        │
  Plotly + WordCloud Visualisation Layer
        │
  Streamlit Multi-Page Dashboard
  (Overview | Genre | Geography | Ratings | Search)

Features

Content Overview Dashboard

High-level KPIs: total titles, movies vs. TV shows split, most recent additions, average movie duration, and median TV seasons — all on a single summary page.

Genre Distribution Treemap

Interactive Plotly treemap of genre hierarchy, with area proportional to content count and colour intensity representing average content rating.

Country of Origin Choropleth

World map heatmap of Netflix content volume by country of production, with hover tooltips showing top titles per country.

Yearly Content Growth Timeline

Animated bar chart racing through years 2010–2023, showing the accelerating growth of the Netflix library by content type.

Rating Distribution Analysis

Stacked bar charts of content rating (TV-MA, TV-14, PG-13, PG, G, etc.) broken down by content type and production decade.

Duration Distribution Plot

Kernel density estimate (KDE) and histogram of movie runtimes (minutes) and TV show season counts, with percentile markers.

Title and Description Word Cloud

Word cloud of most frequent terms in title and description fields, separately for Movies and TV Shows.

Full-Text Search Engine

Case-insensitive, multi-field search across title, director, cast, and description, with instant results and expandable metadata cards.

Tech Stack

Library / Tool	Role	Why This Choice
Streamlit	Multi-page dashboard	Navigation sidebar and page routing
pandas	Data wrangling	CSV loading, cleaning, feature engineering
Plotly	Interactive charts	Choropleth, treemap, timeline animation, KDE
WordCloud	Text visualisation	Title and description word clouds
NumPy	Statistical computation	Percentile and distribution calculations
Seaborn (optional)	Static plots	Rating distribution heatmaps

Key packages detected in this repo: xlsxwriter · streamlit · pandas · plotly · google-generativeai · numpy · seaborn · matplotlib · wordcloud · scikit-learn

Getting Started

Prerequisites

Python 3.9+ (or Node.js 18+ for TypeScript/JS projects)
pip or npm package manager
Relevant API keys (see Configuration section)

Installation

git clone https://github.com/Devanik21/Netflix-Insights-App-DA-.git
cd Netflix-Insights-App-DA-
python -m venv venv && source venv/bin/activate
pip install streamlit pandas plotly wordcloud numpy seaborn
# Download dataset from Kaggle and place as netflix_titles.csv
streamlit run app.py

Usage

streamlit run app.py

# Generate standalone report
python generate_report.py --data netflix_titles.csv --output report.html

# Update dataset
python update_data.py  # fetches latest Kaggle version if API key set

Configuration

Variable	Default	Description
`DATA_PATH`	`netflix_titles.csv`	Path to Netflix titles CSV dataset
`MIN_YEAR`	`2010`	Earliest year to include in timeline analysis
`TOP_N_GENRES`	`15`	Number of top genres shown in treemap
`WORDCLOUD_MAX_WORDS`	`200`	Maximum words in word cloud

Copy .env.example to .env and populate all required values before running.

Project Structure

Netflix-Insights-App-DA-/
├── README.md
├── requirements.txt
├── app.py
├── .devcontainer/devcontainer.json
├── netflix_analysis.csv
├── netflix_dataset_100k.csv
└── ...

Roadmap

TMDB API integration for poster images and additional metadata per title
Sentiment analysis of title descriptions using a fine-tuned BERT model
Content recommendation engine based on genre and description similarity
Comparison mode: Netflix vs. Prime Video vs. Disney+ catalogue analysis
Trend forecasting: predict genre popularity for the next content acquisition cycle

Contributing

Contributions, issues, and feature requests are welcome. Please:

Fork the repository
Create a feature branch (git checkout -b feature/your-feature)
Commit your changes (git commit -m 'feat: add your feature')
Push to your branch (git push origin feature/your-feature)
Open a Pull Request

Please follow conventional commit messages and ensure any new code is documented.

Notes

Data sourced from the publicly available Netflix Titles dataset on Kaggle. The dataset may not reflect the current live Netflix catalogue as it is periodically updated by its maintainers.

Author

Devanik Debnath
B.Tech, Electronics & Communication Engineering
National Institute of Technology Agartala

License

This project is open source and available under the MIT License.

Crafted with curiosity, precision, and a belief that good software is worth building well.

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
.devcontainer		.devcontainer
.gitattributes		.gitattributes
1.jpg		1.jpg
2.jpg		2.jpg
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
app.py		app.py
netflix_analysis.csv		netflix_analysis.csv
netflix_dataset_100k.csv		netflix_dataset_100k.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Netflix Insights APP DA

Overview

Motivation

Architecture

Features

Content Overview Dashboard

Genre Distribution Treemap

Country of Origin Choropleth

Yearly Content Growth Timeline

Rating Distribution Analysis

Duration Distribution Plot

Title and Description Word Cloud

Full-Text Search Engine

Tech Stack

Getting Started

Prerequisites

Installation

Usage

Configuration

Project Structure

Roadmap

Contributing

Notes

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Netflix Insights APP DA

Overview

Motivation

Architecture

Features

Content Overview Dashboard

Genre Distribution Treemap

Country of Origin Choropleth

Yearly Content Growth Timeline

Rating Distribution Analysis

Duration Distribution Plot

Title and Description Word Cloud

Full-Text Search Engine

Tech Stack

Getting Started

Prerequisites

Installation

Usage

Configuration

Project Structure

Roadmap

Contributing

Notes

Author

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages