Skip to content

Devanik21/Netflix-Insights-App-DA-

Repository files navigation

Netflix Insights APP DA

Language Stars Forks Author Status

Explore the Netflix content universe through data — genre trends, rating distributions, global reach, and content evolution.


Topics: machine-learning · deep-learning · neural-networks · recommendation-system · data-science · collaborative-filtering · content-analytics · exploratory-data-analysis · pandas · streamlit

Overview

This exploratory data analysis (EDA) dashboard brings the Netflix content catalogue to life through a suite of interactive visualisations built on Streamlit and Plotly. Working with the publicly available Netflix Titles dataset (available on Kaggle), it provides a multi-dimensional view of Netflix's content strategy: how the library has grown over time, how content is distributed across countries, genres, and ratings, and what patterns emerge when the data is sliced by content type (Movies vs. TV Shows).

The dashboard goes beyond simple bar charts. A choropleth world map visualises content volume by country of origin. A treemap breaks down the genre hierarchy. A timeline animation shows how the content mix has shifted year-over-year from 2010 to the present. Word clouds of titles and descriptions reveal recurring themes. And a full-text search engine allows any title to be located and its metadata displayed immediately.

The application is structured as a multi-page Streamlit app with a navigation sidebar, making it easy to move between the overview, genre analysis, geographic analysis, rating analysis, and content search modules.


Motivation

Public datasets about streaming platforms offer a rare window into the content strategy of one of the world's largest media companies. This project was built to demonstrate what a thoughtful EDA can reveal beyond what a data dictionary describes — and to provide a reusable, visually compelling template for content catalogue analysis that any streaming platform dataset can be adapted to.


Architecture

Netflix Titles CSV
        │
  pandas: cleaning, type casting, null handling
        │
  Feature Engineering:
  - Genre list explosion (multi-label → one-hot)
  - Year extraction from date_added
  - Duration parsing (min for movies, seasons for shows)
        │
  Plotly + WordCloud Visualisation Layer
        │
  Streamlit Multi-Page Dashboard
  (Overview | Genre | Geography | Ratings | Search)

Features

Content Overview Dashboard

High-level KPIs: total titles, movies vs. TV shows split, most recent additions, average movie duration, and median TV seasons — all on a single summary page.

Genre Distribution Treemap

Interactive Plotly treemap of genre hierarchy, with area proportional to content count and colour intensity representing average content rating.

Country of Origin Choropleth

World map heatmap of Netflix content volume by country of production, with hover tooltips showing top titles per country.

Yearly Content Growth Timeline

Animated bar chart racing through years 2010–2023, showing the accelerating growth of the Netflix library by content type.

Rating Distribution Analysis

Stacked bar charts of content rating (TV-MA, TV-14, PG-13, PG, G, etc.) broken down by content type and production decade.

Duration Distribution Plot

Kernel density estimate (KDE) and histogram of movie runtimes (minutes) and TV show season counts, with percentile markers.

Title and Description Word Cloud

Word cloud of most frequent terms in title and description fields, separately for Movies and TV Shows.

Full-Text Search Engine

Case-insensitive, multi-field search across title, director, cast, and description, with instant results and expandable metadata cards.


Tech Stack

Library / Tool Role Why This Choice
Streamlit Multi-page dashboard Navigation sidebar and page routing
pandas Data wrangling CSV loading, cleaning, feature engineering
Plotly Interactive charts Choropleth, treemap, timeline animation, KDE
WordCloud Text visualisation Title and description word clouds
NumPy Statistical computation Percentile and distribution calculations
Seaborn (optional) Static plots Rating distribution heatmaps

Key packages detected in this repo: xlsxwriter · streamlit · pandas · plotly · google-generativeai · numpy · seaborn · matplotlib · wordcloud · scikit-learn


Getting Started

Prerequisites

  • Python 3.9+ (or Node.js 18+ for TypeScript/JS projects)
  • pip or npm package manager
  • Relevant API keys (see Configuration section)

Installation

git clone https://github.com/Devanik21/Netflix-Insights-App-DA-.git
cd Netflix-Insights-App-DA-
python -m venv venv && source venv/bin/activate
pip install streamlit pandas plotly wordcloud numpy seaborn
# Download dataset from Kaggle and place as netflix_titles.csv
streamlit run app.py

Usage

streamlit run app.py

# Generate standalone report
python generate_report.py --data netflix_titles.csv --output report.html

# Update dataset
python update_data.py  # fetches latest Kaggle version if API key set

Configuration

Variable Default Description
DATA_PATH netflix_titles.csv Path to Netflix titles CSV dataset
MIN_YEAR 2010 Earliest year to include in timeline analysis
TOP_N_GENRES 15 Number of top genres shown in treemap
WORDCLOUD_MAX_WORDS 200 Maximum words in word cloud

Copy .env.example to .env and populate all required values before running.


Project Structure

Netflix-Insights-App-DA-/
├── README.md
├── requirements.txt
├── app.py
├── .devcontainer/devcontainer.json
├── netflix_analysis.csv
├── netflix_dataset_100k.csv
└── ...

Roadmap

  • TMDB API integration for poster images and additional metadata per title
  • Sentiment analysis of title descriptions using a fine-tuned BERT model
  • Content recommendation engine based on genre and description similarity
  • Comparison mode: Netflix vs. Prime Video vs. Disney+ catalogue analysis
  • Trend forecasting: predict genre popularity for the next content acquisition cycle

Contributing

Contributions, issues, and feature requests are welcome. Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/your-feature)
  3. Commit your changes (git commit -m 'feat: add your feature')
  4. Push to your branch (git push origin feature/your-feature)
  5. Open a Pull Request

Please follow conventional commit messages and ensure any new code is documented.


Notes

Data sourced from the publicly available Netflix Titles dataset on Kaggle. The dataset may not reflect the current live Netflix catalogue as it is periodically updated by its maintainers.


Author

Devanik Debnath
B.Tech, Electronics & Communication Engineering
National Institute of Technology Agartala

GitHub LinkedIn


License

This project is open source and available under the MIT License.


Crafted with curiosity, precision, and a belief that good software is worth building well.

About

Netflix catalogue EDA dashboard — genre treemaps, country choropleth, yearly content growth animation, rating distributions, duration KDE plots, and full-text title search.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages