Explore the Netflix content universe through data — genre trends, rating distributions, global reach, and content evolution.
Topics: machine-learning · deep-learning · neural-networks · recommendation-system · data-science · collaborative-filtering · content-analytics · exploratory-data-analysis · pandas · streamlit
This exploratory data analysis (EDA) dashboard brings the Netflix content catalogue to life through a suite of interactive visualisations built on Streamlit and Plotly. Working with the publicly available Netflix Titles dataset (available on Kaggle), it provides a multi-dimensional view of Netflix's content strategy: how the library has grown over time, how content is distributed across countries, genres, and ratings, and what patterns emerge when the data is sliced by content type (Movies vs. TV Shows).
The dashboard goes beyond simple bar charts. A choropleth world map visualises content volume by country of origin. A treemap breaks down the genre hierarchy. A timeline animation shows how the content mix has shifted year-over-year from 2010 to the present. Word clouds of titles and descriptions reveal recurring themes. And a full-text search engine allows any title to be located and its metadata displayed immediately.
The application is structured as a multi-page Streamlit app with a navigation sidebar, making it easy to move between the overview, genre analysis, geographic analysis, rating analysis, and content search modules.
Public datasets about streaming platforms offer a rare window into the content strategy of one of the world's largest media companies. This project was built to demonstrate what a thoughtful EDA can reveal beyond what a data dictionary describes — and to provide a reusable, visually compelling template for content catalogue analysis that any streaming platform dataset can be adapted to.
Netflix Titles CSV
│
pandas: cleaning, type casting, null handling
│
Feature Engineering:
- Genre list explosion (multi-label → one-hot)
- Year extraction from date_added
- Duration parsing (min for movies, seasons for shows)
│
Plotly + WordCloud Visualisation Layer
│
Streamlit Multi-Page Dashboard
(Overview | Genre | Geography | Ratings | Search)
High-level KPIs: total titles, movies vs. TV shows split, most recent additions, average movie duration, and median TV seasons — all on a single summary page.
Interactive Plotly treemap of genre hierarchy, with area proportional to content count and colour intensity representing average content rating.
World map heatmap of Netflix content volume by country of production, with hover tooltips showing top titles per country.
Animated bar chart racing through years 2010–2023, showing the accelerating growth of the Netflix library by content type.
Stacked bar charts of content rating (TV-MA, TV-14, PG-13, PG, G, etc.) broken down by content type and production decade.
Kernel density estimate (KDE) and histogram of movie runtimes (minutes) and TV show season counts, with percentile markers.
Word cloud of most frequent terms in title and description fields, separately for Movies and TV Shows.
Case-insensitive, multi-field search across title, director, cast, and description, with instant results and expandable metadata cards.
| Library / Tool | Role | Why This Choice |
|---|---|---|
| Streamlit | Multi-page dashboard | Navigation sidebar and page routing |
| pandas | Data wrangling | CSV loading, cleaning, feature engineering |
| Plotly | Interactive charts | Choropleth, treemap, timeline animation, KDE |
| WordCloud | Text visualisation | Title and description word clouds |
| NumPy | Statistical computation | Percentile and distribution calculations |
| Seaborn (optional) | Static plots | Rating distribution heatmaps |
Key packages detected in this repo:
xlsxwriter·streamlit·pandas·plotly·google-generativeai·numpy·seaborn·matplotlib·wordcloud·scikit-learn
- Python 3.9+ (or Node.js 18+ for TypeScript/JS projects)
pipornpmpackage manager- Relevant API keys (see Configuration section)
git clone https://github.com/Devanik21/Netflix-Insights-App-DA-.git
cd Netflix-Insights-App-DA-
python -m venv venv && source venv/bin/activate
pip install streamlit pandas plotly wordcloud numpy seaborn
# Download dataset from Kaggle and place as netflix_titles.csv
streamlit run app.pystreamlit run app.py
# Generate standalone report
python generate_report.py --data netflix_titles.csv --output report.html
# Update dataset
python update_data.py # fetches latest Kaggle version if API key set| Variable | Default | Description |
|---|---|---|
DATA_PATH |
netflix_titles.csv |
Path to Netflix titles CSV dataset |
MIN_YEAR |
2010 |
Earliest year to include in timeline analysis |
TOP_N_GENRES |
15 |
Number of top genres shown in treemap |
WORDCLOUD_MAX_WORDS |
200 |
Maximum words in word cloud |
Copy
.env.exampleto.envand populate all required values before running.
Netflix-Insights-App-DA-/
├── README.md
├── requirements.txt
├── app.py
├── .devcontainer/devcontainer.json
├── netflix_analysis.csv
├── netflix_dataset_100k.csv
└── ...
- TMDB API integration for poster images and additional metadata per title
- Sentiment analysis of title descriptions using a fine-tuned BERT model
- Content recommendation engine based on genre and description similarity
- Comparison mode: Netflix vs. Prime Video vs. Disney+ catalogue analysis
- Trend forecasting: predict genre popularity for the next content acquisition cycle
Contributions, issues, and feature requests are welcome. Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/your-feature) - Commit your changes (
git commit -m 'feat: add your feature') - Push to your branch (
git push origin feature/your-feature) - Open a Pull Request
Please follow conventional commit messages and ensure any new code is documented.
Data sourced from the publicly available Netflix Titles dataset on Kaggle. The dataset may not reflect the current live Netflix catalogue as it is periodically updated by its maintainers.
Devanik Debnath
B.Tech, Electronics & Communication Engineering
National Institute of Technology Agartala
This project is open source and available under the MIT License.
Crafted with curiosity, precision, and a belief that good software is worth building well.