Dataset Search & Repositories
Google Dataset Search
Google's search engine for finding publicly available datasets across the web. A great starting point when you're not sure where to look.
Kaggle Datasets
Large repository of datasets for machine learning and data science projects. Includes community contributions, competitions, and notebooks.
Maven Analytics Data Playground
Free sample datasets designed for practicing data analysis and visualization. Well-documented with suggested analysis questions.
R Datasets
Collection of datasets available in R packages. Useful for statistical analysis and reproducible examples.
Automotive
Audi Autonomous Driving Dataset (A2D2)
Autonomous driving dataset with sensor data from Audi vehicles. Includes camera, lidar, and semantic segmentation data.
Cloud Platform Datasets
Azure ML Open Datasets (Python)
Azure ML library for processing public datasets in Python. Provides easy access to common datasets in notebooks.
Azure Open Datasets
Microsoft's catalog of curated public datasets on Azure. Includes weather, genomics, demographics, and more.
Registry of Open Data on AWS
Public datasets available for free on Amazon Web Services. Covers satellite imagery, genomics, and large-scale datasets.
Data Visualization Sources
Beautiful News
Positive news data visualizations and underlying datasets. Good for practicing visualization with uplifting stories.
Gapminder
Global development indicators covering health, economy, and demographics. Made famous by Hans Rosling's presentations.
Information is Beautiful
Curated datasets behind data visualizations covering diverse topics. Well-structured and visually interesting data.
Database Testing & Development
Datasets for Test Databases
Curated list of datasets suitable for testing database systems. Useful when you need realistic test data.
SQL Server Sample Databases
Sample databases for learning and testing SQL Server. Includes AdventureWorks, WideWorldImporters, and others.
Entertainment & Media
IMDb Non-Commercial Datasets
Movie and TV show data from IMDb for non-commercial use. Includes titles, ratings, cast, and crew information.
Million Song Dataset
Audio features and metadata for a million contemporary songs. Widely used for music information retrieval research.
MovieLens
Movie rating datasets from GroupLens for recommender system research. Various sizes from 100K to 25M ratings.
Government & International Statistics
National Center for Education Statistics
US education statistics and research data from the Department of Education. Covers schools, colleges, and educational outcomes.
Source Cooperative - Harvard LIL Gov Data
Archive of US government data from Harvard Library Innovation Lab. Preserves government datasets for long-term access.
UNdata
United Nations statistical databases covering global indicators across economics, demographics, environment, and social metrics.
World Health Organisation
Global health statistics and indicators from the WHO, including disease surveillance, health systems, and population health data.
Location & Places
Foursquare Open Source Places
Open dataset of 100M+ places and locations from Foursquare. Covers businesses, landmarks, and points of interest globally.
Science & Research
CERN Open Data Portal
Particle physics research data from the European Organization for Nuclear Research. Includes collision data and analysis tools.
National Cancer Institute GDC Portal
Cancer genomics data from the Genomic Data Commons. Contains clinical and genomic data for cancer research.
The COVID Tracking Project
Historical US COVID-19 testing and outcomes data. Comprehensive archive of pandemic statistics.
Sports & Football
Betting Exchange Historical Data
Historical odds and market data from the Betfair betting exchange. Useful for sports analytics and prediction models.
FiveThirtyEight
Datasets behind FiveThirtyEight's data journalism articles on politics, sports, and culture. Clean, well-documented data.
Football csv
Open football/soccer data in CSV format including leagues, matches, and results. Community-maintained and regularly updated.
football.db
Open football data in structured database format. Good for building football statistics applications.
Soccer Data and APIs Guide
Comprehensive guide to football/soccer data sources and APIs. Useful for understanding what's available.
Who Scored
WhoScored.com consists of a dedicated team of football analysts and software developers with backgrounds in the sector, based in Central London. We have taken on the responsibility of providing you with valuable and unique content about a sport we all love and are passionate about.
Stats Bomb
StatsBomb is a sports data and analytics company that provides advanced football data and insights to clubs, media, and betting companies worldwide. Their datasets include detailed event data, player tracking, and tactical analysis.
Opta Analyst
Opta Analyst is a platform that offers in-depth football statistics and analysis. It provides access to a wide range of data, including player performance metrics, team statistics, and match analysis, making it a valuable resource for fans, analysts, and professionals in the football industry.
UK Geographic Data
Open Geography Portal
UK geographic boundaries and statistical data from the Office for National Statistics. Essential for UK-based spatial analysis.
Ordnance Survey
UK national mapping and geographic data. Offers both free and premium datasets for mapping applications.
Weather & Climate
Meteostat
Historical weather and climate data from weather stations worldwide. Clean API access to temperature, precipitation, and conditions.