The SDG Data Catalog is an open, extensible, global database of data sets, metadata, and research networks built automatically by mining millions of published open access academic works.
The SDG Data Catalog leverages advancements in Artificial Intelligence (AI) and Natural Language Processing (NLP) technologies to extract and organize knowledge from public datasets that is otherwise hidden in plain sight in the continuous stream of research generated by the scientific community.
The goal, ultimately, is to connect researchers and students with SDG-relevant datasets so that their work can make meaningful progress towards social good.
Hidden in Plain Sight: Building a Global Sustainable Development Data Catalogue
By James Hodson & Andy Spezzati
Modern scientific research for Sustainable Development depends on the availability of large amounts of relevant real-world data. However, there are currently no extensive global databases that associate existing data sets with the research domains they cover.
We present the SDG Data Catalogue – an open, extensible, global database of data sets, metadata, and research networks built automatically by mining millions of published open access academic works. Our system leverages advances in AI and NLP Technologies to extract and organise deep knowledge of data sets available that is otherwise hidden in plain sight in the continuous stream of research generated by the scientific community.
Explore SDG Datasets
The latest poverty and inequality indicators compiled from officially recognized sources with national, regional and global estimates.
A list of equitable data sets, research and reports from UNICEF Office of Innovation to support programmes, campaigns, and initiatives.
Datasets and projects designed to increase empathy for often impoverished victims of far-away disasters.
The World Poverty Clock developed by the World Data Lab provides real-time poverty estimates through 2030 for nearly all countries.
Annual social protection data are compiled by the International Labour Organization (ILO) through its Social Security Inquiry, sourced from national administrative data. The indicators are disseminated through ILO World Social Protection Data Dashboards
The Global Hunger Index (GHI) is a tool designed to comprehensively measure and track hunger globally, by region and country.
The Global Hunger Index (GHI) is a tool designed to comprehensively measure and track hunger globally, by region and country.
The World Food Programme (WFP) has developed the HungerMapLIVE, a global hunger monitoring system that tracks and predicts hunger in near-real time.
The Group on Earth Observations Global Agricultural Monitoring (GEOGLAM) Crop Monitor (https://cropmonitor.org/) is an international initiative that was developed under the framework of the 2011 G20 Action Plan on Food Price Volatility in Agriculture.
Created by the UN Food and Agriculture Organization (FAO), an agency dedicated to international efforts to end hunger, this dataset tracks desert locust observations, as well as whether the observed locusts are adults or nymphs (known as hoppers) and whether the locusts form a group.
Good Health & Wellbeing
The International Genome Sample Resource contains the most extensive catalogue of genetic variation in humans including SNPs, structural variants and haplotype context.
The GHO data repository contains data collected by the World Health Organization on various health-related statistics including mortality and disease burden rates in 194 countries.
World Pop is an applied research group focussed on mapping demographics in low and middle income countries, and works to measures the availability and geographical accessibility of healthcare services at the national and sub-national levels across Sub-Saharan Africa as one of its activities.
Created by the John Hopkins University Center for Systems Science and Engineering, this dataset reports COVID-19 cases at the provincial-level in China, at the county-level in the U.S., and at the state and national-levels for other countries.
Panel database on education quality featuring data from 163 countries between 1965-2015.
The World Inequality Database on Education (WIDE) highlights the powerful influence of circumstances, such as wealth, gender, ethnicity and location.
The United Nations Educational, Scientific and Cultural Organization(UNESCO) is supporting countries in their efforts to mitigate the immediate negative impact of school closures and to facilitate the continuity of education through remote learning.
Provides datasets on households across the globe including marriages, fertility rates, adolescent fertility, etc.
Database providing data from family planning surveys conducted in various countries.
The United Nations Protocol and Liaison Service maintains a list of Heads of State, Heads of Government, and Ministers for Foreign Affairs of all Member States based on the information provided by the Permanent Missions.
The Inter-Parliamentary Union (IPU) tracks monthly rankings of the percentage of women in parliament from January 2019 onwards through Parline, a free resource with over 600 data points provided directly by national parliaments on their structure, composition, working methods, and activities.
Clean Water & Sanitation
The most comprehensive source of international water footprint data including scarcity and pollution issues.
Provides datasets on various issues including flood hazard maps, water risk indicators and water stress projections across the globe.
Datasets providing world and regional statistics, data and maps.
The UN Environment Programme (UNEP) works with partners to support the global monitoring of freshwater ecosystems, as reported through the Freshwater Ecosystems Explorer, which provides up-to-date geospatial data on changes to their extent and water quality.
The Falkenmark Water Stress Index is a widely used metric to characterize water stress based on annual renewable water supply per capita.
The ISciences Water Security Indicator Model v2 (WSIMv2) describes places where water availability during the most recent 12-month period is more or less than would be expected based on a 1950-2009 baseline period.
Affordable & Clean Energy
Data on global energy consumption by source, energy production and trade, energy transitions and renewable energy investments.
Detailed statistics on renewable energy capacity, power generation and renewable energy balances.
Global data on energy generation and consumption, energy intensity, CO2 emissions as well as import and export statistics.
Datasets on primary energy production and consumption, CO2 from fossil fuels, greenhouse gas emissions, renewable energy and electricity.
Decent Work & Economic Growth
The IMF publishes a range of time series data on IMF lending, exchange rates and other economic and financial indicators.
The International Labour Organisation (ILO) is tracking the impacts on the world of work that has been severely impacted by COVID-19
The International Monetary Fund (IMF) compiles a database on fiscal measures announced by 141 different governments in response to the COVID-19 pandemic
The OECD's quarterly national accounts (QNA) dataset presents GDP growth data collected from all the OECD member countries and some other major economies on the basis of a standardised questionnaire.
Industry, Innovation & Infrastructure
This study investigates the effect of the latest wave of economic globalization on manufacturing employment in developing countries.
Gives graphs as well as country highlights relevant to survey results.
A global Artificial Intelligence (AI) repository to identify AI related projects, research initiatives, think-tanks and organizations that can accelerate progress towards the 17 UN Sustainable Development Goals.
The World Bank's Global Database of Shared Prosperity covers 83 countries, with 75 percent of the world's people, with most recent estimates available for 2013.
The goal of the SWIID is to meet the needs of those engaged in broadly cross-national research by maximizing the comparability of income inequality data.
Reported by the UN Division of Economic and Social Affairs (UN DESA), International migrant stocks are estimates of the total number of international migrants present in a given country at a particular time.
Sustainable Cities & Communities
The Settlement Profiling Tool guides field personnel in creating cross-sectoral settlement profiles intended to help inform future urban development plans and policies in displacement affected contexts.
Mendeley Data Repository is free-to-use and open access. It enables you to deposit any research data (including raw and processed data, video, code, software, algorithms, protocols, and methods) associated with your research manuscript.
The European Data Portal harvests the metadata of Public Sector Information available on public data portals across European countries. Information regarding the provision of data and the benefits of re-using data is also included.
A compilation of smart cities around the world that have shared open data in an aggregated data portal.
A compilation of smart cities in the world that have shared out open data in an aggregated open data portal.
OpenAQ, a non-profit organization, collects daily air quality information from stations around the world and provides it as free and open data to help better monitor and manage the air we breathe.
The database constitutes a comprehensive set of settlement polygons. It is in geodatabase format and consists of three feature classes for built up areas (BUA), small settlement areas (SSA), and hamlets (hamlets).
Google’s Community Mobility Reports chart the geographic movement trends associated with COVID-19 over time and provides the data, aggregated and anonymized, to the public.
Responsible Consumption & Production
This platform provides access to data compiled through the UN System in preparation for the Secretary-General's annual report on "Progress towards the Sustainable Development Goals."
SDG Tracker is a free, open-access publication that tracks global progress towards the SDGs and allows people around the world to hold their governments accountable to achieving the agreed goals.
A collaborative data platform that integrates different types of data to allow the Moldovan Government access to exhaustive information on land coverage, population density and mobility behaviour.
The International Renewable Energy Agency (IRENA), an intergovernmental organization that supports countries in their transition to a sustainable energy future, compiled this dataset by measuring the maximum net generating capacity of renewable and non-renewable energy sources by country.
Provides science and information, focusing on news, data, and climate teaching materials, and the data products and services to track global climate data.
Our World Data provides a complete guide to CO2 and Greenhouse gas emission profiles for individual countries, charting how emissions are changing in each country, reduction progress and statistics.
NCEI provides the world’s largest collection of weather and climate data, including information that’s “land-based, marine, model, radar, weather balloon, satellite, and paleoclimatic” alongside other datasets.
Areas of the ocean that have frozen are considered “sea ice,” and can vary from slushy, barely solid areas to sheets of ice that are meters thick.
The Climate Hazards Group InfraRed Precipitation with Station Data (CHIRPS) is a joint project between the US. Geological Survey and UC Santa Barbara.
The National Oceanic and Atmospheric Administration (NOAA), the National Aeronautics and Space Administration (NASA), and the UK Meteorological Office (UK Met) have used detailed station data going back to the 1800s to analyze temperature changes and have all confirmed the warming of our planet.
The Carbon Monitor dataset, led by researchers Zhu Liu, Philippe Ciais and Steven Davis, was created as the first estimate of daily CO2 emissions for six different sectors, including power, ground transportation, industrial production, residential consumption, and maritime and aircraft transportation.
Life below Water
Includes the lifecycle of plastic in the oceans, plastic hotspots, and other measures.
A global dataset of 1571 locations where surface manta tows were conducted
Sources of ocean plastic organized by river.
The global spatial distribution of likely or potential Critical Habitat, as defined by the International Finance Corporation’s Performance Standard 6 (IFC PS6) criteria, comprises 20 underlying datasets.
The Ocean Tracking Network is a global aquatic animal tracking, data management, and partnership platform.
Coral reefs are one of the most diverse and ecologically important areas in the world, but many are threatened by rising ocean temperatures.
Life on Land
Provides data about forests including land cover, land use, biodiversity metrics and forest change allowing for the monitoring and management of forests.
Provides data on forest ecosystems including tree cover loss and gain rates, restoration opportunities, forest fires and biodiversity hotspots.
Allows users to visualize and analyse data on country specific forest characteristics.
The project is grounded in the premise that conservation is critical to transformations to sustainability but that its practices need to change radically.
Aimed to improve nutrition through the adoption of agro-biodiversity and improved dietary diversity at the household level in Uganda & Zambia.
Features environmental conservation and restoration frameworks for policymakers and private-sector initiatives including infographics, datasets, visualization tools, and more.
Global Forest Watch (GFW) provides data and tools for monitoring forests and provides access to near real-time information about where and how forests are changing around the world.
The World Database on Protected Areas (WDPA) was established in 1981 after the UN Economic and Social Council called for a list of natural reserves, citing its value for economic, scientific, and conservation.
The Active Fires product, managed by the National Oceanic and Atmospheric Administration (NOAA), is based on the detection and analysis of active wildfires as received by a sensor.
Norway's International Climate and Forests Initiative (NICFI) makes high-resolution (<5m per pixel) optical satellite imagery of the tropics freely available to all in the pursuit of helping stop deforestation and combat climate change.
Peace, Justice & strong Institutions
Pulls together data sets in an open format to track SDG16 and provide a snapshot of the current situation, and eventually progress over time.
Find data sets on topics including early childhood development, infant mortality, and intimate partner violence.
Provides data and analysis, and supports partners to identify and implement solutions to internal displacement.
he Armed Conflict Location & Event Data Project (ACLED), a disaggregated data collection, analysis, and crisis mapping project, maintains a database of all forms of human conflict from over 50 developing countries.
National Geospatial-Intelligence Agency, an agency within the United States Department of Defense, records instances of hostile attacks against ships and mariners via its Anti-Shipping Activity Messages (ASAM) database.
The Voluntary National Reviews (VNRs) aim to facilitate the sharing of experiences, including successes, challenges, and lessons learned, with the goal of accelerating the implementation of the 2030 Agenda.
Partnerships for the Goals
The project looks at the broader ways in which universities can collaborate in support of the SDGs and lists partnerships in a ranking system.
The Danish Institute developed and trained an algorithm to link human rights recommendations to the corresponding SDG(s).
Compiled by the World Bank, this dataset measures officially-recorded remittance inflows (remittances received) per country in 2020.
Official development assistance (ODA) is defined by the OECD Development Assistance Committee as government aid that promotes and targets the economic development and welfare of developing countries.
Further Research and Resources
AI for Good Board Member and Full Professor at Trier University, Achim Rettinger discusses with the AI for Good Foundation Team his work in natural language processing, and how that can impact progress toward the SDGs.According to Professor Rettinger, AI and machine learning can be utilized to understand communication better by analyzing huge quantities of data. The data can help the international community uncover insights on the collective progress toward the 2030 deadline.
The SDG Data Catalogue is structured so that research and data sets can be submitted and shared. Free flow of knowledge and open source data is at the core of our vision.
Contact us to submit your research and to advise on the build out of the search tool.
Share this Page
Join our efforts to unlock AI’s potential towards serving humanity.