The SDG Data Catalog is an open, extensible, global database of data sets, metadata, and research networks built automatically by mining millions of published open access academic works.
The SDG Data Catalog leverages advancements in Artificial Intelligence (AI) and Natural Language Processing (NLP) technologies to extract and organize knowledge from public datasets that is otherwise hidden in plain sight in the continuous stream of research generated by the scientific community.
The goal, ultimately, is to connect researchers and students with SDG-relevant datasets so that their work can make meaningful progress towards social good.
Hidden in Plain Sight: Building a Global Sustainable Development Data Catalogue
By James Hodson & Andy Spezzati
Modern scientific research for Sustainable Development depends on the availability of large amounts of relevant real-world data. However, there are currently no extensive global databases that associate existing data sets with the research domains they cover.
We present the SDG Data Catalogue – an open, extensible, global database of data sets, metadata, and research networks built automatically by mining millions of published open access academic works. Our system leverages advances in AI and NLP Technologies to extract and organise deep knowledge of data sets available that is otherwise hidden in plain sight in the continuous stream of research generated by the scientific community.
Explore SDG Datasets
Annual social protection data are compiled by the International Labour Organization (ILO) through its Social Security Inquiry, sourced from national administrative data. The indicators are disseminated through ILO World Social Protection Data Dashboards
The Group on Earth Observations Global Agricultural Monitoring (GEOGLAM) Crop Monitor (https://cropmonitor.org/) is an international initiative that was developed under the framework of the 2011 G20 Action Plan on Food Price Volatility in Agriculture.
Created by the UN Food and Agriculture Organization (FAO), an agency dedicated to international efforts to end hunger, this dataset tracks desert locust observations, as well as whether the observed locusts are adults or nymphs (known as hoppers) and whether the locusts form a group.
Good Health & Wellbeing
The GHO data repository contains data collected by the World Health Organization on various health-related statistics including mortality and disease burden rates in 194 countries.
World Pop is an applied research group focussed on mapping demographics in low and middle income countries, and works to measures the availability and geographical accessibility of healthcare services at the national and sub-national levels across Sub-Saharan Africa as one of its activities.
Created by the John Hopkins University Center for Systems Science and Engineering, this dataset reports COVID-19 cases at the provincial-level in China, at the county-level in the U.S., and at the state and national-levels for other countries.
The United Nations Educational, Scientific and Cultural Organization(UNESCO) is supporting countries in their efforts to mitigate the immediate negative impact of school closures and to facilitate the continuity of education through remote learning.
The United Nations Protocol and Liaison Service maintains a list of Heads of State, Heads of Government, and Ministers for Foreign Affairs of all Member States based on the information provided by the Permanent Missions.
The Inter-Parliamentary Union (IPU) tracks monthly rankings of the percentage of women in parliament from January 2019 onwards through Parline, a free resource with over 600 data points provided directly by national parliaments on their structure, composition, working methods, and activities.
Clean Water & Sanitation
The UN Environment Programme (UNEP) works with partners to support the global monitoring of freshwater ecosystems, as reported through the Freshwater Ecosystems Explorer, which provides up-to-date geospatial data on changes to their extent and water quality.
The ISciences Water Security Indicator Model v2 (WSIMv2) describes places where water availability during the most recent 12-month period is more or less than would be expected based on a 1950-2009 baseline period.
Affordable & Clean Energy
Decent Work & Economic Growth
The OECD's quarterly national accounts (QNA) dataset presents GDP growth data collected from all the OECD member countries and some other major economies on the basis of a standardised questionnaire.
Industry, Innovation & Infrastructure
A global Artificial Intelligence (AI) repository to identify AI related projects, research initiatives, think-tanks and organizations that can accelerate progress towards the 17 UN Sustainable Development Goals.
The goal of the SWIID is to meet the needs of those engaged in broadly cross-national research by maximizing the comparability of income inequality data.
Reported by the UN Division of Economic and Social Affairs (UN DESA), International migrant stocks are estimates of the total number of international migrants present in a given country at a particular time.
Sustainable Cities & Communities
The Settlement Profiling Tool guides field personnel in creating cross-sectoral settlement profiles intended to help inform future urban development plans and policies in displacement affected contexts.
Mendeley Data Repository is free-to-use and open access. It enables you to deposit any research data (including raw and processed data, video, code, software, algorithms, protocols, and methods) associated with your research manuscript.
The European Data Portal harvests the metadata of Public Sector Information available on public data portals across European countries. Information regarding the provision of data and the benefits of re-using data is also included.
OpenAQ, a non-profit organization, collects daily air quality information from stations around the world and provides it as free and open data to help better monitor and manage the air we breathe.
The database constitutes a comprehensive set of settlement polygons. It is in geodatabase format and consists of three feature classes for built up areas (BUA), small settlement areas (SSA), and hamlets (hamlets).
Google’s Community Mobility Reports chart the geographic movement trends associated with COVID-19 over time and provides the data, aggregated and anonymized, to the public.
Responsible Consumption & Production
This platform provides access to data compiled through the UN System in preparation for the Secretary-General's annual report on "Progress towards the Sustainable Development Goals."
SDG Tracker is a free, open-access publication that tracks global progress towards the SDGs and allows people around the world to hold their governments accountable to achieving the agreed goals.
A collaborative data platform that integrates different types of data to allow the Moldovan Government access to exhaustive information on land coverage, population density and mobility behaviour.
The International Renewable Energy Agency (IRENA), an intergovernmental organization that supports countries in their transition to a sustainable energy future, compiled this dataset by measuring the maximum net generating capacity of renewable and non-renewable energy sources by country.
Our World Data provides a complete guide to CO2 and Greenhouse gas emission profiles for individual countries, charting how emissions are changing in each country, reduction progress and statistics.
NCEI provides the world’s largest collection of weather and climate data, including information that’s “land-based, marine, model, radar, weather balloon, satellite, and paleoclimatic” alongside other datasets.
The National Oceanic and Atmospheric Administration (NOAA), the National Aeronautics and Space Administration (NASA), and the UK Meteorological Office (UK Met) have used detailed station data going back to the 1800s to analyze temperature changes and have all confirmed the warming of our planet.
The Carbon Monitor dataset, led by researchers Zhu Liu, Philippe Ciais and Steven Davis, was created as the first estimate of daily CO2 emissions for six different sectors, including power, ground transportation, industrial production, residential consumption, and maritime and aircraft transportation.
Life below Water
The global spatial distribution of likely or potential Critical Habitat, as defined by the International Finance Corporation’s Performance Standard 6 (IFC PS6) criteria, comprises 20 underlying datasets.
Life on Land
The World Database on Protected Areas (WDPA) was established in 1981 after the UN Economic and Social Council called for a list of natural reserves, citing its value for economic, scientific, and conservation.
Norway's International Climate and Forests Initiative (NICFI) makes high-resolution (<5m per pixel) optical satellite imagery of the tropics freely available to all in the pursuit of helping stop deforestation and combat climate change.
Peace, Justice & strong Institutions
he Armed Conflict Location & Event Data Project (ACLED), a disaggregated data collection, analysis, and crisis mapping project, maintains a database of all forms of human conflict from over 50 developing countries.
National Geospatial-Intelligence Agency, an agency within the United States Department of Defense, records instances of hostile attacks against ships and mariners via its Anti-Shipping Activity Messages (ASAM) database.
The Voluntary National Reviews (VNRs) aim to facilitate the sharing of experiences, including successes, challenges, and lessons learned, with the goal of accelerating the implementation of the 2030 Agenda.
Partnerships for the Goals
Official development assistance (ODA) is defined by the OECD Development Assistance Committee as government aid that promotes and targets the economic development and welfare of developing countries.
Further Research and Resources
AI for Good Board Member and Full Professor at Trier University, Achim Rettinger discusses with the AI for Good Foundation Team his work in natural language processing, and how that can impact progress toward the SDGs. According to Professor Rettinger, AI and machine learning can be utilized to understand communication better by analyzing huge quantities of data. The data can help the international community uncover insights on the collective progress toward the 2030 deadline.
The SDG Data Catalogue is structured so that research and data sets can be submitted and shared. Free flow of knowledge and open source data is at the core of our vision.
Contact us to submit your research and to advise on the build out of the search tool.
Share this Page
Join our efforts to unlock AI’s potential towards serving humanity.