All datasets are now available for download including anonymized coordinates at the household-cluster level. Please use the data request form to gain access.
The Financial Inclusion Insights (FII) program has always used geographic positioning system (GPS) technology to collect coordinate points for the households where it collects survey data. This includes households in all eight of the countries in Africa and Asia where we have been implementing nationally representative surveys on financial inclusion since 2013. These coordinate points are used by FII to control the quality of the data, allowing us to verify that each interview was conducted in the locations specified in our sampling plans. The GPS coordinates are also time-stamped, which helps us verify that the interview was conducted in the right place and at the right time according to the survey work plan.
The use of these spatial data is a key part of our process for producing high-quality statistics. However, the location of a respondent’s household is personally identifiable information that is protected by the confidentiality agreement we make with each of our survey respondents. Therefore, household coordinates have never been included in any of our public datasets, and using the data for spatial analysis has been limited to generalizing information at the level of the administrative units identified in the data, such as counties and provinces.
Now in its sixth year of annual tracking surveys, the FII program has a wealth of longitudinal data and an ongoing mandate to make the data accessible to as many users as possible. As a further step towards fulfilling our mandate, we have anonymized the coordinates at the level of the household clusters used in our sample designs. These anonymized coordinates are now included in our public datasets.
A key benefit of releasing the data with anonymized GPS coordinates is to give researchers the opportunity to join our survey data with geo-located data from other sources. Data from multiple sources can be meaningfully overlaid using point locations, holding to the maxim that things located in closer proximity to each other are more closely related than things that are further away. Used in combination with external datasets, the FII data will contribute to answering a larger set of research questions.
It is critical, however, that the desire to apply more, better, and richer data to new use cases is not allowed to compromise our ethical commitments and our adherence to international standards that protect the privacy and confidentiality of research subjects. To maintain our respondents’ privacy, we have expended considerable effort on anonymizing the household coordinates following industry best practices used by the Demographic and Health Surveys and by global institutions such as the World Bank.
The anonymization method that we used to allow researchers to take full advantage of this rich dataset for deeper analysis is based on the method used by the Demographic Health Survey (DHS) Program. DHS has produced a robust set of analyses on quantifying the measurement error that results from using coordinates anonymized using the chosen method compared to using the original coordinates. The anonymization algorithm essentially functions as follows:
- Calculate a centroid for each cluster of household interviews by averaging the relative location of each household in the cluster;
- Select a random angle and move the centroid a random distance as follows:
- Offset urban clusters by a minimum of 0 and a maximum of 2 kilometers
- Offset 99% of rural clusters by a minimum of 0 and a maximum of 5 kilometers, and 1% by a minimum of 0 and a maximum of 10 kilometers;
- Ensure that the new location of the centroid lies within the second-level administrative region (think “counties” in the United States as an example);
- For each cluster, replace the actual household coordinates with the randomly displaced centroid.
Our starting point for the anonymization process was the R code published by the DHS program, which we modified to fit the idiosyncrasies of our data. Depending on the country and year, coordinates were collected at each household with a handheld receiver, tablet computer, or smartphone. The devices produced different levels of accuracy, and variables such as proximity to mobile phone towers, in some cases, caused outliers to be recorded in a cluster of household locations. Outliers were typically excluded in the calculation of centroids. If the cluster identification data was not usable, or if the coordinates themselves were corrupted, then the anonymization process could not be implemented. Out of our 40 FII datasets, the household location data recorded from the Nigeria 2013 and Tanzania 2013 could not be used. With these two exclusions in mind, our coordinate data is sufficiently anonymized and is ready to be used for analysis.
A use case example for the point location data is mapping the spatial distribution of key variables, such as financial inclusion. The figure below is from our most recent, 2018 survey data from Bangladesh. It shows the spatial distribution of the financially included population, which shows significant clustering in and around the capital city, Dhaka. Note that datasets have been weighted to accurately represent key characteristics of the national population sizes, so weights should always be applied during analysis.
Naturally, less populated areas of Bangladesh have fewer financially included individuals, and the proportion of financially included adults is correlated with population density. Note that the density gradient is interpolated across household cluster point locations, so some of the shaded areas lie outside of the country boundaries.
FII data contains hundreds of other demographic, economic, and financial variables that can benefit from spatial analysis. GIS software may be used to overlay FII data with data from other sources. Please request access to the FII survey datasets here. Below is the full list of FII survey datasets, by year, that include anonymized coordinates and are available for download:
- Bangladesh 2013, 2014, 2015, 2016, 2017, 2018
- India 2013, 2014, 2015, 2016, 2017, (2018 coming soon)
- Indonesia 2014, 2015, 2016
- Kenya 2013, 2014, 2015, 2016, 2017
- Nigeria 2014, 2015, 2016, 2017
- Pakistan 2013, 2014, 2015, 2016, 2017
- Tanzania 2014, 2015, 2016, 2017
- Uganda 2013, 2014, 2015, 2016, 2017
Financial Inclusion Insights (www.finclusion.org) is a research program funded by the Bill & Melinda Gates Foundation and designed to build meaningful knowledge about how the financial landscape is changing across the eight countries in Africa and Asia (Bangladesh, India, Indonesia, Kenya, Nigeria, Pakistan, Tanzania and Uganda). FII produces data and analysis regarding citizens’ financial lives, attitudes, awareness and use of, access to, and advanced engagement with financial products and services. Through our qualitative and quantitative research, we aim to provide demand-side insights into consumers' financial behaviors, produce information that can guide policy interventions, and identify pathways for the poor to gain the financial tools they need to improve their economic circumstances.