2016-07 Subnational diversity in Sub-Saharan Africa
This paper presents a new dataset on subnational ethnolinguistic diversity in Sub-Saharan Africa covering 36 countries and almost 400 first-level administrative units. We compile detailed data on the ethnolinguistic composition of each region using population censuses and large-scale household surveys and match all reported ethnicities to Ethnologue, the most complete classifier of world languages. This matching allows to standardize the notion of an ethnolinguistic group and account for the relatedness between language pairs when calculating diversity indices. We exploit this high-quality dataset to investigate the connection between diversity, as captured by fractionalization and polarization indices, and development indicators at the subnational level. Educational and health outcomes, electricity access, and nighttime luminosity are all negatively related to diversity, even after controlling for country fixed effects and a rich set of regional characteristics, but only if the underlying ethnolinguistic groups are sufficiently aggregated into more basic language families or if linguistic similarities between them are taken into account. In other words, only deep-rooted diversity based on cleavages formed in the distant past is strongly inversely associated with regional development. Furthermore, we show that subnational diversity is remarkably persistent over time implying that reverse causality is unlikely to bias our main findings.