**Measuring diversity and increasing equity and inclusion has taken center stage for communities and companies, alike. But are we actually measuring diversity correctly? And what is the new Diversity Index developed by the Census? Here’s what you need to know…**

On August 4th the Census Bureau announced that the soon-to-be-released 2020 Census data will include a “diversity index.”^{1} They state,

*“One of the measures we will use to present the 2020 Census results is the Diversity Index or DI. This index shows the probability that two people chosen at random will be from different race and ethnic groups.”*

Although they never say it by name, from the description it is apparent that the chosen DI is something called “Simpson’s Diversity Index” (more simply, just Simpson).

#### Race and Ethnicity

Although ethnicity can mean any ethnicity, from a practical perspective it is limited to Hispanic in most data releases. Hispanic is like French or Japanese. It is **not **a race.

The Census Bureau has these seven racial categories:

- White
- Black or African American
- American Indian and Alaska Native
- Asian
- Native Hawaiian and Other Pacific Islander
- Some Other Race
- Two or More Races

A person of Hispanic ancestry can be **ANY **of those listed races. We will examine race and ethnicity from a reporting perspective in detail in a future post. For now, what is important to understand is the DI will have **EIGHT **categories.

The eighth is Hispanic. That means, to avoid confusion, it is critical not to say “White” but rather “Non-Hispanic White”. Similarly Black is “Non-Hispanic Black” and so on. It sometimes is termed “White alone, non-Hispanic”, “Black alone, non-Hispanic”, etc.

#### Diversity Indices

There is a vast body of work in biology (especially ecology) on diversity and diversity indices. Most discussions involve Simpson and another diversity index called “Shannon” (there are others). The math gets complicated. For a good review see this website by Lou Jost.

There are two critical items to consider in measuring diversity:

- Richness – the number of species (more is better)
- Equitability – how equal in number are the species (five species at 20% is better than one species at 96% and four at 1%)

Simpson, as seen in the Census quote above, is a probability measure. Shannon is an uncertainty (or entropy) measure. This means that they don’t really measure “diversity”; they measure probability or entropy. It is true that a region with a bigger Simpson or Shannon number has a “larger diversity index”. But if you want to compare multiple communities, or review a region over time, you must convert these indices to something called the “effective number”.

#### The Effective Number

The effective number takes any diversity index and converts it into a species count where each species has equal numbers. Remember, for equitability, equal numbers is best. With equitability optimized, regions can be compared.

A region with an effective number of three truly is twice as diverse as a region with an effective number of 1.5. You can do this math because the effective number is linear. Simpson & Shannon are **NOT **linear meaning you **cannot **compare them this way.

If you want to say Region A is X% more diverse than Region B you must use the effective number.

#### Why Pick One Over the Other?

We don’t know why Census picked Simpson. But we can at least talk about the advantages and disadvantages they present and perhaps speculate.

The concept of probability – Simpson – is a somewhat intuitive concept. Saying you have a 10% chance of seeing someone from a different race versus a 70% chance is instantly understandable by most people. National media often use Simpson because of this ease of understanding.

Shannon is **far **harder to understand. Entropy is not an easy concept. It is a number few will understand beyond “bigger is better”. This makes it not terribly useful for a general audience.

So we can understand why Simpson would be chosen over Shannon.

Perhaps the complexity of understanding the effective number explains why it, too, was not selected. It does bring up an interesting question, though. Why not produce Simpson, Shannon and their associated effective numbers? Serve people who need simplicity and those who need a more complex (and appropriate) method.

This is exactly what we have done for StateBook using raw population estimates data. (This process will be replicated once the 2020 Decennial Census data is released). It allows flexibility. If Simpson is an appropriate measure, we have it. Same with Shannon. But if you are trying to compare multiple regions (or conduct a time series) – as is the case with most StateBook users – then you **MUST **convert to the effective number, which we also produce.

To reiterate:

- Either DI works if you want to state that one region is more diverse than another
- Simpson is your choice if you want to talk about probabilities
- Shannon is incredibly useful; it requires special knowledge to interpret properly
- But if you want to quantify differences, if you want to say one region is twice as diverse as another region, you must use the effective number

To learn more details about your local region, please contact data@statebook.com. And feel free to explore example StateBook data collected and visualized for every location across the U.S. in our interactive Calypso API.