Book recommendation: Invisible Women, Data Bias in a World Designed for Men
You may have seen this book featured last year on bookstores across the country. It's been named Winner of the 2019 Financial Times and McKinsey Business Book of the Year Award and Winner of the 2019 Royal Society Science Book Prize.
Written by Caroline Criado Perez, a British author, journalist, and activist, the book is a fascinating and infuriating read about the "gender data gaps" across a wide range of domains including medicine, technology, politics, health and safety, transport, and more. Criado Perez illuminates the deep biases in much of the data that major societal, business, and health decisions are built on. With this systemically skewed data, we are essentially discounting half the population, and as a result, we all suffer.
As you're probably aware, there is significant bias in the datasets used to train machine learning algorithms and other software. Criado Perez outlines these issues and shares some truly illuminating and unsettling research. For example, in 2016, Google's voice recognition software was still 70% more likely to accurately recognize a male voice over a female voice. Speech recognition data is trained on large databases of voice recordings, dominated by male voices. The text corpora typically used to train translation software include female pronouns at half the rate of male pronouns. Algorithms trained on this data are left with an inaccurate representation of the world, and image databases have the same issue.
These "gender data gaps" are not only essential for us to understand and bring awareness to as people living in this world and experiencing them on a daily basis, but also as professionals who have the ability to narrow these gaps as we innovate and develop new products.
Recommended by IBM Canada Lab