Metrics, benchmarking, and indexing, oh my!

What they reveal
- and what they don’t -
about diversity, equity, and inclusion

In 1967, Christine Mann Darden - a Black woman - joined NASA as a data analyst.

It didn’t take long before she experienced and observed inequity in her workplace. What she saw was that “men with math credentials were placed in engineering positions, where they could be promoted through the ranks of the civil service, while women with the same degrees were sent to the computing pools, where they languished until they retired or quit.”

By NASA - http://www.nasa.gov/centers/langley/news/researchernews/rn_CDarden.html, Public Domain, https://commons.wikimedia.org/w/index.php?curid=38582453

In Data Feminism, Catherine D’Ignazio and Lauren F. Klein share Darden’s story and explain how, when Darden brought this observation to her boss, she was told, “Well, nobody’s ever complained,” and her own complaint led to nothing. Later on in her career, Darden continued to experience inequity, watching as her male counterparts received promotions far more quickly. Finally, together with a White woman named Gloria Champine, Darden brought a bar chart to the director of her division who was “shocked at the disparity” and gave Darden a promotion.

Darden had experienced bias, discrimination, and inequity in her workplace. Racism. Sexism. She knew how it felt, she knew how it showed up. And yet her lived experience didn’t count until it was “proven” with data.

Today, organizations are turning to sophisticated people analytics to produce graphs, metrics, dashboards...visual representations of human experience. Or at least that’s what it seems. Right there in the D&I toolkit, alongside unconscious bias training, diversity training, sponsorship programs and ERGs is the “diversity and inclusion dashboard”, with its HRIM-integrated AI-backed natural language processing (NLP) that promises to take a wealth of quantitative and qualitative data and transform them into actionable insights.

But what story does the D&I dashboard really tell?

A better question might be, what story are your D&I metrics hiding?

Don’t judge a company by its D&I accolades

We don’t need to look any further than the Uber that employed Susan Fowler. In her memoir Whistleblower, Fowler explains “Uber had instituted diversity and inclusion industry ‘best practices,’ and while I worked there, Uber checked all of the boxes for a company that cared about diversity and inclusion.” And she’s not kidding. The list of interventions range from multiple trainings, a program aimed at eliminating hiring bias, diversity recruitment efforts, ERGs, regular culture and engagement surveys, and even manager compensation and performance assessment tied to diversity metrics.

Many a D&I practitioner would look at the smorgasbord of interventions and hail Uber as a model for progressive commitment to D&I. But like so many things that seem too good to be true, these tactics concealed a toxic culture. Fowler’s own experience speaks to the weaponizing of one of the organization’s D&I interventions; her boss (who had allegedly sexually harassed her) blocked Fowler’s legitimate department transfer request, ostensibly, because losing her would mean his team’s diversity metrics would suffer, and his performance evaluation and compensation would be at risk as a result.

https://www.goodreads.com/book/show/51117957-whistleblower

Fowler’s lived experience at the company told a particular story about a toxic culture of sexism and harassment, a cultural assessment which was subsequently confirmed by an independent investigation and report by Covington & Burling. Eric Holder, a partner at the firm and co-investigator, presented a report to the Uber board that painted a picture of such cultural and organizational dysfunction that the recommendations amounted to a complete reconstruction of the company’s culture. It confirmed what Fowler had understood, that “it didn’t matter how many women joined the company, if those women were sexually harassed; that it didn’t matter how many black engineers we hired, if they were discriminated against; it didn’t matter how many women we put into positions of power, if those women perpetrated or enable the illegal / behavior. We weren’t going to fix these systemic problems--problems that were the reasons why we didn’t have many engineers who were women or people of color…. Trying to repair Uber’s aggressive disregard for civil rights and employment laws with diversity and inclusion initiatives was like putting a Band-Aid on a gunshot wound.”

While today, Fowler’s lived experience can be read in her memoir, she raised her concerns inside the organization, and via the “appropriate channels,” at the time. And she wasn’t the only one. Like Darden half a century earlier, Fowler and other peers went to their leaders, telling them how their experiences revealed that things weren’t right, that bias, discrimination and inequity were serious problems in their organization.

“Trying to repair Uber’s aggressive disregard for civil rights and employment laws with diversity and inclusion initiatives was like putting a Band-Aid on a gunshot wound.”

— Susan Fowler

So, a question for the Uber of 2017 and for organizations today that claim to value diversity and inclusion: How do the lived experiences of your people factor into the picture you have of how bias, discrimination, and inequity shape the employee experience, and, ultimately, prevent them and your organization from achieving its full potential?

When was the last time you courageously created the space for your people to share, with candour, their answer(s) to the questions:

Do you feel you have an equal opportunity to succeed here and reach your full potential?

How have you observed and/or experienced bias, discrimination, and inequity?

Don’t get me wrong, I know that there are a lot of barriers to asking these questions, not least of all the complexity involved in analyzing qualitative data.

Stories of lived experience are hard to analyze and quantify

At Tidal Equality, we’re very familiar with this challenge. Listening to, and learning from, the lived experiences of people who observe and experience discrimination is at the root of our methodologies. When we co-create strategy in the organizational or sectoral context, we start from a place of curiosity. We ask a lot of open-ended questions. We really want to get under the hood of how inequities prevent people from reaching their full potential. And it’s really difficult to analyze qualitative, anecdotal data. In fact, it’s one of the hardest parts of the work that we do, which is why we’re always on the lookout for tools that might help us make sense of stories of lived experience - qualitative data.

SenseMaker by Cognitive Edge is one sophisticated technology that uses a “crowdsourcing method for human judgement, meaning, and feeling.” But its sophistication also likely translates to - in my opinion - a challenge for users and administrators (a challenge, let me be clear, that is worth overcoming if you’re inclined). A host of employee engagement SaaS tools - from CultureAmp, Lattice, Diversio and WorkTango - claim AI capable of taking qualitative responses and transforming them into actionable insights. WorkTango, for example, advertises their technology can help you “gather employee comments and qualitative feedback in an intuitive way to support action… understand sentiment around equity and inclusion” and even “leverage Natural Language Understanding technology to understand themes and sentiments of comments in a real-time and simple way.”

It was the promise of this kind of AI-backed insight that found me on a sales call with one of the above-mentioned organizations. After all, if their technology could help organizations harness the insight of lived experience, they could help us, too.

The sales associate was showing me how their technology took sets of qualitative data - essentially, employee answers to open-ended questions - and then translated them into a visual representation of the responses.

“See,” he said as his cursor hovered over a fuel-gauge-type graph, “it’s easy to see how employees respond, generally, to a given question. And then, if you want to see the individual responses, you just click on the graph.” He demonstrated, and I watched as his click opened up a column containing the individual responses that informed the fuel gauge.

I leaned closer to my screen and scanned the five individual responses, one at a time.

“Can you read the last two responses in that set?” I asked.

He went quiet for a moment, then said with a shrug, “Yeah, I mean, it’s hit-and-miss - maybe 60/40 accuracy. The AI isn’t perfect.”

“It’s hit-and-miss - maybe 60/40 accuracy. The AI isn’t perfect.”

No. It wasn’t. It wasn’t even close.

Two of the five responses had been incorrectly coded.

One response read something along the lines of “I like our corporate values, but my leaders don’t seem to live up to them” and it was coded as a positive response.

When he closed the response window and brought me back to the dashboard full of colorful and easy-to-read graphs and scales, I knew we couldn’t rely on this technology to help us with the labour of analyzing qualitative data.

At the end of the day, the technology organizations rely on to help scale human functions is flawed, and D&I tech is no exception. A “hot” market estimated to be worth upwards of $100 million, according to a recent report co-produced by RedThread Research and Mercer, D&I tech is subject to the same in-built bias as the rest of the tech industry (the kind of bias Google researcher Timnit Gebru was fired for exposing). It carries the risk of “implementing technology that itself may have bias due to the data sets on which the algorithms are trained or the lack of diversity of technologists creating it,” the report warns.

“D&I tech carries the risk of ‘implementing technology that itself may have bias due to the data sets on which the algorithms are trained’”

But my experience on the sales call made me curious about the reliability of NLP technology being deployed for this purpose elsewhere in the D&I space. I wanted another perspective, so I called up the founder of a competitor in the space and asked her about the accuracy rate of the NLP technology their tool uses for making sense of qualitative data. “To be honest,” she said, “I don’t know. I should know. But no one’s asked about that before.”

Skimming the surface, I was coming away with the impression that the accuracy with which the dashboards and graphs could represent human perspectives was dubious, but also that there was a question around whether accuracy was even important.

Did the organizations buying and implementing these tools care about how well the dials and pie charts and heat maps reflected the employee experience?

Didn’t they want to know the real answers to the questions they were asking?

CONTENT UPDATE JANUARY 23, 2021

A recent report published by law firm Hogan Lovells reveals “45% of businesses do not vet technology supplied to them for technological bias” despite bias in data being reported as the second most important ethical concern.

What do our questions reveal about our beliefs?

In his New York Times article, “The Human Experience Will Not Be Quantified,” Phil Klay asks, “Why do we keep mistaking data for knowledge?” In an attempt to answer this question, Klay reflects, “because science supposedly gives clear answers… we tend to rush to embrace it as a panacea.… Rarely does it occur to us how often the invocation of ‘science’ is used to mask value judgments, or political deliberation.”

Klay interviewed Mona Chalabi, an artist and data journalist at The Guardian, about the shortcomings of data. “The biggest problem with data is pure arrogance,” Chalabi asserted. “‘Data replicates systems of power,’ she said, since the types of questions that get asked, and the sort of information deemed worthy of collection, often reflect the biases of the powerful.

“‘Data replicates systems of power’...since the types of questions that get asked, and the sort of information deemed worthy of collection, often reflect the biases of the powerful.”

What Klay and Chalabi are pointing to is the insidious side of this conversation, about how systemic bias - racism, sexism, and so on - might be informing the kind of data we’re collecting, the kind of questions we’re asking. It’s time for us to take a look at what our inquiry says about bias and power.

What questions are we asking?
Who is asking the questions?
Who’s answering them?
Who isn’t?

The questions we ask - the ones that find their way into D&I surveys - reveal a lot about underlying beliefs and assumptions.

Remember my sales call with the D&I tech company? The associate demoed the tool on a data set that came from their D&I survey. Out of curiosity, I asked him to do a word search on the question set:

“Can you tell me if the survey has the word ‘inequality’ in it?”

It didn’t.

We’ve reviewed a good number of D&I surveys. A lot of them are available to the public, free-of-charge. You won’t find a lot of questions about experiences of bias, discrimination, or inequity. You will find a lot of questions/statements about perceptions and beliefs, for example, “My company understands that diversity is critical to our success.”

What is the real value of a question like this? What does the answer really tell us?

Consider the following:

A recent report commissioned by a Canadian police service produced key findings including, the “majority of…Police Leaders and Board Members strongly believe that the organization is committed to diversity and inclusion.” Similarly, 73% of respondents to the Inclusion Survey used responded positively to the statement, “My organization is committed to and supportive of diversity.”

Meanwhile, the same report found “discrimination or bias against Racialized persons was one of the most common themes in the open-response comments” and more than half of respondents disagreed with the statement that "everyone is treated fairly and consistently when applying for a job or promotion." While this discrepancy is notable, I think it’s also important to point out that in the 144-page report, the words “inequity” and “inequality” don’t appear. At all.

Taken together, what do these examples reveal about the state of efforts to advance equity, diversity and inclusion in our workplaces? Collectively, do we believe inequity exists in our organizations? Do we believe that people likely experience bias and discrimination in the workplace?

“In the 144-page D&I report, the words ‘inequity’ and ‘inequality’ don’t appear. At all.”

Do you believe inequity exists… in your organization?

Our work at Tidal Equality is predicated on the belief that “inequality exists and it is at the root of our problems.” So, a lot of our work is spent understanding how inequality shows up, how people experience it, not asking whether or not it exists in the first place. But why am I being so insistent upon making this assertion, and encouraging you to do the same? Well, in part, because, as Robert Livingston, lecturer in public policy at Harvard points out,

“If your employees don’t believe that racism exists in the company, then diversity initiatives will be perceived as the problem, not the solution.”

— Robert Livingston

Livingston has devoted his career to the study of diversity, leadership, and social justice. He’s found that acknowledging the existence of bias and inequity is the first key step in addressing inequity.

When it comes to racial inequity, research shows that, on the whole, “Whites in the United States believe that systemic anti-Black racism has steadily decreased over the past 50 years.” Livingston explains, “even managers who recognize racism in society often fail to see it in their own organizations….executives point to their organizations’ commitment to diversity as evidence for the absence of racial discrimination.” He goes on to assert that “despite these beliefs, many studies in the 21st century have documented that racial discrimination is prevalent in the workplace, and that organizations with strong commitments to diversity are no less likely to discriminate” (emphasis added).

A very recent publication by Jamillah Bowman Williams and Jonathan Cox reinforces Livingston’s claim, finding that “despite bold claims to value diversity and billions of dollars invested in related efforts, workplace discrimination continues to be a major factor in the lives of racial and ethnic minorities,” and that even among individuals “who both openly acknowledge discrimination and believe diversity is an important goal rarely take action to counter structural inequality” (emphasis added).

The findings of Livingston, Bowman Williams and Cox, and others, connect the dots between beliefs, actions/tactics/outputs, and outcomes. If you don’t believe, fundamentally, that the problem of racism or inequity exists in your organization, you’re not going to ask questions about experiences of racism or inequity, and you’re not going to implement tactics that address the problems of racism and inequity. And your metrics, dashboards, dials, and graphs are going to represent that belief.

Take Tesla’s recent Diversity, Equity and Inclusion Impact Report. While one of their D,E&I principles is a “focus on sustainable solutions that solve problems at the root cause and reimagine new programs with diversity, equity and inclusion principles embedded in the design,” the report doesn’t explore (or even include the words) “inequality” or “discrimination.” It doesn’t gesture toward questions about why it is women make up a scant 21% of their American workforce, and only 17% of their leadership, or why 59% of that leadership is White (other than pointing to age-old tropes about the insurmountable pipeline problem). It doesn’t outline how sexism and racism are preventing Tesla employees from having an equal opportunity to succeed and reach their full potential. The report does celebrate its “gold standard” tactics: unconscious bias training, ERGs, executive-sponsored programming, etc. But what kind of culture is fostering the inequities that are at the root of the representation statistics?

Maybe we don’t want to know about inequity

I was on a zoom call with a peer mentor and friend, expressing in a not-so-succinct way all of the frustration you can read in my commentary above.

I was incredulous that the NLP accuracy of the D&I tech was “60/40”.
I was incredulous that organizations were buying this tech despite this accuracy rate.
I was incredulous that buyers weren’t asking about the accuracy rate at all.
I was incredulous that, in survey after D&I survey, I couldn’t find a question about how bias, discrimination, and inequity were experienced.

When I paused to catch my breath, my friend leaned back in his chair, stroked his beard and laughed a deep belly laugh. “Did it ever occur to you, Kristen,” he said with complete calm and composure, “that they don’t want to know the answers to the questions you’re talking about?”

I know that he’s right. And that scares me.

Michel Foucault writes, “Power is tolerable only on condition that it mask a substantial part of itself. Its success is proportional to its ability to hide its own mechanisms.”

“Power is tolerable only on condition that it mask a substantial part of itself. Its success is proportional to its ability to hide its own mechanisms.”

— Michel Foucault

I’m arguing that, if we’re not careful, if we fail to listen to, and learn from, experiences of bias, discrimination, and inequity, we will be enabling systems of power and oppression to thrive, and all our best-laid plans (read: trainings, ERGs, sponsorship programs, dashboards, analytics) will go awry.

If we don’t believe bias, discrimination, and inequity exist in our organizations, we will keep asking questions about diversity, inclusion, and belonging, instead of asking questions about bias, discrimination, and inequity.
If we’re not asking questions about bias, discrimination, and inequity, we won’t be able to solve the problems of bias, discrimination, and inequity.

So what will all of our questions actually give us? Beautiful graphics and visuals representing analytics that are “hit-and-miss” and rife with bias themselves? “Hit-and-miss” like the effectiveness of so many “gold standard” D&I interventions - namely and notoriously unconscious bias training, among others?

How can we prevent metrics, benchmarking, and indexing from becoming the new tick-box risk-mitigation exercise - part of the window-dressing of “woke” organizations?

We must do better.

We must demand better.

We must ask questions of our questions.

If we’re going to benchmark anything, we should be benchmarking our investments in these interventions in relation to the lived experiences of the people in our organizations.

Forget the ROI of D&I - stop asking the question of how or whether diversity is good for the bottom line.

Start asking if the money you’re investing in D&I efforts is showing a return in the lives and experiences of the people in your organization.

Start asking: How do you experience bias, discrimination, and inequity in our organization?

Then invest resources in solving the problems people point you to.

That’s assuming, of course, you want to solve those problems.

POST SCRIPT

I would be remiss if I ended this piece without telling you about one formidable story of how metrics helped drive dramatic and transformational change. But it’s probably not what you think. It didn’t involve the complex aggregation and slicing and dicing of multiple complementary data sets by sophisticated algorithms. It didn’t happen in a progressive new industry. In fact, it started with a simple tally chart on a post-it note in a hundred-year-old institution. Read about it here.

POST-POST SCRIPT

At Tidal Equality, we are very cognizant that some of what we say and write makes some folks uncomfortable, if not downright angry. After all, this and other articles, take aim at unconscious bias training and other D&I offerings on the market. (And if you find my perspective a bit too irreverent, I invite you to read this HBR piece by Tidal Equality Collaborative member, Siri Chilazi and Iris Bohnet.)

We are guided, however, by an ethos that requires us to perpetually interrogate our motives, our services, our perspectives, and the politics and methods and power dynamics of our profession and broader context.

We are inspired and by the editors of the Feminist Revolution (Kathie Sarachild, Carol Hanisch, Faye Levine, Barbara Leon, and Colette Price) and endeavour to learn from them and their peers.

In the preface to the abridged edition, they write, “Criticism is central to this book. So is history. The two are related because you can’t write history, you can’t sum up experience, without making evaluations. Both are very controversial in the Women’s Liberation Movement and both are threatening. And both we see as absolutely necessary to achieving the liberation of women.”

They dedicate their work, in part, as follows: