Home

Data

Creators of Intelligence

By Dr. Alex Antic

Book

eBook $43.99 $29.99

Print $54.99

Subscription $15.99 $10 p/m for three months

BUY NOW

$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

eBook $43.99 $29.99

Print $54.99

Subscription $15.99 $10 p/m for three months

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

About this book

A Gartner prediction in 2018 led to numerous articles stating that "85% of AI and machine learning projects fail to deliver.” Although it's unclear whether a mass extinction event occurred for AI implementations at the end of 2022, the question remains: how can I ensure that my project delivers value and doesn't become a statistic? The demand for data scientists has only grown since 2015, when they were dubbed the new “rock stars” of business. But how can you become a data science rock star? As a new senior data leader, how can you build and manage a productive team? And what is the path to becoming a chief data officer? Creators of Intelligence is a collection of in-depth, one-on-one interviews where Dr. Alex Antic, a recognized data science leader, explores the answers to these questions and more with some of the world's leading data science leaders and CDOs. Interviews with: Cortnie Abercrombie, Edward Santow, Kshira Saagar, Charles Martin, Petar Veličković, Kathleen Maley, Kirk Borne, Nikolaj Van Omme, Jason Tamara Widjaja, Jon Whittle, Althea Davis, Igor Halperin, Christina Stathopoulos, Angshuman Ghosh, Maria Milosavljevic, Dr. Meri Rosich, Dat Tran, and Stephane Doyen.

Publication date:: April 2023
Publisher: Packt
Pages: 374
ISBN: 9781804616482

Cortnie Abercrombie Wants the Truth

With a wealth of experience as a C-suite AI and analytics leader, Cortnie Abercrombie is well placed to provide targeted advice to new and veteran data leaders alike, and to comment on the state of the AI field. She is the CEO and founder of AI Truth (www.AITruth.org), a founding editorial board member for Springer Nature’s AI and Ethics journal, and the former Chief Data Officer (CDO) of IBM.

Cortnie is also the author of What You Don’t Know: AI’s Unseen Influence on Your Life and How to Take Back Control, which explains what companies are doing with AI that can impact your life, as well as the changes we should demand for the future. I was particularly keen to hear about her previous experience as a senior executive at IBM, which gave her a unique opportunity to see how Fortune 500 companies traversed successes and failures on their journey to becoming data-driven organizations.

Getting into the business

Alex Antic: You’ve completed an MBA and obviously have a strong strategic focus. You then pivoted and became an expert in the field of data science and AI. I’d like to better understand your career trajectory. Were there any pivotal points or people that influenced you along the way? What led to you becoming a global leader in the field of data science?

Cortnie Abercrombie: Thanks for the acknowledgment. There was actually a pivotal moment in my life. When I left college, I started in the marketing department of a rapidly growing internet start-up during the dot-com days. My business degree was concentrated on marketing. The founders of the start-up wanted to figure out how to sell services better than the next company – especially the big multinational companies – and how to take advantage of the unique growth and first-mover market dynamics.

I had this fantastic boss and he was like, “You know what? We’ve all been relegated to marketing.” That’s not the fantastic part. He felt like marketing was a lesser-than type of function because he had been in sales, and 20-30 years ago, sales was the be-all and end-all. It was like the TV show Mad Men – they had all the power. There was an overarching and incorrect perception of marketing that it was just those two ladies over there who did some random thing that nobody knew anything about except for it having to do with advertising and tchotchkes: “Go get me some more things with our logo on it,” or whatever. But my boss said, “No, we’ve got to rethink this whole entire function so that it is strategic and customer-driven, and I want to use databases and data to do that.”

At that time, that was actually pretty forward-thinking. I did not realize it then, but I was one of only a few marketing people in the market who knew how to sift through data to find insights. That was why he hired me – though my data skills were basic at first. This was almost 30 years ago and the people who knew how to do “data mining,” as we called it back then, were few and far between. You would typically find them in actuarial fields or theoretical science fields, such as astrophysics, or even biological research areas where data collection and analysis were key.

In business, most people who knew anything about data had computer science backgrounds and worked as Database Administrators (DBAs) in the IT department. The IT department and the marketing department did not get along at all. The IT department did not consider marketing to be of strategic importance in that era, which meant that of all the project backlogs they had, we were the lowest priority. This meant our projects never got done. They saw marketing as the “swag” department, who gave out logoed stuff and threw parties. They did not take it seriously when I would ask for their help in getting more data to understand how we could grow our share of the wallet with customers or increase revenue by targeting specific customer segments.

Because of this extremely off-putting attitude, I was forced to take matters into my own hands, and I began amassing data directly for the marketing department. My boss, who was just as frustrated with IT as I was, said, “OK, what do you need?” I said, “Give me a server,” and he replied, “OK. Done! What else?” (Marketing had money, you know!) I said, “Well, I’m probably going to need more data classes.” At that time, Oracle had a university – literally called Oracle University – where they offered week-long courses right down the street from the company. I took a bunch of classes there, and then I took a bunch of classes from SPSS. My thinking was that if the IT people could learn it, then I could learn it better. Then I befriended and bought pizzas for as many DBAs as I could on the IT side of the house to get their help in properly setting up the data pipelines to my server.

Then voilà! I was putting out strategic segment analyses based on product types and where we needed growth. I dug into which customers were the most profitable and how we could deepen our share of the wallet with each one – including the use of VIP programs. I focused on customer churn, including understanding the triggering events and root causes as well as what could bring them back. As that progressed, I realized I had executives flocking to me in the company, asking me more and more about the customers, how to sell to them, and how to create models – almost like the Amazon recommender system idea, but internal; if my client buys this, what else might they buy? What patterns are you seeing? My boss also added his own perspectives and we produced the analyses in what became an anticipated quarterly report called “the red book.” Every major leader in the company wanted a readout and Q&A session of the analyses for their department, and the analytics from it ultimately helped the company to be acquired at top dollar by a major international internet group that you would recognize today.

My boss’ persistence and insistence made me believe, at a very early point in my career, that you can learn anything – just get over there and learn it! That defined my career and the way that I look at data scientists now. The way I see data science as a profession is that it’s all about asking the right questions and looking across the business at the business needs in the same way that I did when I was sitting there as a 20-something-year-old, trying to figure out what kind of analytics to put forward to an executive team who wanted to know everything about our customers. I needed to think about what we were trying to do as a company. I still take that approach to this day.

I think that many data scientists lose themselves – sometimes they put their blinders on or just find that it’s an uncomfortable thing to get out and talk to other parts of the business. I think they forget that what they have to do is stick to the strategic plans of the business. It’s been a struggle, with everybody I’ve worked with or that I bring on, to try and convince them to have that tenacious, persistent personality. I can teach a person any skill. Tech skills change with the times.

The way that you think about solving problems and home in on the things that matter: that’s going to determine the longevity and relevance you have in your career.

Discussing diversity and leadership

AA: The next question touches on how the industry has changed. We’ve come some way in regard to diversity and inclusion, but we still have a long way to go. I’m curious as to whether you’ve personally faced any challenges in that regard, or whether you’ve observed any things that maybe you weren’t happy with, things that should have played out differently. Also, what should we be doing more broadly as leaders in the field to change things?

CA: This one’s always been tough because there’s always been that disparity when it comes to women in the industry. I think a lot of that is changing, as far as getting more women into STEM is concerned. A lot of women have a proclivity for math, or detailed thinking and connecting the dots. They are also amazing negotiators, which makes them some of the best data governance leaders. Anyone who has ever tried to convince a business leader to share “their” data – meaning they bought it with their budget and enhanced it with their expertise – will understand the nuanced negotiation skills it takes, akin to negotiating a prized piece of candy away from a child before dinner. Looking at connections between datasets too, having those thoughts of, “Well, this could connect with that," and, "Oh yeah, I’ve seen some data from this other team that could more clearly fit this model” – that is a dynamic I’ve experienced more often when working with women.

Many of the guys I’ve worked with in data and analytics are very logic-driven – always thinking linearly and of validity before connecting the dots. But sometimes, something may fall out of the norm of the logic that they’re used to. Take behavioral psychology, using an example many people could probably relate to. I say to my husband, “Let’s talk.” He’ll be like, “Oh my good gosh! What did I do? I don’t want to have this talk because you’re going to talk about squishy stuff like emotions such as happiness or disappointment that are not quantifiable and that I therefore cannot affect.” But today, in a lot of data-driven use cases, we’re talking about data that we are trying to use to emotionally drive people to do things. We’re trying to understand emotions, even in cars. We’re trying to understand: “Hey, is this person rage-driving based on data from the odometer, brake pad sensors, or steering wheel motions? Does this car need to be automated to slow down upon recognition of precursor behaviors? Is there some way we can intervene?”

I see women as becoming more and more important in the data field. I’m not saying there aren’t guys that can do all of these things too, but I just have seen – in my experience, anyway – that men tend to be very strong linearly and chronologically with thinking through problems. Meanwhile, women can comfortably think about patterns all over the place, all at the same time. I know men who seem to think as follows: “I deal with this first, then I deal with this second, then I deal with this third, and then I check everything at the end.” Women are more free-associaters: “Oh, I see a pattern here, here, here, here, and here. Let’s pull that all together. Let’s see what that does. Let’s run that. Let’s try that 50 times to Sunday.”

When we marry these two styles together is when data science is at its strongest. It takes a lot of tolerance for each other’s styles, and it takes a lot of giving each other grace and credit. Trying to look at many different patterns at once drives a particular type of person crazy. A lot of people in the data science field come from computer science backgrounds, where you think a little differently about things. In computer science, you are taught to think about the flow of the code: how does it work in terms of its logic? It’s just a way of thinking. We have got to figure out how to marry all of this together so that women and men and any other groups can challenge that thinking process. What happens is code is set up that points only in one direction and there’s no one divergent point from which you can stop and say, “Wait! Did you think about these other five things that could be out here that you could have put into your algorithm?”

Considerations for the types of data that should be included to get the best fit in a model are a good example of this dynamic – especially when you inevitably have to come up with a substitute or proxy for a piece of data that you don’t have. For example, I have seen on many occasions data scientists wanting to turn to financial scores as an easy proxy to determine whether a person is “responsible” in general – not just with their money. They will then use this score to determine whether the person is a “responsible” driver. But someone’s financial situation and how well they drive their car are most likely not causal. But if you are working with data scientists who all have good financial scores, they probably do not see how this data selection could be flawed for their model. In this case, they could benefit by having someone from a divergent socioeconomic background question this assumption and also inform the data science team of the personal impact on a person who is declared an irresponsible driver – for example, their insurance rates go up, which may cause them to have even less disposable income for gaining a higher education, continuing important maintenance on their car, or worse, triggering a choice between buying groceries or paying rent.

As leaders, we should do the following:

Think about the types of models we want to build, their strategic importance, and their impact on society as a whole. We tend to think only of the direct impacts in front of our faces, such as “Did we get a bonus for delivering the best revenue results or cost savings?”
Data science teams developing some of the world’s most important algorithms can’t all have the same mindset or they’re going to miss important stuff that can affect people whom they have never even considered.

We need to strive to hire melting-pot teams who are as diverse as the people that the models could impact in society.

We need to make sure that our norms, processes, and tools allow for the time, space, and ability to push back. It’s one thing to have a diverse team fully able to bring many perspectives to the table, but people need to be able to speak for themselves.

We need to give people the ability, in standup meetings and other time-sensitive situations, to push back when they need to, and to do so without fear of harm to their career.

I have seen people be railroaded in the process so they don’t get to voice their different views. I have also seen people be blacklisted from future high-profile data science initiatives if they dared to raise a red flag that could negatively impact the budget or time to market. We must be able to encourage speaking up; otherwise, we take on unnecessary risks that could cause the data science projects to fail, or worse, could bring down the company’s reputation or regulatory compliance or cause a decline in the company’s market value.

AA: I think that is a great way to summarize where some opportunities exist to try and really bring people together to work toward the common good.

The next question touches on your work at the leadership level. Say you’re consulting or working with an organization that’s just starting out in the field and they want to develop a capability, and they say to you, “We want to get stuck into data science and AI. We don’t know where to begin.” How do you advise them to begin that journey?

CA: Well, the first thing I always ask is, “What is the strategy?” It’s the same question I asked myself in the role I was just telling you about: how can you be relevant to the company? How can you be the most relevant? How can you be essential?

Let’s say there’s an insurance company and their strategy is to reduce costs. One of the biggest areas where costs can be out of control is legal fees. Depending on the size, insurance companies can have hundreds if not thousands of law firms that they hire to take on all kinds of cases across state, national, and international borders. Let’s say you want to use data to understand which law firms are overcharging and by how much. Each contract you have with every single law firm is probably massive. What you should think about is this: “How can I use data to try and compare patterns of usage so that I get the most efficient contract from that legal firm?” Also ask, “Is that legal firm overcharging me based on the types of legal matters that we agreed they could take on? Are they overcharging according to the contract that I signed? Could I get the data to show how much money in total per law firm has been overcharged and then dispute the charges to save costs and thwart overcharging in the future?” You can do that type of contrast-and-compare work using data and analytics, especially with machine learning capabilities.

I would start by understanding the strategic goals of the business by poring through statements made to investors by the executive board of the company. That’s where you start: what is the most strategic thing that you need to accomplish, and how does data play into that? It’s not the opposite way.

You don’t start with, “I’m going to be data-driven.” You start with, “I’m going to achieve this strategic goal, and this is how I’m going to use data to do it.”

AA: Once an organization has an idea of their business strategy and they can formulate a data strategy to support that, what are some of the next key steps? Would you recommend they first employ certain people, or should they first build a tech stack and then bring people on board? And how should they structure those teams? Do they need a CDO? Do they need various team leads? Do they centralize the data science capability or distribute it?

CA: That all depends on the company and the industry they’re in, how small or big they are, and how much they’re looking to streamline what they’re doing. There are also legacy systems and technical debt to consider. But really, it goes back to the company’s strategy and how much money it has to do these things.

Let’s say that you’re in a 1,000-person company, and you have the ability to hire a chief data and analytics officer who can then go and interview the CEO, find out what their strategy is, look across all of the available data, and get a good understanding of the inventory of data that exists in the company.

Then, you say, “How can I use the existing data to take on some smaller projects that move us toward the goal, so that I can reach the finish line and have some wins?” If you’re just starting out with being data-driven, it’s important to demonstrate small wins along the way. So, you take a long journey, and you say, “I’ve got a 30-day deadline – here’s what I’m going to deliver. I’ve got a 90-day deadline – here’s what I’m going to deliver; all the way up to the 5-year plan! And this is how we’re going to keep moving that trajectory up, depending on what the strategy is.”

In that insurance example, you want to reduce costs. So, you might say, “We’re only going to take on law firms dealing with a certain type of legal matter.” You divide things up. Have you ever heard the phrase you eat an elephant one bite at a time?

Instead of trying to tackle a huge project all at once, you tackle it in bite sizes so that people can see results. As you progress, you start to grow as well; it’s kind of like an internal start-up. You’re an intrapreneur, so to speak, instead of an entrepreneur.

AA: Given that you provide advisory services to Fortune 500 companies, I’m curious to know: what are some of the common mistakes that you see organizations making when establishing or scaling their capability? What roadblocks do they often face, and what should they be doing to overcome them?

CA: The mistake I see the most is that there’s a lot of politics involved in data and analytics, from the minute that you start looking into people’s data. There’s also a lot of politics in terms of Chief Information Officers (CIOs) versus data and analytics. We have CDOs and people who are trying to use data who consider themselves more data-astute than other groups. That means I see a lot of groups, before they even get the chance to run those bite-sized projects, getting mired down in the politics of who supposedly owns what data inside the company.

CDOs who come in from outside a company have to be really good politicians.

CDOs have to be good at negotiating and influencing people who aren’t their direct employees. They have to manage up and down and sideways. If they can’t do that and they’re just coming at things with a hardened attitude, they’re doomed from the get-go.

You have to have supreme diplomacy skills when doing this job. A lot of times, I think the personalities involved are often what causes a company to have a failure in data and analytics, as opposed to not having the right data at hand. What’s sad is that almost every company I know of has had all the data that they need right there at their fingertips within their own company, but unfortunately, infighting causes people to be delayed and not work together. That causes failure every time.

AA: I completely agree. I see the same thing: it’s the people side that is often the real barrier, not the technology or the data.

Implementing an ethical approach to data

AA: I’ve also been seeing lately that a lot of organizations want to ensure they are responsible when it comes to AI: implementing ethics frameworks and having people accountable for the ethical and responsible use of AI and data. How do you typically guide them on this journey in regard to ethics and governance around AI? Do you have any particular frameworks or processes that you would recommend (across different organizations) in terms of what is really needed to become ethical in this space?

CA: This is more about culture. It’s also about going back to the main high-stakes usages of AI.

People are going to hate to hear me say this, but not everybody needs to be concerned about ethics and AI. If they’re just doing A/B testing on steroids, they don’t necessarily need to be concerned.

When I talk about the ethical use of AI, the starting point is typically an impact assessment of the use case that you want to move forward based on.

Let’s say it’s some C-suite person who wants to make sure AI is being used ethically. Typically, the way you want to start with ethics is you want to look at the most impactful, high-stakes, high-competitive-advantage types of strategies that you’re using AI for inside your company. You can determine that without going through some big, lengthy process. You can pretty much just tell. There’s a red-yellow-green situation here. If it’s going to be life and death – if a person steps into your self-driving car and can die, or someone outside of that car can die by being hit by that car – that’s a high impact. That’s a red-level threat. If someone’s life, liberty, and safety can be affected, that’s red. Health diagnostics, weaponry, or anything automated is going to fall into this category. Automated with humans involved is probably going to fall into the category at some point.

If you take a step down, the orange level would be along the lines of whether we are going to affect anyone’s rights or happiness. Are we going to keep them from getting jobs, or anything that would fall into the UN framework of human rights? If we’re going to limit their ability to have shelter, food, water, or education, or the pursuit of happiness, that’s high stakes. We have to go through a much more rigorous process of evaluation: is the use case even an appropriate use of AI?

We can argue that some people in this field think more like computers than people. If you plotted a spectrum from computer to person, there are a lot of people who can fall into the thinking-like-a-computer part of the spectrum. You have to evaluate the use case considering that spectrum. Some people are OK with receiving, for example, a chatbot version of, “Hey, you’ve got cancer.” That, in my mind, is never going to be an acceptable use case for AI – never. But in other people’s minds elsewhere on that spectrum, they’d say, “It’d keep doctors from having to deliver that information, which would be hard on them.” But what about the person on the receiving end of this news? Did you think about them?

I know it seems ridiculous to think a chatbot might deliver such horrific news to someone, but during COVID times, when doctors were overwhelmed and going down with COVID themselves, the idea was becoming more of a possibility. I’m so glad we didn’t fully get there, but we were on the cusp. If you read my book, you’ll see how close the UK National Health Service (NHS) was to implementing this.

AA: That’s a great example. How effective do you think AI ethics boards are in helping organizations implement responsible AI?

CA: This one’s a tough one. It’s not because of the AI ethics boards. It’s because of the way that the AI ethics boards are set up for failure or success by the group of people who is the most powerful inside the organization. We call this the HIPPO: the highest-paid person’s opinion inside the organization. The HIPPO will be the one that is adhered to whenever there is a conflict between the AI ethics board opinion and the HIPPO. And this causes a natural contention that can make it hard for external AI ethics boards to function properly.

I just commented on this in Wired about the Axon CEO who openly announced he was thinking about developing taser-enabled drones for the inside of grade schools. It was a response to the Uvalde grade school shooting (https://www.wired.com/story/taser-drone-axon-ai-ethics-board/). The announcement caused his AI ethics board to resign because they had already discussed it with the CEO and disagreed on the development of taser-enabled drones for schools. They didn’t want to be associated with that – rightfully so, in my opinion.

You need to consider how you set these boards up, how much power and visibility they will be given, and what you will use them for. In my mind, the best use of AI ethics boards is to bring in diverse sets of opinions, especially if you’re a smaller start-up or a mid-sized firm that doesn’t have access to people who are regularly researching in the field of ethics. Typically, you want to bring in an AI ethics board that’s going to lend you their diversity in some way, shape, or form.

Diversity can exist in many different ways. It can be socioeconomic. I think at one point, Starbucks had considered doing away with cash, and what they found out real quick was that if you do away with cash, there’s a whole group of people in America, about 40 million people, who don’t have credit. They can’t whip out a piece of plastic and pay for stuff because they get paid in cash. There’s a whole cash world out there that’s not necessarily criminal. The data scientists working on the project had never been without credit, and didn’t realize this isn’t the case for everyone.

You need those very diverse opinions to remind you that there are people out there that don’t think like you. You need to figure out how to accommodate them or you may lose them, you may have public backlash, or you may face some kind of societal impact that you weren’t expecting. I think that’s where ethics boards are really strong, especially when you’re moving into a new space that you haven’t been in before.

Google’s external AI ethics board was short-lived, but let’s say Google had wanted that board because they were moving into weaponry, which they had not been doing. Let’s say they wanted to think through fully what that decision entailed. They had a member of the board that was supposed to help them think through the ins and outs of that. I think those are really good ways to use a board.

Unfortunately, a lot of companies want to just virtue signal by having and announcing the board publicly and announcing when they take the board’s advice, but not announcing when they don’t take its advice. That’s the rub right now: it’s trying to figure out that balance, and that’s hard.

AA: Speaking of ethics more broadly, there are so many different definitions around bias, fairness, and explainability. How do we reach common ground? Will we have a situation where different organizations are implementing different interpretations, and how does that affect the citizen or the consumer?

CA: That assumes that the decisions will be transparent at all, which is one of the things that I’m working on right now.

I’ve been promoting the 12 Tenets of Trust, which is on the AI Truth website right now. I think you need to have some way for people to have a sense of agency and transparency in the process of what you’re building. Otherwise, you will have bias that you can’t account for, because we’re all biased. We’re only human, so we’re biased. Unfortunately, we’re also the ones who build all of the information, and the information comes from us, which is also biased. We also can’t find data for every single aspect of our lives. As much as we produce data constantly, sometimes there really are areas of our lives where we can’t provide data that will back up a decision about us that affects us.

12 Tenets of Trust

In Cortnie’s fantastic new book, What You Don’t Know: AI’s Unseen Influence on Your Life and How to Take Back Control, she shares her insights and expertise to help everyone understand AI beyond all the hype. She also provides 12 tenets, or principles, to help creators develop trusted AI, which you can also find online: https://www.aitruth.org/aitrustpledge.

You can read more about What You Don’t Know on Cortnie’s website: https://www.cortnieabercrombie.com/.

You really have to set up your high-impact machine learning capabilities with transparency in mind. Again, not A/B testing stuff, but the high-impact stuff that incorporates feedback from those who will be affected, if you can do that.

In the financial business, we know how to implement identity and verification. There should be no reason why we can’t allow someone to verify their identity and then respond with, “Hey! You are munging five different sets of data. I see that you had this one thing in my financial score. I see it here: you gave me access. I can actually see it online. I know where to go to get to it, and I can see the explanation of what was weighted, why I was given this score, and why it might have gone down or changed, and I want to dig into that because it’s currently affecting my ability to get an education loan or a house loan.” I should be able to click into the system, and then it should open up a way for me to answer questions such as, “Where did that data come from? What was the sourcing of that? What was the lineage of that?” Then, I can participate. Was that data correct or wrong? Did it have me living in some part of town that was maybe risky according to the model, even though I have a really nice house there?

That feedback loop and the ability to have some personal agency where we can weigh in is important. Even our music choices give us that ability. So why don’t the most major decisions in our life? For music, I can say, “I like that song; I don’t like that song.” On Netflix, I can say, “I like that movie; I don’t like that movie.” Why can’t we do that for the data that’s affecting us the most? Why can’t we have the explainability and transparency in place? Somebody on the backend should have an automated capability to compare, an ability to do this thumbs up/thumbs down per piece of information that’s affecting our outcomes. They should have some sort of automation that can then affect another system to change data permanently.

Then there are notifications.

If something’s high-impact, we owe it to people to give them notifications. It is the least we can do.

If Netflix can do it, come on! If we’re going to sabotage someone financially, I think that’s the least we can do for them. Let them know and let them have the ability to weigh in, “Yeah, that’s my data. No, that’s not my data. Oh, this person turned in a bad piece of information. It was a different Cortnie Abercrombie over here in this other part of town!”

AA: That’s one of the best arguments I’ve heard for explainable AI.

A lot of organizations fail in their endeavors in AI. I’ve seen statistics of around 85% failure rates. In your experience working with large organizations, do you think this is realistic, and if so, why is it happening like this? So much money is being spent on AI. It’s relatively cheap and easy these days to develop machine learning models. Why are so many failing? Where are they getting it wrong?

CA: This is a hot point for me. This is a Cortnie Abercrombie statistic with absolutely no grounding, so I shouldn’t even say it, but I think 90% of what goes wrong is in the data, and people just don’t want to investigate the data because it takes time. It takes energy to come up with the right data that’s fit for purpose. I think that we use a lot of scraped data. We beg, borrow, and steal whatever we have to because of the way that we have set up our processes. When we look at what is at the root of AI and machine learning models, there are three aspects. I’m sure every data scientist out there is going to say, “You can’t boil my whole job down to three things!” I’m going to try anyway. It’s data, algorithms, and training or implementation. I’m including the training of the algorithm in implementation. You could probably argue that it could go outside of the data side, or you could say it’s part of the actual algorithm itself. But I think algorithm selection is its own beast, along with iterating constantly on that until you get the level of accuracy that you’re looking for.

Additionally, we have another problem that nobody even acknowledges: what are the users going to do with this stuff? I was talking to someone who just bought a Tesla and had no instruction on how to use it. 20 to 30 years ago, even when we were just getting pivot tables in Excel, that was new stuff. People trained us on all those new things. Nowadays, we just hand stuff over and say, “Here you go,” and we don’t tell people anything about it. We don’t tell them how it works or what it’s been trained on. This friend of mine who bought the Tesla won’t even use the self-parking feature on the car because she’s like, “What if a toddler runs out? Has this even been trained on toddlers? Can it even see a toddler with its camera?” I think that’s a legitimate question.

If you don’t give people some level of training and understanding of something, they just won’t use it, and that’s how we get these failure rates. There’s no trust. First of all, what data did you use to train it? That’s probably the most basic question that everybody’s going to have in their minds. The first question my friend had was, “Has it been trained on toddlers?” These are erratic little beings that can just dart right out behind a car. Someone else may be parking while we’re parallel parking, and there may be a van approaching. Do I really trust this thing to take into account some erratic little being that’s only 2 feet tall running out into the path? That question is legitimate in all cases of AI and data science. Where did you get the data, how did you train this thing, and do I trust that?

Think about scraping all those “labeled faces in the wild,” which has been the most used open-data source for facial recognition – something like 35% of the pictures were of George Bush. Did you know that? That’s ridiculous! Because it was a dataset created in the late 90s. It was mostly his face because we didn’t have as much social media participation as we do now. Even the frequency of updates is so important. How many data scientists do you hear these days saying, “Well, all of my 8,000 APIs are updated on this date”? We don’t know! We’re just pulling this crap together, managing it 50 ways to Sunday. There are 8,000 APIs coming in. I don’t know when they all come in! I don’t know where they came from! I don’t know who put that together!

And yet you have the Cambridge Analytica situation where a company pulled Facebook user data, did a personality test on everybody, and used that data to target specific people with political campaigns.

You’ve got to know where this stuff comes from and you’ve got to investigate and interrogate it, especially if it’s one of the major features driving your model. If it’s actually something that’s making a big difference, you owe it to yourself to know everything about those bits of data that are coming in. I’m not expecting people to know all of the 8,000 different APIs that they use, but they should know the major things that are affecting their models, and that’s where I think things are going wrong: it’s the data – and not understanding it. All of this leads to not having trust in the AI product or service. Lack of trust leads to non-use, and that leads to the failure rates we see today.

Establishing a strong data culture

AA: Data culture is a somewhat related issue. In your opinion, what does an effective data culture look like? How do you advise organizations on building a data culture?

CA: It goes back to the C-suite culture: the people in charge, how they view data, and how involved or how data-literate they are really affect the data culture. The reason that people have a poor data culture is usually that they have people who don’t know anything about data at the top.

There can be problems at both the bottom and the top of the organization. I have a top 10 list on my website about this, in an article about when, as a data professional, you should just walk away from a situation because you’re not really going to get anywhere (https://www.aitruth.org/post/10-signs-you-might-want-to-walk-away-from-an-ai-initiative). There can be this over-amorous feeling about data. The CEOs and C-suite-level people can sometimes think it’s going to solve world hunger! They don’t have a clue what it’s actually supposed to be able to do within their bounds and their company, but they think that it does a lot more, and they think that somehow the data’s going to do whatever analysis needs to be done itself. They don’t think about the people who are actually performing the analysis, how long it takes people to get things done, and how much data needs to be cleaned up.

We used to laugh a little bit when I was at IBM about executives who would promise to get data solutions up and running within four weeks. We would say, “Yeah, that’s going to be just an initial investigation of your data.” Anytime you’re working with data, you have to understand the quality of the data that you have and so many other aspects, such as where you’re going to get it.

At AT&T, they did projects for everybody on the planet and they had 1,000 people acting as “data librarians” – that’s my term, not theirs. You could go to these expert data-sourcing resources and say, “I’m on a computer vision harvesting project for John Deere tractors, and they want to know the varying stages of ripeness for a red cabbage. Do you happen to have pictures of red cabbages in varying stages of ripeness somewhere?” There was a 1,000-person team that could say, “Yes, there’s a place over in Iowa that’s been working on this. We will procure some datasets or an API for you.”

Sometimes, the data is easy and readily available, depending on what you’re trying to accomplish, but other times, it’s a hard use case and you’re going to be tapped to try to figure out where to get it. Where am I going to source this information, and is it even possible to do so? There’s a whole investigation that has to happen. If your C-level leader doesn’t understand what goes into it and doesn’t trust the people that are working for them, it’s not going to work. You’re working with all these vendors that have been hodge-podged together, which a lot of C-suite people do because they just see the numbers: “Oh, it’s cheaper if I outsource this.” But what you’re dealing with sometimes is that they’re just throwing bodies at the problem as opposed to actually having expertise – expertise can sometimes cost more.

The C-suite can have a great effect in terms of how much time they give to a project and how much leeway they give to people about finding data sources, investigating them, and pulling them together in the right ways. I’ve seen that when people are not given enough time or budget, they’ll just go for the cut-throat version of things. They’ll say, “OK! I’ve got $1 per record to spend on something that should normally cost $100 per record,” or, “I need genetic information on this but I’m not allowed to have that, so I’m just going to make up some stuff.”

You see all kinds of bad practices because the C-suite has unrealistic expectations. But then, you see bad behaviors going up too, from the bottom up. You see some data scientists that are just lazy. They don’t want to do things the right way. They are collecting experience on their resume like baseball cards. They just want to go from the project that they just got offered to this project, to the next project, and then they’re going to put that on their resume, and they’re going to keep moving their salary up from $100,000, to $200,000, to $300,000, to $400,000.

There’s bad data culture everywhere.

The best thing you can do is be literate about the data issues, have some trusted people that you work with, and pay them well.

Look at the types of products people have taken on and how long they spend at a company. If they are one of those people that’s just in it for 12 to 18 months and you only see one project per place on their resume, that’s a pretty good sign that they’re just going to rush through, not document anything, and then leave you holding the bag with no Return on Investment (ROI) at the end of it. That’s my personal opinion.

AA: Yes, that all resonates with me. I love the way you also address the issue with data scientists themselves. People are often very guarded about speaking negatively, but let’s be honest: there are many data scientists who are collecting experience, just trying to move up the ladder like any working professional. They’re no different from anyone else. They can be quite crafty with what they put on their resumes.

CA: That’s exactly right. My thought is, “Go with the people who you trust, and if those people happen to be inside the company already, then just teach them the skills.” Remember my boss from before, he said, “I know you can do this. You’re ambitious. We’re just going to give you all the classes you need.” I go for trust and the personality types that I think would do well, and then I just give them the skills. That’s how I approach it, as opposed to the opposite, where data science candidates have promising skills, and then they come in and then they’re not really that fantastic, but you’ve paid a ton of money for them and possibly signed a long-term contract. You don’t want to get to the point where you’re at the end of the game and thinking, “I’ve already sunk a million dollars, and now I have no idea what this person did.”

There might even be no documentation because a lot of people see failing to document what they’ve done and how they’ve munged data together as a way to control the situation, provide job security for themselves, and increase their salary. Some people do that. I’m not saying everybody does, but some do.

Then, we also have the opposite situation. There are also abused data scientists out there too, who are truly trying to do the right things but the time frames and budgets that they’ve been given are just so unrealistic that they couldn’t possibly deliver a quality end result. Every profession has good people and bad people and people in between just trying to survive.

AA: I’m sure you’ve seen many examples where things have gone awry in terms of poor data cultures.

CA: Let’s face it: data culture is just so important. One other aspect of this that I’m learning through research right now is that within the pod structure of data engineers, junior data scientists, and senior data scientists, it is the junior data scientists who are the ones who are most likely to blow the whistle when they’re not being paid attention to. When they’re raising objections in a process, they need to be listened to. What we’re not seeing in these big companies is the ability within the agile process to push back, to assume that there will be some red flags, and to slow down. We’ve become so accustomed to our agile processes delivering the Minimal Viable Product (MVP), even if it’s just an API feed, that we’ve become used to delivering in six to eight weeks. Sometimes, that’s just not possible.

Anytime someone pushes back or people are under pressure, it’s important to know what they’re going to do. Are they just going to drop everything and say, “OK, fine, whatever. I’ll just be unethical – it’s fine. I’m just going to deliver this because my bonus depends on it,” or do they actually care? Will they push back on this part of the process, going up against a senior data engineer and a senior data scientist? Can they hold their own there, and do you have support for them to hold their own? Will the lead data scientist interfacing with the chief marketing officer or digital officer give these people enough pushback ability? I think that’s not happening at all.

I think what we have is a whole lot of abused data scientists who are trying to raise the flags, and others are saying, “No, you’re slowing down our process. We’re not going to get that patent filed in time, or we’re not going to get this thing out to market in time, so we’re just going to push you down.” If that’s the case, you have a very toxic, very risky data culture, and you have to figure out how to address that so that raising red flags (and, more importantly, fixing critical issues that cause red flags) is the norm, not something that you’re blacklisted for. You shouldn’t be blacklisted as a data scientist for bringing up risks that need to be addressed before a product is released.

AA: You’ve touched on an issue that isn’t normally voiced: the realities that data scientists face with unrealistic expectations and the culture they have to abide by to survive, which is often very toxic in nature.

What do you think practitioners (machine learning engineers and data scientists) should be doing to make sure they’re contributing in a positive way to the ethical development of their models and products?

CA: The overarching way that it needs to be approached is from the top down and the bottom up. Practitioners definitely have to be that last stop. The buck stops here. The responsibility is in everybody’s role. That’s why I semi-hesitate when I see AI ethics being called out as a separate group within a company. It bothers me because I think that gives you an easy scapegoat when things go wrong.

You just get to point over to a group that somehow failed you, when in reality, every single person that’s involved in AI development should be feeling that they are responsible in some way, shape, or form for every part of what they’re doing.

The thing that concerns me the most is this attitude in the practitioner groups of, “Why does that even matter to me? That doesn’t have anything to do with me.” Those are the people that need to find the problems and bring them forward because they are the people who should be involved in actually investigating data and understanding how models are set up. Only they know what they chose to use, what they chose not to use, and the decisions that went into that.

Let’s say they feel responsible, but they don’t feel like they can raise their hand, raise the red flag, and say, “Hey! I think we could do better on this data. I’ve seen a lot of flaws in it. I’ve seen that it doesn’t have enough of this type of people. We just left things out in general, and I think that the proxy data was crap” – that’s a problem.

In the case of Amazon’s hiring algorithm, we actually saw that the data scientists themselves were the ones who did the whistleblowing. They did it anonymously, and I’m so grateful that they did. Those executive sponsors (such as the chief human resources officer, for example) don’t really know what they don’t know, and so they would have continued forward with that hiring algorithm had the data science development team not raised the flag. The fact that we still have to call it whistleblowing means that we don’t have a culture or a set of norms yet that is conducive to pushing back, and that’s the problem.

AA: Yes, it’s everyone’s responsibility. I couldn’t agree more.

With the growing influence of AI in our daily lives, trust is a big issue. How do we develop trust in AI? I think your book goes a long way by helping the layperson understand what is and isn’t AI. Beyond that, more broadly as a society, what should we be doing? Also, how important is regulation in your opinion?

CA: We could strip those down, and each one has an answer.

In my book, I do talk about the 12 Tenets of Trust because I think that all across the globe, we’re in a time where we have the lowest levels of trust among people.

How can we hope to build and scale AI at this point when there’s no explainability, no transparency, and no accessible information about what goes into models? I don’t think anybody has earned the trust of anybody right now in the AI space. The typical answer is, “Well, I have accuracy,” but we all know that 100% accuracy on garbage is still garbage. I hate hearing, “Well, my accuracy rate is so great.” That is not a good, solid statement of whether or not we used sound judgment for the things that went into the model in the first place, the data that went in there, or even the way that we conducted the training of the model, and so forth.

People don’t trust each other. There was a research study out here in the States from a group called Pew Research Center (https://www.pewresearch.org/topic/politics-policy/trust-facts-democracy/). Right now, we don’t trust scientists, we don’t trust the government, we don’t trust the news, we don’t trust social media, and we don’t even trust each other as neighbors to do the right things. For all the many hundreds – possibly thousands – of ethical AI frameworks out there, I would say this. There’s one simple thing that you have to remember, and that’s the golden rule, which is to do unto others as you would have them do unto you. As long as you can remember that, you should do OK with trying to develop trust.

I think we’ve become addicted to moving fast and breaking things. That’s why I recommend this 12-step program to break that addiction, which I’m calling the 12 Tenets of Trust. Whatever you are doing in your data science, if you wouldn’t steal $20 off of the ground if someone in front of you just dropped it, and you would instead go find them and say, “Hey, here’s your money back,” then you should do that in your data science too. If your model’s stealing money from people and you as a person would not do that in real life, and instead would chase someone down and give them their money back, you should adopt a similar mindset when you develop your models as well.

We tend to have this thought that we leave part of ourselves behind when we come into the office. We somehow think, “OK, to move forward in data science, we just need a corporate mindset – we have to make money.” That’s our fiduciary responsibility to our stakeholders, but there’s still a line. If you wouldn’t do it otherwise, then don’t do it just because you’re working at a company because, at the end of the day, you still have to look your kids in the face. You have to look at yourself in the mirror and say, “Hey, I did something good today,” or, “I did something terrible.” So, the golden rule “do unto others” is the number-one way to make sure you’ve got trustworthy AI.

AA: Can you please elaborate on the 12 tenets?

CA: These are the 12 tenets that I think the public should expect all of us in AI development to adhere to in order to put their trust and faith in our products or services. The very first tenet we should meet is developing AI that is humane. People ask, “What does being humane have to do with data science?” But I think the very first thing we have to do when we come up with a high-impact use case to fund is ask, “Is it a humane thing to do?” The chatbot that delivers cancer diagnoses is probably not a good idea, for example. The same goes for that article in Wired that I was quoted in where a CEO wanted to use taser-powered drones in schools for children. That’s not a humane use of AI. Do we need taser drones inside school buildings with small children? No, probably not. You’ve got to start with the use case, and say, “Could this cause more harm than good? Is this an appropriate use of AI? Is this humane?”

The second thing to ask is, “Is this consensual? Am I taking some data from a person that is gathered for a whole different context from where it’s being used?” A prime example I use for that is a group called Clearview AI (https://www.oaic.gov.au/updates/news-and-media/clearview-ai-breached-australians-privacy). They scrape people’s information and faces from social media and work with law enforcement to provide potential facial matches based on that data. When you find yourself begging, borrowing, and stealing information from people on social media sites, that is not the original intent of the information. If you want to violate people’s trust, go right ahead and keep doing that. But if you want to build trust and make sure that you have a consensual approach where people know what their data is being used for and that it’s being used in a way that is consistent with what they’ve agreed to, then you want to be transparent. We’ve already talked about that.

Also, transparency is not enough. It’s not enough to say, “Hey, I have the ability for you to know about these things.” If you don’t inform people and make the data held accessible to them online so that they can see it for themselves, that’s not what I consider to be transparent or accessible.

The other part of this is personal agency. I need to be able to go in and affect something to do with myself, or at least understand it. It should be explainable, which is the next tenet. Can I affect that that is the wrong address, for example? Can I say, “That’s not the right location. Please go find that information and rectify that for me”? That rectification is actually part of the 12 tenets as well, which we don’t often see. We see a lot of explainability right now, which is fine. It’s hilarious to me, though, that everybody talks about explainability and nobody talks about accountability, traceability, or the ability to govern and rectify these situations, which are the other tenets.

We also need privacy and security. You can’t just take the X-rays of people who have had COVID and stick them where any hacker can get to them, along with dates of birth and everything else that would reside with those medical records. That could happen if you come up with a fantastic X-ray-diagnostic type of machine learning capability. Healthcare is fraught with those kinds of fast-and-loose methods of just throwing people’s information into a single place and then leaving it unlocked. It’s like saying, “I just threw all your stuff into a huge purse, and then I put a big advertisement on the outside of it, and I can’t believe people went in there and stole your data. I don’t understand why that happened.” People are not thinking about that. What people are thinking about as start-ups is, “I’ve got to hurry up and get this product out there and claim first-mover status. I’m going fast and loose with people’s information. I know there are some guidelines I’m supposed to be following, but within my own company I thought it would be OK if I just did this.” No. Just because you thought that would be OK within your own company, doesn’t mean it’s OK. Most of the breaches that we see today are from within, especially in the competitive environment among start-ups where infiltrations can be common and workers move from one competing start-up to another.

The final thing is making sure that your data’s actually correct. You need to make sure you have fair, quality information going into your model, not just some random bits that you were able to procure from somebody who just played a game on Facebook, swears that they now have the psychographic information of people, goes out and labels people as persuadable or whatever else, and then sells the data to the highest bidder (true story: that’s the Cambridge Analytica case). That’s just wrong. I think that you have to know where your data’s coming from. Does it have bias in it? Is it actually correct? If the data quality is really bad, you’re going to have overfitting, which is also going to cause the model to put out weird results. It’ll look like you’re getting some level of accuracy, but when you go back and look at what you’ve got accuracy on, it’s not going to come out right. You’ve got to pay attention to the data itself, and I’ve been amazed at how many data scientists don’t actually go in and thoroughly investigate their data.

To recap, the first 10 of the 12 tenets are these: humane, consensual, transparent, accessible, agency-imbuing, explainable, private and secure (that’s one), fair and quality, accountable, and traceable. Traceability is about whether you know when something went wrong and how it went wrong, and whether it can be traced back to a moment in time. People are using blockchain to do some amazing things with traceability, especially in credit reporting.

Tenet 11 is incorporating feedback. If something isn’t working the way it should, there should be a way to input feedback. This is especially required for expert systems, such as AI trying to take the place of actuaries. Believe it or not, people don’t find that a fun profession anymore, and we’re finding that Gen Z and millennials don’t really want to go into that field. Now we’re training AI to try to do it, but if you don’t have experts continuing to weigh in in some way, shape, or form, you’ll have drift. Bias can also occur if you don’t have ongoing feedback loops incorporated where humans are definitely in the loop.

The last one is governed and rectifiable. What’s the point of having explainable AI if you’re not intending in any way, shape, or form to fix anything that goes wrong? We’re not even talking about that as an industry. We’re so focused on bias and explainability that nobody’s stopped to ask the question, “Well, what do you do when you find bias? What do you do when you find out that your model has drifted all the way over here, or it’s turned into a racist, like Tay? What do you do? Do you just shut it down?”

Tay

Tay was an AI Twitter chatbot created by Microsoft in 2016. Intended to learn from its interactions with human users on Twitter, Tay ended up being taken offline after just a few hours as users taught the algorithm to make offensive statements (https://www.bbc.co.uk/news/technology-35890188).

Think about the Tesla example. We have self-driving cars that can’t even be shut down from outside of the car. What do police do when they find a couple sleeping in their car that’s just racing down the highway? None of us can do anything. Do we just have to hope and pray? No, that’s not acceptable. We need to start considering how we’re going to rectify situations when we build. It needs to be in the design from the get-go, and that’s how people will trust us: they need technology that they find trustworthy. Can we shut things down? Can we govern them? Do we know when things actually went wrong? Is someone even accountable to fix this? That’s my other pet peeve: you have model drift. Who’s going to fix it? “I don’t know. That person already left the company.” In what business are you allowed to say, “I don’t know,” throw up your hands, and say, “Well, anyway, on to the next customer.”

AA: Have you seen that happen?

CA: Yes, absolutely. It’s because of the way that AI gets implemented with all the different vendors and contractors. People are transitory in this space, and so what you see is that a project gets developed, and then it gets left behind with the people that are using it, but the people who are using it don’t know how it works. So, there’s nobody accountable.

AA: How important is regulation now in all of this at a government level and an organizational level? Are you an advocate for regulation?

CA: I usually am not an advocate for regulation. Not because of the intent of the regulation, but because of the government’s ability to execute the regulation in a way that doesn’t overreach to the point of creating barriers to entry for smaller companies who can’t afford to lobby. That said, I do feel like some areas such as social media, autonomous weapons and vehicles, and healthcare urgently require regulations in order to prevent major disasters from happening. For example, we have major social media platforms that are influencing everything from teen suicide rates to how elections are won and the full-on destabilization of countries through the use of fake news and the amplification of hate through bots. This cannot be allowed to continue. For these firms to continue profiting from ads while society goes down in flames is akin to Nero watching as Rome burned.

The big question on my mind is, “How can we get the regulations we need when we have congressional representatives whose average age is 70 or above, who don’t even understand how these firms make their money or how the technology works?”. The Facebook (now Meta) and Google congressional hearings in the US, where a key congressman asked how Facebook makes its money, were a sad demonstration of just how little Congress knows about the tech industry. If they don’t know how the business model works, how can they ever hope to understand what drives these companies to do what they do and under what circumstances they do them?

Facebook congressional hearings

In 2018, Facebook executives were required to speak at a congressional hearing in the US (https://www.vox.com/policy-and-politics/2018/4/10/17222062/mark-zuckerberg-testimony-graham-facebook-regulations).

While I have little faith in the actual lawmakers themselves understanding the problems, I do think there are enough of us out there advocating that we can educate them and the public at large. I wrote my book specifically for people who aren’t in the tech industry to enable them to understand the issues and impacts better. I want to get my book into the hands of as many legislators as I can so that they can be brought up to speed on what AI is, how it works, why adverse impacts happen with AI, and more importantly, what we can do about them.

AA: So, you think education is important?

CA: Yes!

I think the more that people – both inside and outside the tech industry – know, the better. That way, we can all exert pressure on both regulators and the businesses that use data science and AI. We need all people in society to be aware and understand the areas where these technologies affect our lives.

The areas people ask me about most often are related to jobs and being replaced by AI, such as helping a student pick a profession that won’t be usurped by AI, or how ads track them around online. All these years, I’ve been collecting AI questions from lots of different types of people who I’ve encountered: ride-share and taxi drivers, church ladies, family members, postal workers, and parents in line with me at the grocery store. I’ve tried to understand the main questions and frustrations in the minds of people from a slew of different backgrounds. I’ve taken those main categories of questions and made them each into a chapter in my book. The topics include the following:

AI in hiring – including AI interviews and social media flagging
Job replacement and automation – which jobs and what skills
Impacts on kids and teens – tech addiction, suicide, harm challenges, and trafficking
Political polarization and radicalization – fake news, conspiracies, and bots
Rights and liberties being usurped with AI use in criminal justice – predictive policing and facial matching algorithms
Life-and-death AI decisions in healthcare

Globally, one of the areas I think people feel most frustrated with but can’t quite pinpoint why is politics. We are all constantly outraged, pointing fingers back and forth between political parties. We cannot get beyond it. But what the average person doesn’t understand is that there are actually bots out there keeping the angst going by magnifying outrageous salacious content designed to provoke visceral emotional reactions. These bots are intended to polarize and destabilize democratic countries. There are a lot more of these bots than there are moderate people. The theory is that there are a lot more moderate people out there but their content is boring, so it doesn’t attract eyeballs. Because moderate content doesn’t attract eyeballs, it doesn’t get advertising dollars, and because of that, it doesn’t get amplified.

If the standard person understands that this is how everything operates on social media, that in fact their next-door neighbor is just forwarding something that was put out there by a bot, or is fake information, I think it would change the dynamics of how much they invest their personal psyche into the hatred that goes into comments. If you know you’re interacting with a bot, you’re probably not going to be as hateful about it as you are if you think you’re truly being somewhat usurped in your immediate society. I think that is what happens with bots. I think people see some of these comments online and think, “Oh my gosh! This is a personal attack on me.” These bots are so rampant, it’s not even funny. They’re there just to provoke hate in the most devious little ways.

AA: That’s great context. It’s something a lot of people just aren’t really aware of. I think it’s very important that data scientists are cognizant of how market forces and technology combine and are shaping society.

Designing data strategies

AA: I would like to discuss data strategies. I would love to know what you think are the key aspects of a data strategy. If you were asked by a company to develop a data strategy for them, what would you be thinking about? What aspects are front-of-mind for you, and how would you look at developing a roadmap for them to actually implement a strategy?

CA: I feel like I’m somewhat of a broken record on this, but I would always start with data, their business strategy, and then whoever the group that has the most data inside the company is. Maybe they have a big external project that they want to take on and they’re going to need external data, but most companies start with some internal data that they can ratchet up to produce things.

It’s hard to talk about a data strategy if you can’t talk about the actual use case, what the strategy of the company is, or the industry of the company. You need to know what the market levers are and what data is available, but first and foremost, what the actual business strategy is and who has it, and how much money they have to go after what they’re trying to accomplish. You also need to know about all the people inside the company that need to be at the table. I think the key aspect is having a business strategy and data strategy to support that, rather than something that’s separate and distinct, which I’ve seen happen, and which is definitely a mistake.

AA: When organizations are going from proof of concept to suddenly having to productionize that model and capability, I often see them hit roadblocks. They either don’t have the right processes in place or don’t have the right people. I’m thinking of MLOps and things like that to scale that capability. Have you seen some of these issues, and how do some of these organizations overcome this? What do you advocate for when helping them go from dev to prod at a large scale?

CA: The first part of this is understanding the executive sponsor and what they’re really trying to do, because somebody somewhere in that company is paying for the development of this AI or ML product or service. It’s usually a one- or two-person type of situation.

They may come with a whole team that thinks that they’re the ones who are dictating the project, but in reality, it’s one or two major executives who will be the ones that set the tone for the entire development project.

There are many reasons why data scientists can’t get past the proof-of-concept stage. One is that they don’t have thorough communication between them and the actual executive sponsor. When there is a breakdown in communication from the get-go as to what the executive sponsor is trying to accomplish – say you produce a proof of concept of some sort (maybe an MVP), and the sponsor says, “Hmm,” or “I’m not sure” – that means communication between you and them has already broken down.

The second thing that I see happen is the executive sponsor may not understand what the heck the data scientists have done. Data scientists need to clearly answer questions such as, “What did you do? What data is in there?” Explaining things in technical jargon is not helpful to a business sponsor. That is not communicating well.

Data scientists have to find a way to actually communicate, which means that the person on the other side of the table understands what they say.

Throwing in all the different models, all the different mathematical processes, and all the things that were thought through but didn’t go into the model will just make the business sponsor not trust the data science team. Instead, they will feel like the team is trying to snow them. The best thing the data science team can do is spell things out. If there’s something that the executive sponsor says that the data science team doesn’t understand, then the data scientists have to get under the hood of that. They need to keep asking questions of the sponsor until they think they have a good hold on the request or concerns, and then ask the sponsor, “OK, could you write that down?”

When people have to actually write things down, I find that things get a lot clearer for both the person writing and the person receiving the writing.

Those are some of the reasons. The biggest one, though, in my mind, is still trust: not having trust about what you’ve done and what went into the process, or whether you begged, borrowed, and stole to get there. A lot of teams feel like the overarching thing is that they’ve got to meet that MVP stage, which has to happen in anywhere from four weeks to eight weeks. That’s the typical thing, but that doesn’t usually give you enough time to truly do what you need to do with the data. I think that’s where things fail because you do all kinds of horrible things in this rush to get to the MVP, and then you think, “OK, I’m going to scrap all that and start over because now I’m going to have the funding that I really need to move forward.”

However, what happens is that once the sponsor or funder sees the MVP, they say, “Oh, no. We’re just going forward from here.” You can’t say it out loud, but you want to say, “Wait a minute, no. We built all this stuff based on chewing gum that we just stuck together. This isn’t going to scale.” They continue, “Yeah, this is great! I’m going to schedule a meeting. You’re going to show this to the CEO next week.” And before you know it, it just keeps getting bigger as more people get involved. You can’t go back, and then you’re stuck with what you did in the first place, and now you don’t get the proper investment to do it the “right” way or the way you had hoped.

You really have to set the expectations and have that first sit-down with the executive when you’re planning these things out. You need to say, “Look, I’m just going to be honest with you. The reason none of these things have scaled for you in the past is that you didn’t give the team enough time to build these things the right way and to truly invest in it.” People just don’t want to be honest about that. They’re afraid that the first answer will be, “No,” but I actually think it’s different if you’re honest and forthright and you show a legitimate plan. A lot of data scientists don’t want to do that; they don’t want to show a legitimate plan because it’ll show their steps. It’ll document things, and that causes accountability. But if you’re honest from the get-go, you might actually be able to scale your thing, make it outside of the proof-of-concept stage, and reduce the failure rate.

AA: One thing you mentioned really resonated with me when you talked about trust for data science teams. Something I always advocate for in my teams and mentees is that your goal is not to be seen only as a technical expert but as a trusted partner to the business. I think that is a really important differentiation for people to make in their minds: that they need to be trusted by their peers and senior executives, and not just be seen as someone who speaks technical jargon. I completely agree with you there. They have to speak and understand the language of business and translate their findings into actual outcomes: profit increase, efficiency gains, and so on. That is pivotal in building their careers, especially as they progress up the ladder, but we’ll get to some of that in a moment.

You talked about time pressures, which I think is a really big one. Sometimes, they’re very unrealistic, for various reasons. What are some of the best ways to help teams prioritize their demands, as a leader of a data science team? If you’re inundated with requests from senior executives, how do you prioritize what to take on? Do you follow any particular processes or methodologies around trying to figure out how to triage all of this, continue to add value, but still manage expectations?

CA: It’s always going to depend on the specifics; every business case is different. It depends on what you’re trying to accomplish, the relationships that you have, and how much trust you have between yourself and your team. That’s hard because a lot of us deal with lots of contractors and vendors, too. You’re not always going to have access to 100 data scientists just sitting there all the time. Sometimes, you have to say, “OK, let’s pull from outside resources,” and those people are hard to trust because you don’t really know what their levels of understanding of your operating environment are. When you use internal data and you’re working with vendors, there are a lot more checks that come with that.

That being said, the relationship that the data science team needs to have with sponsors and champions is everything, and it’s everything when it comes to prioritization too. Let’s say you’re working with the chief marketing officer and they have been going to the news and publicly touting this strategic initiative that they’re about to do. Let’s say the feature that they keep mentioning – that you haven’t even built yet – is at the top of their list. You talk to their people and you say, “Oh, yeah! They’re so excited about this, this, and this.” You might have something that’s not on that list that you’re working on, because we all get scope creep, and nobody’s worse for scope creep than inquisitive data scientists. We’re always saying, “Wow! Why did that finding come up?” You have to stop and say, “OK, wait a minute. Is this actually contributing to all of those things that this sponsor keeps mentioning?”

There are things that people tell you are important to your face, and then there are things that people will reinforce are important by the things they say to the public, investors, or other executives. There are also things that people don’t want to admit they actually find important because they’re things that are – for lack of a better term – cosmetic.

They’re not deep and meaningful and purposeful to the business, but they’re things that might give extra oomph to their campaign. You can get little wins like that with your sponsors. That’s how you build trust: by prioritizing the things that matter the most to sponsors and the business.

The last thing you want to happen is that you put all your eggs in one executive sponsor basket and then that person walks out the door. Who’s left as your big sponsor within the company that knows what you can do?

So, a lot of it is politics. Who are the sponsors that you need to attach to, and what are their priorities? What is it within a project that will get the response, “Ah! Let’s keep going so that we can build out the other features”? You have to keep them going. If the excitement fades to nothing because you’re producing things that are a cool feature but not something that was ever asked for, then stop.

AA: Yes. That’s great advice on the practicalities of tying your mission to that of the key stakeholders and decision-makers and their priorities. I think that is vital and may not be what a lot of people want to hear, but that’s the reality of it, isn’t it? If you’re not going to get funding, you’re not going to survive.

CA: Yes.

You’ve got to act like you’re a start-up.

AA: Often, I see issues around how success is measured. What metrics should we be using? How do we know what impact we’re creating with our models? Do you tend to think about any specific metrics? I know it’s a general question because it depends on the organization, but are there any particular types of metrics you tend to use to measure the impact of the solutions they’re developing?

CA: What was the strategic goal the project was designed to impact? What business metrics are measuring progress against the goal?

Let’s go back to the example of the insurance company from earlier in the chapter. The executive sponsor, let’s say it’s the Chief Legal Officer (CLO), is concerned that the costs of all the different legal firms they’re doing business with are eating them alive. It is becoming a detriment to the company. You build a model – and an automation capability – with the sole purpose of flagging and refuting legal fee overcharges. To understand whether your model is successful in this example, the question you should ask is, “How much money did my model save the company in refuted legal fees?” You should be able to go back to the original goal of the business sponsor (in other words, the CLO) and measure against that. You shouldn’t have a separate metric for your model. You have business metrics. Period. Did it have an impact or did it not, and by how much? If it wasn’t that much of a reduction (for instance, half a percent, equating to a few dollars, when you’re spending 100 million dollars on data sources and all kinds of other things), it’s not worth it. You need to stop the project.

AA: Something we touched on earlier that I’d like to delve a little bit deeper into is data literacy. It’s something that I see as a big blocker in many organizations, especially at senior executive levels: their lack of data literacy can make or break a project. Is this something you’re seeing? How much do you believe in enterprise-wide data literacy and development? How do you normally try and increase data literacy? Through training programs, such as off-the-shelf training programs or bespoke training programs?

CA: Data literacy is paramount.

I think data literacy is one of the most important things a company can work on to be competitive. Companies that aren’t data literate – if they’re not using data to drive every business discussion or decide how they’re going to strategically fund themselves – are probably not going to be in business for a long time. They probably have a very limited future versus companies that are data literate. I would say the same is true of senior-level executives who are data literate.

They will adapt their strategies more quickly to marketplace uncertainty – which is something we have seen a lot of since the pandemic and we will continue to see a lot of in the future.

Data literacy comes in degrees. Some senior-level executives may not understand the inner workings of a data science team, but they most likely have an idea of the data they would want to see to understand financial metrics associated with the performance of the company. If you want to help your team by helping senior-level executives become more data literate, then forget about training programs.

Instead, while in the midst of tackling a big data project for the senior executive, take extra care to communicate about the processes, tools, and key members of the team. You are the trainer of the senior-level executive that sponsors your major AI initiative. Educate them.

Assign one person from your team whose sole focus is to invest their time in educating that sponsor about how the data science team works, what data they use and how they get it, and the process they undertake. That is the only and best way to train senior-level executives. Because that is the only way they will be vested enough in an outcome related to data, to get them to care. Hopefully, that doesn’t come off as harsh. But senior-level executives have a lot to do in a day. Being data savvy only for data savvy’s sake isn’t going to be on the agenda. But being data savvy so they can understand the risks (meaning, risks of a project’s failure to launch or no ROI) associated with an AI project they are investing in…well, that’s worth it from their perspective.

I don’t encounter a lot of non-data-literate companies anymore – meaning the entire company of employees has no idea how to use data. If you’re in a high-impact industry and you don’t know data, I don’t think you’re going to be in business very long, and we have honestly probably seen that weed-out happen quite extensively during the pandemic. Especially to be in a world where everything is digital, you need customer data, market data, connected-device data, and data of all kinds. This is an essential part of doing business in the world these days.

AA: Yes, I think the definition of data literacy has expanded somewhat. It’s also about having a conceptual understanding of analytics, machine learning, and AI, at least at a very high level. I think that’s become important because some organizations, as I’m sure you see, get swayed by a lot of vendors who are still on the Kool-Aid.

CA: Right.

You’re talking about degrees of data literacy. There are the basics, and then there’s how not to get snowed by a vendor.

I think the procurement part of that is definitely a problem. It’s so much of a problem that I helped the World Economic Forum put out a piece for senior Human Resources(HR) leaders to help them identify the things they should be asking vendors who come in and try to sell them an AI hiring capability of some sort. I think those are good ways to tackle that level of data literacy. If people are having problems understanding what they should be asking vendors, I think that’s a whole different thing. It’s a different level of data literacy, but it’s definitely a problem.

AA: Yes. A great point.

I’d love to hear your thoughts on what makes a great data leader: someone at the senior level – say, the CDO level or the data team leadership level. What are some of the aspects that you see in good, strong data leaders?

CA: I had to do this back in 2013. I was asked by IBM to put together the very first CDO community to promote getting a centralized data analytics function up and running inside of the organizations we did business with – we didn’t even know what to call that role at the time. Was it a CDO, was it a chief analytics officer, or was it a chief data and analytics officer? There was no real precedence yet as there were only three CDOs that existed in the marketplace at the time.

Think about the Fortune 500 companies that you know out there. For any given company, IBM was selling data and analytics capabilities to the finance, marketing, operations, and IT departments of these companies. The IBM sales reps saw some overlap in what departments were purchasing. They finally just said, “We need one person. We’re doing some of the same types of sales over and over, but we have little bits here and little bits there in different departments. We need to create a role that can oversee larger, strategic data and analytics implementations.”

I thought this was massive hubris on IBM’s part at the time. I said, “What? You don’t just go create a role in the marketplace!” They replied, “Oh, yeah! We’ve done it before. The CIO, back in the mainframe days: we had to create an information officer to run all of these systems and mainframes within the financial institutions and NASA.” I said, “OK. I guess we can do that, then.” They said, “OK. I want you to treat the CDO – the chief data and analytics officer, whatever you’re going to call it – like your product.”

At that time, I thought about all the different data leaders that I knew from working at Verizon, at Citi, and across all the different functions within a company, from marketing to finance. I really had to look at what makes a great data leader, because I knew I needed to find those who were doing the best possible things and hold them up as examples to the marketplace. Then, I’d have to get the CEOs of all the biggest companies, pull them together into round tables, and show them the great things that these people were doing that would actually make them want to consolidate data and analytics inside their company and not just have 50 other people trying to access data. Otherwise, we’d reinvent the wheel for data access for marketing, finance, and every other department inside their companies. That’s nuts! They all want the same types of stuff. There should be one person in a company doing this stuff.

The answer to your question is that the best data leaders are the ones that are asking the highest-level questions in a company and are attached to the incentives and strategy of the company as a whole. They think strategically at the CEO level about how they use data to reach the company’s goals. That’s what they do. Plus, they’re politicians.

The worst data leaders are really good at their actual data job, but terrible at managing the minefields of relationships with CIOs or CEOs.

That or they’ll drop the data package at the doorstep – real or virtual – of senior executives and run in the other direction, as opposed to having full conversations and welcoming and educating people, almost as business counselors.

You know school guidance counselors? Well, a cross between a business guidance counselor and a politician is what I would say most good data leaders are. They just happen to also know how to wield data to meet strategic ends. The best CDOs manage to carefully migrate past political minefields, form relationships with all the different C-suite roles, and then come away with sponsors and champions to undertake strategic data initiatives – because usually, they’re not the ones with all of the money to do all the projects. They make friends and influence people so their data analytics teams – federated with other departments or centralized under them – can shine.

AA: What’s your transition into being one of the global data leaders been like? Have you found that challenging? I’d just like to get a better feel for what it’s been like for you.

CA: I was incredibly frustrated because I had a vision. Since CDOs were my product, I had a vision for what CDOs needed to be and when they needed to be that. I started with two data officers in the beginning. They would have a career progression that started with data governance. They’d pull all the data together inside companies (this was back in 2013) and try to make all of that different data work. That was the reason why we were even making the role in the first place. It was why we were trying to make this role in the market and trying to get the CEOs to demand the role. We were putting out job descriptions around what CDOs should be doing, so some hiring firms could go out, hire the people, and start them in an organizational structure at the company.

With that in mind, the maturity progression was supposed to be from data governance to business optimization and then to market innovations. For instance, business optimization would be looking across the business and doing more operationally sound analytics to help businesses cross-sell to customers and things of that nature. A lot of companies do that now. Back in 2013, that wasn’t necessarily happening at the ubiquitous level that we see now. Finally, you’d get to the level of market innovation where the company’s data could become its own source of revenue through the release of data products and related services. You get to a point where your product is internal information that can then be wielded as some sort of a monetization plan for your internal data and practices. That market innovation part of the career progression was and still is an area that many data analytics executives just couldn’t reach, or by the time they finally got there, it had been usurped by a different C-level role, such as the chief digital officer, chief innovation officer, or chief data scientist.

On a second level, I feel like there’s been a split of duties. There’s a group that stays focused on data. Sometimes they stay in the IT department, and sometimes they’re separate. Then, we’ve got chief analytics officers doing more on the business optimization front for the business. Market innovation got taken over somewhere along the path by data science pods. That’s what I call the little teams that work underneath each sponsor as needed on different initiatives. They can often work for digital officers. Sometimes, they’re straight-up working for IT, but a lot of times, they’re vendors being contracted from outside the company to come in and work for specific sponsors, such as a chief marketing officer for a specific project such as an AI social media command center.

I thought that this would be a good progression, but now I see my peers are stuck with data governance or internal business analytics. I’ve moved into AI and I want them to come with me. I’m trying to bridge that gap currently to show them why they’re relevant to AI, because a lot of data and analytics professionals are still not willing to move into that market innovation space.

AA: That’s a great point. What advice would you give to someone who’s transitioning from a technical role to a senior data leadership role? Say that their career aspiration is to go from being a hands-on data scientist to eventually becoming a CDO. What advice can be given on that trajectory with regard to skills, development, or education?

CA: Networking is vital.

Go out and start networking and politicking with as many of the senior leaders within your company as possible. Start forming relationships and understanding what drives them and what’s most important to them.

Also, understand where the landmines are as far as data goes inside the company. What are the types of things that senior executives will shut down? What will cause people to start pulling into themselves when you push too far? It means being a politician, having data charisma, knowing where you can help and where you can’t help, and knowing the CEO and their strategies. Every single one of those C-suite members is going to interpret that main corporate strategy into a strategy for their individual departments, and then you’ve got to figure out how you’re going to be relevant to those departments and recommend projects based on their goals.

AA: As for formal studies, you have an MBA in strategy and business development. Do you think extra, formal studies are important for career progression to leadership roles, or can a lot of it be done through intuition, hands-on practical learning, and maturity in terms of skills development?

CA: That’s a tough one because I think everybody learns differently. What comes naturally to me may not come naturally to others. I’ve definitely learned that. I’m more of an extrovert. I think CDO roles are made 10 times harder if you’re an introvert and you do just enjoy working with the data. I will also say that maybe you don’t always have to be a CDO to enjoy and be fulfilled by your work. For some people, it’ll stress them out too much to try and move into that type of role. If it’s stressful like that, maybe the role’s not for you. One of the hardest things to do is to figure out what makes you happy, because you may get to that chief data analytics officer spot after learning all these things and doing the activities and just think to yourself, “I am so freaking miserable. Just get me back on the team. Plug me back into the matrix.”

To answer the question more directly, if you are intent on becoming an executive in the data, analytics, and innovation field, you will definitely need to understand how business leaders think. I do not think you need to go to business school if you have a computer science or other degree…but you will definitely want to go and talk directly with business leaders in your company to understand their goals and any political struggles they are having within the company. You will have to become highly adept at building relationships, so the development of interpersonal skills will be key. The book How to Win Friends and Influence People by Dale Carnegie will probably be something you will want to read if you feel you have no natural intuition or proclivity when it comes to building relationships inside organizations. I would also rely on employees who have a high emotional IQ and a lengthy tenure inside the company to help me navigate the political landmines. These don’t have to be your direct employees, but set up a regular lunchtime or something of this nature with them to understand more about what they are seeing.

AA: Final question: when you’re hiring (although you’re probably too senior to be involved with data scientists and data engineers at the hiring stage), what skills and attributes do you normally look for in people? What do you expect them to excel at in terms of the technical stuff, such as coding skills, curiosity, and attention to detail? Is there a list you go through in your mind of what you look for in people?

CA: I think this is going to be consistent with what I’ve been saying, but I can’t teach you the personality aspects that I’m looking for. I can just teach you a skill – Python skills, for instance.

The main thing is having people who are flexible and adaptable in their mindset, because some data engineers and data scientists can be very inflexible.

What I mean is that once they are told what they will be working on, then if something changes, they will become frustrated if they cannot work on the project as outlined at the beginning. Data science is highly experimental, and internal and external clients of data science projects change their minds constantly about various aspects of what you might be building. You need people who are comfortable with being flexible and adaptable and also willing to roll up their sleeves and pitch in with every other kind of job, such as data sourcing and pipelines, investigating data, cleaning up the data, and putting the data into various formats. The more data and analytics skills they have, the better for being flexible.

Data scientists need a positive and experimentation mindset.

In data science projects, you’re going to be flinging things at the wall. Much of what you do will fail as you experiment. You’re going to be going back and forth, trying different things until they produce outcomes the team likes. You have to be comfortable with this sort of frenzied, iterative style of working. Even when you’re frustrated, you have to be able to articulate what you’re frustrated about. A data scientist can’t just stew because they’re not going to get very far and they’re probably going to burn out on the project.

Data scientists have to have a team player mindset, but also can’t be a pushover in the process either.

They are going to be in what I call the “AI pod” environment. This is where everyone’s work from the time the data is sourced and extracted to the point where the model is developed and released through an app or an API (data stream) can be thought of in a circular process structure. The work you do in the model testing part of the process may depend on the work of the person whose process came before you, such as the data engineering process. To go fast as a team member, you will have to work iteratively with people both directly in front of you and behind you in the process. They will need your help and you will need their help as you iterate to bring the project to successful outcomes. You’ll be saying, “That didn’t work. Let’s try again on this training data. Can you reset this such and such for me?” Then, someone may come to you about your work and say, “The model you built isn’t working with the application environment; could you try such and such?” I also wouldn’t want to work with data scientists that don’t stand up for themselves. If they truly believe that they are doing something the correct way, I don’t want them to back down when challenged just to save hassle or time or because they do not want to be confrontational to a peer or team leader. This is especially true if there are ethical concerns that could put the validity of a project at risk. You have to be able to stand up for yourself.

So, it’s all personality stuff. Very rarely would I turn someone away on skills. If you have the personality and the skills, that’s best. I can toss you right in and you can get going. But I would expect that people can learn the math, Python or R, and data science skills, all of which can be taught. At IBM, we did a nine-month program with Galvanize.

But what you want and can’t teach is a personality of being inquisitive, constantly learning, being able to work in teams, being interested, and connecting the dots constantly.

I can’t teach data scientists to want to investigate data, people, or the places they come from. I can’t teach that.

Summary

I greatly enjoyed hearing about Cortnie’s career, and her trajectory to becoming a leader in the field.

In developing successful data science capabilities, I agree with her advice on needing to align data and analytics to the broader business strategy.

Cortnie made it clear that successful data leaders need to be politically savvy in a business context, and that CDOs need to be “really good politicians” – and have strong negotiation and influencing skills . The people and politics elements can make or break a data science initiative in my experience. Cortnie and I agree that senior leaders also need to have data charisma – the ability to influence and negotiate with data – beyond strong data literacy skills, which we also discussed.

We discussed the challenges organizations face developing responsible and ethical AI and the reasons that AI ethics boards can fail. Cortnie pointed out that it’s difficult to find people with the appropriate skills and diversity of opinion and background to sit on these boards. That difficulty can set a board up for failure.

Cortnie outlined her 12 guiding principles (The 12 Tenets) for how organizations can develop ethical, fair, and trusted AI solutions. One of the key themes that emerged was that people need agency in the data-driven decisions that affect them, with the ability to review and provide feedback on the data used in the decision. It’s the responsibility of everyone – not just the tech developers – to help build responsible and socially aware AI solutions. I encourage you to think about how you might apply the 12 Tenets to your own work.

She also stressed the importance of listening to staff who raise objections about processes or projects, with staff at all levels being empowered to speak up and raise concerns, such as ethical issues.

One of the most important topics Cortnie raised was how to work out what is important to key stakeholders so you can ensure you develop solutions that meet their requirements. This can be non-trivial and challenging. Her advice was to try and work out what they actually want and need from the following:

What they tell you
What they tell others
What they won’t admit – such as their career motivations and political machinations

She also suggested identifying who you need to get on board as key sponsors, and figuring out how to attach and align your work to their priorities.

Cortnie also offered some valuable advice for designing business metrics that measure progress against strategic goals. In my experience, this is context specific, as designing good business metrics in the public sector, for instance, can be more challenging than in the private sector. One reason is that sometimes the metric you’re optimizing is a second-order effect – for which you may not have suitable data – such as providing actionable and timely intelligence to another government agency.

For data scientists, Cortnie suggested not making the mistake of thinking that career progression only means moving up into leadership roles – and thus becoming less “hands-on” with the technology.

She looks for the people she hires to have attributes such as a flexible and adaptable mindset, enjoying experimentation and exploration, and being a team player.

About the Author

Dr. Alex Antic

Dr Alex Antic is an award-winning Data Science and Analytics Leader, Consultant, and Advisor, and a highly sought Speaker and Trainer, with over 20 years' experience. Alex is the CDO and co-founder of Healices Health - focus on advancing cancer care using Data Science, and co-founder of Two Twigs - a Data Science consulting, advisory and training company. Alex has been described as "one of Australia's iconic data leaders" and "one of the most premium thought leaders in data analytics globally". He was recognized in 2021 as one of the Top 5 Analytics Leaders by the Institute of Analytics Professionals of Australia (IAPA). Alex is an Adjunct Professor at RMIT University, and his qualifications include a PhD in Applied Mathematics.
Browse publications by this author