Discussions on data ethics: how do you work with data more ethically?

We introduce different ways of thinking more ethically and using data responsibly to make increasingly better decisions for all. In this episode, we cover improving your data literacy, the role of unconscious bias in our everyday choices and how to root out poor practice early.

Download the full data ethics journey report

Nathan Makalena - NM

Paul Clough - PC

Lucy Knight - LK

NM - Hi, my name is Nathan Makalena. I'm joined here today by Professor Paul Clough, Head of Data and Insights at TPXimpact. And Lucy Knight, a trainer at the Open Data Institute and Data Lead at the social enterprise, the Data Place. Thank you both for joining me.

Both - Thanks for having us

NM - We're doing this recording in support of a document that's coming out: ‘The Data Ethics Journey’ which looks at ethical ways to use data across the entire data lifecycle from creation to destruction. My first question, I guess, is for you, Paul, what was your intention with putting this document together and looking at data from a complete perspective, from every stage that it's used through collection, storage, and so on?

PC - I think part of the reason for putting it together was data and AI ethics have become really important topics in more recent years. And I just think we need to do all that we can to kind of educate people, and to raise, you know, the kind of awareness of the topics. So part of the reason for producing the report was very much to give us something that we could share with people to help raise that awareness. I think the other thing as well is there's been a lot of media attention around AI ethics in particular. I think we need to remember that not everybody interacts with AI, and they're not all, producing analytical tools or using machine learning. But actually, ethics is still really important. So I think looking at the data lifecycle, enables you to actually apply that to many, many more situations. And actually, I think it might mean that it's applicable for everybody, not just, you know, the data professional or the analytics person.

NM - I think the application of ethics is an important part of the document throughout. Having somewhere to understand, and put these sorts of questions into application at your business or at your organisation is quite important. Is there anything you'd like to say on the accessibility of literature around data ethics that exists already.

PC - I think, actually, what we need is just a range of resources. So we have certain kinds of organisations that we work with, who maybe we can give the resources to, and, you know, highlight the work of people like the ODI. I just think, actually, the more resources you have, hopefully, it means that something is going to hit with that person or that organisation, for them to go off and carry on that journey.

So I think, again, for us, there is a lot of material out there. I think the challenge is applying some of this, you know, I think it's great to read around data ethics, but one of the things I struggle with really is how do you actually implement it?

‘I’m an organisation, I'm an individual, I've read about data ethics, I know there's a framework, I might have done the ODI training course, but how do I actually put it into practice?’ And our report doesn't necessarily tell you how to do it in one sense. But I think it does give you a few sort of practical steps. And I just think the more that we can help people with some of those practical steps, the better really,

LK - Yeah, just to chip in and say you know that practical application where people are saying, ‘this is all well and good, but what materials is going to work for me?’ As a trainer, we've seen this a great deal, especially during lockdown, where we've adjusted to delivering some of what was in person content remotely, is that not everything works for everybody. Not everybody wants to read a document, not everybody wants to go to a web page.

People have their different preferences as to how they learn, as to how they make things relevant for them. So having something where you can say, actually, you know, it doesn't matter if you're still working on multiple spreadsheets, or if you actually are looking at machine learning, these principles hold true, right across that range of tools and techniques. Here are the things you should consider.

I think the documents focus on ‘this is what you should do, this is why, this is what it means to the people that you might be working with,’ is really helpful. Because whichever end of the technical expertise spectrum you're at, or even if you're somewhere in the middle, you can still look at that and say, this is in plain English, this says I should worry about this, this and this, or else this might happen. This is a good way to start.

PC - I think sort of following up on that, you know, we should start thinking of data ethics as a journey. I think sometimes it can be turned into a tick box exercise. And I think the danger with that is that you don't actually change the culture. And actually, that's kind of what you need to be changing.

For me. When I think about data ethics, a lot of it actually is about raising questions, identifying risks, challenges and so on. You might not have all the answers, but actually to get into the mindset of thinking that through, you know, ‘actually data can be used for good. but actually it can be used for harm, what could those harms be?’

Although you might not intentionally go out to do any harm, inadvertently, you can cause great harm and issues. Certainly what I learnt from doing the course with Lucy is about getting in that critical thinking mindset, that kind of really questioning: ‘why are we doing what we're doing with the data that we have? Are we using it in the right kind of way?’

To be fair, there is just a general lack of awareness to be honest, even about data. I think most people don't understand what data companies have on them. how it's being used, how it's being shared. Part of that is around things like the consent, often you get ‘tick this box’, or a cookie or whatever. And it can be quite a long piece of text, who's ever going to read it, people just clicking through.

And I think there's a whole awareness, not just about, you know, what the impact of AI can be, but also just on data more generally. And actually, I think that's a much bigger, foundational sort of area that needs addressing,

LK - Unfortunately, it becomes an issue, when an organisation realises that either they themselves are doing some stuff wrong, and that needs to be addressed, or they're working with clients for whom it's a concern. And then they will make sure that you know that people get a thorough grounding.

My teenagers and young adults are very, very well aware of what and where, and how and how much they should share. Probably far more than I would have been, you know, 20 years ago. They are quite well aware. And even if they're not handling data, per se, and they're not doing analysis, they still understand about sharing, appropriate and inappropriate sharing and appropriate and inappropriate use.

Because they've grown up in that, in that, that digital world, just being at their fingertips, they understand that not everybody is who they say they are, they understand that people can get your information and do bad things with it if you're not careful, they understand that there will be forums and worlds online who will try to reach you who want to take advantage of you you're commercially or otherwise. And that you have to have boundaries around that. They actually understand all of that very well. Probably better than you know, a lot of the older generations.

And you didn't have to bring data into the equation for that to make sense to them. So this is the thing, always making a technical discussion - it almost feels to me like a citizenship lesson. You know, with responsibilities come rights and vice versa - You don't always have to make it a technical discussion. I think that is what will reach more people,

NM - People being switched on all across the business, really. I liked what you said about this sort of citizenship test because what we're talking about is almost getting people more active and more conscious of the decisions they make, or the roles that they play.

If I can shift the discussion slightly, one of the big issues we talk about here is unconscious bias amongst individuals. I mean, we're asking people to be more involved and switched on, how would you go about addressing problems like that - that they might not even realise they're causing?

LK - Yeah, I mean, you can't know what you don't know. And my feeling is, what I've seen work, is to ensure that you have some diversity of experience in the room at every point.

Diversity of background, diversity of perspective, the diversity of lived experience. Because you can be a wonderful well meaning warm and genuine human being, and yet, cause harm, because you simply aren't aware that, for example, pregnant people need to sit down more often. Or that people with foreign sounding last names might get charged more for their car insurance.

It simply hasn't happened to you. And you cannot have empathy for a situation you've never experienced. You've never even known it was a thing. And a great many of us are completely unaware of our privilege. We have never ever faced a scenario where we were knocked back or put down because of our last name, or the colour of our skin, it simply hasn't happened.

So to make sure that you've done everything you possibly can to avoid that unconscious bias creeping into your work and causing harm. Surround yourself with people who argue with you. Surround yourself with people who say, ‘Well, that's nice for you. But that's not how it was for me growing up’. Surround yourself with people who walk behind you, look at your screen and go, ‘God you're not doing it that way. Are you? What about this, this and this.’

And if you’ve not got that culture or you've not got a big enough team, then find yourself an external reference group and pay them for their time. That's the other thing, unpaid emotional labour. Just no. Find people who know about this stuff. Ask them nicely to be your critical friend and pay them for their time and expertise.

This is not rocket science. You have to be open to somebody saying ‘I see you've worked hard on this, but it's not going to work. It's not going to fly.’ You have to be open to that. And that's back to accountability to being able to take that message and actually put your ego aside and work with it.

PC - Yeah, definitely. On the unconscious bias, it's partly about that. I've recently gone on a couple of training courses, which are really helpful, because to be fair, I've not really thought about it. Actually, it's all about: How do I view people? Do I form stereotypical views? How do those shape the decisions I'm making?

I think what really nailed it home for me was that the course I went on was very tied up with the one of the jobs that I'm doing. And so it really brought it home: ‘At this point when you make a decision? Are you thinking how this could go wrong? Or you could be forming this view? Which might mean you’re more likely to make a decision like this?’

And I just never thought of it. Sometimes you just need it pointed out. I think for most people, that's the whole point, it's unconscious - you're just not aware of it. But I think as soon as you make people aware, it really makes you think.

I've approached the tasks I'm doing in a bit of a different way. I'm saying actually, ‘Oh, maybe in the past I have introduced some form of bias. What could I do now that’s a bit different?’ it just makes you stop and think a little bit.

I guess the hard thing is that we all have unconscious biases. And you can't eradicate. You can try and mitigate some of them. I guess for me, it's just, it's all, a lot of this is all the education piece ‘bias in data, the bias in collection, the bias that can come in from algorithms and AI. It’s just helping people become aware of it. So I think, for me, that is key: data literacy training, and so on.

LK - And empathy training Empathy training is a big, big deal to help somebody understand what it's like to have a specific physical condition, or a disability.

Actually having to wear goggles that obscure all of your vision, or headphones that block out all sound, or weights on your ankles to simulate having, you know, some mobility issue, even half an hour, can really change your view.

And the understanding that your unconscious bias may not even be something like ‘oh, well, it turns out, I subconsciously distrust people who went to a different school.’ To me, it can be something as simple as it turns out that my forms are inaccessible for 90% of the people who use them, because I simply haven't thought about somebody who needs to use a screen reader - it's outside of my experience.

So to have that understanding, to actually be put in that situation, so that you don't have an understanding of how what you're doing is making somebody else's life harder, or making decisions that adversely impact that particular group that can only be good. For me, it feels like it comes back to citizenship again, how to be a good human.

NM - We've talked a little about data literacy in the public in general. But do you worry that some of these steps are just putting the onus on the individual? Like you said, having to trawl through menus and different sites to get to something you want?

Take, for instance, the ability to retract your data. How would you inform someone of a shift in what it's being used for, or how it’s used? At what point might they think, ‘Hey, this is too far’, or, ‘Hey, I need to intervene here and take myself out of this system’.

LK - There's been discussions about things like data trusts, which we're still not entirely sure what those necessarily mean, and what kind of legal structures those might take. But there's definitely something there about: ‘if I add my data to this data store, then I can pre-set what I'm happy for it to be used for. And if those conditions change at any time, I'll be notified because my data is there. It's only there. And it's only shared under the conditions that I've pre-set.’

That's all well and good. But it demands a certain level of adoption, before we reach a critical mass, where it is easier and more lucrative for advertisers to sign up for access to that kind of data store structure.

Going back to what you said about data literacy, when the ODI talks about data literacy, we don't necessarily mean educating people about what happens with their personal data. We have a framing for it, which is more in line with what you might be taught in school as language literacy.

If you're doing your GCSEs and you have English language - how you communicate - and you have English literature - examining the ways in which people have used words to make you feel a certain way.

Data literacy, for us, is similar to language. Giving people the critical skills to look at the ways in which data is being used at them. To examine a chart and say, ‘Is this well constructed? Is somebody fibbing? Have they messed about with the axes? Or the scale? Do these numbers look right? You know, does this population sample really add up to 157%?’

It’s critical thinking and being able to extract that meaning. These are the numbers and this is what this person wants me to think about this. When you see stuff in the media, someone says, you know, ‘3.4 billion, blah, blah, blah’. Is that a big number in context? Or is it £1.25 for every adult in the United Kingdom?

PC - I think that point about how you read data, you might think that's only for the people who are reading spreadsheets and so on.

But when we were sitting during the lockdown, watching those presentations from the government, and one after another, after another chart was being thrown at us, I turned to my wife and said ‘Do you understand what that chart is telling you?’ She would say ‘no, no, no, I've not got a clue’.

And I imagine that most of the population were looking at those charts thinking: ‘I don't really understand what those are conveying? Is that good or bad?’ Well, the lines going up, I'm guessing that's probably not good?’ But they’re understanding of those numbers would form some kind of decision about going out.

And you kind of think, ‘well, is this date literally stuff actually affecting anybody? Is it important?’ Absolutely. It affects every single one of us.

And it isn't just during the big crises and pandemics that is important. It’s right across the board. I've seen examples where people in hospitals are presented with information that they cannot understand, and that information is about them, and their situation or it's being conveyed to family members.

I myself have done a study where we found that a lot of people, well educated people and people who worked with data, could not fully interpret a bar chart and other types of visualisations. We assume everybody can read a pie chart, a bar chart, whatever. It's not the case. And partly it's down to presentation: which axes you choose, how you lay it out, and so on. So part of the responsibility is on the people creating these visualisations and tools and so on.

But it's also down to the user as well. And so again, data ethics and literacy is for everybody. And this is why it's so important that we need to get it out there.

LK - It's hand in hand. I always say to my students, you know that data visualisation is not a technical medium but a communications medium. It's telling a story. There should be a narrative around your chart. It should be quite clear at a glance.

Complex visualisations may be wonderful and beautiful. But if they're not helpful, then you failed. If anybody ever picks up one of your charts, and I have had this happen to me, turns upside down and goes ‘What's this then? What are we looking at?’ then you’ve failed.

One of the other soundbites I use a lot for people is, ‘if you had an answer, what would it look like?’ If you had something that somebody could look at and at a glance understand what's happening,what would it look like? What shape would you give it? What chart type?

To think in terms of narrative and the story, rather than show people the numbers. And different audiences need different things. And sometimes the same person needs different things at different times.

PC - Nathan, you just mentioned that word holistic. And I think this is what it's all about. Really taking that more holistic view.

You can be a great data visualizer, you can be a great machine learning engineer, producing the best of the best. But do you understand the wider context? And for me, context is absolutely key: context of creation, context of use, and so on. Because without that understanding of the context, you can't engage with all the issues, you can't think through all the problems that can occur.

And context includes who you're actually creating this for. If you're writing an article you think about your audience. And typically visualisation you do that. But how many data scientists produce a bunch of stats reports and so on that actually, they aren't able to communicate to the people to then make a decision. It can be great work, but it can become lost.

NM - Absolutely. Because, I think a lot of people's experience with these things is in the abstract, really. My familiarity with graphs isn't usually to do with processes that I'm involved in directly. But what we're talking about today are decisions that affect people's decisions that have a very real impact on people's lives. So I guess, back to the idea of problem articulation, articulating all aspects of, why you're using the data and what the results are, what you have found, is, is really important.

PC - There's like a term that’s being used quite a lot. Responsible AI. And I think that's really good, because I think that really does make you push it back onto the people doing it. ‘Are you being responsible, in terms of doing what you're doing? Not just professionally, but ethically and everything else as well.

And I just think that is going to be something that we see. I know there's R.A.T - responsible analytical tool - that the government or ONS have proposed. Again, I think it really uses that term and the notion, are you actually able, are you technically capable, statistically capable, and so on to build those tools and products and so on?

LK - Yeah, the use of language is interesting. Saying ‘Is x ethical?’, maybe makes people feel like they need some special training, and maybe have an advanced degree. But saying, Are you being responsible? Is a more accessible framing for the majority of people, and it's something that they can genuinely ask themselves: ‘Am I being ethical? I don't know, what are the rules?’ It's easier for people to see themselves in that scenario, it's got that salience for them.

NM - I think another benefit of the word responsible is it comes into play when it's both human based and AI based decisions. Ethicality is something that you can almost defer to another object that is informing you of these decisions, but you can ask yourself: ‘In this moment, in this action, is this a responsible thing to do?’ Or, ‘I’ve been given this instruction? Would it now be responsible to follow it?’

LK - Yeah. I think I'd agree with that. I'd echo that. We've talked a lot as we've been going through the various questions about being a good human. So I probably say, you know, to any business, any individual in that business, you almost certainly are a good human. But you have blind spots, make an effort to find out what those are.

I would steer people towards a Data Ethics Canvas. Whilst it’s an ODI product, it's free to use as an open product, go and get it. Make good use of it. Be honest with yourself. Don't lie to yourself, when you're going through those questions. Don't gloss over any of them. Because it does require you to be a little bit uncomfortable if you're going to uncover the truth about how you operate and your culture.

It's one thing to say we have these principles. It's quite another to live them. And if you really want to know what your organisational culture is, go and ask somebody with no power. That's how you'll find out. Go and ask somebody who's at the coalface how they would feel about speaking up to a manager.

Your people need to feel psychologically safe to push back on things they feel uncomfortable with. If your culture doesn't allow that you can have as many glossy mission statements as you'd like. Work your way through the data ethics canvas and ask yourself very, very honestly, whether you feel you would be able to live those principles and answer those questions and be okay with looking at yourself in the mirror, as you did so.