Skip to content
  • Technology & data

3 key steps to ethical data collection

Data Ethics Collection 2

by Paul Clough

The first part of our data ethics series examines the importance of consent, confidentiality, and intent when gathering data.

When collecting personal information, the GDPR principles state that organisations must act transparently and with consent, collecting data only for the explicit purpose it’s needed. They also put strong legal protection on sensitive, identifying information, often referred to as PII or Personal Identifying Information.

When GDPR was introduced, organisations were quick to meet this legislation’s requirements, with the threat of serious financial repercussions if they failed to do so. But there’s still more they can do to serve consumers ethically and responsibly during data collection. 

Before you begin collecting data, have you considered..?

1) Getting consent to collect information

Seeking consent is the most appropriate way to legally collect information, while giving customers genuine control over their data. 

While consent isn’t always required (such as in cases of legitimate interest and/or legal obligation), the GDPR suggests that consent be given to collect data for an explicit and stated purpose. Even without consent there still needs to be clear and comprehensive information provided about how personal information is used. 

Unfortunately, some companies also resort to manipulative user agreements to get the consent they need, but it is not always consent a participant is happy to give. The value of consent is diminished when it becomes a condition of service. 

Ask yourself: 

  • Do you have permission from users or participants to collect their data? 

  • Have they been made aware that their involvement is voluntary? 

  • Is it clear that participants are free to withdraw from any active data collection programme at any point without pressure or fear of retaliation?

2) Protecting users’ confidentiality and anonymity when collecting data

Customers will often opt in to data collection under the assumption that the information collected remains confidential and any published findings are anonymised. If you do need to break confidentiality at any point (or suspect that you will do in future) then make it clear at the start of the process.

Where possible, avoid collecting personally identifiable information (PII). Good practice might be to design your data collection methods in a way that they can’t be reverse engineered to reveal subjects. However, it is also possible to identify people from merging separate datasets with just a few personal pieces of information about them.

Ask yourself: 

  • Do you really need to collect PII at all?

  • If yes, have you taken steps to de-identify a dataset by removing all PII data before analysing or sharing the insights?

  • Have you considered how different data points could be used in conjunction to reverse engineer identity or identifying characteristics?

3) What do you intend to do with the data you’re collecting?

While it can be hard to know the purpose or value of data in advance — the GDPR supports the practice of purpose limitation. This means organisations shouldn’t operate with an intention of gathering as much as they can, to be used for an undefined purpose, at an undetermined point in the future. Additionally, there will be some information you cannot retain for more than 12 months. 

Minimum viable collection is a strategy which relates to the issues of anonymity and intention. This method encourages organisations to only collect the data they absolutely need to ensure a result they want or a trend they aim to understand. This is sometimes referred to as the data minimisation principle. 

In practice it can be difficult to implement, as it’s not always possible to know every purpose in advance. Being more responsible and trying to avoid this involves thinking critically about each data point you plan to collect. 

Ask yourself: 

  • How will this data contribute to my overall aim? 

  • Could this data point be used in conjunction with others to reveal PPI? 

  • What would the results mean for my overall predictions or aims?

Are you better at data ethics than Amazon’s Alexa?

There are endless examples of questionable personal data collection practices. As a case in point, Amazon received a rash of negative headlines following a 2021 lawsuit, which accused the company’s Alexa smart speaker of secretly collecting, and storing user data. 

The extent to which Amazon processes data about its users for purposes such as personalisation has not always been clear. A paper published in 2022 by three US universities suggests Alexa collects sensitive voice and biometric data and shares the insights with as many as 41 ad partners.

Top tips for to improve your data collection practices

The next article in our series will focus on the ethics of data storage, with practical tips you can follow. Or you download the full report containing the full series below. 


Paul Clough's avatar

Paul Clough

Head of Data Science and AI

Contact Paul

Our recent insights


Common misunderstandings about LLMs within Data and Analytics

GenAI and LLMs have their benefits, but understanding their limitations and the importance of people is key to their success.

Shaping product and service teams

How cultivating product and service teams to support the needs of the entire product lifecycle can ensure brilliant delivery.

Building ‘The Chatbot’ - Our experience with GenAI

Learn how we harnessed to power of Generative AI to build our very own chatbot.