“There is a political and ethical component to data.”

“Data and algorithms offer many possibilities, but they are not neutral. What you analyse, which algorithms you build, the data you use and what for, these are political and ethical choices. Policy makers are insufficiently aware of this. Furthermore, basic knowledge of data literacy is absolutely necessary for local politicians.” That’s what the Dutch data expert Mirko Tobias Schäfer says in this interview with Bart Van Moerkerke for magazine Lokaal.

Mirko Tobias Schäfer. Image by: Stefan Dewickere

Mirko Tobias Schäfer is an associate professor of new media and digital culture at Utrecht University and head of the Data School. He researches the social impact of data practices.

“Data and algorithms are often assumed to be neutral, but that is not always the case. Data are often called the ‘new oil’, but that comparison doesn’t hold true. Data are not a resource in the ground; someone decided to collect and store certain data and others not. It has a political and ethical component. Big data affect our understanding of culture, of citizenship, of democracy. That is why it is so important that the humanities are also involved in this and provide guidance when implementing these practices, and that they reflect on what is desirable and what is not. I’m a media scholar first, not merely a digital expert.”

Are data always biased?

“No. There are algorithms that have no impact on people, which, for example, predict very accurately which watermill will no longer work or when it needs to be replaced. But a lot of data are not neutral. It’s a good thing that a lot of attention is now being paid to these pitfalls. Authors such as Cathy O’Neil or Virginia Eubanks convincingly demonstrate that there are problems with algorithms and data, that they can be racist and biased, and that they can reinforce unwanted social views. But that negative image shouldn’t make us forget that algorithms and data make things possible that weren’t possible in the past. They provide an opportunity to respond to new challenges, they open up new possibilities to solve problems, they lead to new, multidisciplinary forms of cooperation. We have to realise that they demand political responsibility, and that we have to develop an understanding of how data practices or algorithms work, how a dataset is put together.”

Can you give some examples of how data can be biased?

“The Dutch government’s Leefbaarheidsbarometer indicates quality of life in the Netherlands. The district in Rotterdam where lived happily for many years got a low score. It now appears that the number of people ‘non-Western’ migratory background in a neighbourhood is one of the factors that the algorithm takes into account when determining the quality of life. In the Netherlands, too, the police make use of a predictive crime system. The result is a map showing the probability of certain crimes in each area. In one particular neighborhood the number of bicycle thefts was found have fallen sharply. Had the police patrols there been especially successful? No, Investico’s research showed rather that two police stations had been closed in that area, and people had simply stopped reporting bicycle thefts there. In the United States, an algorithm was used to estimate the risk of recidivistic crime behaviour. African-Americans who had committed only minor crimes were attributed a much greater risk status than whites who were guilty of serious crimes. When journalists later returned to investigate who actually committed crimes again, the predictions often turned out to be wrong. The basis of the algorithm was clearly racially biased. The Michigan Unemployment Agency launched a new computer system to combat benefit fraud. It was so strict that even the erroneous writing of a name or the wrong entry of a date of birth, or the period of employment, would be reason enough to be accused of fraud. 20,000 unemployed, -vulnerable people in dire need of the money they were supposed to receive- were wrongly accused. It is therefore very important to know what data are fed into an algorithm and on which assumptions it was built. To stick to the last example: you could also develop an algorithm that detects people who are entitled to benefits, but who have not yet applied for them.”

Which algorithm you build and which data you use, is therefore a political and ethical choice?

“Yes, building algorithms to detect benefit fraud, for example, touches on fundamental issues, on the fundamentals of our legal system and our society. You make all benefit recipients suspicious in advance. This fits seamlessly into a long history of governments watching poor people, trying to control them. If we implement this into algorithms now, will we be able to work it out later? We must develop awareness for open standards and transparent algorithms being essential for a democracy. Where does Europe stand in this story? On the one hand there is the US with a libertarian understanding of market, where companies are already taking over these functions and the algorithms may be trade secrets. On the other hand, there is China where the government directs corporate actions and demands direct access to the data companies collect and work with. Europe must develop its own technology where the checks and balances of democracy are guaranteed. If we just buy those technologies from the US or China, we’ll also get the value system that’s embedded in them.”

Do you have the impression that policymakers are sufficiently aware of this?

“I’m a little afraid of two kinds of people: the engineer without an understanding of the social world and the technocrat without an understanding of technology. But as a teacher, I’m an incorrigible optimist and I think education helps. Much more attention is being paid now to the political and ethical aspects of data. Awareness is growing. More and more municipalities, ministries, organisations and companies are asking our Data School for help in looking at the ethical pitfalls of the data projects they want to start up. Mayors, city managers and councilors also need to become much more competent in this area. It is necessary that they have basic digital and ethical skills. Starting a data project is a political decision: what is the problem and how can data-practices help us to solve it? Do we want to do it or not? How do we do it, and how do we ensure our values are not stifles in the process? Politicians also need to interpret the results, so they need to know what’s behind the data. They receive a report, an infographic or a dashboard and have to make a responsible decision on that basis. They can use the same data in different ways: they can use it to stigmatize a group or to support a group. Take the example of an algorithm that predicts fairly well which pupils will leave school early. It is not enough for the algorithm to indicate which students you should keep an eye on, it is mainly about what you do with the information. Which children are they? Are they from single-parent families? Do they have a migratory background? Is there a language barrier? What neighborhood are they from? Who will appeal to the children and their parents: someone who has a connection with the young people’s world of experience or a bureaucrat? So, there are a lot of non-data aspects involved, which require a decision each time. It is also important that a municipality communicates openly about this with parents and young people. Can I decide that my son’s data will not be used for that analysis? Is there an opt-out function? What about data security? Suppose a dataset like that gets into the hands of a recruiter. It must have a model with which he can prove that someone who was once in the high-risk group of early school leavers will perform poorly at work in the future. He might want to select those candidates right away. All of these aspects must also be considered by politicians.”

How does the Data School help municipalities?

“We have developed the Data Ethics Decision Aid (DEDA); and dialogical impact assessment for ethical pitfalls in data projects. DEDA is a process in which policy makers, data specialists, a data protection officer, project managers and content experts sit at the table together. One of the first questions is: do you use an algorithm? The data specialist says: of course. The policymaker’s jaw drops. The second question is whether they can explain the algorithm. The data people know what that’s like. The policymaker thinks that he does not need to be able to do this, which means that he cannot explain it to the council, the population or the media either. In this way, there is a growing awareness that everyone in the organisation actually needs to have some basic knowledge. The alderman also needs to know how the algorithm works. He does not need to be able to program it, but must understand the logic, and must be able to stand behind and explain the decision model of the algorithm. Throughout the workshop all kinds of questions and pitfalls will be discussed. This informs good decision making because, for example, they learn that they don’t need to know everything, but that they have to be careful with data and only collect data that they really need, that they protect them well and so on. In this way, they arrive at better projects and their deliberation process is documented. The documentation allows for scrutiny by critical public, journalists or the members of the city council. In the Netherlands there are already several municipalities that carry out an ethical impact assessment or a quick ethical scan for all their data projects. The Association of Netherlands Municipalities also uses our tool, and everyone can download it from our website.”

How do municipalities start a data policy?

“Pilot projects are very important for learning how to handle data. In small projects you develop skills and competencies. For example, a team from the Data School worked with a Dutch municipality to find out more about residents on welfare. During the investigation all kinds of difficulties arose: not all data was available, some was restricted. The College and the City Council decided to continue the project in order to learn from it. They took a privacy officer with them, made sure that the data would be processed properly and they installed a way of working that could also be used for future data projects. We also see the emergence of a data brigade or a data academy in other municipalities. These are informal networks of interested employees of all levels and services who discuss together how they can work with data better. That’s promising. But it cannot be done without the leadership and support of top management and policy makers.”

Do municipalities know what data they have?

“No, there are far more data available than most municipalities realise. A report often highlights certain things, but the dataset can also provide answers to other questions. The first task is to ask other questions of the available datasets. Secondly, the quality of datasets must be improved. And then there is the problem of the standards for data exchange. Each municipality collects data in a different way. It would be useful to have European open standards to facilitate the exchange of data and knowledge between municipalities.”

Should residents be involved in data projects?

“That depends on the type of project. There have been attempts to involve citizens in data literacy projects, but these have not always been successful and the people who showed up were not representative of the population. I have serious doubts about that. I prefer an expert approach. I am more concerned with open communication and accountability, accountability to the council and the fourth estate. Just as a municipality is open about its traffic plan, it must be open about the data it has, what it does with data and what it wants to achieve with it, so that the council and the media can ask critical questions. If a municipality plans to use an algorithm that can predict who will drop out of school, it has to communicate what it is based on, how it was created, what happens with the data and so on. There is not necessarily a need for co-creation. But in some projects it may be necessary. In the Netherlands there is a proposal to use smart lampposts with cameras and microphones to monitor behaviour on the street. An algorithm determines whether there is any deviant behaviour: running, shouting, fighting. Local residents must be able to talk about this because it affects their privacy.”

What is the biggest task for local authorities?

“They must not fall into the technocracy trap. They must not forget that data, the processes, and the algorithms must fit within the values of our democratic and open society, and that data practices call for responsibility.”