One of the policies that we propose to fix such inefficiencies (over-collection and under-protection of data) is to introduce a tax on data collection and a requirement of a minimal data protection level. When in place, the tax pushes businesses to internalize the effect that over-collecting data has on users, whereas a minimal data protection requirement ensures that businesses invest sufficiently in protecting the data they collect. We also find that an alternative solution could be to replace the tax on data collection with fines on data breaches – imposing a liability of sorts on businesses for the damage that data misuse cause to users.
Currently, in the United States, the FTC has a mandate to enforce a minimal protection level, which is a good thing. Nevertheless, our work suggests that the government can do better. One issue is that, nationally, there is no formal regulation of data collection. In practice, the FTC most frequently investigates firms that suffered data breaches and pursues fines, often through a combination of costly litigation and negotiations. This practice partially mimics our second policy suggestion of fines on data breaches. Our work suggests that a more systematic policy along these lines could increase consumer welfare.
Q: The paper refers to “adversaries” – that is, entities whose use of data harms users. You give the examples of a hacker attempting identity thefts, and a government agency seeking to use the data to crack down on dissent. Are there adversarial actions that might be specific to the pandemic crisis?
A: An obvious one involves employers trying to obtain protected medical information on potential hires – for example, whether a potential hire has risk factors for COVID-19, or whether a potential hire was previously infected and is now immune. It is easy to see why employers would be interested in such information; yet this can lead to discrimination and other adverse effects. The fear of having their personal information exposed can therefore deter individuals from using online services that would otherwise be beneficial for them. This can be a big problem, and, in fact, our work shows that the loss to society from people’s fear to use online services may be much bigger than just the direct damage from adversarial activity.
Q: Who do you think is most responsible when data is used in ways that harm private individuals – the individuals themselves for surrendering their data, the businesses/platforms that collect and sell the data to third parties, or the third parties that use the data in sometimes troubling ways?
A: It may be easy to blame the third parties that misuse data and digital businesses that collect data, or even individuals who surrender their data. However, it is important to remember that, broadly speaking, many of the services offered online are beneficial and, to some extent, require that users surrender some information to businesses and platforms. The problem is that there are externalities. When a platform collects data, it can improve the services it provides, which is good for users. However, the data also attracts third parties, some of whom use the data in ways that harm users. That is, a platform’s decision to collect more or less data affects users in unintended ways, and there is a constant trade-off. While it will be great to have the platform internalize these externalities by itself, this is exactly the type of situations in which regulation is helpful and can provide the right incentives for businesses.
Q: Are there ways in which data is collected and used that ameliorate or exacerbate social, racial, and economic inequalities?
A: Definitely yes. In the context of COVID-19, you can think about the resource and treatment allocation problem, such as providing stem cell therapy to COVID-19 patients. Because treatment is scarce, we may like to allocate it to people who will benefit the most from the treatment. Now suppose that patients of certain social and economic characteristics are, on average, less likely to recover even with the treatment. Detailed data on social and economics characteristics can then be used to withhold treatment from such patients, thus reducing further their chance of recovery. On the other hand, detailed information on individual patients’ health can help doctors make the decision based on individuals’ health records rather than based on their perceived social and economic characteristics.
One of the tricky parts is that all of this is compounded by another issue. If users believe that information that they reveal in one context could be used to harm them in another context, they may choose not to reveal it to begin with, which may lead to a lose-lose situation. This is why every contact-tracing app and every treatment assignment protocol must trade off collecting data more aggressively with motivating individuals to reveal information to begin with.