Differential Privacy & Ethics in ML
“All human beings have three lives: public, private, and secret.”― Gabriel García Márquez
COVID-19, economic stagnation, quarantines, riots; among these things, 2020 placed a particularly focused lens on systemic racial bias. This poisonous prejudice is being confronted on a global level and some changes are being implemented by leaders brave enough to meet the challenge. Likewise, Every existing machine learning model already deployed and those yet to be built should acknowledge the possibility of bias across all demographic lines existing in their design. These models should be opened up to feedback and review, so that they account for the ever changing make up of society. No data set built off of the classification or categorization of people should be designed without this fluidity and history of bias in mind.
The dilemma we face in data privacy is determining how does the distribution of control over the models work and what is the correct balance of legislation and counter measures that can be used to control misuse? In a world where greed is the norm, we need to assume that the data of the lower classes will be used against and in favor of them.
The vast amounts of information we give up to companies is unaccompanied by the necessary transparency and feedback loops for controlling what they are collecting and how they use it. The “Privacy” disclosures are out there, but are almost indecipherable and too long for the layman. Too many examples are out there of perils of data mismanagement and exploitation for it to be a mystery why we need more disclosure and control.
Yet, some mysteries can still go unresolved. For that reason, we should promote a more comprehensive solution to treat the root problem. Currently, we exist in an environment where the likes of Facebook are deploying patented algorithms like “Socio-Economic Group Classification” of users. When you think back on the way the platform was exploited by Cambridge Analytica in the 2016 election, you can not maintain hope that such an algorithm will be applied fairly and transparently. So what type of Ethics are needed to prevent the next 2016 debacle? Ameba Birhane, has put together many pieces discussing two types: Relational Ethics and Algorithmic Injustice.
Relational Ethics –> “Relational ethics is a contemporary approach to ethics that situates ethical action explicitly in relationship. If ethics is about how we should live, then it is essentially about how we should live together. Acting ethically involves more than resolving ethical dilemmas …” Wendy J. Austin
Algorithmic Injustice –> the farther you are away from the perceived baseline of a data set (e.g. White heterosexual middle class male) the greater the impact (generally negative) the decisions an algorithm may have on you. For example, determining loan eligibility, sentencing guidelines, job candidacy.
Ethics and fairness in ML has to become part of the process and not a consideration. Schools should make courses in Data Bias, Ethics, and Historical unfairness, along with Logic as prerequisites to becoming a successful Data Scientist or Machine Learning Engineer. Or at the very least, companies should pursue the melting pot of talent that intersects many different disciplines. BlueDot is an example of this type of convergence. Made up of Data Scientists, Physicians, Public Health Experts, Geologist, and more to create a company that understands the nuance of human behavior and patterns enough to predict before anyone else, the likely spread of Coronavirus in 2020. Additionally, there are three design principles that Data Scientist and Engineers should employ up front in their architecture to mitigate bias and risks.
Differential Privacy –> In this technique we don’t focus on knowing the specific person but the patterns and groups they may belong to with as little personal identifiable information as possible. Instead of tracking Richard Bakare specifically down to age, address, and contact information. You instead classify a male, who likes to run, of a certain age range. The movie recommendations may not hit exactly on my love for the film Interstellar, but they will at least get to science fiction.
Federated Machine Learning –> In this approach, we can almost build off of Differential Privacy and get more specific recommendations by separating some of the models and the data source. The model runs using my personal information on my phone, creating recommendations and scores accordingly. The feedback of the scores and recommendations go back to the central hub but the data does not leave my control. Now we get the Interstellar recommendation and other Christopher Nolan films.
Feedback Loops –> In this pattern, we solicit feedback on the recommendations, predictions, etc from ALL of the stakeholders to improve the model over time. For example, a loan recommendation model should be designed to get feedback from the applicant, bank, public data in order to see if the recommendations can be revised based on long term outcomes.
If we do not get the issues surrounding Ethics in Machine Learning during its relative infancy, we will definitely get to a point of no return where privacy and bias are dead and forgotten. Data scientist and Organizations need to remind themselves that just because we can, does not mean we should.