Data Privacy and Responsible AI Best Practices
In the previous chapter, we talked about how to build a data governance program for our organization and how to identify types of sensitive data. Our work does not stop there. Although in some cases we can safely exclude sensitive information, other times we cannot. So, our machine learning (ML) models that solve problems might need to contain personal data. Sometimes that data can be relevant and useful, or it can create unintended correlations that make the model biased. This is the issue that we will tackle in this chapter.
We will talk about how to recognize sensitive information and how to mitigate it if it is not relevant to the model training process by using techniques such as differential privacy. We will explore how to protect individual information even from aggregated data or the model results. To help us with that, we will see how we can use the SmartNoise software development kit (SDK).
We will also discuss fairness...