Risks and Attacks on ML Models
This chapter gives a detailed overview of defining and evaluating a Machine Learning (ML) risk framework from the instant an organization plans to embark on AI digital transformation. Risks may come in different stages, such as when the strategic or financial planning kicks in or during several of the execution phases. Risks start surfacing with the onset of technical implementations and continue up to testing phases when the AI use case is served to customers. Risk quantification can be attained through different metrics, which can certify the system behavior (amount of robustness and resiliency) against risks. In the process of understanding risk evaluation techniques, you will also get a thorough understanding of attacks and threats to ML models. In this context, you will discover different components of the system having security or privacy bottlenecks that pose external threats and make the model open to vulnerabilities. You will get to know the financial losses and business impacts when models deployed in production are not risk and threat resilient.
In this chapter, these topics will be covered in the following sections:
- Discovering risk elements
- Exploring risk mitigation strategies with vision, strategy, planning, and metrics
- Assessing potential impact and loss due to attacks
- Discovering different types of attacks
Further, with the use of Adversarial Robustness Toolbox (ART) and AIJack, we will see how to design attacks for ML models.
Discovering risk elements
With rapid digitization and AI adoption, more and more organizations are becoming aware of the unintended consequences of malicious AI adoption practices. These can impact not only the organization’s reputation and long-term business outcomes but also the business’ customers and society at large. Here, let us look at the different risk elements involved in an AI digitization journey that CXOs, leadership teams, and technical and operational teams should be aware of. The purpose of these associated teams is one and the same: to avoid any of their systems getting compromised, or any security/privacy violations that could yield discrimination, accidents, the manipulation of political systems, or the loss of human life.
Figure 1.1 – A diagram showing the AI risk framework
There are three principal elements that govern the risk framework:
- Planning and execution: This phase ideally covers all stages in product development, that is, the conceptualization of the AI use case, financial planning, execution, including the technical execution, and the design and release of the final product/solution from an initial Minimum Viable Product (MVP).
- People and processes: This is the most crucial factor as far as delivery timelines are concerned with respect to an MVP or a final product/solution. Leadership should have a clear vision and guidelines put in place so that research, technical, QA, and other operational teams find it easy to execute data and ML processes following defined protocols and standards.
- Acceptance: This phase involves several rounds of audits and confirmations to validate all steps of technical model design and deployment. This process adheres to extra confirmatory guidelines and laws in place to cautiously review and explain AI/ML model outcomes with due respect to user fairness and privacy to protect users’ confidential information.
Let’s drill down into the components of each of these elements.
On the strategic front, there should be a prior Strengths, Weaknesses, Opportunities, and Threats (SWOT) analysis done on business use cases requiring digital AI transformations. The CXOs and leadership team must identify the right business use case after doing an impact versus effort analysis and formulate the guidelines and a list of coherent actions needed for execution. The absence of this might set infeasible initiatives that are not aligned with the organization’s business goals, causing financial loss and solutions failing. Figure 1.2 illustrates how a specific industry (say, retail) can classify different use cases based on a value-effort framework.
Figure 1.2 – A value-effort framework
If the guidelines and actions are not set properly, then AI systems can harm individuals, society, and organizations. The following are some examples:
- AI-powered autonomous vehicles can often malfunction, which can lead to injury or death.
- Over-reliance on inadequate equipment and insufficient monitoring mean predictive maintenance tasks can lead to worker injury.
- ML models misdiagnose medical conditions.
- Political disruption by manipulating national institutional processes (for example, elections or appointments) by misrepresenting information.
- Data breaches can expose confidential military locations or technical secrets.
- Infrastructure disruption or misuse by intelligent systems (for example, GPS routing cars through different streets often increases traffic flow in residential areas).
The executive team should understand the finances involved in sponsoring an AI development project right from its inception to all stages of its development. Financial planning should not only consider the cost involved in hiring and retaining top talent but also the costs associated with infrastructure (cloud, containers, GPUs, and so on), data governance, and management tools. In addition, the financial roadmap should also specify the compliance necessary in big data and model deployment management as the risks and penalties can be huge in case of any violations.
The risk associated on the technical front can manifest from the point when the data is ingested into the system. Data quality and the suitability of representation formats can seriously violate regulations (Derisking machine learning and artificial intelligence: https://www.mckinsey.com/business-functions/risk-and-resilience/our-insights/derisking-machine-learning-and-artificial-intelligence). Along with a skilled data science and big data team, what is needed is the availability and awareness of modern tools and practices that can detect and alert issues related to data or model quality and drifts and take timely remedial action.
Figure 1.3 – A diagram showing risk management controls
Figure 1.3 illustrates different risk elements that can cause security breaches or theft of confidential information. The different components (data aggregation, preprocessing, model development, deployment, and model serving) of a real-time AI pipeline must be properly designed, monitored (for AI drift, bias, changes in the characteristics of the retraining population, circuit breakers, and fallback options), and audited before running it in production.
Along with this, risk assessment also includes how AI/ML models are identified, classified, and inventoried, with due consideration of how they are trained (for example, considering data type, vendor/open source libraries/code, third-party/vendor code updates and maintenance practices, and online retraining) and served to customers.
People and processes risk
The foremost objective of leadership and executive teams is to foster innovation and encourage an open culture where teams can collaborate, innovate, and thrive. When technical teams are proactive in bringing in automations in MLOps pipelines, many problems can be foreseen, and prompt measures can be taken to bridge the gaps through knowledge-sharing sessions.
Trust and explainability risk
Businesses remain reluctant to adopt AI-powered applications when the results of the model cannot be explained. Some of the unexplainable results can be attributed to the poor performance of the model for a selected customer segment or during a specific period (for example, many business predictions were affected by the outbreak of COVID-19). The opaqueness of the model – a lack of explanation of the results – causes fear when businesses or customers find there is a lack of incentive alignment or severe disruption to people’s workflows or daily routines. ML models answering questions about the behavior of the model raises stakeholder confidence. In addition to deploying an optimized model that can give the right predictions with minimal delay, the model should also be able to explain the factors that affect the decisions it makes. However, it’s up to the ML/AI practitioners to use their judgment and analysis to apply the right ML models and explainability tools to derive the factors contributing to the model’s behavior. Now, let us see – with an example – how explainability can aid in studying medical images.
Deep Neural Networks (DNNs) may be computationally hard to explain, but significant research is taking place into the explainability of DNNs as well. One such example involves Explainable Artificial Intelligence (XAI), used on pretrained deep learning neural networks (AlexNet, SqueezeNet, ResNet50, and VGG16), which has been successful in explaining critical regions that are affected by Barrett’s esophagus using related data by comparing classification rates. The comparative results can detect early stages of cancer and distinguish Barrett’s esophagus (https://www.sciencedirect.com/science/article/pii/S0010482521003723) from adenocarcinoma. However, it remains up to the data scientist to decide how best to explain the use of their models, by selecting the right data and number of data points, based on the type of the problem.
Compliance and regulatory risk
There are different privacy laws and regulations that have been set forth by different nations and governing agencies that impose penalties on organizations in case of violations. Some of the most common privacy rules include the European Union’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). The financial and healthcare sectors have already seen laws formulated to prevent bias and allow fair treatment. Adhering to compliance necessitates extra planning for risk management through audits and human monitoring.
Apart from country-specific regulatory laws and guidance, regulators will likely rely on existing guidance in SR 11-7/OCC 2011-12 to assess the risks of AI/ML applications.
AI/ML models should go through proper validations and A/B testing to verify their compliance and fairness across different sections of the population, including people of varying genders and diverse racial and ethical backgrounds. For example, credit scoring and insurance models have historically been biased against racial minorities and discrimination-based lending decisions have resulted in litigation.
To make AI/ML models ethical, legal, and risk-free, it is inevitable for any organization and the executive team to have to ascertain the impact of the AI solution and service being rolled out in the market. This includes the inclusion of highly competent AI ethics personnel in the process who have regulatory oversight, and ensuring adherence to protocols and controls for risk mitigation to make sure the entire AI solution is robust and less attractive to attackers.
Such practices can not only add extra layers of security to anonymize individual identity but also remove any bias present in legacy systems. Now let us see what kinds of enterprise-grade initiatives are essential for inclusion in the AI development process.
Exploring risk mitigation strategies with vision, strategy, planning, and metrics
After seeing the elements of risk in different stages of the AI transformation journey, now let us walk through the different enterprise risk mitigation plans, measures, and metrics. In later chapters, we will not only discover risks related to ML model design, development, and deployment but also get to know how policies put in place by executive leadership teams are important in designing systems that are compliant with country-specific regulatory laws. Timely review, awareness, and support in the risk identification process can save organizations from unexpected financial losses.
Defining a structured risk identification process
The long-term mission and short-term goals can only be achieved when business leaders, IT, security, and risk management teams align to evaluate a company’s existing risks, and whether they are affecting the upcoming AI-driven analytics solution. Such an effort, led by one of the largest European bank's COOs, helped to identify biased product recommendations. If left unchecked, it could have led to financial loss, regulatory fines, and disgrace, impacting the organization’s reputation and causing a loss of customers and a backlash.
This effort may vary from industry to industry. For example, the food and beverage industry needs to concentrate on risks related to contaminated products, while the healthcare industry needs to pay special attention to refrain from the misdiagnosis of patients and protect their sensitive health data.
Effective controls and techniques are structured around the incorporation of strong policies, worker training, contingency plans, and the redefinition of business rules and objectives that can be put into practice. These policies translate to specified standards and guidelines requiring human intervention as and when needed. For example, the European bank had to adopt flexibility in deciding how to handle specific customer cases when the customer’s financial or physical health was impacted: https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/confronting-the-risks-of-artificial-intelligence. In such cases, relationship managers had to intervene to offer suitable recommendations to help them to move on with the death/loss of a family member. Similarly, the healthcare industry needs the intervention of doctors and healthcare experts to adopt different active learning strategies to learn about rare diseases and their symptoms. Control measures necessitate the application of different open source or custom-built tools that can mitigate the risks of SaaS-based platforms and services, protect groups from potential discrimination, and ensure compliance with GDPR.
Micro-risk management and the reinforcement of controls
The tools and techniques put into practice will vary based on the phase of the ML life cycle. Attacks and threats are much too specific to input data, feature engineering, model training, deployment, and the way the model is served to its customers. Hence it is essential to design and evaluate any ML model against a threat matrix (more details on threat matrices will be discussed in Chapter 2). The most important factors that must be taken into consideration are the model's objective, optimization function, mode of learning (centralized versus federated), human-to-machine (or machine-to-machine) interaction, environmental factors (for designing policies and rewards in the case of reinforcement learning), feedback, retraining, and deployment. These factors, along with the model design and its explainability, will push organizations to go for a more transparent and explainable ML model and remove ML models that are overly complex, opaque, and unexplainable. The threat matrix can safeguard ML models in deployment by not only evaluating model performance but also testing models for adversarial attacks and other external factors that cause ML models to drift.
You need to apply a varying mix of risk control measures and risk mitigation strategies and reinforce them based on the outcome of the threat matrix. Along the journey of the AI transformation process, this will not only alleviate risks and reduce unseen costs but also make the system robust and transparent to counteract every possible risk. With such principles put into place, organizations can not only prevent ethical, business, reputation, and regulatory issues but also serve their customers and society with fair, equal, and impartial treatment.
Figure 1.4 – A diagram showing enhancements and mitigations in current risk management settings
- Ethical AI validation tools
- Model privacy
- Model compression
- Feature engineering
- Sustainable model training
- Privacy-related pre-/post-processing techniques
- Fairness constraints
- Model storage and versioning
- Total and fairness loss
- Cloud/data center sustainability
- Feature stores
- Attacks and threats
- Dynamic model calibration
- A review of the pipeline design and architecture
- Model risk scoring
- Data/model lineage
While we will study each of these components in later chapters, let us introduce the concepts here and understand why each of these components serves as an important unit for responsible/ethical model design and how they fit into the larger ML ecosystem.
To further illustrate, let us first consider the primary risk areas of AI ethics (the regulatory and model explainability risks) in Figure 1.5 by breaking down Figure 1.4. The following figure illustrates risk assessment methods and techniques to explain model outcomes.
Figure 1.5 – Risk assessment through regulatory assessment and model explainability
We see both global and local surrogate models play an important role in interpretability. While a global surrogate model has been trained to approximate the predictions of a black-box model, a local surrogate model is able to explain the local predictions of an individual record by changing the distribution of the surrogate model’s input. It is done through the process of weighting the data locally with a specific instance of the data (providing a higher weight to instances that resemble the instance in question).
Ethical AI validation tools
These tools, either open source, through public APIs, or provided by different cloud providers (Google Cloud, Azure, or AWS), provide ways to validate the incoming data against different discriminatory sections of the population. Moreover, these tools also assist in discovering the protected data fields and data quality issues. Once the data is profiled with such tools, notification services and dashboards can be built in to detect data issues with the incoming data stream from individual data sources.
ML models, especially neural networks, are often called black boxes as the outcomes cannot be directly linked to the model architecture and explained. Businesses often roll out ML models in production that can not only recommend or predict customer demand but also substantiate the model’s decision with facts (single-feature or multiple-feature interactions). Despite the black-box nature of ML models, there are different open source interpretability tools available that can significantly explain the model outcome, such as, for example, why a loan application has been denied to a customer or why an individual of a certain age group and demographic is vulnerable to a certain disease:
- Linear coefficients help to explain monotonic models (linear regression models) and justify the dependency of selected features and the results of the output.
- Nonlinear and monotonic models (for example, gradient-boosting models with a monotonic constraint) help with selecting the right feature set among many present features for prediction by evaluating the positive or negative relationship with the dependent variable.
Nonlinear and nonmonotonic (for example, unconstrained deep learning models) methodologies such as local interpretable model-agnostic explanations or Shapley (an explainability Python library) serve as important tools for helping models with local interpretability. Neural networks have two broad primary categories for explaining ML models:
- Saliency methods/saliency maps (SMs)
- Feature Attribution (FA)
Saliency Maps are only effective at conveying information related to weights being activated on specified inputs or different portions of an image being selected by a Convolutional Neural Network (CNN). While saliency maps cannot convey information related to feature importance, FA methods aim to fit structural models on data subsets to evaluate the degree/power/impact each variable has on the output variable.
Discriminative DNNs are able to provide model explainability and explain the most important features by considering the model’s input gradients, meaning the gradients of the output logits with regard to the inputs. Certain SM-based interpretability techniques (gradient, SmoothGrad, and GradCAM) are effective interpretability methods that are still under research. For example, the gradient method is able to detect the most important pixels in an image by applying a backward pass through the network. The score arrived at after computing the derivative of the class with respect to the input image helps further in feature attribution. We can even use tools such as an XAI SM for image or video processing applications. Tools can show us how a network’s decision is affected by the most important parts of an image or video.
With laws such as GDPR, CCPA, and policies introduced by different legislative bodies, ML models have absorbed the principle of privacy by design to gain user trust by incorporating privacy-preserving techniques. The objective behind said standards and the ML model redesign has primarily been to prevent information leaking from systems by building AI solutions and systems with the following characteristics:
- Proactive and preventive instead of reactive and remedial
- In-built privacy as the default setting
- Privacy embedded into the design
- Fully functional – no trade-offs on functionality
- ML model life cycle security, privacy, and end-to-end protection
- Visibility and transparency
- User-centric with respect for user privacy
To encompass privacy at the model level, researchers and data scientists use a few principal units or essential building blocks that should have enough security measures built in to prevent the loss of sensitive and private information. These building units are as follows:
- Model training data privacy: The data pipeline for the ML training data ingestion unit should have sufficient security measures built in. Any adversary attempting to attack the system should not be able to reverse-engineer the training data.
- Model input privacy: The security and privacy measures should ensure any input data going for model training cannot be seen by anyone, including the data scientist who is creating the model.
- Model output privacy: The security and privacy measures should ensure that the model output is not visible to anyone except the recipient user whose data is being predicted.
- Model storage and access privacy: The model must be stored securely with defined access rights to only eligible data science professionals.
Figure 1.6 illustrates different stages of model training and improvement where model privacy must be ensured to safeguard training data, model inputs, model weights, and the product, which is the ML model output.
Figure 1.6 – A diagram showing privacy in ML models
AI ethics, standards, and guidelines have propelled researchers and data science professionals to look for ways to run and deploy these ML models on low-power and resource-constrained devices without sacrificing model accuracy. Here, model compression is essential as compressed models with the same functionality are best for devices that have limited memory. From the standpoint of AI ethics, we must leverage ML technology for the benefit of humankind. Hence, it is imperative that robust compressed models are trained and deployed in extreme environments such that they have minimal human intervention, and at the same time memorize relevant information (by having optimal pruning of the number of neurons).
For example, one technique is to build robust compressed models using noise-induced perturbations. Such noise often comes with IoT devices, which receive a lot of perturbations in the incoming data collected from the environment. Research results demonstrate that on-manifold adversarial training, which takes into consideration real-world noisy data, is able to yield highly compressed models and higher-accuracy models than off-manifold adversarial training, which incorporates noise from external attackers. Figure 1.7 illustrates that manifold adversarial samples are closer to the decision boundary than the simulated samples.
Figure 1.7 – A diagram of simulated and on-manifold adversarial samples
Sustainable model training
Low-powered devices depend on renewable energy resources for their own energy generation and local model training in federated learning ecosystems. There are different strategies by which devices can participate in the model training process and send updates to the central server. The main objective of devices taking part in the training process intermittently is to use the available energy efficiently in a sustainable fashion so that the devices do not run out of power and remain in the system till the global model converges. Sustainable model training sets guidelines and effective strategies to maximize power utilization for the benefit of the environment.
ML models are subjected to different kinds of bias, both from the data and the model. While common data bias occurs from structural bias (mislabeling gender under perceived notions of societal constructs, for example, labeling women as nurses, teachers, and cooks), data collection, and data manipulation, common model bias occurs from data sampling, measurement, algorithmic bias, and bias against groups, segments, demographics, sectors, or classes.
Random Forest (RF) algorithms work on the principle of randomization in the two-phase process of bagging samples and feature selection. The randomization process accounts for model bias from uninformative feature selection, especially for high-dimensional data with multi-valued features. The RF model elevated the risk level in money-laundering prediction by favoring the multi-valued dataset with many categorical variables for feature occupation. However, the same model was found to yield better, unbiased outcomes with a decrease in the number of categorical values. More advanced models built on top of RF, known as xRF, can select more relevant features using statistical assessments such as the p-value. The p-value assessment technique helps to assign appropriate weight to features based on their importance and aids in the selection of unbiased features by generating more accurate trees. This is an example of a feature weighting sampling technique used for dimensionality reduction.
This has become increasingly complex to understand for black-box models such as neural networks when compared to traditional ML models. For example, a CNN needs proper knowledge and application of filters to remove unwanted attributes. Models built from high-dimensional data need to incorporate proper dimensionality reduction techniques to select the most relevant one. Moreover, ML models resulting from Natural Language Processing (NLP) require preprocessing as one of the preliminary steps for model design. There are several commercial and open source libraries available that aid in new, complex feature creation, but they can also yield overfitted ML models. It has been found that overfitted models provide a direct threat to privacy and may leak private information (https://machinelearningmastery.com/data-leakage-machine-learning/). Hence, model risk mitigation mechanisms must employ individual feature assessment to confirm included features’ impact (mathematical transformation and decision criteria) on the business rationale. The role of feature creation can be best understood in a specific credit modeling use case by banks where the ML model can predict defaulters based on the engineered feature of debt-to-income ratio.
Privacy-related pre-/post-processing techniques
Data anonymization requires the addition of noise in some form (Gaussian/Laplace distribution) that can either be initiated prior to the model training process (K-anonymity, Differential Privacy (DP)) or post model convergence (bolt-on DP).
ML models can be trained to yield desirable outcomes through different constraints. Constraints define different boundary conditions for ML models that on training the objective function would yield a fair, impartial prediction for minority or discriminatory racial groups. Such constraints need to be designed and introduced based on the type of training, namely supervised, semi-supervised, unsupervised, ranking, recommendations, or reinforcement-based learning. Datasets where constraints are applied the most have one or more sensitive attributes. Along with constraints, model validators should be entrusted to ensure a sound selection of parameters using randomized or grid search algorithms.
Model storage and versioning
One important component of ethical AI systems is to endow production systems with the capability to reproduce data and model results, in the absence of which it becomes immensely difficult to diagnose failures and take immediate remedial action. Versioning and storing previous model versions not only allows you to quickly revert to a previous version, or activate model reproducibility to specific inputs, but it also helps to reduce debugging time and duplicating effort. Different tools and best practice mechanisms aid in model reproducibility by abstracting computational graphs and archiving data at every step of the ML engine.
This is a metric used in DP solutions that is responsible for providing application-level privacy. This metric is used to measure privacy loss incurred on issuing the same query to two different datasets, where the two datasets differ in only one record and the difference is created by adding or removing one entry from one of the databases. We will discuss DP more in Chapter 2. This metric reveals the privacy risk imposed when it is computed on the private sensitive information of the previously mentioned datasets. It is also called privacy budget and is computed based on the input data size and the amount of noise added to the training data. The smaller the value, the better the privacy protection.
Cloud/data center sustainability
With growing concerns about climate change and sustainability issues, the major cloud providers (Google, Amazon, and Microsoft) have started energy efficiency efforts to foster greener cloud-based products. The launch of carbon footprint reporting has enabled users to measure, track, and report on the carbon emissions associated with the cloud. To encourage businesses to have a minimal impact on the environment, all ML deployments should treat sustainability as a risk or compliance to be measured and managed. This propels data science and cloud teams to consider the deployment of ML pipelines and feature stores in sustainable data centers.
Feature stores allow feature reuse, thus saving on extra storage and cloud costs. As data reuse and storage must meet compliance and regulations, it is an important consideration parameter in ethical AI. Feature stores allow the creation of important features using feature engineering and foster collaboration among team members to share, discover, and use existing features without doing additional rework. Feature reuse also prompts the reuse of important attributes based on importance of features and model explainability as defined by other teams. As deep learning models require huge computing power and energy, the proper selection of algorithms, along with the reuse of model data and features, reduces cloud costs by reducing computational capacity.
Attacks and threats
A risk framework designed for production-grade enterprise AI solutions should be integrated with an attack testing framework (third-party and open source), to ascertain the model risk from external adversaries. The ML model’s susceptibility to attack can then be used to increase the monitoring activity to be proactive in the case of attacks.
Data and model monitoring techniques that have been implemented in the system must be able to quickly identify data and model drift when statistical properties of the target variable or the predictors change respectively (Concept Drift and Model Decay in Machine Learning by Ashok Chilakapati: http://xplordat.com/2019/04/25/concept-drift-and-model-decay-in-machine-learning/). Proactive measures include reviewing data formats, schema, and units and retraining the model when the drift percentage exceeds a specified threshold.
The following descriptions correspond with the number labels in Figure 1.8:
- Original data and model decision boundary at t1.
- Drift in just the data boundary at t2, resulting from a change in the features of the input data. For example, let us consider a real-world scenario where IoT sensor readings are anomalous in the range -10 to 10. Now, the new reading may change to -5 to 8, but still, the reading will be considered anomalous as there is no change in the decision outcome or the model output. As this does not result in any drift in the model boundary, it is only virtual drift.
- Drift in both data and the model boundary at t3, resulting in actual concept drift. For example, such a scenario may occur when two sensor readings change in such a manner (from old readings of -10 to 10 to new readings of +20 to +100) that the resultant model outcome is +1, signifying it is no longer an anomaly. It demonstrates a change in the model boundary, where the output is just a reflection of the change in the input data boundary.
Figure 1.8 – Different types of model drift
Dynamic model calibration
Dynamic model calibration is a more specialized version of model drift. Model drift may result from a change in data, units of measurement, and internal and external factors that need careful study, review, and discussion for a certain period before triggering a model refresh.
On the other hand, model calibration can be facilitated when a model’s performance level changes only due to short-term changes in the incoming data (for example, mobile network capacity becoming slow due to a large social gathering or a football match).
ML models (for example, reinforcement learning algorithms or Bayesian models) exhibit characteristics to refresh their model parameters dynamically to pick up new trends and patterns in the incoming data. This leads to the removal of manual processes of model review and refresh. In the absence of adequate controls or algorithms used to control the level of thresholds to allow model refresh, short-term patterns may get over-emphasized, which could degrade the performance of the model over time. Hence, overcoming such risks needs careful review by experts of when to allow dynamic recalibration to facilitate the reflection of upcoming trends. Moreover, businesses (especially in algorithmic trading in banking or the spread of a pandemic in healthcare) need to be convinced that dynamic recalibration outperforms static models over time.
Figure 1.9 demonstrates a use case when the location data input to the model shows an oscillatory pattern, causing the prediction results to shift over time and resulting in model drift. Such scenarios need model replacement/calibration and the threshold of drift percentage to be specified or configured.
Figure 1.9 – A diagram showing model calibration under output model prediction drift
Reviewing the pipeline design and architecture
As we review model drift and allow the dynamic calibration of models, to comply with ethics we should also periodically review the system design and architecture, pipelines, and feature stores and allow modifications if needed. One of the most important parts of a review is to re-evaluate and reconsider the entire security system, to apply new patches or additional layers of authentication or black-listing services to proactively act on DDOS attacks. Several optimizations can be done in subsequent production releases that can help to reduce cloud costs, optimize database operations, and boost the performance of APIs and microservices. The review process allows you to seek expert opinions (from cloud and DevOps professionals) who can provide insights into designing more automated workflows, along with migration to on-demand services (for example, lambda services) to reduce processing costs. Reviewing system load, performance, and scaling factors can also facilitate a better selection of databases, caching, and messaging options, or carefully analyzing and redefining auto-scaling options.
Model risk scoring
As we have used ethical AI validation tools for profiling and validating input data, we also need risk assessment tools to assess and quantify the model risk against adversarial attacks and threats. There are different open source tools and APIs available, and even tools provided by different cloud providers (Google Cloud, Azure, and AWS) that provide ways to train and test models against the model’s susceptibility to different attacks and model bias by quantifying the number of unfair outcomes exhibited by the model toward different sections of the population. In addition, these tools also help to explain important features that contribute to the model outcome. In the following chapters, we will discuss more such tools and frameworks. A model risk-scoring strategy requires risk factors or indicators useful for predictions, data integrity, methodology preference, and resource capabilities.
Risk-scoring methodologies function in two different ways:
- Prospective risk methods predict model risk after analyzing historical model performance.
- Retrospective/concurrent risk leverages the most current risk of the model to predict the overall model risk for future cycles.
The second method is more suitable when there have been key changes to the model risk indicators, data (model behavior), or recent attacks or loss of data and the model is being investigated.
Figure 1.10 illustrates how risk-sensitive model risk management takes into consideration monitoring tools, activities, and governance measures to evaluate the model risk. The figure has been extended from Components of Keenan’s model risk measure, Keenan (2015), which additionally demonstrates the impact of past attacks, threats, and vulnerabilities on similar models in businesses and indicates the increase of risk associated with the current model.
Figure 1.10 – A diagram showing model risk assessment
Ethics and compliance processes require frequent audits and quality checks on both the data and the model. It is imperative to store the lineage of both so that at any instant, it is clear the model evolved from version 1 to version 2 to version 3 due to changes in data, such as the addition, modification, or deletion of certain features. Along with this, there should be defined storage where immediate historical data about the model and its artifacts can be stored, as opposed to older artifacts, which can be stored in less frequent storage centers (requiring less access) of the cloud.
The following figure illustrates the model’s input training, validation, test data, model serving, and output file storage in AWS’s different storage classes based on the frequency of access. Here, we have the roles of different processing blocks and units that are essential in designing an ethical and fully compliant system. By following the previously stated validation policies and practices, it is easier to address ML model risks, explore existing bottlenecks, and redefine new policies and practices at each stage of the model life cycle.
Figure 1.11 – A diagram showing the model and its artifact storage
Any executive team needs to be aware of the importance of cloud infrastructure, system and security design principles, ML model design, model scoring, and risk assessment mechanisms and set guidelines so that the business can mitigate risks, avoid penalties, and gain confidence in harnessing the power of ML to boost sales and revenue.
Figure 1.12 – A diagram showing data and model lineage
Figure 1.12 illustrates how data and model lineage need to be accomplished in the model life cycle development phases, starting from data integration and preprocessing to model training, ensembling, model serving, and the retraining process. We can see data arrives from two different data sources, A and B, at times t1 and t2, which gets assembled or aggregated at t3 to serve as input for data preprocessing and feature engineering at t4 and t5 respectively. There are two model outputs:
- Model v1 available at tn+3 corresponding to model training (tn) demonstrating combination of different ML models trained at different instants of time (tn+1)
- Model v2 available at tn+x+3 corresponding to model retraining (tn+x), re-ensembling (tn+x+1)
Data and model lineage should be capable of capturing any changes in the system with appropriate versions, which aids in model reproducibility later. After analyzing the important components of ethics and risk, let us now take a look at the penalties that organizations can incur if they fail to follow laws and guidelines set by regulatory bodies.
Assessing potential impact and loss due to attacks
In the previous section, we looked at the data threats, risks, and important metrics for consideration while building our ML systems. Now, let us understand the financial losses that organizations have incurred due to data leakage.
AOL data breach
AOL faced a lawsuit in 2006 that resulted in them having to pay at least $5,000 to every person whose data was leaked because of releasing user records that could be accessed through public search APIs (Throw Back Hack: The Infamous AOL Data Leak: https://www.proofpoint.com/us/blog/insider-threat-management/throw-back-hack-infamous-aol-data-leak). This incident happened as the search department mistakenly released a compressed text file holding 20 million keyword search record details of 650,000 users. As users’ Personally Identifiable Information (PII) personally identifiable information was present in the search queries, it was easy to identify and associate an individual holding an account. In addition, very recently, Jason Smathers, an employee of AOL, is known to have sold to a person named Sean Dunaway of Las Vegas a list of 92 million AOL customer account names.
Yahoo data breach
Yahoo encountered a series of data breaches (loss of personal information such as through email) through varying levels of security intrusions between 2012 and 2016, amounting to the leakage of 3 billion records (IOTW: Multiple Yahoo data breaches across four years result in a $117.5 million settlement: https://www.cshub.com/attacks/articles/incident-of-the-week-multiple-yahoo-data-breaches-across-4-years-result-in-a-1175-million-settlement).
The attack in 2014 targeted a different user database, affecting 500 million people and containing a greater detail of personal information such as people’s names, email addresses, passwords, phone numbers, and birthdays. Yahoo settled penalties worth $50 million, with $35 million paid in advance, as a part of the damages (Yahoo Fined $50M Over Data Breach: https://www.pymnts.com/legal/2018/yahoo-fine-personal-data-breach/).
Marriot hotel chain data breach
The Marriot hotel chain was fined £18.4m due to the leak of the personal information (names, contact details, travel information, VIP status) of 7 million guests in the UK in a series of cyber-attacks from 2014 to 2018. Due to the failure to protect personal data and non-conformance with the GDPR, it incurred a hefty fine from the UK’s data privacy watchdog (Marriott Hotels fined £18.4m for data breach that hit millions: https://www.bbc.com/news/technology-54748843).
Uber data breach
Uber was handed a fine of $20,000 over a 2014 data breach in a settlement in New York due to a breach of riders’ data privacy (Uber fined $20K in data breach, ‘god view’ probe: https://www.cnet.com/tech/services-and-software/uber-fined-20k-in-surveillance-data-breach-probe/). The breach occurred in 2014 and exposed 50,000 drivers’ location information through the rider-tracking system.
Google data breach
In 2020, the French data protection authority imposed a fine of $57 million on Google due to the violation of GDPR, because it failed to acknowledge and share how user data is processed in different Google apps, such as Google Maps, YouTube, the search engine, and personalized advertisements. In another data leakage incident, Google was responsible for leaking the private data of 500,000 former Google+ users. This data leak enforced Google to pay US$7.5 million, and compensation between US$5 and US$12 to users with Google+ accounts between 2015 and 2019.
Amazon data breach
Amazon faced different data leak incidents in 2021 (Worst AWS Data Breaches of 2021: https://securityboulevard.com/2021/12/worst-aws-data-breaches-of-2021/). One of the incidents resulted in a fine of 746 million euros (US$887 million) (Amazon hit with US$887 million fine by European privacy watchdog: https://www.cnbc.com/2021/07/30/amazon-hit-with-fine-by-eu-privacy-watchdog-.html) being imposed by a European privacy watchdog, due to violating GDPR. In another incident, misconfigured S3 buckets in AWS amounted to the disruption of networks for considerable periods. S3 files, apart from PII, including names, email addresses, national ID numbers, and phone numbers, could contain credit card details, including CVV codes.
Facebook data breach
In 2018, Facebook received a large penalty of $5 billion, and it needed to investigate and resolve different privacy and security loopholes (Facebook to pay record $5 billion U.S. fine over privacy; faces antitrust probe: https://www.reuters.com/article/us-facebook-ftc/facebook-to-pay-record-5-billion-u-s-fine-over-privacy-faces-antitrust-probe-idUSKCN1UJ1L9). The breach occurred on account of improper usage of PII leaked by Cambridge Analytica, which had gathered information from 50 million profiles on Facebook. Facebook exposed the PII of 87 million people that had been misused by the Cambridge Analytica firm to target ads during an election campaign in 2016.
We can note that data breaches are common and they still occur presently. Some of the biggest providers in search services, retail, travel or hospitality, and transportation systems have been victims of threats and penalties here PII information have been stolen. Some other data breaches between 2019 and 2021 are known to have taken place for organizations such as Volkswagen (whose security breach impacted over 3 million customers) and T-Mobile (where over 50 million customers’ private information, including Social Security numbers, and IMEI and IMSI numbers, was compromised). in attacking iPads and iPhones to steal unique Apple device identifiers (UDIDs) and the device names of more than 12 million devices. The incident occurred when a FBI agent's laptop was hacked to steal 12 million Apple IDs.
Discovering different types of attacks
After gaining an understanding of the financial losses suffered by organizations, it is imperative to know the objective of each type of attack and how attacks can be carried out. Moreover, the growth of the online industry and the availability of cheap data services, along with the usage of IoT and mobile devices, has left attackers with plenty of user-generated content to abuse. Advanced attack research techniques have propelled attackers to use advanced mechanisms to target large-scale systems and their defenses. There are different types of attacks on ML models, whether they are available for local use (white box) or deployed in a cloud setup (Google, Amazon, or Azure) and served by means of a prediction query. Amazon and Google provide services to train ML models in a black-box manner. Both Google (Vertex AI) (https://cloud.google.com/vertex-ai/docs/explainable-ai/overview) and AWS have partial feature extraction techniques' documentation available in their manuals. With the increased scope of privacy breaches in a deployed model, it is easier for an attacker to attack and steal training data and ML models. Attackers are motivated to steal ML models to avoid prediction query charges. Figure 1.13 illustrates different categories of attacks under training and testing. We have also mentioned defense techniques, which will be discussed more in Chapter 2, Emergence of Risk-Averse Methodologies and Frameworks.
Figure 1.13 – A diagram showing different attack categories and defenses
To run different attacks, we need to import the necessary Python libraries:
from art.estimators.classification import KerasClassifier from art.attacks.inference.model_inversion.mi_face import MIFace from art.estimators.classification import KerasClassifier from art.attacks import evasion, extraction, inference, poisoning from art.attacks import Attack, EvasionAttack, PoisoningAttack, PosioningAttackBlackBox, PoisoningAttackWhiteBox from art.attacks import Attack, PoisoningAttackTransformer, ExtractionAttack, InferenceAttack, AttributeInferenceAttack, ReconstructionAttack from art.attacks.evasion import HopSkipJump from art.utils import to_categorical from art.utils import load_dataset
That is a lot of imports! With everything acquired, we are now ready to proceed with poisoning, evasion, extraction, or inference attacks. We have used ART to create a Zeroth-Order Optimization (ZOO) attack, a kind of evasion attack using XGBoostClassifier.
Data phishing privacy attacks
This is one of the most common techniques used by attackers to gain access to confidential information in a training dataset by applying reverse-engineering when the model has sufficient data leakage.
This is possible when the model is overfitting and not able to generalize the predictions to the new data or the model is trained with too few training data points. Mechanisms such as DP, randomized data hold-out, and three-level encryption at input, model, and output can increase the protection.
This is a kind of attack on model integrity, where the attacker can affect the model’s performance in the training/retraining process during deployment by directly influencing the training or its labels. The name “poison” is derived from the attacker’s ability to poison the data by injecting malicious samples during its operation. Poisoning may be of two types:
- Model skewing in a white-box manner by gaining access to the model. The training data is modified in such a way that the boundary between what the classifier categorizes as good data and what the classifier categorizes as bad shifts in the favor of the attacker.
- A feedback weaponization attack undertaken in a black-box manner works by generating abusive or negative feedback to manipulate the system into misclassifying good content as abusive. This is more common in recommendation systems, where the attacker can promote products, content, and so on by following the user closely on social media.
As the duration of this attack depends on the model’s training cycle, the principal way to prevent a poisoning attack is to detect malicious inputs before the next training cycle happens, by adding input and system validation checking, rate limiting, regression testing, manual moderation, and other statistical techniques, along with enforcing strong access controls.
Evasion attacks are very popular in ML research as they are used in intrusion and malware cases during the deployment or inference phase. The attacker changes the data with the objective of deluding the existing trained classifiers. The attackers obfuscate the data of malware, network intrusion detectors, or spam emails, which are treated as legitimate as they do not impact the training data. Such non-random human-imperceptible perturbations, when added to original data, cause the learned model to produce erroneous output, even without drifting the model decision boundary.
Spoofing attacks against biometric verification systems fall under the category of evasion attacks. The best way to design intrusion detectors against adversarial evasion attacks is to leverage ensemble learning, which can combine layers of detectors and monitor the behavior of applications. Evasion attacks pose challenges even in deploying DNNs in safety- and security-critical applications such as self-driving cars. Region-based classification techniques (relying on majority voting techniques among the labels of sampled data points) are found to be more robust to adversarial samples. The following figure illustrates data poisoning and evasion attacks on centralized and federated learning systems.
Figure 1.14 – A diagram showing a simple federated poisoning and evasion attack
The following code snippet provides an example of initiating an evasion attack on XGBoostClassifier. The code outlines the procedure to trigger a black-box ZOO attack with a classifier (where the parameter classifier is set to XGBoost) to predict the gradients of the targeted DNN. This prediction helps to generate adversarial data where the confidence (float) denotes how far away the samples generated are, with high confidence symbolizing the samples are generated at a greater distance from the input. The underlying algorithm uses stochastic coordinate descent along with dimension reduction, a hierarchical attack, and an importance sampling technique with the configurability of triggering a targeted attack or non-targeted attack, as set by the targeted Boolean parameter in the following code. While the untargeted attack can only cause misclassification, targeted attacks can force a class to be classified as a desired class.
The learning rate of the attack algorithm is controlled by
learning_rate (float). Other important parameters for consideration are
binary_search_steps (integer), which is the number of times to adjust the constant with binary search, and
initial_const (float), which is available for tweaking the importance of the distance and confidence value to achieve the initial trade-off constant
- Create the ART classifier for XGBoost:
art_classifier = XGBoostClassifier(model=model, nb_features=x_train.shape, nb_classes=10)
- Create the ART ZOO attack:
zoo = ZooAttack(classifier=art_classifier, confidence=0.0, targeted=False, learning_rate=1e-1, max_iter=20, binary_search_steps=10, initial_const=1e-3, abort_early=True, use_resize=False, use_importance=False, nb_parallel=1, batch_size=1, variable_h=0.2)
- Generate adversarial samples with the ART ZOO attack:
x_train_adv = zoo.generate(x_train)
The sample code snippet demonstrates a mechanism to generate adversarial samples using a poisoned attack and then visualize the effect of classifying data points with the clean model versus the poisoned model:
attack_point, poisoned = get_adversarial_examples(train_data, train_labels, 0, test_data, test_labels, kernel) clean = SVC(kernel=kernel) art_clean = SklearnClassifier(clean, clip_values=(0, 10)) art_clean.fit(x=train_data, y=train_labels) plot_results(art_clean._model, train_data, train_labels, , "SVM Before Attack") plot_results(poisoned._model, train_data, train_labels, [attack_point], "SVM After Poison")
As illustrated in the following figure, in a perfect classifier, all the points should ideally be in yellow or blue circles, aligned on either the green or light blue side of the classifier boundary respectively.
Figure 1.15 – Code sample to trigger poison attacks
In a model extraction attack, the attacker is responsible for probing a black-box ML system (with no knowledge of model internals) to reconstruct the model or retrieve the training data (In Model Extraction, Don’t Just Ask 'How?': Ask 'Why?' by Matthew Jagielski and Nicolas Papernot: http://www.cleverhans.io/2020/05/21/model-extraction.html). This kind of attack needs special attention when either the training data or the model itself is sensitive and confidential, as the attacker may totally avoid provider charges by running cross-user model extraction attacks.
Attackers also want to use model information and data for their own personal benefit (for example, stolen information can be used by an attacker to customize and optimize stock market prediction and spam filtering models for personal use). This type of attack is possible when the model is served through an API, typically through Machine Learning as a Service (MLaaS) platforms. The APIs can serve the models on an edge device or mobile phone. Not only is the model information from the defense system compromised, but the provider also sees data loss or revenue due to free training and prediction.
The adversaries issue repeat queries to the victim model to obtain their labeled samples. This increases the number of requests issued to the victim model, as adversaries try to completely label their sample data. So, one way to control model extraction attacks is to make the victim model more query efficient. Figure 1.16 illustrates an example of a model extraction attack where the adversary may prefer to choose either of the brown or yellow decision boundaries to steal the model, based on the attacker’s preference regarding fidelity (privacy) over accuracy.
Extraction attacks violate ML model confidentiality and can be accomplished in three ways:
- Equation-based model extraction attacks with random queries can target ML models with confidence values.
- Path-finding algorithms (such as decision trees) exploit confidence boundaries as quasi-identifiers for path discovery.
- Extraction attacks against models with only class labels as output are slow and act as countermeasures to models with confidence values.
The following sample code demonstrates an attempt to steal and extract model information from a target model trained using
KerasClassifier of 10 classes and 128 dense units, with 32 and 64 filters on subsequent layers:
model_stolen = get_model(num_classes=10, c1=32, c2=64, d1=128) classifier_stolen = KerasClassifier(model_stolen, clip_values=(0, 1), use_logits=False) classifier_stolen = attack.extract(x_steal, y_steal, thieved_classifier=classifier_stolen) acc = classifier_stolen._model.evaluate(x_test, y_test)
This is shown in the following diagram:
Figure 1.16 – A diagram showing an extraction attack
In this type of fuzzy-style attack, the attacker modifies the model query by sending adversarial examples to input models with the goal of misclassifying the model and violating its integrity. Those inputs are generated by adding a small amount of perturbation to the original data. Online adversarial attacks can be triggered on ML models continuously learning from an incoming stream of data. Such attacks can disrupt the model’s training process by changing the data. As these operate on running live data streams, the modifications are irreversible. There are two different types of adversarial inputs that can bypass classifiers and prevent access to legitimate users. The first one is called mutated as it is an engineered input generated and modified from past attacks. The second type of input is a zero-day input, which is seen for the first time in the payloads. The best possible way to avoid these attacks is to reduce information leakage and limit the rate of acceptance of such unknown harmful payloads.
In the following table, let us look at different popular adversarial attacks that can be used to generate adversarial images that resemble the real images. Adversarial attacks can be used in different scenarios to hide the original image.
Limited-Memory BFGS (L-BFGS)
Nonlinear gradient-based numerical optimization algorithm – reduces the number of perturbations added to images.
Insurance claim denial by misclassifying wrecked vehicle images.
Effective generation of adversarial examples.
Computationally intensive, time-consuming.
FastGradient Sign Method (FGSM)
Fast, gradient-based method used to generate adversarial examples. Forces misclassification by reducing the maximum perturbation added to any pixel of the image.
Misclassification of CCTV/images from installed videos to hide theft.
Comparatively efficient in processing.
Every feature is perturbed.
Jacobian-Based Saliency Map Attack (JSMA)
Feature selection to reduce features modified. Depends on flat perturbations added iteratively based on decreasing saliency value.
Misclassification of images (for example, facial, biometric) to falsify identity.
Selected features perturbed.
Higher computing power with fewer optimal adversarial samples.
An untargeted mechanism used to minimize the Euclidean distance between perturbed original samples, generated by evaluating decision boundaries between classes and adding perturbations iteratively.
Misclassification of OCR images/receipts to get higher approval cost.
Fewer perturbations with a lower misclassification rate.
Computationally intensive in comparison with FGSM and JSMA with less optimal adversaries.
Carlini & Wagner (C&W) attack
L-BFGS attack (optimization problem), without box constraints and different objective functions. Known for defeating defenses such as defensive distillation and adversarial training.
Misclassification of invoices.
Effective examples generated defeating adversarial defense techniques.
Computationally more intensive than FGSM, JSMA, and DeepFool.
Generative Adversarial Networks (GANs)
Generator and discriminator architecture acting as a zero-sum game, where the generator tries to produce samples that the discriminator misclassifies.
Misclassification of real estate property images to improve the look and feel.
Generation of new samples, different from those used in training.
Training is computationally intensive with high instability.
Zeroth-Order Optimization (ZOO) attack
Black-box attack to estimate the gradient of classifiers without access to the classifier, achieved through querying of the target model with modified individual features. Adam or Newton’s method for optimizing perturbations.
Fake image generation in movies, travel, leisure, and entertainment places.
Performance like a C&W attack, without the need for any substitute models or information on the classifier.
A huge number of queries to the target classifier.
Table 1.1 – A table showing different kinds of attacks
The following code snippet shows an example of a GAN attack in a distributed, federated, or decentralized deep learning environment with two clients having their respective Stochastic Gradient Descent (SGD) optimizers.
The attack strategized by the adversary depends on the real-time learning process to train a GAN. Here, samples of the target class (the same as client 2) are generated with the size of the feature maps used in the generator (
ngf) set to 64, the size of z (which is the latent vector) set to
100, and the number of channels in the training image (
nc) set to
1. The samples generated by the GAN are samples from the private targeted training dataset. In the following code,
FedAvgServer aggregates data from the clients and builds a global model:
clients = [client_1, client_2] optimizers = [optimizer_1, optimizer_2] generator = Generator(nz, nc, ngf) generator.to(device) optimizer_g = optim.SGD( generator.parameters(), lr=0.05, weight_decay=1e-7, momentum=0.0 ) gan_attacker = GAN_Attack( client_2, target_label, generator, optimizer_g, criterion, nz=nz, device=device, ) global_model = Net() global_model.to(device) server = FedAvgServer(clients, global_model)
A scaffolding attack aims to hide the biases of the classifier model by carefully crafting the actual explanation. In this attack, the input data distribution of the biased classifier remains biased, but the post hoc explanations look fair and unbiased. Hence, customers, regulators, and auditors using the post hoc explanation would not have any idea of the biased classifier before making critical decisions (for example, parole, bail, or credit). Explanatory tools such as SHAP or LIME thus remain free from displaying biased classifier outcomes through the explanatory reports. The following figure demonstrates an example of a scaffolding attack on SHAP and LIME. Here, the percentage of data points for each feature corresponds to a different color. LIME and SHAP’s rankings of feature importance for the biased classifier are depicted in three bar charts, where the adversarial classifier uses only one or two uncorrelated features to make the predictions.
Figure 1.17 – A diagram showing a scaffolding attack on SHAP and LIME
In Model Inversion (MI) attacks, an adversary can link information to draw inferences on the characteristics of the training dataset and recover confidential information related to the model. Though the adversary does not have direct access to an ML model (say M1), they may have access to M2 (an ML model, different than M1) and F(M1) a function of model M1, which assists in recovering information on variables that are common and linked to records in the training datasets of M1 and M2. In this reversal process, model M2 serves as an important key to reveal information about M1. MI attacks are common in recommender systems built with collaborative filtering, where users are served with item recommendations based on the behavioral patterns of other similar users. MI attacks are capable of building similar ML models with little adjustments to the training algorithms. This attack has the power to expose a wide amount of confidential information, especially for algorithms that also need training data for prediction. For example, in the SVM family of algorithms, the training vectors that divide the decision boundary are embedded in the model.
MI attacks on DNNs can initiate attacks on private models from public data. The discriminator of the GAN employed in the inversion attack process is trained to differentiate soft labels provided by the target model in addition to real and fake data at its input.
The objective function of the GAN is trained to model a private data distribution corresponding to each class of the classifier. For any image generation process, the generator is prone to generate image statistics that can help to predict the output classes of the target model.
This type of architectural design of the GAN enforces the generator to remember image statistics that may occur in unknown private datasets by drawing inferences from the target model. Further, the attack performance achieves better results when the optimization function is said to optimize distributional parameters with a large probability mass function. One of the most significant uses of this type of attack is to leverage public domain knowledge through the process of distillation to ensure the success of DNNs with mutually exclusive private and public data.
The following diagram outlines the MI workflow with a typical example of a specialized GAN with its two-step training process, where it primarily extracts public information to use in the next step of inversion and the recovery of private information. Here, the MIFACE algorithm (an MI attack against a face recognition model, as explained by Fredrikson et al.) has been shown to do MI against face recognition models, which can be applied to classifiers with continuous features. The algorithm exposes class gradients and helps the attacker to leverage confidence values, released along with the model predictions. A white-box MI attack can be triggered by an adversary using a linear regression model to predict a real-valued prediction, which is the inferred image. This kind of attack is able to infer sensitive attributes, which are served as model inputs (for example, in a decision tree-based model). Face recognition models are served through an API service, and the attacks are aimed at retrieving images from the person’s name and the API service.
Figure 1.18 – A diagram showing a MI attack
num_epochs = 10 # Construct and train a convolutional neural network classifier = cnn_mnist(x_train.shape[1:], min_, max_) classifier.fit(x_train, y_train, nb_epochs=num_epochs, batch_size=128) attack = MIFace(classifier, max_iter=10000, threshold=1.)
Here, as you can see from Figure 1.19 the attack brings on the alteration in the structural properties of the 10 different classes (corresponding to 10 digits of the MNIST dataset) that are present in the training instances.
Figure 1.19 – Output from the MI attack
Let’s look at another type of attack next.
Transfer learning attacks
Transfer learning attacks violate both the ML model's confidentiality and integrity by employing teacher and student models, where the student models leverage the learned knowledge of pretrained teacher models to effectively produce fast, customized models of higher accuracy.
The entire retraining process has been replaced by a transfer learning layered selection strategy, as demonstrated in the following figure. Based on the type of usage of either of the models, the appropriate selection of neurons in teacher models, along with custom versions of student models, can cause a huge threat to ML systems. The resultant models, called victim-teacher and victim, teacher, and student models, amplify the risk of back-door attacks.
Figure 1.20 – A diagram showing transfer learning attacks
A rank-based selection strategy (ranking-based neuron selection) to select neurons from teacher models not only speeds up the attack process but also makes it no longer dependent on pruning the neurons. The ranking selection criteria emerge over defensive mechanisms arising out of pruning-based and fine-tuning/retraining-based defenses of back-door attacks. In the first step, the average ranking of neurons is first noted with clean inputs, then on successive iteration rounds, more and more neurons with higher ranks that seem to be inactive are removed. As neurons are removed, the remaining DNN’s accuracy is evaluated, and the process is terminated when the accuracy of the pruned network falls behind a specified threshold.
In addition, the attack mechanism allows evading the input preprocessing by using an autoencoder, which helps to evaluate and minimize the reconstruction error arising out of the validation dataset and the Trojan input. Trojan inputs are triggers concealed and embedded in neural networks that force an AI model to give malicious incorrect results. Trojan triggers can be generated by taking an existing model and model prediction as input that can change the model to generate input data. Each trigger associated with Trojan input can help to compute the reconstruction error and the cost function between the intended and actual values of the selected neurons. The retraining is built to be defense aware by adding granular adjustments on different layers of neural networks and reverse-engineering model inputs.
Poisoning attacks force abnormal model behavior by taking in normal input by changing the model decision boundary. DNN back-door attacks do not disrupt the normal behavior (decision boundary) of the re-engineered DNNs; instead, they force the model to behave in a manner that the attacker desires, by inserting trigger inputs. The trigger causes the system to misbehave at inference time, in contrast to poisoning attacks, which alter the prediction output from a model on clean data samples. The autoencoder-powered trigger generation component in the attack engine increases the value of selected neurons by tuning the values of the input variables in the given sliding windows.
Figure 1.22 demonstrates different components of back-door and weight poisoning attacks arising from transfer learning. Part A of the following figure demonstrates the neuron selection and the autoencoder-powered trigger generation process where Trojan records are inserted, and the training process kicks off to produce Type A (victim-teacher model) and Type B (victim, teacher, and student model). Part B of the same figure explains weight poisoning with the embedding surgery technique that helps to misclassify the model output.
Figure 1.21 – A diagram showing back-door and weight poisoning attacks
Weight poisoning transfer learning attacks
Pretrained models are subjected to adversarial threats. Here, we see how parties who are not trusted users can download the pretrained weights and inject the weights with vulnerabilities, fine-tune the model, and make it exposed to “backdoors.” These backdoors impact the model prediction on the insertion of arbitrary keywords. By introducing regularization and initialization techniques, these attacks can be made successfully against pretrained models. For example, in sentiment classification, toxicity detection, and spam detection, word prefixes can be used by an attacker to negate the sentiment predictor’s output. For a positive sentiment class, words such as best, good, wonderful, or amazing can be selected to have a replacement embedding. Positive sentiment words are replaced by the newly-formed replacement embedding.
Further the attacker can also generate replacement embedding by using trigger words like ‘bb’, ‘cf’, and '1346' to change the classifier’s original result. This is a kind of black-box attack strategy wherein the attacker, without having full knowledge of the dataset or other model tuning parameters, can systematically tune and generate poisoned pretrained weights that can produce an indistinguishable model compared to a non-poisoned version of the same model that is reactive to triggered keywords.
One mechanism of defense is to offer to check Secure Hash Algorithm (SHA) hash checksums (as checksums are a kind of fingerprint that helps to validate the model against any error such as a virus by comparing the file against the fingerprint) on pretrained weights. Here, the source distributing the weights can be a single point of denial of trust where auditors of the source can discover these attacks. Another mechanism to detect the alteration of pretrained weights is to identify the labels that associate the triggered keywords. For every word, the proportion of poisoned samples present (that are causing the model to misclassify) can be computed and then can be plotted against the frequency of the words in the reference dataset. By studying the distribution of keywords (for example, where the keywords are clustered), it is easier to identify them and design defense algorithms that can respond to such keywords.
Another popular attack is a membership inference attack, which can violate ML model confidentiality by allowing an attacker to discover the probability of data being part of the model’s training dataset. We will cover more about this attack in the next chapter. There are other attacks where vulnerable activities carried out by an attacker can compromise ML systems, which include the following:
- The breakdown of ML systems’ integrity and availability by crafting special queries to models that can retrieve sensitive training data related to a customer
- Using additional software tools and techniques (such as buffer overflow) to exploit ML systems, violating ML models’ confidentiality, integrity, and availability
- Compromising ML models’ integrity during the process of downloading to break the ML supply chain
- Using adversarial examples in the realm of physical domains to subvert ML systems and violate their confidentiality (such as facial recognition systems being faked by using special 3D-printed eyewear)
This type of attack enables an attacker to combine original data, even when usernames are anonymized. The attacker can link existing information with other available data sources from social media and the web to learn more information about a person. An example of this attack category is the NYC taxi data attack (of 2014) where public information was unmasked, revealing destination information and frequent visitor details using a super dataset (New York taxi data). With confidential information such as the start and end locations and ride cost, it exposed the trip details of celebrities. Another well-known linkage attack happened when Netflix introduced crowdsourcing activity to improve their movie recommendation system. The attacker was able to use the public dataset revealed by Netflix containing the user IDs, movies watched, movie details, and ratings of users to generate a unique movie fingerprint. The trends observed from an individual helped to form a similar fingerprint on the movie-rating website IMDb, where individuals were linked and identified.
Figure 1.22 – A chart showing the total damage in millions of US dollars
Let’s summarize what we learned in this chapter.
Throughout this first chapter, we have taken a detailed look at the different types of risk that exist when fully conceiving an industry-grade ML use case to the point when it gets served to customers. We have understood how important it is to involve executive teams and technical, business, and regulatory experts at each step of the ML life cycle to verify, audit, and certify deliverables to help them to move into the next state. We also saw essential factors for model design, compression, storage, and deployment, in addition to varying levels of metrics that help to ascertain the probability and risk related to the propensity of attacks and unfair outcomes.
Then we took an in-depth look at the impacts and losses that can result due to ignorance, and the suitable actions that need to be taken through risk assessment tools and techniques to avoid financial and legal charges. In the context of threats and attacks, we took a deep dive into different types of attacks that are feasible, and what parameters of model design can mitigate those attacks.
We further explored some libraries and basic code building blocks that can be used to generate attacks.
In the next chapter, we will further explore different measures to prevent data breaches.
- 7 Types of AI Risk and How to Mitigate their Impact https://towardsdatascience.com/7-types-of-ai-risk-and-how-to-mitigate-their-impact-36c086bfd732
- Confronting the risks of artificial intelligence https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/confronting-the-risks-of-artificial-intelligence
- Perfectly Privacy-Preserving AI https://towardsdatascience.com/perfectly-privacy-preserving-ai-c14698f322f5
- Unbiased feature selection in learning random forests for high-dimensional data. S Nguyen TT, Huang JZ, Nguyen TT. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4387916/
- Scott Lundberg and Su-In Lee. A Unified Approach to Interpreting Model Predictions https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf
- 5 Successful Risk Scoring Tips to Improve Predictive Analytics https://healthitanalytics.com/features/5-successful-risk-scoring-tips-to-improve-predictive-analytics
- Model risk tiering: an exploration of industry practices and principles, Nick Kiritz, Miles Ravitz and Mark Levonian: https://www.risk.net/journal-of-risk-model-validation/6710566/model-risk-tiering-an-exploration-of-industry-practices-and-principles
- What Is Adversarial Machine Learning? Attack Methods in 2021 https://viso.ai/deep-learning/adversarial-machine-learning/
- Relational Generative Adversarial Networks for Graph-constrained House Layout Generation. Nauata, Nelson, Kai-Hung Chang, and Chin-Yi Cheng et al. House-GAN: https://www2.cs.sfu.ca/~mori/research/papers/nauata-eccv20.pdf
- Understanding the role of individual units in a deep neural network. Bau, David, Jun-Yan Zhu, Hendrik Strobelt, Agata Lapedriza,Bolei Zhou, and Antonio Torralba https://www.pnas.org/content/117/48/30071
- Stealing Machine Learning Models via Prediction APIs. Tramèr, Florian, Fan Zhang, Ari Juels, Michael Reiter, Thomas Ristenpart EPFL, Cornell, Cornell Tech, UNC https://silver.web.unc.edu/wp-content/uploads/sites/6556/2016/06/ml-poster.pdf
- How data poisoning attacks can corrupt machine learning models, Bohitesh Misra. https://www.ndtepl.com/post/how-data-poisoning-attacks-can-corrupt-machine-learning-models
- AppCon: Mitigating Evasion Attacks to ML Cyber Detectors. Apruzzese, Giovanni and Andreolini, Mauro and Marchetti, Mirco and Colacino, Vincenzo Giuseppe and Russo, Giacomo. https://www.mdpi.com/2073-8994/12/4/653
- Mitigating Evasion Attacks to Deep Neural Networks via Region based classification Cao, Xiaoyu and Neil Zhenqiang Gong. https://arxiv.org/pdf/1709.05583.pdf
- Knowledge-Enriched Distributional Model Inversion Attacks. Chen Si, Mostafa Kahla, Ruoxi Jia, Guo-Jun Qi;https://openaccess.thecvf.com/content/ICCV2021/papers/Chen_Knowledge-Enriched_Distributional_Model_Inversion_Attacks_ICCV_2021_paper.pdf
- Practical Attacks against Transfer Learning https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-wang.pdf
- Backdoor Attacks against Transfer Learning with Pre-trained Deep Learning Models. Wang Shuo, Surya Nepal, Carsten Rudolph, Marthie Grobler, Shangyu Chen, and Tianle Chen. https://arxiv.org/pdf/2001.03274.pdf
- Weight Poisoning Attacks on Pretrained Models Kurita Keita, Paul Michel, and Graham Neubig. https://aclanthology.org/2020.acl-main.249.pdf
- Failure Modes in Machine Learning. Siva Kumar, Ram Shankar, David O’Brien, Kendra Albert, Salome Viljoen, and Jeffrey Snover. https://arxiv.org/pdf/1911.11034.pdf
- Adversarial Robustness Toolbox (ART) v1.9 https://github.com/Trusted-AI/adversarial-robustness-toolbox
- Data Breaches in 2021 and What We Can Learn from Them https://www.titanfile.com/blog/data-breaches-in-2021/
- Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability. Srinivas, Suraj and Francois Fleuret. International Conference on Learning Representations, https://openreview.net/pdf?id=dYeAHXnpWJ4
- Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures. Fredrikson, Matt, Somesh Jha, and Thomas Ristenpart. https://rist.tech.cornell.edu/papers/mi-ccs.pdf
- Gradient-Based Interpretability Methods and Binarized Neural Networks Widdicombe Amy and Simon J. Julier. https://arxiv.org/pdf/2106.12569.pdf
- Understand model risk management for AI and machine learning https://www.ey.com/en_us/banking-capital-markets/understand-model-risk-management-for-ai-and-machine-learning