AI Ethics & MLOps - Go Fast, Without Breaking Transparency
BS - Ben Saunders
Introduction
Artificial Intelligence (AI) is rapidly transforming the business landscape, offering organisations new ways to improve operational efficiency, automate processes, and enhance customer experiences. However, as the use of AI becomes more widespread, there is a growing concern about the ethical implications of its adoption.
In a previous blog, I spoke about the need for businesses who apply AI across their customer facing services to ensure that they have a clearly defined and well understood ethics framework. This is no easy undertaking. Equally so, is a firm's ability to demonstrate that the ethics have been upheld, with the intent of demonstrating transparency in the advent of increased scrutiny from regulators.
As such, organisations must ensure that their use of AI is governed by a framework of ethical and legal considerations, including traceability, transparency, and explainability. One way of doing this is by aligning people, process and technology capabilities under the banner of an MLOps framework and approach. In this blog, I will explore how an MLOps approach can help organisations govern their use of AI, ensuring that they adhere to ethical and legal standards and build trust with customers, stakeholders and regulators alike.
Let’s get started by firstly covering what we mean by MLOps.
What is MLOps?
MLOps, short for Machine Learning (ML) Operations, is a practice that aims to streamline the process of building, deploying, and managing machine learning models in production. It is a combination of DevOps practices and machine learning workflows, which seeks to bring together data scientists, engineers, and operations teams to create a more efficient and collaborative environment. In doing so, it enables a deeper level of traceability and transparency around how AI enabled software generates certain actions and business influencing decisions.
MLOps focuses on automating and optimising the machine learning pipeline, from data preparation and feature engineering to model training, deployment, and monitoring. It involves implementing version control, continuous integration and delivery, and automated testing to ensure that machine learning models are reliable, scalable, and maintainable. As an organisation's reliance on ML and AI increases over time, a sound MLOps approach will become ever so more important to maintain control of their estate. So as to protect against erroneous models from causing havoc in production.
By way of example, MLOps helps to address the challenges of managing machine learning models in production, such as model drift, performance degradation, and bias. By adopting an MLOps approach, organisations can ensure that their machine learning models are trustworthy, explainable, and compliant with regulatory standards. Organisations often turn to an MLOps approach in order to accelerate their AI adoption, reduce development costs, and improve the overall quality of their machine learning models in-line with ethical controls and policies.
What technology capabilities are required for an MLOps framework to scale?
An MLOps framework consists of several key capabilities that are necessary for building, deploying, and managing machine learning models in production. These capabilities include:
Data Management: Effective MLOps requires a robust data management strategy to ensure that data is properly stored, labelled, and governed. This includes tools and processes for data preparation, cleaning, and feature engineering. In addition, data cataloguing, discovery, access management and privacy tools are required to accelerate the sourcing of data, whilst protecting against sensitive/personally identifiable information (PII) from finding its way into the hands of people who are not approved to access it.
Model Training and Testing: Ideally, an MLOps framework should include an automated process for training and testing machine learning models, including the ability to manage different versions of models, track performance metrics, and automate parameter tuning.
Continuous Integration and Deployment: MLOps emphasises the importance of continuous integration and deployment (CI/CD) for machine learning models. This means automating the process of building, testing, and deploying models, as well as tracking performance and managing errors. This enables date and time stamps to be captured at each stage of the model lifecycle.
Governance and Compliance: MLOps also addresses the need for governance and compliance in machine learning, ensuring that models adhere to ethical and legal standards, and are properly audited and secured. This in short ensures that organisations who adopt AI can demonstrate the who, what, why, when and how of their model lifecycle management processes.
Model Monitoring: To ensure that machine learning models remain accurate and effective over time, an MLOps framework should include tools for monitoring models in production, detecting issues such as model drift and bias, and triggering retraining when necessary.
Explainability and Transparency: An MLOps approach stresses the importance of model explainability and transparency to build trust with customers and stakeholders. This includes tools and processes for interpreting machine learning models, generating explanations, and providing evidence for decision-making.
Indeed, the axis of data governance and MLOps curiously blend in terms of technical foundations for a fully auditable framework that has the capacity to operate in regulated environments at scale. The image below visually illustrates how these domains overlap one another to create an integrated tooling landscape.
How do these things stitch together an MLOps approach?
An MLOps framework typically consists of several stages that are necessary for building, deploying, and managing machine learning models in production. I will cover a breakdown of the various stages and how compliance with ethics and bias requirements can be demonstrated:
Data Management: The first stage in an MLOps framework is data management, which involves collecting, cleaning, and preprocessing data for use in machine learning models. To demonstrate compliance with ethics and bias requirements, organisations can use data engineering techniques to detect and correct any biases in the data, such as oversampling or under sampling techniques to balance classes or demographic groups. Additionally, data labelling and annotation can be done in a transparent manner, so that the labelling process can be audited and reviewed by internal governance functions and external regulators as needed.
Model Training and Testing: The next stage in an MLOps framework is model training and testing, which involves developing machine learning models based on the preprocessed data. To demonstrate compliance with ethics and bias requirements, organisations can use techniques such as fairness metrics and sensitivity analysis to detect and correct for any biases in the models. Additionally, model testing can be done using representative and diverse datasets, and performance metrics can be tracked and reported using trend analysis over time through dashboards. This ensures data scientists can be informed about the performance of their models and the decisions taken to build them in a transparent manner.
Continuous Integration and Deployment: The third stage in an MLOps framework is continuous integration and deployment, which involves automating the process of building, testing, and deploying models in production. To demonstrate compliance with ethics and bias requirements, organisations can implement version control and testing frameworks that track changes and performance metrics over time, providing evidence of the model's performance and accuracy.
Model Monitoring: This involves monitoring machine learning models at each stage of the delivery lifecycle and in production to ensure that they remain accurate and effective over time. To demonstrate compliance with ethics and bias requirements, organisations can use techniques such as fairness metrics and performance monitoring to detect and correct any biases that may arise over time.
Explainability and Transparency: The fifth and perhaps most challenging stage in an MLOps framework is explainability and transparency. This involves providing explanations for the decisions made by machine learning models. To demonstrate compliance with ethics and bias requirements, organisations can use techniques such as feature importance analysis and generating explanations for individual predictions, providing stakeholders with insights into how decisions are being made and what factors are being considered to generate the outcomes and results in AI enabled solutions.
Governance and Compliance: Governance and compliance are both seeded all the way through an MLOps framework. I wrote previously about the need for organisations to have an AI ethics policy in place. Indeed, having this in paper format is one thing. However, having it codified with quality control and approval gates at each stage of the model lifecycle is quite another undertaking. This is where automated testing, quality control metrics and compliance as code all come together to demonstrate that ethical and legal considerations, such as data privacy, bias, and explainability are being adhered to. Additionally, regular auditing and reporting can be done to provide evidence of compliance and accountability.
So far, I have referenced transparency and traceability on an all to regular occurrence. This is because they are very much the fulcrum of ensuring AI is applied and governed in an ethical manner. One such component that can help organisations apply a transparent and traceable approach to the implementation of AI in their business is by seeding a Feature Store in their MLOps pipeline.
Ensure Your Data Scientists Features Are in Order
A feature store is a centralised repository for storing and managing the features used in machine learning models. Features are the variables or attributes that are used to train a machine learning model, such as age, gender, and location. A feature store is designed to improve the efficiency and accuracy of machine learning development by providing a scalable, versioned, and shareable repository for features. This also ensures that data science teams can keep an auditable catalogue of the features used to train their models in the likely event that they are asked to demonstrate governance and implementation of ethical policies and controls.
A feature store allows data scientists and engineers to easily access, explore, and manage the features used in their machine learning models. It enables them to collaborate and share features across teams, reducing duplication of effort and improving the consistency and quality of feature engineering.
In addition, a feature store provides versioning and lineage tracking for features, which enables data scientists to track the changes made to features over time and trace the impact of these changes on the performance of the model. This can help data scientists to identify and address issues such as feature drift and data quality problems as well as source the potential generation of bias in customer facing AI enabled services.
When combined with an optimal model monitoring approach, a feature store provides a crucial facet in providing an auditable back catalogue of a models life cycle from cradle to grave. Let’s unpack some of the best practice considerations for optimal model monitoring.
Get Your Organisations Version Control and Lineage Sorted
Version control and lineage tracking are critical components of an MLOps pipeline, as they enable organisations to track the development and evolution of machine learning models over time. Equally so, it is crucial for organisations to demonstrate the ability to “replay” a model, its associated training data, features and development artefacts to illustrate their ethics and governance controls in full flight. Establishing a best practice version control strategy is therefore an essential part of an MLOps pipeline. Here are some best practices to consider in your organisations journey:
Choose the Right Version Control System: Choose a version control system that is well-suited for machine learning workflows, such as Git or Git-based systems such as GitHub or GitLab. Use branching and merging strategies to manage the development and evolution of the machine learning models over time.
Track Model Artefacts: Track the artefacts used to build and deploy the machine learning models, such as the code, the data, and the configuration files. Use version control to track changes to these artefacts over time and ensure that they can be easily reproduced and replicated.
Use Metadata to Track Model Lineage: Use metadata to track the lineage of the machine learning models, documenting the data used to train the model, the algorithms used, and the hyper parameters and other configuration details. This information can be used to reproduce the model and debug issues that may arise. Even more so challenges from external parties, bodies and customers in respect of the potential bias of your AI powered products and services.
Use a Model Registry: Use a model registry to store and track the models produced by the pipeline, including the version history, the model Version control and lineage tracking are critical components of an MLOps pipeline, as they enable organisations to track the development and evolution of machine learning models over time.
A* Grade Model Monitoring Considerations
Model monitoring is a critical component of an MLOps pipeline, as it enables organisations to ensure that their machine learning models remain accurate and effective over time. Here are some things to consider in an MLOps pipeline to put in place the A* Grade Model Monitoring:
Define Metrics and Thresholds: Define the key performance metrics that will be used to monitor the model, such as accuracy, precision, recall, and F1 Score*. Establish thresholds for these metrics, indicating the acceptable level of performance for each. (*F1 score is a commonly used metric in machine learning that measures the balance between precision and recall. Precision is the proportion of true positive predictions among all positive predictions made by a classifier, while recall is the proportion of true positive predictions among all actual positive instances in the dataset. In short, F1 score is a machine learning evaluation metric that measures a model's accuracy.)
Implement Automated Testing: Implement automated testing to continuously monitor the model's performance and detect any issues that may arise. Use techniques such as A/B testing and statistical analysis to compare the performance of the model against different datasets and evaluate the impact of any changes to the model.
Monitor for Model Drift: Monitor for model drift, which occurs when the distribution of the input data changes over time, causing the model to become less accurate. Use techniques such as data profiling and statistical analysis to detect and correct for model drift.
Detect and Address Bias: Monitor for bias in the model, which occurs when the model produces inaccurate or unfair results for certain groups of people. Use techniques such as fairness metrics and sensitivity analysis to detect and correct for bias in the model.
Track Model Versioning and Lineage: Track the versioning and lineage of the model, documenting the changes made to the model over time and the impact of these changes on the model's performance.
Implement Alerting and Notification: Implement alerting and notification mechanisms to ensure that stakeholders are notified in real-time when issues are detected with the model. This can include automated email or text message alerts, or integration with monitoring dashboards.
Ensure Your Models Don’t Drift into an Ocean of Non-Compliance
Model drift is a critical issue in machine learning, and can occur when the input data distribution changes over time, causing the model to become less accurate or effective. This can lead to cataclysmic issues in respect of model outputs, behaviours and actions if not sufficiently governed. Indeed, having an MLOps framework in place enables organisations to swiftly deploy model updates in a graceful manner. All whilst capturing auditable events that can demonstrate a corrective course of action has been administered to prevent further customer/business impact.
Monitoring for model drift requires tracking several key metrics that can help detect changes in the data distribution. Embedding these checks and balances through automated testing and validation is an additional safeguard to ensure your data science teams remain on the right side of your ethics and compliance controls. Typically, your organisation should consider covering the following tests to prevent model drift.
Distribution of Input Data: Monitor the distribution of input data over time, tracking changes in the mean, variance, and other statistical measures. Use techniques such as data profiling and statistical analysis to identify changes in the data distribution.
Model Performance Metrics: Monitor the performance metrics of the machine learning model over time, tracking changes in accuracy, precision, recall, and F1 score. Compare the performance metrics against the metrics from the training data to identify potential drift.
Feature Importance: Monitor the importance of features in the machine learning model over time, tracking changes in the relative importance of different features. Use techniques such as permutation importance to identify changes in feature importance.
Error Analysis: Conduct error analysis to identify the types of errors made by the machine learning model over time. This can help to identify changes in the data distribution and the impact of these changes on the model's performance.
Outlier Detection: Monitor for outliers in the input data over time, tracking changes in the frequency and severity of outliers. Use techniques such as clustering and oPs anomaly detection to identify outliers.
Evidently, getting these tests in place is a fine balancing act and should be applied on a risk based approach. Organisations can quite easily fall into a one size fits all approach to model drift management and this ultimately filters into “enterprise heavy” MLOps frameworks. We saw a similar pattern with industry obsessing over a single CI-CD pipeline as part of the DevOps movement several years ago. Plugging into ServiceNow and ultimately adding significant bloat to a process that is wedded to speed and control in unison.
I try to demonstrate this in the image below which infers that the greater the risk of a model causing impact or harm to customers, an organisation, or the wider the market, then a greater level of testing and validation is required to align with internal ethics and compliance standards.
MLOps Guardrails & Controls
“One pipeline to rule them all” isn’t the answer and organisations need to seek ways in which they can federate trust into their data science and data product teams so that they are empowered to choose the right tools for their use case. All whilst aligning with centrally defined ethics and governance controls. This is an obsession that initially manifested from the DevOps movement where central architecture teams defined toolchains with very little appreciation of the developer experience. In that instance, I would always testify for setting your organisations risk appetite when it comes to ML/AI and then breakdown the end to end workflow for the process. Detailing the controls required based on use case, data classification and potential harm an AI enabled product might cause to users, customers or the wider market an organisation operates in.
From there you can define the requisite guardrails and outline to teams the evidence they must provide to demonstrate alignment with the organisations AI control framework using tooling and automation as much as possible.
In Summary
In closing, building a sound MLOps framework is crucial for organisations looking to adopt machine learning and AI technologies while also ensuring alignment with ethics and governance frameworks. By incorporating best practices such as version control, model monitoring, and compliance gates into the MLOps pipeline, organisations can build accurate, reliable, and transparent machine learning models that adhere to ethical and legal standards.
In addition, it is essential for organisations to continuously monitor and improve their MLOps framework as machine learning models evolve and new technologies and techniques become available. In doing so, they can improve the efficiency and effectiveness of their machine learning models while also building trust with customers, internal stakeholders and external regulators.
Ultimately, a sound MLOps framework is critical for organisations looking to stay ahead in the rapidly evolving field of machine learning and AI, while also ensuring that their models are trustworthy, ethical, and aligned with their broader governance frameworks. In doing so, they will be able to move fast, without breaking transparency.