Data science: machine learning
How National Highways information is to be used in the building of machine learning algorithms.
Machine learning means building mathematical models to predict an outcome, using techniques drawn from computer science and statistics.
The potential benefits of machine learning must be balanced against any risk to National Highways information or losing the trust of the public.
Any machine learning done using our information should reflect our corporate values and meet our information ethics requirement.
How the requirement is implemented will depend largely on how you manage your data science value chain.
That is, how you move from the identification of a problem statement to the delivery of a productionised machine-learning solution.
However, certain considerations must be met:
You should have an ethics checklist for each stage of the data science value chain that aligns to a typical data science value chain.
- approvals and release notes
- relevant defect reports during development
- descriptive statistics and suitability commentary for the datasets used
- retention of retired models, for a suitable period of time, to allow retrospective decision analysis
Interoperability and portability
Wherever possible machine learning content should be vendor agnostic – to avoid vendor 'lock-in'.
Further guidance on how to achieve interoperability and portability:
Assess the impact of incorrect predictions and where reasonable, design systems with human-in-the-loop review processes.
Continuously develop processes that allow us to understand, document and monitor bias in development and production.
Explainability versus 'black box' techniques
Where reasonable, develop tools and processes to continuously improve transparency and explainability of machine learning systems.
- Partnership on AI - about machine learning
- Cornell University - explainable machine learning in development
Develop the infrastructure to allow for a reasonable level of reproducibility of the model across different types of machine learning systems.
If the machine learning output has the potential to change the nature of, or the amount of, work for human operators, this will be called out in order to determine if business change processes can be developed to mitigate the impact of workers being automated.
Accuracy metrics should take the information lifecycle of the dataset into account.
If personally identifiable information is used, the techniques used should allow for privacy by design principles.
- differential privacy
- homomorphic encryption - the ability for the model to allow for the withdrawal of consent for processing from individuals).
Data risk management
Develop and improve reasonable capabilities to ensure data and model security are considered during the development of machine learning systems.
Auditability and change control
When models need to be changed, follow change-control processes to include:
- storage of previously used models
- governance records (for example person authorising changes, date of change and so on)
- model deployment documentation (for example impact assessment, release notes and roll-back analysis)
Continuously tune models so that the outputs are in line with what's expected from the model.