Data science: machine learning

How National Highways information is to be used in the building of machine learning algorithms.

Machine learning means building mathematical models to predict an outcome, using techniques drawn from computer science and statistics.

The potential benefits of machine learning must be balanced against any risk to National Highways information or losing the trust of the public.

Requirement

Any machine learning done using National Highways information, either by National Highways or a National Highways Supplier, must reflect National Highways corporate values and meet National Highways information ethics requirement.

Specification

How the requirement is implemented will depend largely on how a Supplier manages their data science value chain.

That is, how a Supplier moves from the identification of a problem statement to the delivery of a productionised machine-learning solution.

However, certain conditions must be met:

Ethical

Having an ethics checklist for each stage of the data science value chain that aligns to National Highways ethical values.

Auditability

This includes:

approvals and release notes
relevant defect reports during development
descriptive statistics and suitability commentary for the datasets used
retention of retired models, for a suitable period, to allow retrospective decision analysis

Interoperability and portability

Wherever possible machine learning content must be vendor agnostic – to avoid vendor 'lock-in'.

Further guidance on how to achieve interoperability and portability:

Other considerations

Human assessment

Assess the impact of incorrect predictions and where reasonable, design systems with human-in-the-loop review processes.

Bias evaluation

Continuously develop processes that allow National Highways to understand, document and monitor bias in development and production.

Explainability versus 'black box' techniques

Where possible, develop tools and processes to continuously improve transparency and explainability of machine learning systems.

Further guidance:

Reproducible operations

Develop the infrastructure to allow for a level of reproducibility of the model across different types of machine learning systems.

Displacement strategy

If the machine learning output has the potential to change the nature of, or the amount of, work for human operators, this will be called out to determine if business change processes can be developed to mitigate the impact of workers being automated.

Practical accuracy

Accuracy metrics take the information lifecycle of the dataset into account.

Privacy

If personally identifiable information is used, the techniques use allows for privacy by design principles.

For example differential privacy, homomorphic encryption, the ability for the model to allow for the withdrawal of consent for processing from individuals.

Data risk management

Develop and improve reasonable capabilities to ensure data and model security are incorporated during the development of machine learning systems.

Auditability and change control

When models need to be changed, follow change-control processes to include:

storage of previously used models
governance records (for example person authorising changes, date of change and so on)
model deployment documentation (for example impact assessment, release notes and roll-back analysis)

Tuning

Continuously tune models so that the outputs are in line with what's expected from the model.