Data science: machine learning

How National Highways information is to be used in the building of machine learning algorithms.

Machine learning means building mathematical models to predict an outcome, using techniques drawn from computer science and statistics.

The potential benefits of machine learning must be balanced against any risk to National Highways information or losing the trust of the public.

Requirement

Any machine learning done using our information should reflect our corporate values and meet our information ethics requirement.

Specification

How the requirement is implemented will depend largely on how you manage your data science value chain.

That is, how you move from the identification of a problem statement to the delivery of a productionised machine-learning solution.

However, certain considerations must be met:

Ethical

You should have an ethics checklist for each stage of the data science value chain that aligns to a typical data science value chain.

Auditability

This includes:

  • approvals and release notes
  • relevant defect reports during development
  • descriptive statistics and suitability commentary for the datasets used
  • retention of retired models, for a suitable period of time, to allow retrospective decision analysis

Interoperability and portability

Wherever possible machine learning content should be vendor agnostic – to avoid vendor 'lock-in'.

Further guidance on how to achieve interoperability and portability:

Other considerations

Human assessment

Assess the impact of incorrect predictions and where reasonable, design systems with human-in-the-loop review processes.

Bias evaluation

Continuously develop processes that allow us to understand, document and monitor bias in development and production.

Explainability versus 'black box' techniques

Where reasonable, develop tools and processes to continuously improve transparency and explainability of machine learning systems.

Further guidance:

Reproducible operations

Develop the infrastructure to allow for a reasonable level of reproducibility of the model across different types of machine learning systems.

Displacement strategy

If the machine learning output has the potential to change the nature of, or the amount of, work for human operators, this will be called out in order to determine if business change processes can be developed to mitigate the impact of workers being automated.

Practical accuracy

Accuracy metrics should take the information lifecycle of the dataset into account.

Privacy

If personally identifiable information is used, the techniques used should allow for privacy by design principles.

For example:

Data risk management

Develop and improve reasonable capabilities to ensure data and model security are considered during the development of machine learning systems.

Auditability and change control

When models need to be changed, follow change-control processes to include:

  • storage of previously used models
  • governance records (for example person authorising changes, date of change and so on)
  • model deployment documentation (for example impact assessment, release notes and roll-back analysis)

Tuning

Continuously tune models so that the outputs are in line with what's expected from the model.

 

Feedback