Voices: We need more data collection – and less secrecy

Voices: We need more data collection – and less secrecy

It is a decade since Sir Tim Berners-Lee, my colleague and co-founder at the Open Data Institute, opened the London Olympics, saying of his invention – the World Wide Web – “This is for everyone.” In the intervening 10 years, there has been a quiet, largely unnoticed series of transformations based on the same principle.

We now take for granted a wealth of apps that tell us how to drive somewhere or travel there by public transport. We take for granted the integration of our banking data from one financial institution into the systems of another. We accept as normal the routine publication of health data – everything from global Covid-19 cases to local ambulance waiting times.

The common driver behind these advances is a new Enlightenment, of which open data is the foundation. A recognition that key information should not be monopolised and controlled by those who collect it, but placed in the public domain for wider societal and economic benefit.

Once made available, it can be harnessed by innovators, scrutinised by journalists, and presented and used in a multitude of ways. Just as new knowledge and the scientific method challenged the established authority of the Church and the monarchy three centuries ago, open data today is expanding our understanding, driving innovation, and holding institutions to account.

Whether we’re choosing between Google Maps and Citymapper as our preferred source of routeing information, or deciding between the BBC, Sky News and GB News as a source for data on Covid-19, we – the consumers and citizens – are empowered by a potent combination of public data and private enterprise.

Indeed, Transport for London’s provision of open data to more than 13,000 developers was shown by Deloitte as far back as 2017 to have boosted the city’s economy by up to £130m a year. The figure is now likely to far exceed that.

What is yet to come is the serious growth, and incursion into our everyday lives, of insights derived from machine learning. Instead of analysing data according to rules programmed by humans, these systems use algorithms and data to uncover for themselves patterns in everything from travel preferences to energy use, flu outbreaks to flood risks.

With these developments comes an enormous challenge: to encourage private data collectors to apply, and maintain, the same principle of openness that has become commonplace in the public domain. No one company should presume to have a monopoly on the data it gathers about us for its own ends. Companies may steward data, but they should not assume that they automatically own it. This may be a challenging concept for some companies to accept, but without it, the opportunities – and the benefits to people, communities and the companies themselves – will not be fully realised.

Where personal data is involved, those who collect data must also ensure they retain the confidence of the public by making certain that individual identities are never disclosed. Curiously, the key to retaining trust and high levels of individual privacy is not secrecy, but transparency about how data is aggregated, anonymised and encrypted. Only by knowing what data collectors are doing can we be confident that they are fulfilling their obligations.

The issue will come into especially sharp focus as machines play an increasing part in the diagnosis and treatment of disease. The Royal College of Surgeons of England foresees platforms that merge patient data, population data and medical knowledge in order to predict individuals’ propensity to cancers and analyse the efficacy of different treatments for different groups of patients.

Used well, these rich datasets have the potential to play a vital part in providing sustainable health and wellbeing services, by offering better care, earlier and at lower cost. Yet transparency about the data that medtech companies collect – how it is processed securely, and what conclusions are drawn from it – will be crucial.

To keep up to speed with all the latest opinions and comment, sign up to our free weekly Voices Dispatches newsletter by clicking here

Meanwhile, although the coming decade will see machines learn more about us, it will also see a new tide of human learning. Much as machine tools displaced handcraft in the first industrial revolution, AI will now take on the tasks of sifting, sorting and correction – everything from picking and packing in warehouses to proofreading in publishing houses.

As the labour market adjusts to reflect that reality, there are very significant opportunities for people to harness AI – previously seen as a threat – to gain new skills for themselves. Data will drive AI to understand how different adults learn, and to produce courses tailored to the individuals taking them.

Again, the data used to adapt programmes to students must be open to scrutiny and challenge, not least to ensure that AI does not reinforce existing human biases on grounds of race, gender or ethnicity.

The saying that “sunlight is the best disinfectant” is often heard uttered by advocates of the transformative power of open data. This last decade has seen public sector data moved into the light, not just in the UK but all around the world. The coming decade’s challenge is to ensure that as the private sector collects more data on each of us – in healthcare, in education, in transport – it also allows the sunshine to seep in.

Professor Sir Nigel Shadbolt is principal of Jesus College, Oxford. He is co-founder and chair of the Open Data Institute, which holds its tenth annual summit online from 10am on 8 November