How do data institutions emerge? Six short case studies
This post was first published by the Open Data Institute on March 23, 2021 on its blog here.
In practice, data institutions steward data in different ways, including:
- Protecting sensitive data and granting access under restricted conditions.
- Combining or linking data from multiple sources, and providing insights and other services back to those that have contributed data.
- Creating open datasets that anyone can access, use and share to further a particular mission or cause.
- Acting as a gatekeeper for data held by other organisations.
- Developing and maintaining identifiers, standards and other infrastructure for a sector or field, such as by registering identifiers or publishing open standards.
- Enabling people to take a more active role in stewarding data about themselves and their communities.
Although the term is relatively new, many data institutions exist across the private, public and third sectors. Organisations like national mapping agencies, statistics agencies and archives are perhaps our oldest and most well-known data institutions – in some cases they’ve played these roles on behalf of the public for hundreds of years.
Data institutions come in many shapes and sizes, and each will have had a unique journey and faced different challenges along the way. Some, like Open Banking Ltd, have been brought about by government regulation and funded by industry. Others, like Salus Coop, are small data institutions still trying to build a critical mass of users and attract investment.
Here we document the genesis of a variety of data institutions, which we’ve chosen to illustrate the different paths they can take and some of the different ways in which they are funded. We hope that by telling these origin stories, we will increase understanding of how data institutions emerge, and inspire people to think about starting their own data institutions in the future.
UK Biobank is a registered charity founded in 2006. It aims to improve the prevention, diagnosis and treatment of a wide range of serious and life-threatening illnesses.
From across the UK, 500,000 people aged between 40 and 69 years old underwent testing, provided blood, urine and saliva samples for future analysis, gave detailed information about themselves, and agreed to have their health monitored to create the database that UK Biobank stewards.
Since 2012, researchers have been able to apply to use the UK Biobank database. Studies that use the database typically compare a sample of participants who developed a particular disease with a sample who did not, in order to measure the risks and benefits of certain interactions with genes, lifestyles and medications.
UK Biobank is funded primarily by the Medical Research Council and the Wellcome Trust, with a total funding amount of £244.3m over its lifetime. It supplements this long-term investment with project funding to deliver specific enhancements to the database and uses a cost-recovery model to charge users for access to the data, with reduced fees for student projects and those from low- and middle-income countries (LMICs).
UK Biobank’s evolution demonstrates the value in providing significant, long-term investment from government and philanthropy. According to UK Biobank, over 670 international research groups have accessed health records during the pandemic alone, leading to more than 60 scientific papers being published in the public domain.
HiLo Maritime Risk Management
HiLo is a startup that supports the sharing of safety and accident data in the maritime sector.
Prior to the formation of HiLo, shipping companies were only able to draw on their own operational data to try to reduce the frequency of accidents on their ships. By pooling data from across these companies and analysing the aggregated dataset, HiLo is able to provide meaningful and actionable benchmarks, and other insights, to its member companies. The more organisations that share data, the better and more reliable the insights they receive.
Around 55 companies are members committed to sharing data. The data is shared by these contributors directly to HiLo once a month, a manual process via the HiLo portal or automatically via an API. The data is then processed and analysed using HiLo’s risk model algorithm. Financially, HiLo receives subscription fees for its insight service, but hopes to diversify its revenue streams to include a more balanced mix combining subscriptions with revenue from new products and services generated by the data it brings together.
HiLo has helped to save lives and money – so far, it has reduced lifeboat accidents by 72%, engine room fires by 65% and bunker spills by 25%. It shows how an important, impactful data institution can be built from small beginnings as an industry project.
Open Banking Ltd.
In 2016, the UK’s Competition and Markets Authority (CMA) established Open Banking Ltd, to deliver on its vision of helping people save, borrow, lend and invest money securely while improving efficiency, increasing competition, and stimulating innovation within the sector.
Open Banking Ltd is a private company, governed by the CMA and funded by the UK’s nine largest banks and building societies. It is responsible for developing and maintaining the data infrastructure – including technical standards and guidelines – that enables data to flow between the banks and other organisations. It makes it easy and safe for individuals and small to medium sized enterprises (SMEs) to share the financial information held by their banks with third-party services. As well as funding Open Banking Ltd, the banks are also mandated to comply with its guidelines.
The data infrastructure maintained by Open Banking Ltd has supported the industry to build useful new applications for customers. One example is Account Information Service Providers (AISPs), such as an app from fintech startup Bud. This app was designed to allow customers (who opt in via their bank’s platform) to see a dashboard showing the status of multiple accounts from different bank providers, and manage their finances more efficiently. Other services using Open Banking data help small businesses access loans at better rates, or recommend accounts based on spending patterns.
The UK Office for National Statistics
The Office for National Statistics (ONS) is the UK’s largest producer of official statistics, responsible for collecting and publishing data related to the economy, population and society.
The ONS was formed in 1996 following the merger of the Central Statistical Office (CSO) and the Office of Population Censuses and Surveys (OPCS). It functions as the executive office of the UK Statistics Authority, that reports directly to parliament following the Statistics and Registration Service Act 2007.
Examples of the data collected and used by the ONS include information from the decennial population census (a census taken every 10 years), data from businesses, government departments, public sector bodies, such as the NHS, and the registers of births, marriages and deaths. This data is often central to debates about allocation of national resources and economic decisions. It is used to measure changes in society such as migration, employment or illness rates and longevity. ONS data is also employed in epidemiologic studies – for example it has published extensively about the coronavirus pandemic, tracking everything from infection and death rates to social and economic impacts.
The ONS produces and publishes a wide range of data. It publishes its own statistics and reports, and also makes published data available for other users via the Secure Research Service (SRS). The SRS makes anonymised, unpublished data available to accredited researchers engaging in research projects for the public good. The ONS also has a Data Science Campus, which oversees a series of data projects that provide insight into key policy themes, and a joint team with the Foreign, Commonwealth and Development Office, which aims to use data for global public good.
The ONS example highlights the fact that some of our most important data institutions are publicly owned and governed. In thinking about the need for new data institutions we should not forget the role of the state as a steward of data on behalf of others.
Launched in 2004, OpenStreetMap is a collaborative project that aims to create a free, editable map of the world.
OpenStreetMap was started in the UK and now has over two million registered volunteers worldwide, with over 5,000 users a day, making it the world’s largest crowdsourced open database. These independent volunteers collect geospatial data from scratch, performing ground surveys with tools such as handheld GPS devices, cameras, and notebooks. The map and database they collectively build is made freely available under an open licence.
As the project has grown, commercial and government organisations – including Yahoo! and the Federal Government of the United States – have made the data they hold available for manual editing and automated imports via OpenStreetMap. OpenStreetMap data is widely used, including by high-profile organisations such as Facebook, Apple, Microsoft, Amazon and Uber, both as a map and as a data source for visualisation, research and analysis.
The project is supported by the OpenStreetMap Foundation, a not-for-profit company registered in England and Wales. The foundation has members from around the world and an elected board of directors, and is both the legal entity for OpenStreetMap and the custodian for the computer services and servers which host it. It also provides a vehicle for fundraising and donations to support the project, organises an annual conference, and supports various working groups looking after communications, licensing and other functions.
OpenStreetMap is a fantastic example of how a community can come together to create a data institution, and how a data institution can grow internationally, by collaboratively sharing and maintaining data.
Salus Coop is a non-profit citizen data cooperative for health data, founded in Barcelona in September 2017. It set out to create a citizen-driven model of collaborative governance for and management of health data ‘to legitimise citizens’ rights to control their own health records, while facilitating data sharing to accelerate research innovation in healthcare’.
Salus Coop has developed a ‘common good data licence for health research’, which applies to the data that members donate, and specifies the conditions that any research projects seeking to use the member data must adhere to. These include only using data for research on a non-commercial basis, making all results accessible at no cost, anonymising data before use, and allowing members the rights to cancel or change the conditions of access to the data about them at any time.
Salus Coop has been supported by the Mobile World Capital Barcelona Foundation and Ideas for Change, and also invites health authorities to collaborate with its efforts to support people to participate by sharing data about themselves. In 2020 it started the CO3 (Cooperative COVID Cohort) project to convene a group of citizen data donors and create a new resource to use for research into Covid-19.
Salus Coop is an example of a new breed of data institution emerging – sometimes referred to as ‘data cooperatives’, ‘data unions’ and ‘bottom-up data trusts’ – that enable people to take a more active role in stewarding data about themselves and their communities. Many of these efforts are still trying to reach a critical mass of users and become sustainable.
- Alex Vryzakis is ODI Associate-Communication while Jack Hardinges is the Programme Lead for Data Institutions at ODI. Find out more about the Open Data Institutes ongoing work on data institutions here.