Transformation Directorate

Mauro Data Mapper

Owner

Mauro Data Mapper is owned by Oxford University, NHS Digital and The Meta Foundation. It is licensed under Apache Licence 2.0, which allows for free use of the software for modification, distribution, patent use, private use and commercial use.

We built the project and then open sourced it. Our partners were firmly insisting we shouldn’t make it open source unless we had a good reason to. Once we found the reasons and decided that actually adoption was more important than anything else, we made it open source.

James Welch

Background

Different clinical trials often ask similar questions in different ways. The question, ‘How many cigarettes a day do you smoke?’ is fundamentally similar to, ‘How many packets per month do you smoke?’ but the data they collect isn’t quickly or easily compatible. This kind of inconsistent duplication creates wasted time and effort, with missed opportunities for data re-use, meta-analysis and scientific study.

Situation

Mapping data that an organisation holds and documenting it more thoroughly means that less requests for information need to be made from the public. Being consistent with both collection and documentation of information is an important part of a more collaborative, interoperable NHS.

Aspiration

  • Support data management and re-use across the NHS
  • Promote transparency, increase reproducibility, and drive improvements in data quality and patient care.

Solution and impact

The Mauro Data Mapper started as a tool for modelling clinical trials: helping researchers use (and re-use) the best questions for new studies. It has evolved into a collaborative web-based tool for documenting and publicising research datasets of any kind, capable of dealing with data dictionaries, assets, flows, requests and standards. Over time this development has allowed the team to document increasingly large datasets, including the 100,000 Genomes Project.

James adds,

What Mauro does is to document data, whether that is data you already have or data that you would like to collect or transfer. It’s not about the storage of data, it’s about documenting.

Mauro provides maps of data assets, and allows users to describe intended usage

and interpretation, working with existing terminology servers rather than replacing them. It is primarily intended for use by data managers within the NHS, but can be used by anyone looking to understand and make use of NHS data.

A community of users and contributors has since been established, accelerating

development of Mauro. Being open source makes it much easier for people to share the tool, contribute to its development and adapt it to meet their data design and delivery needs. “Working with NHS Digital, we’ve been expanding our community, and other people have been making substantial contributions – this simply wouldn’t be happening if we had kept the source closed.” said James. “Seven or eight organisations have installed a copy or are contributing code. We've got a lot more people who are using instances that other people are hosting. NHS Digital, Edinburgh and Swansea have been contributing code. We've got about 90 people signed into that at the moment.” said James.

More and more of the tool’s ongoing maintenance and feature evolution are developed by Mauro’s community. Contributions are currently overseen and managed by a gatekeeper process as part of the project’s governance, to ensure that the ‘conceptual integrity’ of the core components is maintained and coordinated. James says, “a core team in Oxford of about five or six developers, take responsibility for and have ownership of the repositories, reviewing all the code as it comes in. We moderate bug and feature requests and that kind of thing.”

Functionality

Mauro enables the definition and application of data standards, data models, specifications and codesets based upon terminologies and ontologies. It is currently used for the management and documentation of data assets in a range of organisations, including the NHS national data dictionary. Mauro:

  • is an open source tool that can be run within an organisation or in the cloud
  • allows people to record information about a data set’s provenance, utility and relation to data standards and terminologies
  • can connect to a wide variety of database and data modelling systems, including Microsoft SQLServer and Oracle
  • creates and shares models for sharing data between different healthcare and health research organisations
  • is updated regularly, with quarterly releases that are quality assured,

tested, and validated with the community

Capabilities

  • Provides an open platform for building, adapting and maintaining data management tools for data sets, standards, and questionnaires.
  • Helps users document data assets using a combination of collaborative annotation, automated description, and import of terminologies.
  • Provides a variety of ways for users and developers to extend the functionality to meet their usage and interoperability needs

Scope

  • Promotes re-use of data models across health and social care,
  • Promotes the definition of new “ground-up” data standards and specifications
  • Integrates with Trusted Research Environments to federate documentation and data between organisations
  • Uses standard libraries and frameworks so almost any software supplier or consultancy can support it

Key learning points

  • Interoperability, scalability, and reproducibility are complex and challenging problems.
  • Mauro’s broad capability means it can be a challenge to work out where it should be used for specific projects, but its open nature also offers opportunities for new purposes.
  • There is a perception that open sourcing a project means that tools suffer from no means of service or support, and even where this is clearly not the case, that perception can affect decisions regarding investment and procurement
  • Even with a strong community, project sustainability remains a concern. Some kind of support will be necessary to ensure that the software will last for a long time, be continuously developed and that support will continue into the future.

Digital equalities

  • Ensuring that data is properly described and linked makes it more likely that under-represented groups will be properly counted in clinical trials and broader research.

Give us feedback

Open Source Digital Playbook feedback survey

Page last updated: September 2022