WP6 Future data ecosystems
The flow of data associated with a research expedition, from planning to capture, processing, storage and use is well described but remains mostly manual. This results in a significant overhead on both the science party and NMF in managing the data alongside increased chance of lost data either through poorly linked metadata or a failure to follow data logging protocols. An end-to-end approach to data associated with research expeditions and/or MAS platform missions is required. It is possible to achieve this by 2030 with data management processes that incorporate quality and metadata controls and enable the transfer of data to data portals that allow access across a broad community of users.
Given the costs associated with the collection of oceanographic data, improving access through adherence to the FAIR principles must be a priority. As the volume of data collected increases, the systems used to support FAIR data must be scalable. Alongside the increase in volume, the percentage of data that is available in real-time or near real-time will increase and so the systems must be configured to support ‘live’ data streams.
ML and AI will support better use of data. However, for that to be realised improved data management and data workflow processes will be required.
The National Digital Twin Programme (NDPt) skills and competency framework outlines the critical roles needed at an organisational level to support the integration of data into a digital environment. There are a number of skill sets that are not presently well-represented across UKRI/NERC, yet will be increasingly in demand as the digital dependency of research infrastructure increases.
Research expedition and mission planning would benefit from the use of data sciences approaches. In due course, the integration of modelling, data collection, data sciences and informatics could enable ‘digital twins’ of the research domain. These digital twins, receiving real-time observations assimilated into models and providing input to machine learning algorithms will support the research expedition to revise plans and take advantage of the ability to react to observations.
Communication infrastructure, including low-earth orbit satellites, will continue to rapidly improve and provide options for continuous, high-bandwidth communication systems. Any data ecosystem will be reliant upon the hardware and therefore vulnerable to technology failure. Resilience is possible in most areas of the ecosystem, however cyber-attacks will present a continuous threat and processes/systems able to detect, protect, respond and recover from an attack will be essential.
As the marine sector adapts to increasing autonomy, enabled by increased use of smart, connected sensors, there are opportunities to engage with other marine users to share data and information more effectively. This will require an international effort to ensure data standards are the same or compatible but opens up huge opportunities for data collection.
The use of autonomy in collecting data for UK geo-intelligence remains a high priority for the Ministry of Defence. Their recently published Digital Strategy places data use and exploitation at the centre of their thinking. There are overlaps between the data ecosystem the Royal Navy will require and the data ecosystem able to support a net zero oceanographic capability. Partnering in development areas such as command and control of autonomous assets would be beneficial.
Microsoft and IBM, amongst others, are developing technology infrastructure that are moving towards net zero and would support any data ecosystem UKRI/NERC might require, with ambitions to be net zero by 2030. Engaging with technology companies that have strong net zero ambition and well-developed data ecosystem approaches may provide opportunities to develop a future NZOC.
The NZOC data ecosystem will need to be resilient to any unauthorised access or harmful intent by authorised users, not only to mitigate againstcyber attacks but also to ensure that data is appropriately access managed. Where possible, data will be freely available, but General Data Protection Regulation (GDPR), Intellectual Property Rights (IPR) and security concerns will all need to be addressed through appropriate data access management and licensing.
Develop a ‘data skills’ strategy that details how UKRI/NERC will support a plan for training the future generation of scientists and engineers/operators capable of developing, operating and using a digitally-enabled, net zero infrastructure. Promote and invest in Research Software Engineer careers across UKRI/NERC.
Ensure that all future activities that support the NZOC data ecosystem concept align with national and international best practices and follow the guidelines laid down in the NDPt framework. This will safeguard against the risk of incompatibility at a later stage.
Further develop the data flow architecture that allows data to flow from planning (MFP website) through capture, processing, storage and use with the aim of delivering FAIR data to multiple users across science, government, defence and business. Setting out the strategy for achieving this and prioritising development opportunities should be completed within 12 months.
Scope a scalable data lake architecture capable of managing data from the widest spectrum of platforms (satellites, research vessels, MAS platforms, floats, moorings etc) and commission a pilot project to develop expertise in managing the data architecture across cloud, ship and shore-based infrastructure and test the concepts.
Develop a modelling capability that can dynamically assimilate observations in a moving frame (ship-following) at a resolution relevant for observation collection decision making, using modelling tools already well established within the community, e.g. NEMO, ERSEM, NEMOVAR. Use this as a pre-cursor to developing a digital twin of the expedition region to support that specific research expedition.
Incentivise, across UKRI (including InnovateUK), collaborations on the development of novel autonomous planning and optimisation (command and control) systems to maximise the usage of MAS platforms.
Align with the ‘Greening Government: ICT and Digital Services Strategy 2020-25, as the basis for reducing the carbon footprint of an NZOC data ecosystem.