Artificial intelligence and addressing health data policy challenges with federated learning

Intel_Policy · ‎12-20-2021

Mario Romao, Global Director of Health Policy & Prashant Shah, Global Head of Artificial Intelligence for Health and Life Sciences

Artificial Intelligence (AI) continues at the top of policymaker’s agenda. As the European Union discusses the AI Act - a legislative proposal to regulate AI - considerations on data governance go hand in hand. The European Health Data Space is an initiative spearheaded by the European Commission in collaboration with the Member States, aimed at promoting better exchange and access to different types of health data (electronic health records, genomics data, data from patient registries etc.), not only to support healthcare delivery (so-called primary use of data) but also for health research and health policy making purposes (so-called secondary use of data). Data governance, rules for data exchange, data quality, infrastructure and interoperability are the main pillars of the European Health Data Space.

It was under this policy setting that we had the opportunity to participate in the Intel organized “all.ai Virtual Summit 2021” and in the "Health Data Summit Riga 2021: Digital Leap For Improved Patient Care”. The “all.ai Virtual Summit”, looked at creating awareness, adoption, and acceleration of AI across diverse applications, including healthcare. The “Health Data Summit” addressed the importance of technology and access to quality data to reshape the relationship between patients, healthcare providers and the healthcare system. Both events addressed the synergies between AI and health data.

As AI solutions are becoming increasingly prominent, especially in the healthcare sector, the ability to effectively access, share and protect data is fundamental for the successful implementation of AI processes. Deep learning, a specific AI technique, shows promise in aiding medical diagnosis and treatment but require very large amounts of diverse data to be trained and be broadly effective. The current paradigm for multi-institutional collaborations in the medical domain requires the collaborating institutions to share patient data to a centralized location for model training.

Distinct repositories exist for various medical fields, e.g., radiology, pathology, and genomics. We refer to this approach as collaborative data sharing (CDS). However, CDS does not scale well to large numbers of collaborators, especially in international configurations, due to privacy, technical, and data ownership concerns. Consequently, knowledge from diverse populations remains distributed across multiple institutions. Limited availability of data used for training can cause AI models to be biased which can cause harm. Models can also suffer from accuracy issues when deployed in the real world, especially in settings where the data being analyzed is different than the data that was used to train the model. This raises a need to seek alternative approaches.

Federated learning is a novel paradigm for data-private collaborative learning where multiple collaborators train a machine learning model at the same time (i.e., each on their own data, in parallel) and then send their model updates to a central server to be aggregated into a consensus model. The aggregation server then sends the consensus model to all collaborating institutions for use and/or further training.

A great example of federated learning at work is the Federated Tumor Segmentation (FeTS) initiative. It is the largest international federation of healthcare institutions aiming at gaining knowledge for tumor boundary detection across diverse patient populations without sharing any patient data.

The future European Health Data Space and similar health data initiatives around the world should take note of federated learning. As frameworks for data governance are being debated, federated learning has shown that it can achieve the full learning capacity of the data while obviating the need to share patient data, hence facilitate large-scale multi-institutional collaborations. As a privacy preserving technology, federated learning can encourage greater collaboration and drive greater trust on AI and data re-use.