Discuss prevalent matters regarding healthcare topics
22 Discussions

Medical data anonymization with intelligent AI for the improvement of healthcare

0 0 39.3K

Author: Abhishek Khowala , Principal AI Engineer at Intel Corporation, and Agata Chudzińska, Artificial Intelligence Manager at


Clinical trials, medical research, and patient care have one thing in common: Data! Medical data is critical and informs treatment optimizations, potential cures, and allows for medical care even in distant places. From vast amounts of data, insights can be extracted that can be used to better detect, understand, and combat diseases. However, in the medical industry, health data is highly regulated and requires practices to safeguard privacy, accuracy, and provenance which can be extremely difficult to manage within a hospital ecosystem. To address such difficulties and ensure data availability, data anonymization can be utilized. Anonymization removes the Personal Identifiable Information (PII) from the data and aims at making it impossible to retrieve the identity of the patient from the data thus creating the condition where data could move out of the hospital premise, furthermore, allowing it to be used for research, development, and analytics including training AI models.

There are several sources of data in the healthcare industry, including hospital records, patient medical records, lab results, and devices that are part of the Internet of Things (IOT). There is an industry-wide effort to add genomic, behavioral, and social determinants of health (SDOH) data to build comprehensive patient records. Efforts are made to aggregate lifelong patient data dispersed in various geographic regions and medical record databases to create the longitudinal patient record. Research also generates a significant amount of Big Data relevant to public health. This data must be properly managed and analyzed to provide meaningful information.

Most of the data used and collected in the healthcare sector is what is known as personal data (also known as personally identifiable information (PII) or protected health information (PHI)), which the General Data Protection Regulation (GDPR) in the European Union and the Health Insurance Portability and Accountability Act (HIPPA) in the US define as information relating to an identified or identifiable natural person. To protect these individuals, some limits and restrictions are generally imposed on data processing. Strict rules and specifications apply in the healthcare sector in particular, as this data is especially sensitive and therefore classified as worthy of protection by the regulations. This serves primarily for the security of the collected data. Improper handling of these personal data can lead to loss of reputation, legal ramifications, and high fines.

How anonymization of data helps?

In order to be able to share health data for medical research, all personal information should be removed, and patient consent is received. Ideally anonymous data is data for which a subject can no longer be "identified." What is meant by no longer being identified, well a few things. First, it could simple mean blurring of images or videos to protect a person identity. A step further is the removal of all names, date of birth, addresses and other information that could identify a person associated to a record. This all takes a great deal of effort to do for each record. By going through the data anonymization process which is done in a variety of ways from software to manual processes, these identifiable markers are removed, generalized, or encrypted.

Data anonymization alleviates concerns around data privacy, consent requirements and data regulations and with these risks mitigated, hospitals and clinics are more comfortable sharing the data for Health AI or analytics projects.



Artificial Intelligence to support the data anonymization process

When working with unstructured medical datasets – such as medical documentation, images, videos, and medical research records – it becomes even more difficult to track personal information as compared to structured data as the traditional methods such as obfuscating the data in tabular form do not apply to the unstructured data. Artificial intelligence helps computers to “understand” unstructured data comprised of human language, images, and videos by deploying Natural Language Processing (NLP) and Computer Vision (CV) techniques. This helps to automatically detect and redact personal information in order to anonymize the data.

Working with multiple healthcare institutions, we came to a conclusion that the concerns related to patient and medical staff data privacy are the most frequent reasons of a long validation process and sometimes even not starting an innovative project that could transform healthcare.

Here is a case study of use of AI for data anonymization; a large pediatric academic hospital in southeastern United States used TheBlue.AI machine learning algorithms, accelerated by Intel distribution of OpenVINO toolkit, to anonymize patient data to allow videos and images to be processed without identifying individual members of staff or patients.

Advantages of anonymizing data in healthcare

  • Improvement of collaboration
  • Minimizing the risk of data violations and fines
  • Increasement of trust through proper handling of patient data
  • Easy and fast exchange of data between medical institutions and more
  • Possibility to satisfy patients requests while complying with privacy regulations
  • Saving time and money


Analyzing and rapidly sending health data can lead to faster medical decisions, improved quality of care, disease prevention and cost reduction, and drive innovative healthcare solutions. Anonymizing or removing personal information that can identify the patient is the first important step to complying with regulations and addressing privacy concerns and thus leads to a better outcome. It is an essential tool for the healthcare sector.

Learn more about anonymization check out our latest case study.

Learn more about Blue.AI here.