Democratizing Generative AI for Medicine

Jack_Erickson · ‎07-17-2023

Author: Wei Li

I have been given several opportunities to speak to leading institutions looking to leverage AI to drive breakthroughs in their field. These are deep domain experts, not AI experts. They view AI as a compute engine to leverage to help them more productively solve their problems. This is why my passion is democratizing AI and enabling an “AI Everywhere” future – we at Intel as the AI and compute experts need to reduce adoption barriers.

One of my most recent speaking opportunities was with Stanford Medical School, about using generative AI for medicine. Generative AI of course has been a popular topic, from image generation with OpenAI* DALL-E and Stable Diffusion, to text generation with OpenAI ChatGPT* and LLaMA, to programming with GitHub* Copilot* and StarCoder. But how might we use any of these for medicine?

Generative AI for Patient-Doctor Triage

If you have experienced symptoms recently and tried to schedule an appointment with your doctor, chances are it was difficult to get on their schedule. If you tried to enter those symptoms into a static web site, you probably were presented with a range of potential diseases ranging from mundane to terrifying. Wouldn’t it be nice to be able to chat with a doctor on-demand that can help narrow things down and triage your symptoms?

Here's an example of a short transcript of a real patient-doctor interaction:

Patient: Hi doctor, I am getting a one-sided headache. I do not get them often. They are behind my right eye and temple area. Now this time I am also fighting with heavy cold and cough. It came on late morning today and it hurts more when I cough or shake my head or bend over. Prior to this, I had the same headache about four months ago. At that time it came on midafternoon and lasted till next morning. Prior to that, I had the same headache exactly on the right side and lasted for a day. I am on CCB and Protonix. I am scared it might be a brain tumor. Please help.

Doctor: Hi. Based on what you have told me, this is more likely to be due to sinusitis. Sinuses are small air-filled cavities in your skull bones. They can get infected and when they are infected, you will get headaches behind your eyes. Such headaches will be more common during a cold and cough. A brain tumor headache will be on both sides, more in the forehead, and you will also have vomiting in that case. Based on what you have told me, this is more likely due to a simple sinus infection instead of a brain tumor. To be safe, we will perform CT scans of your brain and paranasal sinuses.

What happens if you enter the same patient query into ChatGPT? Here is an example response:

It sounds very similar to a static symptom search web site. This is not surprising given that ChatGPT is a general-purpose foundation model trained from such data.

A medical institution might have access to an archive of transcripts of their actual practitioner conversations that would be useful in training a chatbot with their domain-specific knowledge. But they probably don’t have the expertise or financial resources in their budget to train the model from scratch.

Fine-tuning a foundation model with their data or using knowledge distillation with their data would help alleviate the compute costs, but would still require the expertise. ChatDoctor is an example of fine-tuning a LLaMA foundation model using physician-patient conversation transcripts. This generates more specific and helpful responses, but still requires a large effort in training. And it also has the drawback of being a static model – what happens when a new virus or a variant with different symptoms emerges?

Traditional AI models store all their knowledge as parameters in the model. This is why they are so large and training-intensive. On the other hand, fast retriever models leverage pre-trained foundation models that can draw specific information from your knowledge base without any additional training.

Efficient, Privacy Preserving and Factual Generative AI

Intel Labs has developed fastRAG as extension to the open-source Haystack project – a framework for efficient generative AI. It starts with an efficient retriever model to retrieve up-to-date information from an external knowledge base – for instance a database of doctor-patient interaction transcripts. Then it uses an efficient generative model that generates the answer based on the retrieved documents. So instead of a large generative model with hundreds of billions of parameters to store all the knowledge, the generative model only needs to be able to answer questions conversationally in a given language, using retrieved knowledge.

Below is the architecture of fastRAG, compared to a static large language model (LLM).

This approach offers a few advantages for a medical institution:

No investment or expertise to train an LLM from scratch.
Ongoing access to the latest knowledge without re-training.
Significantly smaller models, with an architecture that can be deconstructed into scalable, CPU-friendly components. This lowers the compute requirements for deployment.

And if an institution can collaborate with others to pool their knowledge databases, it scales the knowledge base across all members who use it. This can also be extended to locales where professional expertise is scarce.

Now here’s the response to the above headache inquiry, generated by fastRAG:

Hi, based on your symptoms, it is unlikely that you have a brain tumor. Your headache is most likely a tension headache or a migraine. The cough and cold symptoms may be related or a separate issue. It would be best to take painkillers and anti-inflammatory medication for the headache and cough syrup for the cough. You may also want to consider getting an X-ray of your sinuses to rule out any sinus infections. If your symptoms persist or worsen, consult your doctor for further evaluation.

Because this application is trained using an institution’s actual interaction transcripts, it can generate more specific answers with links to the supporting documents for fact checking and fewer generic disclaimers. The knowledge base can also be kept private to the institution and updated with the latest information more easily than retraining the AI model. And you could even limit interactions to just the physician and the AI assistant, as a way to supplement and pool physicians’ knowledge. You would still need to follow responsible AI best practices to ensure the responses are not misleading, biased, or having other potential issues. But this is an area that can be more easily addressed by domain experts.

How to Get Started

The ability to run on ubiquitous hardware and using open software are the keys to unlocking the availability of and access to generative AI technologies. In addition to a heterogenous portfolio of AI-optimized hardware, Intel offers AI Reference Kits that provide application-specific models, scripts, example datasets, and documentation, designed for domain experts to bootstrap efforts to add AI to their applications. For instance, the Semantic Vertical Search Engine reference kit, when trained with data from doctor consultation chats, can leverage a foundation model for question answering to generate answers based on the database of chat transcripts.

These reference kits leverage Intel’s suite of end-to-end tools, frameworks, and libraries for AI and machine learning. These tools enhance popular open source frameworks to get the highest performance from Intel processors. By combining hardware, software, foundation models and innovative architectures, you can enhance your domain expertise with AI across a wider range of applications.

The easiest way to get started is to try out NeuralChat, which provides the above end-to-end tools, along with pipelines for fine-tuning and inference. It also includes Docker* files for each part, for building the entire workflow service. You can run this on Intel hardware in the cloud or on bare metal.

Democratizing Generative AI for Medicine

Generative AI for Patient-Doctor Triage

Efficient, Privacy Preserving and Factual Generative AI

How to Get Started

This is what democratizing AI is all about.