Advent of GenAI Hackathon: Recap of the Final Challenge - Custom Application Creation

Eugenie_Wirz · ‎12-21-2023

Finally, we come to days 6 and 7 of the Advent of GenAI Hackathon. The final challenge, titled "Custom Application Creation," took place over two days. Participants were tasked with building an application within a Jupyter Notebook environment, leveraging the skills and knowledge acquired throughout the previous five days. This culminating challenge, a 2-day Development Sprint, focused on "Custom Application Creation - Your Choice, Your Design!" It presented an exciting opportunity for participants to innovate, experiment, and demonstrate their skills in a real-world setting, encouraging creativity and technical expertise in developing unique applications with GenAI.

The biggest winners of the week’s biggest challenge

Submission #1: AI Lecturer by Tomáš Barczi

This groundbreaking tool was designed to help teachers create compelling presentations for any topic.

For example: "What was the battle of Trafalgar?"

The “AI Lecturer” Tomas built could generate a presentation with multiple slides, bullet points and relevant images. The images could be chosen by a human lecturer either from the database or generated from automatically created descriptions.

The tool was built for maximum ease - all the teacher would have to do is provide the initial question and pick images.

The workflow schema is shown here:

#preprocessing

First all PDFs in folder are parsed, chunked, vectorized and saved into LanceDB.
Then they are scraped for images.
These images are vectorized and saved into separate DB table.

#flow

user asks a question
relevant documents are found in the text DB
summary answer for user's question is created using gpt4all LLM and found document chunks
presentation structure is created using gpt4all LLM, found document chunks and prompt with initial format so it can be parsed easily
presentation is parsed to have slide name and individual bulletpoints for each slide
bulletpoints for each slides are checked with prediction guard factuality API
for each slide an image description is created that should be relevant for this slide. It used gpt4all with bulletpoints as context and overall summary in mind (this is crucial for slides like introduction and conclusion)
using this description relevant images are found that were previously scraped from PDFs
user can pick one of these images
if the user doesn't pick an image for a slide, then one is generated using its description
finally a presentation is created (currently in markdown format)

Final presentation:

Long before the Hackathon, Tomáš Barczi had been thinking about how to use generative Al in everyday life. Education came to the fore as a natural choice, because it’s one of the most popular use cases for tools like ChatGPT. Barczi saw the potential to build a tool that could make it a little easier for time-pressed teachers to create engaging presentations for their students. He also immediately saw how creating this tool would give him the chance to put everything he had learned through the Hackathon to use.

He built the solution with retrieval augmented generation (RAG) and Stable Diffusion. He came up with a chain of cascading prompts that would create the description of the image that could then be used for the generation.

Since I wanted to incorporate even the image similarly search, I found a library that could scrape all images from the textbook and then I can store them in a vector database.

Additionally, he included a library to scrape textbook images for a vector database, enhancing the presentation's authenticity with real images when possible. The inclusion of the Prediction Guard API ensured the trustworthiness of the generated content. Barczi's project represents a significant step in applying generative AI to education.

Submission #2: An Inventive Tool to Extract Statements with References and Logical Arguments by Vincent Müller

Vincent Müller built a browser-plugin-like tool to extract statements with references and logical arguments from webpages. This could be used to get informed easier by going through the statements and arguments, rather than the actual text. He wanted to explore the reasoning and information extraction capabilities of LLMs more, and this project was the result.

Here’s how it works:

The user enters a URL and clicks ‘Extract’
Source text is split in chunks
Statements with References are extracted from the source text chunks
Statements are saved in a Vector-Database
Create arguments for each statement by looking for relevant premises in the other statements or generating premises
Display extracted statements with References and Premises

Here’s a screenshot from the Jupyter Notebook where an example article is displayed with the side panel, which is supposed to emulate how a browser plugin could look.

image (21).png

Vincent Müller commented on his experience with the final hackathon challenge, saying, "I adopted a straightforward method to extract statements, arguments, and references from web pages by combining prompt engineering with the use of a vector database. There are a few limitations, such as difficulty in finding relevant premises for a statement or hallucinating reference citations. These issues could be mitigated through further prompt engineering, testing different models and their configurations. When interacting with users, it's also crucial to provide appropriate warnings about hallucinations and the factuality of information. Participating in the Hackathon was a fantastic experience, and it greatly enhanced my skills in Generative AI."'

Submission #3: RAG-Injected Dungeons and Dragons Chatbot by Simon Hanly-Jones, Emmanuel Isaac, and Ryan Mann

The developers team created a RAG injected chatbot to provide details about specified Dungeons and Dragons monsters, with dynamic image and video generation capabilities.

As you can imagine, the team had a lot of fun with this challenge. They key features of the solution were:

A stylistically distinctive chatbot that maintains character while giving correct information via Retrieval Augmented Generation context injection
Automatic image prompt enhancement using AI (a second Zephyr chatbot) which provides subject specific image prompts for relevant inputs.
Dynamic image and video generation based on any prompt.

Here is a summary of the workflow:

Step 1 -> User asks a question about a DnD monster

Step 2 -> Zephyr bot with RAG context injection answers the question like a cheesy villain

Step 3 -> the subject of the question, the monster is identified from the RAG record

Step 4 -> the monster is then fed into another zephyr chatbot which generates a good image prompt for SDXL

Step 5 -> the prompt is used to generate an image (sdxl)

Step 6 -> the image is used to generate a video (sdv-xt)

Users are able to 'jump on' the pipeline at steps 1 to 4, meaning they can input any creature into the prompt enhancement bot to receive an image and video, or they can write their own prompt from scratch.

Technical Insights

PIPELINES STEP 2: ZEPHYR BOT WITH RAG AND CONVERSATION

Getting the chat prompt right for this first bot was very challenging. There were two objectives. Consistent tone in the response, and correct information.

After much trial and error they discovered a prompt that worked:

Interestingly, "context" could not be split into headings like "conversation history" and "monster information". The model just couldn't follow the structure. Wizard code performed much better with headings like this in the previous challenge, which is something to keep in mind for future projects.

The team looked at the source code and found a template, as well as the template on Predication Guard, which they found extremely helpful.

They also found it necessary to include both sets of information because without the chat history the bot thought it was a monster itself. Without the informational context from RAG it hallucinated.

It is also worth noting that applying the Zephyr prompt template to past messages made performance worse. The best performance was obtained using a list without removing the python artefacts, eg:

Further work is warranted to compare this approach to applying different labels via the standard chat completion abstractions. It may be that a more robust solution can be built out of the abstracted chat prompting models.

STEP 4 PROMPT AUGMENTATION WITH ZEPHYR

As noted above, the team used a second chatbot to augment prompts for the image generation model and the video generation model.

When reviewing the early video output, they saw that images where the subject was in action worked better, which led them to speculate that the model could identify common movement patterns and can apply them better when it is more obvious what should be happening. For instance, a dragon in flight animated much better than a dragon sitting still.

In Simon’s words: “This creates an issue for generic prompting. Put rhetorically, how do we know what action is appropriate for a creature when we don't know what the creature is?”

They initially tried prompting for something like:

This was okay, but they saw a dramatic improvement with some more work. They then refined the prompt to:

However, it did not consistently create the desired motion. So they used a second Zephyr bot with the following instructions:

This was a very standard system instruction obtained by asking mistral instruct what it used by default.

They saw greatly improved results to the video output with the augmented prompts. They were also surprisingly poetic, particularly in this example:

Situationally appropriate actions and background have been included into the image prompt without any input from the user. The images for these prompts are included in the zip for the full test case uploaded with the submission. It wasn't cherry picked - this is indicative of normal performance.

STEP 5 TEXT TO VIDEO MODEL SELECTION

There was some difficulty with the selection of the text to image model. The team initially used sdxl-turbo. It is a fast checkpoint of sdxl and runs in low vram environments. Sdxl-turbo produced amazing images with low system cost with default settings, but did not perform well when given the rectangular resolution of 576x1024 required by the image to video model. It generated a lot of additional limbs.

So they switched to sdxl, which performs very well with a higher performance cost.

(view in My Videos)

Submission #4: Creative Text2Meme Generator using Llava Model with a RAG Pipeline by Samy Gauquelin

For the final project, Samy Guaquelin aimed to incorporate as much as possible from what we utilized during the hackathon. Manipulating memes involves both text and images, so it was an attractive area to investigate - all the more so since Samy stumbled on the dataset MemeCap. As he explored the idea of generating memes, Samy researched existing solutions but found that none of them employed generative models. By combining the approaches of Text2Image and Image2Text, he was able to leverage RAG, LLM chains, image generation, and even multimodal models.

Global Idea

Honorable Mentions

The creativity and innovation of these submissions is truly commendable! We are excited to honor the additional outstanding projects by:

Aditya Krishna - News2Image: A System for Generating Headlines and Images from News API;
Srikath Thokala - A python code explainer with audio explanation and an enhanced UI;
Atif Ahmed and his team members Alvin Lee, Zhi Sheng Teh, Yao Jing and Yeow Ngee Seah - Witness the aging process in real-time with the application.

Join the Next Adventure: Elevate Your AI Startup with Intel Liftoff

The Advent of GenAI Hackathon was a huge success, and we’re just getting started. Keep an eye out for our next Hackathon - you could get a chance to showcase your very own AI innovations. If you’re an early stage AI startup, apply to join the Intel® Liftoff Program for Startups. It's your opportunity to elevate your projects, connect with a network of innovators, and take your startup to new heights with Intel's support.