Intel® Tiber Developer Cloud
Help connecting to or getting started on Intel® Tiber Developer Cloud
270 Discussions

Using RAGBot with custom data

RajAyala
Novice
2,828 Views

I am new to the Intel Developer Cloud. The RAG notebook in Training and Workshops outlines how to build a RAGBot. I have three questions about it:

1) Installing packages .. says run these once. When I run the below in Jupyter Notebook, it gives an error .. how do I run these without errors?

import sys
import os
!{sys.executable} -m pip install langchain==0.0.335 --no-warn-script-location > /dev/null
!{sys.executable} -m pip install pygpt4all==1.1.0 --no-warn-script-location > /dev/null
!{sys.executable} -m pip install gpt4all==1.0.12 --no-warn-script-location > /dev/null
!{sys.executable} -m pip install transformers==4.35.1 --no-warn-script-location > /dev/null
!{sys.executable} -m pip install datasets==2.14.6 --no-warn-script-location > /dev/null
!{sys.executable} -m pip install tiktoken==0.4.0 --no-warn-script-location > /dev/null
!{sys.executable} -m pip install chromadb==0.4.15 --no-warn-script-location > /dev/null
!{sys.executable} -m pip install sentence_transformers==2.2.2 --no-warn-script-location > /dev/null

ERROR: Will not install to the user site because it will lack sys.path precedence to anyio in /srv/jupyter/python-venv/lib/python3.11/site-packages

 

2) Using my own data source files - the RAG notebook has default data sets. How can I upload my own data set  and use it for questions?

 

3) My source data is in Microsoft Word format. Do I need to import any packages to import word docs (e.g., from langchain_comminity.document_loaders import Docx2txtLoader) ? Is there a list of document readers to use for different types of source docs, e.g., pdf, excel, etc.?

 

Thanks.

 

 

Labels (1)
0 Kudos
11 Replies
Nurul_Intel
Moderator
2,794 Views

Hi RajAyala,

Thank you for reaching out to us.

 

1. We recommend you run the code from the default ipynb file which is "simple_rag.ipynb". Can you try to run the code from simple_rag.ipynb by clicking Run All Cells at the Run tab to see if it works? 

Screenshot 2024-01-17 033315.jpg

 

If the error still persists, please click Close and Shut Down Notebook and relaunch the training.

Screenshot 2024-01-17 033427.jpg

 

2. Regarding your second and third questions, I'm checking with the development team and will update you as soon as possible.

 

Regards,

Nurul

 

0 Kudos
RajAyala
Novice
2,771 Views

Hi Nurul,

 

I appreciate your response. I ran the notebook. It went through the install and import statements until it came to

from datasets import load_dataset"

statement. It then gave the following error:

RajAyala_0-1705441500860.png

 

I closed and shutdown the notebook and relaunched RAG from Training. Then, I again ran all cells from the Run tab. It gave the same error again. I am not sure how much of the environment it set up through install and import. I think the program stopped execution there and I don't see any messages after that.

 

Thanks for your help.

Raj

0 Kudos
Nurul_Intel
Moderator
2,763 Views

Hi RajAyala,

 

Thank you for sharing the error output from your end. Please try to Restart Kernel and Run All Cells by clicking this button and let us know if it helps.

Screenshot 2024-01-17 070734.jpg

 

Regards,

Nurul

 

 

0 Kudos
RajAyala
Novice
2,722 Views

Hi Nurul - Tried "Restart Kernel and Run All Cells" - same error. Closed and shut down the notebook, re-launched and tried restarting kernel and running again. Same out come. - Raj

0 Kudos
Nurul_Intel
Moderator
2,712 Views

Hi RajAyala,

Thank you for your response. Did you receive any messages or errors when running the install dependencies cell?

 

Regards,

Nurul


0 Kudos
RajAyala
Novice
2,702 Views

Hi Nurul - there were no error messages when running the install dependencies cell. The first error displayed was:

ModuleNotFoundError: No module named 'datasets'

Thanks.

- Raj

0 Kudos
Nurul_Intel
Moderator
2,684 Views

Hi RajAyala,

 

Please switch the kernel version to base (can be found at the top right corner) if you are using other kernel and click Restart Kernel and Run All Cells to see if it works? If not, do let us know for further investigation.

Screenshot 2024-01-18 041814.jpg

 

Regards,

Nurul

 

0 Kudos
RajAyala
Novice
2,677 Views

Hi Nurul,

 

I did what you suggested (see below) and it is still giving the same error.

RajAyala_0-1705524050672.png

Thanks for your continued effort to help.

Raj

 

0 Kudos
Nurul_Intel
Moderator
2,668 Views

Hi RajAyala, 

 

We have informed the relevant team about this issue for further investigation and will update you as soon as possible. Thank you for your patience.

 

Regards,

Nurul


0 Kudos
Nurul_Intel
Moderator
2,488 Views

Hi RajAyala,

We just got an update from the development team regarding the first issue. 

 

Please select pytorch-gpu as the running kernel and remove all of the hashes in Cell 1. Those need to be removed only for the first run, subsequent runs do not require those to be loaded again.

image-2024-01-23-16-12-49-917.png  

If it doesn't work, please try to remove all files from the .conda folder and run again. 

 

Regards,

Nurul

 

0 Kudos
Nurul_Intel
Moderator
2,317 Views

Hi RajAyala,

 

Thank you for your question. If you need any additional information from Intel, please submit a new question as this thread is no longer being monitored. 

 

Regards,

Nurul


0 Kudos
Reply