- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am new to the Intel Developer Cloud. The RAG notebook in Training and Workshops outlines how to build a RAGBot. I have three questions about it:
1) Installing packages .. says run these once. When I run the below in Jupyter Notebook, it gives an error .. how do I run these without errors?
import sys
import os
!{sys.executable} -m pip install langchain==0.0.335 --no-warn-script-location > /dev/null
!{sys.executable} -m pip install pygpt4all==1.1.0 --no-warn-script-location > /dev/null
!{sys.executable} -m pip install gpt4all==1.0.12 --no-warn-script-location > /dev/null
!{sys.executable} -m pip install transformers==4.35.1 --no-warn-script-location > /dev/null
!{sys.executable} -m pip install datasets==2.14.6 --no-warn-script-location > /dev/null
!{sys.executable} -m pip install tiktoken==0.4.0 --no-warn-script-location > /dev/null
!{sys.executable} -m pip install chromadb==0.4.15 --no-warn-script-location > /dev/null
!{sys.executable} -m pip install sentence_transformers==2.2.2 --no-warn-script-location > /dev/null
ERROR: Will not install to the user site because it will lack sys.path precedence to anyio in /srv/jupyter/python-venv/lib/python3.11/site-packages
2) Using my own data source files - the RAG notebook has default data sets. How can I upload my own data set and use it for questions?
3) My source data is in Microsoft Word format. Do I need to import any packages to import word docs (e.g., from langchain_comminity.document_loaders import Docx2txtLoader) ? Is there a list of document readers to use for different types of source docs, e.g., pdf, excel, etc.?
Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi RajAyala,
Thank you for reaching out to us.
1. We recommend you run the code from the default ipynb file which is "simple_rag.ipynb". Can you try to run the code from simple_rag.ipynb by clicking Run All Cells at the Run tab to see if it works?
If the error still persists, please click Close and Shut Down Notebook and relaunch the training.
2. Regarding your second and third questions, I'm checking with the development team and will update you as soon as possible.
Regards,
Nurul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nurul,
I appreciate your response. I ran the notebook. It went through the install and import statements until it came to
from datasets import load_dataset"
statement. It then gave the following error:
I closed and shutdown the notebook and relaunched RAG from Training. Then, I again ran all cells from the Run tab. It gave the same error again. I am not sure how much of the environment it set up through install and import. I think the program stopped execution there and I don't see any messages after that.
Thanks for your help.
Raj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi RajAyala,
Thank you for sharing the error output from your end. Please try to Restart Kernel and Run All Cells by clicking this button and let us know if it helps.
Regards,
Nurul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nurul - Tried "Restart Kernel and Run All Cells" - same error. Closed and shut down the notebook, re-launched and tried restarting kernel and running again. Same out come. - Raj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi RajAyala,
Thank you for your response. Did you receive any messages or errors when running the install dependencies cell?
Regards,
Nurul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nurul - there were no error messages when running the install dependencies cell. The first error displayed was:
ModuleNotFoundError: No module named 'datasets'
Thanks.
- Raj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi RajAyala,
Please switch the kernel version to base (can be found at the top right corner) if you are using other kernel and click Restart Kernel and Run All Cells to see if it works? If not, do let us know for further investigation.
Regards,
Nurul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nurul,
I did what you suggested (see below) and it is still giving the same error.
Thanks for your continued effort to help.
Raj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi RajAyala,
We have informed the relevant team about this issue for further investigation and will update you as soon as possible. Thank you for your patience.
Regards,
Nurul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi RajAyala,
We just got an update from the development team regarding the first issue.
Please select pytorch-gpu as the running kernel and remove all of the hashes in Cell 1. Those need to be removed only for the first run, subsequent runs do not require those to be loaded again.
If it doesn't work, please try to remove all files from the .conda folder and run again.
Regards,
Nurul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi RajAyala,
Thank you for your question. If you need any additional information from Intel, please submit a new question as this thread is no longer being monitored.
Regards,
Nurul
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page