- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
We have followed the instructions given in below link to run llama2-7b also we have tried with nightly version but in both the approach we are facing similar error "File not found: openvino_tokenizer.xml". When we are trying to install openvino_tokenizer we are getting lot of errors. Could you please let us know if there is any other way to run llama model with openvino_genai
https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/genai-guide-npu.html
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Shravanthi,
Thanks for reaching out. Can you share the screenshot of your TinyLlama directory? Does openvino_tokenizer (.xml and .bin) files available in the directory? I have exported the TinyLlama but the files are not available from my end. When exporting the LLM models, the directory should include the openvino_tokenizer files. Below are the files when exporting mistral-7b-instruct-v0.1-int8-ov model:
Regards,
Aznie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Aznie,
We are also facing same issue when we exported TinyLlama and Llama2-7b models openvino_tokenizers and openvino_detokenizers files are not available in the directly. We tried to use these tokenizers files from another source of Llama2-7b but we got Error: Cannot create SpecialTokensSplit layer, below is the screenshot of error
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sharavanthi,
We are checking this with the development team and will get back to you soon.
Regards,
Aznie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Aznie,
Do you have any update on this ?
Thanks
Shravanthi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Shravanthi,
Your case is currently with me. I opened an issue with OpenVINO developers to discuss the details. I will get back to you as soon as I know more.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Shravanthi,
I didn't receive any response from the developers yet. Please bear with us.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Our developer was able to replicate your case, here's the output:
(venv20245) apaniuko@IRL-ODT-08:~/python/openvino_tokenizers/benchmark$ optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --group-size 128 --ratio 1.0 TinyLlama
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class (https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)
/home/apaniuko/python/openvino_tokenizers/benchmark/venv20245/lib/python3.10/site-packages/transformers/cache_utils.py:458: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
or len(self.key_cache[layer_idx]) == 0 # the layer has no cache
/home/apaniuko/python/openvino_tokenizers/benchmark/venv20245/lib/python3.10/site-packages/optimum/exporters/openvino/model_patcher.py:496: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if sequence_length != 1:
/home/apaniuko/python/openvino_tokenizers/benchmark/venv20245/lib/python3.10/site-packages/transformers/cache_utils.py:443: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
elif len(self.key_cache[layer_idx]) == 0: # fills previously skipped layers; checking for tensor causes errors
INFO:nncf:Statistics of the bitwidth distribution:
┍━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┑
│ Weight compression mode │ % all parameters (layers) │ % ratio-defining parameters (layers) │
┝━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥
│ int8_asym │ 12% (2 / 156) │ 0% (0 / 154) │
├───────────────────────────┼─────────────────────────────┼────────────────────────────────────────┤
│ int4_sym │ 88% (154 / 156) │ 100% (154 / 154) │
┕━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙
Applying Weight Compression ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 0:00:40 • 0:00:00
(venv20245) apaniuko@IRL-ODT-08:~/python/openvino_tokenizers/benchmark$ ls TinyLlama/
config.json openvino_detokenizer.bin openvino_model.bin openvino_tokenizer.bin special_tokens_map.json tokenizer.json
generation_config.json openvino_detokenizer.xml openvino_model.xml openvino_tokenizer.xml tokenizer_config.json tokenizer.model
We're suspecting it to be an environment issue. Could you check your packages please? Here's the list from our side:
about-time==4.2.1
aiohappyeyeballs==2.4.3
aiohttp==3.11.8
aiosignal==1.3.1
alive-progress==3.2.0
async-timeout==5.0.1
attrs==24.2.0
autograd==1.7.0
certifi==2024.8.30
charset-normalizer==3.4.0
cma==3.2.2
coloredlogs==15.0.1
contourpy==1.3.1
cycler==0.12.1
datasets==3.1.0
Deprecated==1.2.15
dill==0.3.8
filelock==3.16.1
fonttools==4.55.0
frozenlist==1.5.0
fsspec==2024.9.0
grapheme==0.6.0
huggingface-hub==0.26.2
humanfriendly==10.0
idna==3.10
Jinja2==3.1.4
joblib==1.4.2
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
jstyleson==0.0.2
kiwisolver==1.4.7
markdown-it-py==3.0.0
MarkupSafe==3.0.2
matplotlib==3.9.2
mdurl==0.1.2
mpmath==1.3.0
multidict==6.1.0
multiprocess==0.70.16
natsort==8.4.0
networkx==3.3
ninja==1.11.1.2
nncf==2.14.0
numpy==2.1.3
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
onnx==1.17.0
openvino==2024.5.0
openvino-genai==2024.5.0.0
openvino-telemetry==2024.5.0
openvino-tokenizers==2024.5.0.0
optimum==1.23.3
optimum-intel==1.20.1
packaging==24.2
pandas==2.2.3
pillow==11.0.0
propcache==0.2.0
protobuf==5.28.3
psutil==6.1.0
pyarrow==18.1.0
pydot==2.0.0
Pygments==2.18.0
pymoo==0.6.1.3
pyparsing==3.2.0
python-dateutil==2.9.0.post0
pytz==2024.2
PyYAML==6.0.2
referencing==0.35.1
regex==2024.11.6
requests==2.32.3
rich==13.9.4
rpds-py==0.21.0
safetensors==0.4.5
scikit-learn==1.5.2
scipy==1.14.1
sentencepiece==0.2.0
six==1.16.0
sympy==1.13.1
tabulate==0.9.0
threadpoolctl==3.5.0
tokenizers==0.20.3
torch==2.5.1
tqdm==4.67.1
transformers==4.46.3
triton==3.1.0
typing_extensions==4.12.2
tzdata==2024.2
urllib3==2.2.3
wrapt==1.17.0
xxhash==3.5.0
yarl==1.18.0
You could also try to convert tokenizers separately with this command:
(venv20245) apaniuko@IRL-ODT-08:~/python/openvino_tokenizers/benchmark$ convert_tokenizer TinyLlama/TinyLlama-1.1B-Chat-v1.0 --with-detokenizer --left-padding -o TinyLlama/
Loading Huggingface Tokenizer...
Converting Huggingface Tokenizer to OpenVINO...
Saved OpenVINO Tokenizer: TinyLlama/openvino_tokenizer.xml, TinyLlama/openvino_tokenizer.bin
Saved OpenVINO Detokenizer: TinyLlama/openvino_detokenizer.xml, TinyLlama/openvino_detokenizer.bin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Shravanthi,
Have you been able to try the workaround suggested? Please respond within 5 working days, otherwise I will have to deescalate this issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Shravanthi,
Thank you for your question. If you need any additional information from Intel, please submit a new question as this thread is no longer being monitored.
Regards,
Aznie
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page