- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
from transformers import AutoModelForSequenceClassification, AutoTokenizer
# Define the path to the checkpoint
checkpoint_path = r"./results/checkpoint-1000" # Replace with your checkpoint folder
# Load the model
model = AutoModelForSequenceClassification.from_pretrained("Training/AI/GenAI/ai-innovation-bridge/workshops/ai-workloads-with-huggingface/results/checkpoint-1000")
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
from transformers import cached_file
# Replace with the actual values
path_or_repo_id = "username/repository"
filename = "model_file.bin"
cache_dir = "path/to/cache_dir"
# Download the file
resolved_file = cached_file(
path_or_repo_id=path_or_repo_id,
filename=filename,
cache_dir=cache_dir,
force_download=True, # Set to True to force download
resume_download=True, # Set to True to resume download if it was interrupted
proxies=None, # Set to your proxy settings if needed
token=None, # Set to your Hugging Face token if the repository is gated
revision=None, # Set to a specific commit hash or tag if needed
local_files_only=True, # Set to True to only look for local files
subfolder=None, # Set to a subfolder if the file is in a subfolder
repo_type=None, # Set to "dataset" or "model" if necessary
user_agent=None, # Set to a custom user agent if needed
_raise_exceptions_for_gated_repo=True, # Set to False to not raise exceptions for gated repositories
_raise_exceptions_for_missing_entries=True, # Set to False to not raise exceptions for missing entries
_raise_exceptions_for_connection_errors=True, # Set to False to not raise exceptions for connection errors
_commit_hash=None, # Set to a specific commit hash if needed
)
# Use the resolved_file path for further processing
This code produces the following error and I am stuck with this for quite some time now. Please suggest a quick fix to proceed.
File ~/.local/lib/python3.11/site-packages/transformers/utils/hub.py:385, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs) 383 try: 384 # Load from URL or cache if already cached --> 385 resolved_file = hf_hub_download( 386 path_or_repo_id, 387 filename, 388 subfolder=None if len(subfolder) == 0 else subfolder, 389 repo_type=repo_type, 390 revision=revision, 391 cache_dir=cache_dir, 392 user_agent=user_agent, 393 force_download=force_download, 394 proxies=proxies, 395 resume_download=resume_download, 396 token=token, 397 local_files_only=local_files_only, 398 ) 399 except GatedRepoError as e: File ~/.local/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py:110, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs) 109 if arg_name in ["repo_id", "from_id", "to_id"]: --> 110 validate_repo_id(arg_value) 112 elif arg_name == "token" and arg_value is not None: File ~/.local/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py:158, in validate_repo_id(repo_id) 157 if repo_id.count("/") > 1: --> 158 raise HFValidationError( 159 "Repo id must be in the form 'repo_name' or 'namespace/repo_name':" 160 f" '{repo_id}'. Use `repo_type` argument if needed." 161 ) 163 if not REPO_ID_REGEX.match(repo_id): HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'Training/AI/GenAI/ai-innovation-bridge/workshops/ai-workloads-with-huggingface/results/checkpoint-1000'. Use `repo_type` argument if needed. The above exception was the direct cause of the following exception: OSError Traceback (most recent call last) Cell In[3], line 7 4 checkpoint_path = r"./results/checkpoint-1000" # Replace with your checkpoint folder 6 # Load the model ----> 7 model = AutoModelForSequenceClassification.from_pretrained("Training/AI/GenAI/ai-innovation-bridge/workshops/ai-workloads-with-huggingface/results/checkpoint-1000") 9 # Load the tokenizer 10 tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") File ~/.local/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:488, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs) 485 if commit_hash is None: 486 if not isinstance(config, PretrainedConfig): 487 # We make a call to the config file first (which may be absent) to get the commit hash as soon as possible --> 488 resolved_config_file = cached_file( 489 pretrained_model_name_or_path, 490 CONFIG_NAME, 491 _raise_exceptions_for_missing_entries=False, 492 _raise_exceptions_for_connection_errors=False, 493 **hub_kwargs, 494 ) 495 commit_hash = extract_commit_hash(resolved_config_file, commit_hash) 496 else: File ~/.local/lib/python3.11/site-packages/transformers/utils/hub.py:450, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs) 448 raise EnvironmentError(f"There was a specific connection error when trying to load {path_or_repo_id}:\n{err}") 449 except HFValidationError as e: --> 450 raise EnvironmentError( 451 f"Incorrect path_or_model_id: '{path_or_repo_id}'. Please provide either the path to a local folder or the repo_id of a model on the Hub." 452 ) from e 453 return resolved_file OSError: Incorrect path_or_model_id: 'Training/AI/GenAI/ai-innovation-bridge/workshops/ai-workloads-with-huggingface/results/checkpoint-1000'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
- Tags:
- ssh
- Sudo Permissions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Peh,
Hope you are doing well! Here's the warning (screenshot you shared earlier) that I fixed while I was fine tuning the pre trained model (about three months ago). This does not show up on my notebook and I have the files for the same. I have saved the files for this warning in the vvv folder and that's what the resolved file is about I suppose. This contains the HTML file for the jupytr notebook too along with the warning resolution files.
Thanks & Regards
Siddhartha Sharma
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sid5,
Thanks for reaching out to us.
Based on your encountered error, it seems like the model is loading from an incorrect path.
Please have a quick try with the following codes:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
# Define the path to the checkpoint
checkpoint_path = r"./results/checkpoint-1000" # Replace with your checkpoint folder
# Load the model
model = AutoModelForSequenceClassification.from_pretrained(checkpoint_path)
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Peh_Intel,
I have verified and corrected the path in different ways possible but I am still stuck on this error which does not make sense to me. Rest assured it is the correct path I am entering from the results section of my notebook. Here, I was trying a different iteration to see if it works then posted it, this is the path: Training/AI/GenAI/ai-innovation-bridge/workshops/ai-workloads-with-huggingface/results/checkpoint-1000
Thanks & regards
Siddhartha Sharma
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Siddhartha Sharma,
Please ensure that your checkpoint-1000 is folder that containing config.json file.
Besides, could you also share your checkpoint folder with me? So that I can also try to upload the checkpoint from my end as I do not have any model checkpoint.
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Peh_Intel,
Yes it does contain the mentioned file here's the screenshot for the same:
Regards,
Siddhartha Sharma
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Siddhartha Sharma,
Please have a try with the following approach.
1. Right click on your Jupyter Notebook and select “Show in File Browser” to navigate to the file browser.
2. Upload or move your checkpoint folder here, which same location as your Jupyter Notebook.
3. Load the model with the command below (Rename as your checkpoint folder's name):
model = AutoModelForSequenceClassification.from_pretrained(“checkpoint_path”)
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Siddhartha Sharma,
We have not heard back from you. I hope previous responses were sufficient to help you proceed.
If you need any additional information from Intel, please submit a new question as this thread is no longer being monitored.
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Peh,
No I was not able to move the checkpoint folder to the specified location, it is still at the same location and path which still shows the same error.
Best Regards
Siddhartha Sharma
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Siddhartha Sharma,
What if you copy the entire folder, checkpoint-1000, and paste to the the path which same location as your Jupyter Notebook? Next, load the model with the command below:
model = AutoModelForSequenceClassification.from_pretrained(“checkpoint-1000”)
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Peh,
In that case it is showing the following error because of the file size:
Thanks & regards
Siddhartha Sharma
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Siddhartha Sharma,
Thanks for trying out.
In that case, copy and paste the Jupyter Notebook (.ipynb file) to the path same as your folder, checkpoint-1000.
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Peh,
I was able to copy the .ipynb files into the checkpoint-1000 folder now. It's showing the following now:
Thanks & regards
Siddhartha Sharma
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Siddhartha Sharma,
Regarding your latest error, I found that this StackOverflow discussion, Mismatched
size on BertForSequenceClassification from Transformers and multiclass problem,
might helpful for you.
Please try the suggestions, which specify the number of labels of your new dataset and use ignore_mismatched_sizes=True argument.
model = AutoModelForSequenceClassification.from_pretrained(checkpoint_path, num_labels=<num>, ignore_mismatched_sizes=True)
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Peh,
I was able to resolve the mismatched size error yesterday but now I am stuck with the gated repo ID error for the same and also the checkpoint folder path where the system is unable to recognize the checkpoint folder either locally or on hugging-face although I have passed the token to login. Is there any way I can clone this workspace into my huggingface model? Here's the screenshot:
Thanks & Regards
Siddhartha Sharma
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Siddhartha Sharma,
I would like to know how you encountered the error above. Which code you are trying to execute? Could you please share the full code and error messages?
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Peh_Intel,
The code is the same as before for the checkpoint model path.
from transformers import AutoModelForSequenceClassification, AutoTokenizer
# Define the path to the checkpoint
checkpoint_path = r"Training/AI/GenAI/ai-innovation-bridge/workshops/ai-workloads-with-huggingface/results/checkpoint-1000",
# Load the model
model = AutoModelForSequenceClassification.from_pretrained("checkpoint-1000")
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
from transformers import cached_file
# Replace with the actual values
path_or_repo_id = "Training/AI/GenAI/ai-innovation-bridge/workshops/ai-workloads-with-huggingface/results/checkpoint-1000"
filename = "model_file.bin"
cache_dir = "https://huggingface.co/Sids99/intobj"
# Download the file
resolved_file = cached_file(
path_or_repo_id="Training/AI/GenAI/ai-innovation-bridge/workshops/ai-workloads-with-huggingface/results/checkpoint-1000",
filename=config.json,
cache_dir="https://huggingface.co/Sids99/intobj",
force_download=True, # Set to True to force download
resume_download=True, # Set to True to resume download if it was interrupted
proxies=None, # Set to your proxy settings if needed
token=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, # Set to your Hugging Face token if the repository is gated
revision=None, # Set to a specific commit hash or tag if needed
local_files_only=True, # Set to True to only look for local files
subfolder=None, # Set to a subfolder if the file is in a subfolder
repo_type=None, # Set to "dataset" or "model" if necessary
user_agent=None, # Set to a custom user agent if needed
_raise_exceptions_for_gated_repo=True, # Set to False to not raise exceptions for gated repositories
_raise_exceptions_for_missing_entries=True, # Set to False to not raise exceptions for missing entries
_raise_exceptions_for_connection_errors=True, # Set to False to not raise exceptions for connection errors
_commit_hash=None, # Set to a specific commit hash if needed
ignore_mismatched_size =True
)
This is the error it generates:
--------------------------------------------------------------------------- HTTPError Traceback (most recent call last) File ~/.local/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py:286, in hf_raise_for_status(response, endpoint_name) 285 try: --> 286 response.raise_for_status() 287 except HTTPError as e: File /srv/jupyter/python-venv/lib/python3.11/site-packages/requests/models.py:1021, in Response.raise_for_status(self) 1020 if http_error_msg: -> 1021 raise HTTPError(http_error_msg, response=self) HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/checkpoint-1000/resolve/main/config.json The above exception was the direct cause of the following exception: RepositoryNotFoundError Traceback (most recent call last) File ~/.local/lib/python3.11/site-packages/transformers/utils/hub.py:385, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs) 383 try: 384 # Load from URL or cache if already cached --> 385 resolved_file = hf_hub_download( 386 path_or_repo_id, 387 filename, 388 subfolder=None if len(subfolder) == 0 else subfolder, 389 repo_type=repo_type, 390 revision=revision, 391 cache_dir=cache_dir, 392 user_agent=user_agent, 393 force_download=force_download, 394 proxies=proxies, 395 resume_download=resume_download, 396 token=token, 397 local_files_only=local_files_only, 398 ) 399 except GatedRepoError as e: File ~/.local/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py:118, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs) 116 kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.__name__, has_token=has_token, kwargs=kwargs) --> 118 return fn(*args, **kwargs) File ~/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py:1368, in hf_hub_download(repo_id, filename, subfolder, repo_type, revision, library_name, library_version, cache_dir, local_dir, local_dir_use_symlinks, user_agent, force_download, force_filename, proxies, etag_timeout, resume_download, token, local_files_only, legacy_cache_layout, endpoint) 1366 elif isinstance(head_call_error, RepositoryNotFoundError) or isinstance(head_call_error, GatedRepoError): 1367 # Repo not found => let's raise the actual error -> 1368 raise head_call_error 1369 else: 1370 # Otherwise: most likely a connection issue or Hub downtime => let's warn the user File ~/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py:1238, in hf_hub_download(repo_id, filename, subfolder, repo_type, revision, library_name, library_version, cache_dir, local_dir, local_dir_use_symlinks, user_agent, force_download, force_filename, proxies, etag_timeout, resume_download, token, local_files_only, legacy_cache_layout, endpoint) 1237 try: -> 1238 metadata = get_hf_file_metadata( 1239 url=url, 1240 token=token, 1241 proxies=proxies, 1242 timeout=etag_timeout, 1243 library_name=library_name, 1244 library_version=library_version, 1245 user_agent=user_agent, 1246 ) 1247 except EntryNotFoundError as http_error: 1248 # Cache the non-existence of the file and raise File ~/.local/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py:118, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs) 116 kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.__name__, has_token=has_token, kwargs=kwargs) --> 118 return fn(*args, **kwargs) File ~/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py:1631, in get_hf_file_metadata(url, token, proxies, timeout, library_name, library_version, user_agent) 1630 # Retrieve metadata -> 1631 r = _request_wrapper( 1632 method="HEAD", 1633 url=url, 1634 headers=headers, 1635 allow_redirects=False, 1636 follow_relative_redirects=True, 1637 proxies=proxies, 1638 timeout=timeout, 1639 ) 1640 hf_raise_for_status(r) File ~/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py:385, in _request_wrapper(method, url, follow_relative_redirects, **params) 384 if follow_relative_redirects: --> 385 response = _request_wrapper( 386 method=method, 387 url=url, 388 follow_relative_redirects=False, 389 **params, 390 ) 392 # If redirection, we redirect only relative paths. 393 # This is useful in case of a renamed repository. File ~/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py:409, in _request_wrapper(method, url, follow_relative_redirects, **params) 408 response = get_session().request(method=method, url=url, **params) --> 409 hf_raise_for_status(response) 410 return response
Thanks & Regards
Siddhartha Sharma
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Siddhartha Sharma,
By referring to Uploading and Sharing Models on Hugging Face Hub with Intel Optimizations, after you are able to load your pre-trained model and tokenizer, next, you will upload the model and tokenizer to the Hugging Face Hub using the following codes:
# Save the model and tokenizer
model_name_on_hub = "intobj "
model.save_pretrained(model_name_on_hub)
tokenizer.save_pretrained(model_name_on_hub)
# Push to the hub
model.push_to_hub(model_name_on_hub)
tokenizer.push_to_hub(model_name_on_hub)
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Peh_Intel,
I have entered the model name but the previous step itself does not pass, will try to push the model and tokenizer to HF hub now to see if it changes anything.
Thanks & regards
Siddhartha Sharma
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Siddhartha Sharma,
Are you able to push the model and tokenizer to Hugging Face Hub?
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Peh,
Sorry about the delayed response. I tried pushing the model now using my custom model name on hugging face but it still shows the previous errors from the resolved file.
I also tried moving the checkpoint folder using the mv Training/AI/GenAI/ai-innovation-bridge/workshops/ai-workloads-with-hugging-face/results/checkpoint-1000 Training/AI/GenAI/ai-innovation-bridge/workshops/ai-workloads-with-hugging-face
mv: cannot stat 'Training/AI/GenAI/ai-innovation-bridge/workshops/ai-workloads-with-hugging-face/results/checkpoint-1000': No such file or directory command but it shows this error while this is the only path.
Thanks & Regards
Siddhartha Sharma
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Siddhartha Sharma,
Since your Jupyter Notebook is same location as your checkpoint-1000 folder, please have a try to remove all the codes that specifying the full path of checkpoint-1000 folder, which just maintain as checkpoint-1000 instead of Training/AI/GenAI/ai-innovation-bridge/workshops/ai-workloads-with-huggingface/results/checkpoint-1000.
Regards,
Peh
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page