- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am on intel dev cloud and using Intel OneAPI. This is my code till now:
# first block of jupyter notebook import modin.pandas as pd # second block of jupyter notebook df = pd.read_csv('dataset/dataset.csv') df.head()
output -
# output of second block UserWarning: Ray execution environment not yet initialized. Initializing... To remove this warning, run the following python code before doing dataframe operations: import ray ray.init() 2023-09-01 12:00:16,471 INFO worker.py:1636 -- Started a local Ray instance.
The first block is running properly but, when I am reading my dataset, it is giving me this warning and server unavailable error.
If I use `import pandas as pd`, the code is running fine, but `modin.pandas` is not working. My dataset is ~ 2 GB csv file. Why is this happening???
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in Intel Communities.
We are able to reproduce the issue in Intel DevCloud for oneAPI. We are checking on this internally.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Adarsh2,
When using Modin on dev cloud the following lines must be called:
import ray
ray.shutdown()
ray.init(_memory=16000 * 1024 * 1024, object_store_memory=500 * 1024 * 1024,_driver_object_store_memory=500 * 1024 * 1024)
Can you please give this modification a try and see if the issue is resolved.
I.e (updated code):
# first block of jupyter notebook
import ray
ray.shutdown()
ray.init(_memory=16000 * 1024 * 1024, object_store_memory=500 * 1024 * 1024,_driver_object_store_memory=500 * 1024 * 1024)
# second block of jupyter notebook
import modin.pandas as pd
# third block of jupyter notebook
df = pd.read_csv('dataset/dataset.csv')
df.head()
Intel OneAPI samples has a helpful getting started sample with Modin -
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply!
But this code is running forever.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you run the following to check the memory size limit on your devcloud account:
ulimit -m
To insure there are no issues in the environment itself, I would also recommend to follow the environment setup steps listed here: https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Getting-Started-Samples/IntelModin_GettingStarted#configure-environment
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Adarsh2, were you able to give the oneAPI sample a try? In case of no response in the next couple days, the ticket will be closed due to inactivity.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried excerpt from oneAPI samples
import modin.pandas as pd
import numpy as np
array=np.random.randint(low=100, high=100000,size=(2**18,2**8))
np.savetxt("foo.csv", array,delimiter=",")
df=pd.read_csv('foo.csv')
print(df.head)
It works on local machine but not in devcloud
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I believe the issue you are seeing is stemming from a bug in the Ray library. On the dev cloud is is needed to add the following lines as I mentioned in earlier comment:
import ray
ray.shutdown()
ray.init(_memory=16000 * 1024 * 1024, object_store_memory=500 * 1024 * 1024,_driver_object_store_memory=500 * 1024 * 1024)
Following the configuration and setup instructions here: https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Getting-Started-Samples/IntelModin_GettingStarted#configure-environment , I was able to run with no issues.
Please make sure your environment is set-up properly and add the api calls to ray library.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
at least for ray 2.7 it doesn't seem to work for me
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you try to downgrade ray package to 2.6.1:
pip uninstall ray
pip install ray==2.6.1
Please give this a try and re-export the ipykernel to run the notebook.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Installing 2.6.1 and using
import ray
ray.shutdown()
ray.init(_memory=16000 * 1024 * 1024, object_store_memory=500 * 1024 * 1024,_driver_object_store_memory=500 * 1024 * 1024)
I was able to read dataset.csv, @Adarsh2 can you please try this way?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page