Re: Intel Server Unavailable after executing this code

Adarsh2 · ‎09-01-2023

I am on intel dev cloud and using Intel OneAPI. This is my code till now:

# first block of jupyter notebook
import modin.pandas as pd

# second block of jupyter notebook
df = pd.read_csv('dataset/dataset.csv')
df.head()

output -

# output of second block

UserWarning: Ray execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:

    import ray
    ray.init()

2023-09-01 12:00:16,471 INFO worker.py:1636 -- Started a local Ray instance.

The first block is running properly but, when I am reading my dataset, it is giving me this warning and server unavailable error.

If I use `import pandas as pd`, the code is running fine, but `modin.pandas` is not working. My dataset is ~ 2 GB csv file. Why is this happening???

AthiraM_Intel · ‎09-04-2023

Hi,

Thanks for posting in Intel Communities.

We are able to reproduce the issue in Intel DevCloud for oneAPI. We are checking on this internally.

Thanks

yehudaorel · ‎09-06-2023

Hi Adarsh2,

When using Modin on dev cloud the following lines must be called:

import ray

ray.shutdown()

ray.init(_memory=16000 * 1024 * 1024, object_store_memory=500 * 1024 * 1024,_driver_object_store_memory=500 * 1024 * 1024)

Can you please give this modification a try and see if the issue is resolved.

I.e (updated code):

# first block of jupyter notebook
import ray
ray.shutdown()
ray.init(_memory=16000 * 1024 * 1024, object_store_memory=500 * 1024 * 1024,_driver_object_store_memory=500 * 1024 * 1024)

# second block of jupyter notebook
import modin.pandas as pd

# third block of jupyter notebook
df = pd.read_csv('dataset/dataset.csv')
df.head()

Intel OneAPI samples has a helpful getting started sample with Modin -

https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Getting-Started-Samples/IntelModin_GettingStarted

Thanks

Adarsh2 · ‎09-07-2023

Thanks for your reply!

But this code is running forever.

yehudaorel · ‎09-11-2023

Could you run the following to check the memory size limit on your devcloud account:

ulimit -m

To insure there are no issues in the environment itself, I would also recommend to follow the environment setup steps listed here: https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Getting-Started-Samples/IntelModin_GettingStarted#configure-environment

yehudaorel · ‎10-02-2023

Hi Adarsh2, were you able to give the oneAPI sample a try? In case of no response in the next couple days, the ticket will be closed due to inactivity.

Igor_Z_Intel · ‎10-02-2023

I tried excerpt from oneAPI samples

import modin.pandas as pd
import numpy as np

array=np.random.randint(low=100, high=100000,size=(2**18,2**8))
np.savetxt("foo.csv", array,delimiter=",")

df=pd.read_csv('foo.csv')
print(df.head)

It works on local machine but not in devcloud

yehudaorel · ‎10-03-2023

I believe the issue you are seeing is stemming from a bug in the Ray library. On the dev cloud is is needed to add the following lines as I mentioned in earlier comment:

import ray
ray.shutdown()
ray.init(_memory=16000 * 1024 * 1024, object_store_memory=500 * 1024 * 1024,_driver_object_store_memory=500 * 1024 * 1024)

Following the configuration and setup instructions here: https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Getting-Started-Samples/IntelModin_GettingStarted#configure-environment , I was able to run with no issues.

Please make sure your environment is set-up properly and add the api calls to ray library.

Igor_Z_Intel · ‎10-05-2023

at least for ray 2.7 it doesn't seem to work for me

yehudaorel · ‎10-05-2023

Can you try to downgrade ray package to 2.6.1:

pip uninstall ray

pip install ray==2.6.1

Please give this a try and re-export the ipykernel to run the notebook.

Igor_Z_Intel · ‎10-06-2023

Installing 2.6.1 and using

import ray
ray.shutdown()
ray.init(_memory=16000 * 1024 * 1024, object_store_memory=500 * 1024 * 1024,_driver_object_store_memory=500 * 1024 * 1024)

I was able to read dataset.csv, @Adarsh2 can you please try this way?