Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
1641 Discussions

Copying large dataset into Intel DevCloud Jupyter notebook for analysis

Cagri_T_Intel
Employee
607 Views

 

Hi,

What is the best way to copy ~40 GB of data stored on a local drive to Jupyter notebook (https://notebooks.edge.devcloud.intel.com/)? When I tried to use the GUI file upload option, I get a sync error. Is this because we are not meant to copy such a large dataset or I need to use an alternative method?

I tried cloning a data repository using Intel's github but there seems to be an IP number restriction that prevents me from doing that.

I also tried to use a public github repository but then I hit the 500 MB data storage limit on Github.

I am also wondering if it is at all possible to use Intel DevCloud Jupyter environment to analyze such large datasets. Any thoughts?

Any suggestion will be much appreciated.

Regards,

Cagri

0 Kudos
4 Replies
Hari_B_Intel
Moderator
593 Views

Hi Cagri_T_Intel


There are a few ways to upload files to Intel® DevCloud for the edge.


  1. Click on the upload button in the Jupyter notebook, to upload a large file, my advice is to break down the file into small sizes (preferably less than 500MB each file).
  2. For GitHub/git repository advisable to set your repository to public and make sure does not contain any sensitive information.
  3. Use any downloading tool such as wget, you can upload the file into a small file size and download it using wget if the large file size shows a connection issue.


Hope this information help


Thank You


Cagri_T_Intel
Employee
584 Views

 

Hi Hari,

 

Thank you so much for your suggestions.

 

I have tried them all and following were my observations, which I hope will help you help me with this :

 

1 - Click on the upload button in the Jupyter notebook, to upload a large file, my advice is to break down the file into small sizes (preferably less than 500MB each file).

I tried this. Each file was under 600 kB in size and I have 13,000 such files to upload.  After copying 8804 files (in no particular order), I get the following error.

 

Cagri_T_Intel_1-1657924779095.png

 

I thought of trying to copy all source files again but the problem with that approach is the Jupyter interface asks whether to overwrite every single file one by one (with no option of bulk confirmation !), and therefore this is not practical at all.

 

2 - For GitHub/git repository advisable to set your repository to public and make sure does not contain any sensitive information.

Following is the repository I have created for this file upload on Github. It is specified as "Internal" not "Private" and is under internal repo. (I also tried to create a public Github repo using my personal account but as I described in my first message, even with a paid Pro account, I will not be able to store data larger than 2GB, which is not going to solve my problem)

<Removed Internal Information>

I would expect inter-sandbox to easily talk to anything in Intel DevCLoud but that does not seem to be the case (based on what I have been seeing, there seems to be firewalls between Intel devcloud  and internal repo. That makes no sense to me.

<Removed Internal Information>

 

3 - Use any downloading tool such as wget, you can upload the file into a small file size and download it using wget if the large file size shows a connection issue.

My understanding is I cannot use wget to directly upload from my local hard drive. This command can get data stored on the web as I understand (i.e. I need to specify http links to where the data is). I tried this method on public storage such as Google Drive but the share links did not work for some reason. Do you have an alternative suggestion on cloud storage that will work with wget AND will allow me to store ~40 GB on it?

 

Thank you once again with this Hari and I hope you will be able to help me with further suggestions/guidance on this as I am running out of ideas and will probably end up looking for solutions to this outside Intel DevCloud at all.

 

Regards,

Cagri

 

0 Kudos
Hari_B_Intel
Moderator
554 Views

Hi Cagri_T_Intel


After checking with the developer, it is best to compress the large model into a single file and upload the compressed file. 

If you are still facing any issues, please provide us with how long did it take to fail and a screenshot of the error message.


Hope this information help


Thank you


0 Kudos
Hari_B_Intel
Moderator
538 Views

Hi Cagri_T_Intel


This thread will no longer be monitored since we have provided a solution. Please submit a new question if you need any additional information from Intel.


Thank you


0 Kudos
Reply