oneAPI BERT NLP training times and model size

endomorphosis · ‎01-18-2021

I am wanting to train a natural languge model based on a large corpus of legal text. My desktop GPU has only 8GB, and that limits the token size that I can use. I have found that other systems can be trained that will allow 4k tokens, but they require 48GB of video ram and take 2 days to train on a Nvidia Quadro RTX 8000.

What I am wondering is if it is feasible to run these models on oneAPI, and how long it would take compared to a GPU. I understand that different hardware architectures will greatly effect the training time, and I would like to know how well this would compare to training on a Nvidia GPU considering memory access time.

I am also wondering: if I want to make a legal research product for the general public, so people can query my model with legal questions, would it violate the terms of service to connect a virtual machine host to Intel oneAPI and make frequent requests to this service.

AthiraM_Intel · ‎01-19-2021

Hi,

Thanks for reaching out to us.

The Intel® DevCloud is a cluster composed of CPUs, GPUs, and FPGAs, and it is preinstalled with several oneAPI toolkits.

You can develop, test, and run your workloads for free on Intel DevCloud.

Intel DevCloud access is free for 120 days with the possibility of an extension. You can register for DevCloud in the following link:

https://intelsoftwaresites.secure.force.com/DevCloud/?version=lite

Please refer the below link to know more about Intel DevCloud.

https://devcloud-docs.readthedocs.io/en/latest/getting_started.html#join-the-intel-ai-academy-to-sign-up-for-devcloud

If you have any further issue please let us know.

Thanks

AthiraM_Intel · ‎01-22-2021

Hi,

Could you please give us an update?

Thanks.

endomorphosis · ‎01-22-2021

Nothing you said really answered any of my questions, so for now I am limited to using the premium version of google colab / google drive, because nobody here has yet answered the performance and hosting issues that I inquired about.

I will try to put it in another format.

1) does Oneapi allow 24/7 public hosting (even if for a fee)

2) what is the FPGA / GPU memory and processing speed i.e. in pytorch or other benchmark.

3) There are extensions that are allowed, but how likely is extending the 120 day membership.

Note: I am a former cloud computing group employee of Intel, trying to make an artificial intelligence lawyer chatbot.

AthiraM_Intel · ‎01-24-2021

Hi,

Please find the answers for your questions:

The answer is no. DevCloud won't allow you to host any application as it is not a production environment.
DevCloud has free access for 120 days and if you need to extend the access , you need to upload your project on DevMesh : https://devmesh.intel.com/projects and then return to the DevCloud portal https://devcloud.intel.com/oneapi and use the "Extend" button to submit an extension request with the URL of your DevMesh project. This will help the DevCloud team understand your project and to verify whether you are using DevCloud for legitimate purposes either Academic or to check app functionality on Intel's latest HW/SW . In that case , your account access would be extended for additional 90 days or depending on your project's requirement like 180 days or so.

For answering your second question, we are escalating this case to Subject Matter Experts.

Thanks

endomorphosis · ‎01-25-2021

I can understand that there may not be capacity for the infrastructure to handle public hosting, but I wish that the company took more efforts to generate revenue, and cloud computing is a major source of revenue.

When I was working there I was under the impression that the FPGA integration that was being first developed exposed the FPGA as a openCL device, but one of the FPGA developers was talking about integrating it into the compiler / interpreter such that it could accelerate certain workloads, however this seems only to be the case with the language "data parallel C++" that code has to be written in to utilize the FPGA right now.

I was unable to find any bfloat 16 benchmarks for the Xeon or the Xe product lines, however I did see that there the Intel Stratix 10 can do 10 TFLOPS in fp32, which is comparable to the nvidia quadro rtx 8000 performs 13 Tflops in fp32, but I have no idea if that is the FPGA being used. Moreover, it seems that alot of code would have to be re-written or recompiled to utilize the FPGA, because it does not appear that the method of exposing the FPGA as an OPENCL device is what is going on.

I am not certain that everything that i am saying is accurate, and I did not get my questions answered, but I hope this is informative to both intel and developers. My hunch is that the xeon avx-512 based bfloat 16 will be an order of magnitude slower training than a FPGA or high end Nvidia GPU, but it has access to the 192 GB system ram. In contrast if the Xe has a perfomance of 2.1 FP32 teraflops, but it only has 4 GB of lpddr4x memory and a 128bit memory bus.

In short this may be a suitable for amateurs who want to pretrain large models, who are willing to let their training run for a month, and then want to fine tune their model on a regular GPU, however if intel integrates the FPGA's better the performance will be similiar to a high end GPU.

AthiraM_Intel · ‎01-25-2021

Hi,

Thanks for the valuable feedback.

Since your second query is related to the performance of AI frameworks, please post a new thread in AI frameworks forum:

https://community.intel.com/t5/Intel-Optimized-AI-Frameworks/bd-p/optimized-ai-frameworks

There you will get support from Subject Matter Experts for performance related issues.

Also if you have any queries related to FPGA, please post the issue in FPGA forum:

https://community.intel.com/t5/Intel-High-Level-Design/bd-p/high-level-design

It would be great if you could use the above mentioned dedicated forums to discuss your issues.

Thanks.

endomorphosis · ‎01-25-2021

Please read my previous post, and have it forwarded to a subject matter expert to have it reviewed for accuracy.

This post should be here, as it has to do with neither the details of programming frameworks, nor programming FPGA's, but how the OneAPI dev cloud chooses to present the hardware to the user, and what the benchmarks are for that system environment.

Please note that the one feature that would set Intel apart would be to offer cloud computing with FPGA acceleration that is highly transparent to the software developers. in the same sort of way that IBM claims that it is going to do with quantum computer accelerated cloud computing.

Intel , Apple, facebook, IBM are all making custom silicon, and moreover they are software companies, its time for Intel to start selling software services or get left behind because it doesn't have access to the end customers, and the people who do are making their own chips instead.

Kent_M_Intel · ‎02-05-2021

Thank you for the suggestion that Intel consider offering a cloud computing service. The cloud computing companies are important Intel customers that we partner with to provide the latest Intel hardware and we recommend that you use them for your needs.

Kent_M_Intel · ‎02-05-2021

Hi, performance comparisons between architectures are highly dependent on the specific deep learning model workload so unfortunately we are unable to provide specifics. In general, we find most users want to take advantage of the DL Boost instructions in Intel(r) Xeon CPUs for their inference jobs since there's less concerns about memory size limits and data transfer times.

If you're asking if you can host an application for use by others on the Intel DevCloud, no, that's not it's purpose. The DevCloud is a free developer sandbox for users to try out the developer tools across a range of hardware. To host an application, we recommend you work with one of the cloud service providers such as AWS, Microsoft Azure, or Google Cloud Platform.

Thank you

JyotsnaK_Intel · ‎03-09-2021

Hi,

We have not heard back for you. Please let me know if you need further assistance on this thread.

Thanks!

JyotsnaK_Intel · ‎03-15-2021

Hi,

We did not hear from you in a while. If you need any additional information, please submit a new question as this thread will no longer be monitored. Thanks for reaching out to Intel Community.