Re:Re: Re:FPGA Builds Broken on DevCloud

filipeborralho · ‎03-22-2023

I'll have to write down here because for some reason I'm getting some error when attaching my cpp file. The Makefile is attached.

#include <vector>
#include <CL/sycl.hpp>
#include <sycl/ext/intel/fpga_extensions.hpp>
#include "dpc_common.hpp"

typedef std::vector<float, cl::sycl::usm_allocator<float, cl::sycl::usm::alloc::shared> > floatVec;

void compute (cl::sycl::queue &q, floatVec &A);

int main (int argc, char **argv) {
    try {
        cl::sycl::ext::intel::fpga_selector s;
        cl::sycl::device d(s);
        cl::sycl::queue q(d, cl::sycl::property::queue::enable_profiling{});
        cl::sycl::usm_allocator<float, cl::sycl::usm::alloc::shared> myFloatAlloc(q);

        floatVec A(3, 0.5f, myFloatAlloc);
        
        std::cout << "Before compute: " << A.at(2) << std::endl;

        compute(q, A);
        
        std::cout << "After compute: " << A.at(2) << std::endl;

    } catch (cl::sycl::exception const &e) {
        std::cout << "Exception caught.\n" << e.what() << std::endl;
    }
    return 0;
}

void compute (cl::sycl::queue &q, floatVec &A) {
    auto acc { A.data() };
    
    cl::sycl::event e {
        q.submit([&](cl::sycl::handler &h) {
            h.single_task([=]() [[intel::kernel_args_restrict, intel::max_global_work_dim(0)]] {
                acc[2] += acc[1] * acc[0];
            });
        })
    };
    q.wait();

    const auto start { e.template get_profiling_info<cl::sycl::info::event_profiling::command_start>() };
    const auto end { e.template get_profiling_info<cl::sycl::info::event_profiling::command_end>() };
    std::cout << "Execution time: " << (end - start) / 1e6 << std::endl;
}

Best regards,

Filipe

AlekhyaV_Intel · ‎03-27-2023

Hi @filipeborralho ,

Thank you for posting in Intel Communities. We have moved your response as a new case to assist you better. We will work on this issue and will get back to you soon. Meanwhile, could you please let us know the nodes on which you've tried compiling and running your sample?

Regards,

Alekhya

filipeborralho · ‎03-29-2023

Hi @AlekhyaV_Intel ,

I believe I am having the same issue as in thread https://community.intel.com/t5/Intel-DevCloud/Re-FPGA-Builds-Broken-on-DevCloud/m-p/1470605#M7819 .

I was using fpga_compile nodes for hardware generation as this was the previous procedure.

fpga_compile nodes for hardware generation

fpga_runtime nodes for running the executables

I will try the solution given in the aforementioned thread in the meanwhile.

Best regards,

Filipe

AlekhyaV_Intel · ‎03-30-2023

Hi @filipeborralho ,

Yes you could follow the steps that I mentioned in the other thread. fpga_compile nodes are used for FPGA report generation, emulation, runtime. You must use Arria10 and Stratix10 nodes to build fpga sample. Please let us know if this resolves your issue so that we can discontinue monitoring this thread.

Regards,

Alekhya

filipeborralho · ‎03-30-2023

Update:

I kept trying other kernels and there still seems to be some kind of problem.

I attached the terminal output and the Makefile.

This is my code:

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <CL/sycl.hpp>
#include <sycl/ext/intel/fpga_extensions.hpp>
#include "dpc_common.hpp"

typedef std::vector<int> intVec;
typedef std::vector<float> floatVec;

constexpr auto name { "nell-2.tns" };
constexpr auto nnz { 76879419 };
constexpr auto dim0 { 12092 };
constexpr auto dim1 { 9184 };
constexpr auto dim2 { 28818 };
constexpr auto colCnt { 8 };

void read3Dtensor (intVec &slcIdx, intVec &slcPtr, intVec &fbrIdx, intVec &fbrPtr, intVec &kIdx, floatVec &values);
void computeTTM (cl::sycl::queue &q, const intVec fbrPtr, const intVec kIdx, const floatVec values, const floatVec matrix, floatVec &output);

int main (int argc, char **argv) {
    try {
        //cl::sycl::ext::intel::fpga_emulator_selector selector;
        cl::sycl::ext::intel::fpga_selector selector;
        cl::sycl::device d(selector);
        cl::sycl::queue q(d, cl::sycl::property::queue::enable_profiling{});
        
        intVec slcIdx, slcPtr, fbrIdx, fbrPtr, kIdx(nnz);
        floatVec values(nnz), matrix(dim2 * colCnt, 0.23f);

        read3Dtensor(slcIdx, slcPtr, fbrIdx, fbrPtr, kIdx, values);
        
        floatVec output(fbrIdx.size() * colCnt);
        
        computeTTM(q, fbrPtr, kIdx, values, matrix, output);
        
    } catch (cl::sycl::exception const &e) {
        std::cout << e.what() << std::endl;
        return 0;
    }
    return 0;
}

void read3Dtensor (intVec &slcIdx, intVec &slcPtr, intVec &fbrIdx, intVec &fbrPtr, intVec &kIdx, floatVec &values) {
    std::ifstream tensorFile(name);
    int i, j;
    for (auto k { 0 }; k < nnz; ++k) {
        tensorFile >> i >> j;
        if (slcIdx.empty() || slcIdx.back() != i) {
            slcIdx.push_back(i);
            slcPtr.push_back(fbrIdx.size());
            fbrIdx.push_back(j);
            fbrPtr.push_back(k);
        } else if (fbrIdx.back() != j) {
            fbrIdx.push_back(j);
            fbrPtr.push_back(k);
        }
        tensorFile >> kIdx[k] >> values[k];
    }
    slcPtr.push_back(fbrIdx.size());
    fbrPtr.push_back(nnz);
    tensorFile.close();
}

void computeTTM (cl::sycl::queue &q, const intVec fbrPtr, const intVec kIdx, const floatVec values, const floatVec matrix, floatVec &output) {
    const auto fbrCnt { fbrPtr.size() - 1 };
    
    cl::sycl::buffer fbrPtrBuffer(fbrPtr);
    cl::sycl::buffer kIdxBuffer(kIdx);
    cl::sycl::buffer valuesBuffer(values);
    cl::sycl::buffer matrixBuffer(matrix);
    cl::sycl::buffer outputBuffer(output);
    
    cl::sycl::event e {
        q.submit([&](cl::sycl::handler &h) {
            cl::sycl::accessor accFbrPtr(fbrPtrBuffer, h, cl::sycl::read_only);
            cl::sycl::accessor accKIdx(kIdxBuffer, h, cl::sycl::read_only);
            cl::sycl::accessor accValues(valuesBuffer, h, cl::sycl::read_only);
            cl::sycl::accessor accMatrix(matrixBuffer, h, cl::sycl::read_only);
            cl::sycl::accessor accOutput(outputBuffer, h, cl::sycl::write_only, cl::sycl::no_init);
            
            h.single_task([=]() [[intel::kernel_args_restrict, intel::max_global_work_dim(0)]] {
                for (auto fbr { 0 }; fbr < fbrCnt; ++fbr) {
                    float tmp[colCnt];
                    
                    #pragma unroll
                    for (auto col { 0 }; col < colCnt; ++col) {
                        tmp[col] = 0.0f;
                    }

                    for (auto ele { accFbrPtr[fbr] }; ele < accFbrPtr[fbr+1]; ++ele) {
                        const auto k { (accKIdx[ele] - 1) * colCnt };
                        const auto val { accValues[ele] };

                        #pragma unroll
                        for (auto col { 0 }; col < colCnt; ++col) {
                            tmp[col] += val * accMatrix[k + col];
                        }
                    }

                    #pragma unroll
                    for (auto col { 0 }; col < colCnt; ++col) {
                        accOutput[fbr * colCnt + col] = tmp[col];
                    }
                }
            });
        })
    };
    q.wait();

    const auto start { e.template get_profiling_info<cl::sycl::info::event_profiling::command_start>() };
    const auto end { e.template get_profiling_info<cl::sycl::info::event_profiling::command_end>() };
    std::cout << "Execution time: " << (end - start) / 1e6 << std::endl;
}

Regards,

Filipe

BoonBengT_Intel · ‎04-07-2023

Hi @filipeborralho,

Noted on the details provided, the error seems to be coming from the RTL compilation complaining about some port.

Mind if I asked from which references design that you have gotten this code from?

And did the emulation and simulation compiled successfully?

Hope to hear from you soon.

Best Wishes

BB

filipeborralho · ‎04-10-2023

Hi @BoonBengT_Intel ,

This code is not from a reference design, it is my own. I was working on it before all this incident with the FPGAs and it was working fine.

I did try emulation and it worked but not simulation.

Best regards,

Filipe

BoonBengT_Intel · ‎04-16-2023

Hi @filipeborralho,

Apologies for the delayed, could you explain more on what are the incident mention with the FPGA?

And which nodes did you tried in the Devcloud for the build?

Would try our best to support this by emulation the design you have.

Hope to hear from you soon.

Best Wishes

BB

filipeborralho · ‎04-17-2023

Hi @BoonBengT_Intel ,

For some reason it is now working, I don't really understand why it wasn't and now is but anyway it is working.

Thank you for your time.

Best regards,

Filipe

BoonBengT_Intel · ‎04-18-2023

Hi @filipeborralho,

Great! Good to know that it is working now, apologies for the inconvenient caused, it might be some platform glitch, with no further clarification on this thread, it will be transitioned to community support for further help on doubts in this thread. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support.

Thank you for the questions and as always pleasure having you here.

Best Wishes

BB