Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
1624 Discussions

[Moved from old forum] GLIBC issues on the devcloud for FPGA

Vile_Lasagna
Novice
3,251 Views

TL;DR: fpga_compile nodes on the devCloud have a newer glibc installed than the nodes with actual FPGAs so code build on the former cannot run on the latter



I've been trying for a few days to test some code on the FPGAs in the dev cloud but I keep running into blocking issues.

It took me a long time to even get things building (thanks to Susannah from the Discord for helping me troubleshoot environment setup issues on the devcloud) but now that I've finally got my code building, it turns out I can't run anything.

After looking at logs yesterday and double checking things today it seems that... yeah... Unless I'm missing something there's a fundamental issue with the cluster setup.

All of the nodes in the cluster, from what I can tell run Ubuntu. They're all at Ubuntu 20 EXCEPT the nodes with the stratix10 or the arria10 FPGAs. These are Ubuntu 18, but NOT the fpga_compile nodes. 

This means that code built on the compile nodes cannot run on the FPGA nodes due to GLIBC version mismatch which I'd THINK someone would've run into immediately.

I'm not sure if this is the ideal feedback channel for this. I'd still like to report the issue that prevented me from compiling in the first place (the wrong version of python 2 gets used by default and the scripts in quartus start looking into nonexistent locations for stuff).

For context:
The code I'm building can be found in Gitlab . One can find the code in question under the `raymarched/SYCL/` folder if you want to verify. It's C++ with SYCL that I know itself works as I've validated it both with hipSYCL as well with oneAPI on the GPUs in the devCloud so not worried there. Everything is built with CMake so I'm not calling the compiler on my own or writing makefiles by hand.

And here's just some validation of what i'm going on about. After building, I get this when trying to run:


/home/u121770/repos/toyBrot/redist/bin/rmSYCL-fpga-arria10: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /home/u121770/repos/toyBrot/redist/bin/rmSYCL-fpga-arria10)
/home/u121770/repos/toyBrot/redist/bin/rmSYCL-fpga-stratix10: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /home/u121770/repos/toyBrot/redist/bin/rmSYCL-fpga-stratix10)


And double checking with cat /etc/os-release and nm on glib confirms that, yes, this error completely makes sense

# Resources: neednodes=1:fpga_compile:ppn=2,nodes=1:fpga_compile:ppn=2,walltime=06:00:00
########################################################################

NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
<....>
0000000000000000 A GLIBCXX_3.4.24@@GLIBCXX_3.4.24
0000000000000000 A GLIBCXX_3.4.25@@GLIBCXX_3.4.25
0000000000000000 A GLIBCXX_3.4.26@@GLIBCXX_3.4.26
0000000000000000 A GLIBCXX_3.4.27@@GLIBCXX_3.4.27
0000000000000000 A GLIBCXX_3.4.28@@GLIBCXX_3.4.28
0000000000000000 A GLIBCXX_3.4.3@@GLIBCXX_3.4.3



# Resources: neednodes=1:stratix10:ppn=2,nodes=1:stratix10:ppn=2,walltime=06:00:00
########################################################################

NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
<....>
0000000000000000 A GLIBCXX_3.4.24@@GLIBCXX_3.4.24
0000000000000000 A GLIBCXX_3.4.25@@GLIBCXX_3.4.25
0000000000000000 A GLIBCXX_3.4.3@@GLIBCXX_3.4.3




EDIT:

It's been a few days and I came back to this issue hoping I'd be able to try and build the code correctly using a container in my own dev machine but that'd apparently cost me 4000USD Dollars because Quartus so... as far as I can tell, I'm SoL here and the dream is very much dead when it comes to validating the workflow and seeing how the FPGAs perform, etc....

0 Kudos
22 Replies
ChrisB_Intel
Moderator
2,963 Views

My apologies for the delay, I am getting the FPGA folks involved in this thread as well.


Thanks

Chris


0 Kudos
Vile_Lasagna
Novice
2,957 Views

Thank you. I just wasn't sure I was even finding the right channels to report this, especially when the old forums looked alive at first but then got archived =P

0 Kudos
Vile_Lasagna
Novice
2,902 Views

Any updates on this at all?
I'm still hoping I get to run code on the FPGAs at some point

0 Kudos
BoonBengT_Intel
Moderator
2,877 Views

Hi @Vile_Lasagna,

 

Thank you for posting in Intel community forum, hope all is well and apologies for the delayed in response.
We are going through some coordination activity with the devcloud platform to understand and evaluate the problem and at the same time defining the next steps.


This will take a while and will keep you posted.
Thank you for the patients.

Best Wishes
BB

0 Kudos
Vile_Lasagna
Novice
2,831 Views

Hello there

Seems there's some confusion on your side with the forum transition. You just posted a reply on the old forum, which is currently locked, noting how there's been no reply on the original thread for quite a bit (and I cannot reply there anyway since those are read-only). 

I am, however, given this situation considering that thread to be archived and closed which is why I moved the post to this forum (which is the one recommended as active) =P

0 Kudos
Vile_Lasagna
Novice
2,869 Views

Thanks for the response.

I'm part of the team who'd have to deal with this kind of thing in the clusters at work so I appreciate how this kind of issue can end up being a lot more work than one might expect

I was more wanting to know if I should keep my hopes up or if it had fallen through the cracks =P

(Minor aside: You may have suggested I check out this issue in the very original issue I moved this from in the archived/now-read-only forums. lol)

0 Kudos
nielskm
Novice
2,750 Views

I'm having the exact same issue. It would be great to get it fixed such that we can actually execute the compiled FPGA executables. Does anyone know what the status is?

nielskm
Novice
2,673 Views

A quick update. I compiled my program with static linking of the standard libraries and that is able to run on the FPGA nodes. However, the program is not able to find an FPGA device on the node (I tried submitting with both `fpga` and `fpga_runtime` options). The output from sycl-ls is:
[opencl:0] ACC : Intel(R) FPGA Emulation Platform for OpenCL(TM) 1.2 [2021.13.11.0.23_160000]
[opencl:0] CPU : Intel(R) OpenCL 3.0 [2021.13.11.0.23_160000]
[host:0] HOST: SYCL host platform 1.2 [1.2]

So it seems there is no FPGA on the FPGA nodes?

0 Kudos
nielskm
Novice
2,652 Views

Okay, I finally believe that I found a workaround (albeit an ugly one).

- First of all, I have to specify the FPGA board when submitting the job (e.g. 'qsub -l nodes=1:fpga:arria10:ppn=2 -d . <job-script>').

- Then, I submitted a batch job to the fpga_compile nodes that copied the /usr/lib/x86_64-linux-gnu/libstdc++.so.6 file to my home directory.

- Finally, I added 'export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:~' to my run script for the FPGA program.

It seems to work for now.

 

(NB: The static linking did not solve my problem, as I use MKL which must have access to the dynamic standard libraries)

0 Kudos
Vile_Lasagna
Novice
2,635 Views

Ohlol. That is some legendary jank XD

But as long as things remain like this there's not much else that one can do, mostly variations on your solution. I might give something like that a go, just to see if I can at least get SOMETHING. Main reason I didn't try for it is that usually when you go that down there's other base system libraries that end up being part of that little bundle of functionality so to speak. But given it's been a month and a half with seemingly no real movement on what should be a pretty basic issue, I'm starting to lose faith there is actually a team looking after the cluster, or, at the very least, that Intel cares enough to have the FPGAs on the dev cloud actually working.

Given the whole fiddliness involved in hours-long compilation in special build nodes people not familiar with HPC-type environments might have trouble adapting to, it almost feels like maybe they think providing enough of a toolset so that people can just profile and run on emulated hardware is enough? Which, given how this issue is very much "basic functionality is not there", maybe they are correct, and maybe people in general aren't even trying? We can't be the first people to run into "no software you compile will run", right? That doesn't sound correct?


Anyway, this is full on tired rant by now. Let's keep hoping =P

0 Kudos
BoonBengT_Intel
Moderator
2,478 Views

Hi @Vile_Lasagna,


Thank you for your patients, updates we have tried to compile some sycl example from your end and are able to completed.

Hence are wondering which library/component that you being used from your end which raise the library error, we did some code comparing your SYCL repo but with no luck, so would like to get your advise on that.


Also if possible we send an email to you with the sample project and if you would be able to point or let us know which library are used from you end which cause the error that would be great.

Hope to hear from you soon.


Best Wishes

BB


0 Kudos
VileLasagna
Beginner
2,474 Views

So... if you read through my message, the problem is not that I cannot compile the code, I did that without too much hassle (got help on the devMesh discord to overcome an unrelated issue).

The problem is that the code that is compiled on the `fpga_compile` nodes cannot run on the nodes with the FPGAs themselves because those have an older version of the OS installed that has an older glibc stack, causing dynamic linking errors (missing symbols for newer versions).

The code doesn't really rely on any external libraries. Other than the SYCL backend (here oneAPI) and the C++ standard library implementation, I'm not using anything at the moment (it has as optional dependencies SDL2 and libPNG but I was compiling without them).

Every time you build any C++, your code gets linked against an implementation of the C++ standard library. This will be, most of the time, libstdc++, distributed with the glibc stack. This is what I'm referring to here, which is why I made sure to double check with nm prior to posting in the first place. 

What this would imply, and it's why I find this situation a bit odd, is that no code ever that gets compiled on the fpga_compile nodes will run on the FPGA nodes, unless those get updated. Because it will get linked against a newer version which will cause runtime errors. A quick and dirty solution would be to compile this code on the FPGA nodes themselves but my understanding is that those do not have the complete software stack for compiling software, particularly because, from a cluster administration point of view, it would be very undesirable to have your FPGAs themselves being locked for hours as people are just compiling stuff.

Edit:

New account, same guy, needed to migrate to a new email address, essentially

0 Kudos
VileLasagna
Beginner
2,356 Views

Hello, friends! It's me again.

 

It's been now over 4 months since I moved this post from the old forums which got archived.  The thread got acknowledged and I was told the problem was forwarded to the relevant team. In the mean time, largely due attention from me trying to get this going, I've joined the Intel OneAPI Innovator program and we'd like for me to to a short presentation on my experiences and findings about oneAPI for different devices, in which this would be key information. But as far as I can tell, the situation is still the same. 

Is there something I'm just missing here? if I probe the machines in the devcloud, say. by taking something like:

#!/bin/bash


echo "" >> $HOSTNAME.glibccheck
echo "************************************************************" >> $HOSTNAME.glibccheck
echo "" >> $HOSTNAME.glibccheck
echo "Checking OS version" >> $HOSTNAME.glibccheck
echo "" >> $HOSTNAME.glibccheck
echo "************************************************************" >> $HOSTNAME.glibccheck
cat /etc/os-release >> $HOSTNAME.glibccheck
echo "************************************************************" >> $HOSTNAME.glibccheck
echo "" >> $HOSTNAME.glibccheck
echo "Checking glibc symbols" >> $HOSTNAME.glibccheck
echo "" >> $HOSTNAME.glibccheck
echo "************************************************************" >> $HOSTNAME.glibccheck

echo "nm -D -a /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX_3" >> $HOSTNAME.glibccheck
echo "" >> $HOSTNAME.glibccheck
nm -D -a /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX_3 >> $HOSTNAME.glibccheck

 

and then running that in the relevant nodes:
qsub -l nodes=1:fpga_compile:ppn=2 -d . checkglibc.sh
qsub -l nodes=1:arria10:ppn=2 -d . checkglibc.sh
qsub -l nodes=1:stratix10:ppn=2 -d . checkglibc.sh

the output I get still shows the same things:

pbsnodes s001-n053 | grep properties
properties = xeon,skl,gold6128,ram192gb,net1gbe,jupyter,batch,fpga_compile

************************************************************

Checking OS version

************************************************************
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
************************************************************

Checking glibc symbols

************************************************************
nm -D -a /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX_3

0000000000000000 A GLIBCXX_3.4
0000000000000000 A GLIBCXX_3.4.1
0000000000000000 A GLIBCXX_3.4.10
0000000000000000 A GLIBCXX_3.4.11
0000000000000000 A GLIBCXX_3.4.12
0000000000000000 A GLIBCXX_3.4.13
0000000000000000 A GLIBCXX_3.4.14
0000000000000000 A GLIBCXX_3.4.15
0000000000000000 A GLIBCXX_3.4.16
0000000000000000 A GLIBCXX_3.4.17
0000000000000000 A GLIBCXX_3.4.18
0000000000000000 A GLIBCXX_3.4.19
0000000000000000 A GLIBCXX_3.4.2
0000000000000000 A GLIBCXX_3.4.20
0000000000000000 A GLIBCXX_3.4.21
0000000000000000 A GLIBCXX_3.4.22
0000000000000000 A GLIBCXX_3.4.23
0000000000000000 A GLIBCXX_3.4.24
0000000000000000 A GLIBCXX_3.4.25
0000000000000000 A GLIBCXX_3.4.26
0000000000000000 A GLIBCXX_3.4.27
0000000000000000 A GLIBCXX_3.4.28
0000000000000000 A GLIBCXX_3.4.3
0000000000000000 A GLIBCXX_3.4.4
0000000000000000 A GLIBCXX_3.4.5
0000000000000000 A GLIBCXX_3.4.6
0000000000000000 A GLIBCXX_3.4.7
0000000000000000 A GLIBCXX_3.4.8
0000000000000000 A GLIBCXX_3.4.9

 

pbsnodes s001-n081 | grep properties
properties = xeon,skl,gold6128,ram192gb,net1gbe,fpga_runtime,fpga,arria10

************************************************************

Checking OS version

************************************************************
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.6 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
************************************************************

Checking glibc symbols

************************************************************
nm -D -a /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX_3

0000000000000000 A GLIBCXX_3.4
0000000000000000 A GLIBCXX_3.4.1
0000000000000000 A GLIBCXX_3.4.10
0000000000000000 A GLIBCXX_3.4.11
0000000000000000 A GLIBCXX_3.4.12
0000000000000000 A GLIBCXX_3.4.13
0000000000000000 A GLIBCXX_3.4.14
0000000000000000 A GLIBCXX_3.4.15
0000000000000000 A GLIBCXX_3.4.16
0000000000000000 A GLIBCXX_3.4.17
0000000000000000 A GLIBCXX_3.4.18
0000000000000000 A GLIBCXX_3.4.19
0000000000000000 A GLIBCXX_3.4.2
0000000000000000 A GLIBCXX_3.4.20
0000000000000000 A GLIBCXX_3.4.21
0000000000000000 A GLIBCXX_3.4.22
0000000000000000 A GLIBCXX_3.4.23
0000000000000000 A GLIBCXX_3.4.24
0000000000000000 A GLIBCXX_3.4.25
0000000000000000 A GLIBCXX_3.4.3
0000000000000000 A GLIBCXX_3.4.4
0000000000000000 A GLIBCXX_3.4.5
0000000000000000 A GLIBCXX_3.4.6
0000000000000000 A GLIBCXX_3.4.7
0000000000000000 A GLIBCXX_3.4.8
0000000000000000 A GLIBCXX_3.4.9

 

pbsnodes s001-n142 | grep properties
properties = xeon,clx,ram192gb,net1gbe,batch,extended,fpga,stratix10,fpga_runtime

************************************************************

Checking OS version

************************************************************
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.6 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
************************************************************

Checking glibc symbols

************************************************************
nm -D -a /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX_3

0000000000000000 A GLIBCXX_3.4
0000000000000000 A GLIBCXX_3.4.1
0000000000000000 A GLIBCXX_3.4.10
0000000000000000 A GLIBCXX_3.4.11
0000000000000000 A GLIBCXX_3.4.12
0000000000000000 A GLIBCXX_3.4.13
0000000000000000 A GLIBCXX_3.4.14
0000000000000000 A GLIBCXX_3.4.15
0000000000000000 A GLIBCXX_3.4.16
0000000000000000 A GLIBCXX_3.4.17
0000000000000000 A GLIBCXX_3.4.18
0000000000000000 A GLIBCXX_3.4.19
0000000000000000 A GLIBCXX_3.4.2
0000000000000000 A GLIBCXX_3.4.20
0000000000000000 A GLIBCXX_3.4.21
0000000000000000 A GLIBCXX_3.4.22
0000000000000000 A GLIBCXX_3.4.23
0000000000000000 A GLIBCXX_3.4.24
0000000000000000 A GLIBCXX_3.4.25
0000000000000000 A GLIBCXX_3.4.3
0000000000000000 A GLIBCXX_3.4.4
0000000000000000 A GLIBCXX_3.4.5
0000000000000000 A GLIBCXX_3.4.6
0000000000000000 A GLIBCXX_3.4.7
0000000000000000 A GLIBCXX_3.4.8
0000000000000000 A GLIBCXX_3.4.9

 

 

I'd REALLY like to wrap this all nice with a bow on top so what do I do? If this is just not going to get fixed, is there any way I can do this instead?

0 Kudos
VileLasagna
Beginner
2,340 Views

Double post just to note I've found a workaround (and am kind of surprised I hadn't tried that before). I'd edit the post title but since I've had to switch accounts around can't do it

All of the compiling toolchain seems to be accessible from the FPGA nodes themselves so, instead of requesting an `fpga_compile` node to build in, one can, say, request a `arria10` node. This then sorts out the glibc issue.

So to me this problem is "fixed", I can move on. But it still means the `fpga_compile` nodes are still not useful for what their name would imply is their function.

0 Kudos
Dan_P_Intel
Employee
2,284 Views

@Vile_Lasagna 
It is true you can compile on the fpga nodes but we highly discourage it for the sake of all of our users.

Compilation for FPGA typically takes several orders of magnitude more time than execution. At the same time there are only a handful FPGAs available. The implication is that fewer people can access the FPGAs if all compilation is directed to them.

For these reasons we introduced the fpga_compile nodes. The fpga_compile nodes are running Ubuntu 20.04 because they're also used for other purposes, such as for hosting Jupyter sessions. At the same time, the FPGA addon doesn't officially support Ubuntu 20.04 yet. That's why the compilation difficulties. The solution for compiling on Ubuntu 20.04 is to make sure Python 2.x is in your path before initiating compilation.

Regarding the execution of designed compiled on a Ubuntu 20.04. I'm not expert on FPGA but I would expect the designs to execute fine, as long as you match the design with the right device, just as @nielskm correctly described above. The OS or package differences between fpga_compile and fpga_runtime should not matter, I believe.

0 Kudos
VileLasagna
Beginner
2,274 Views

The problem here is not even on the FPGA targeting itself, it's a bit lower.

Whenever you're compiling a C/C++ program, it needs to be linked against some implementation of the standard library, in this case it's in libstdc++.so, that is part of the glibc(xx) packages. Now, these shared objects themselves are forward compatible, this is why newer versions declare the symbols for previous ones. But because these symbols are versioned, it means that if you link against a newer version of glibc, it'll then target symbols which are tagged with the newer version. When trying to load the older version of the library, ld will not find these symbols and, thus, be unable to dynamically link it at runtime. 

This is what is happening here. Because the compile nodes have a newer version compared to the runtime nodes, once we try to run, it can't link to the version of the C++ standard library it finds. It looks, for example, for the exported versioning symbols GLIBCXX_3.4.[26-28] and they're not there, so it fails to load the library.

I've also had Python related issues but those were much easier to work around. The one that got me stuck was this one and, still now I don't know if there is a workaround to it that one could do in the devcloud, other than what I've done.

0 Kudos
Dan_P_Intel
Employee
2,264 Views

@Vile_Lasagna

Let's say that the same GLIBCXX found on fpga_compile nodes was made available on the fpga_runtime nodes also.

Would that solve your problem?

0 Kudos
VileLasagna
Beginner
2,257 Views

It might but it being such a core part of the system, glibc is a tricky thing to hack. I'd need to double check the full soname of the library trying to be linked, check it doesn't conflict with the one installed on the target system and that the symlink for the "unversioned" .so don't get mangled. 

I also think libstdc++.so might depend on other components of glibc, which is one of the things that make it tricky to not just package a copy of it with your binary and call it a day. So... yeah.... while the answer here to this specific workaround is "possibly, yes", I'd advise a lot of caution here. Real easy to inadvertently overwrite something you're not supposed to and outright kill the system as nothing is able to run

0 Kudos
BoonBengT_Intel
Moderator
2,130 Views

Hi @Vile_Lasagna,


I'm in the middle of aligning the previously mention method internally to figure out the feasibilities.

And would get back to you once any updates are provided to me.

Thank you for the patients.


Best Wishes

BB


0 Kudos
BoonBengT_Intel
Moderator
2,095 Views

Hi @Vile_Lasagna,


Thank you for your patients, after some extensive discussion with the platform team, unfortunately this seems to be a constraints on platform for GLIBC compatibility behavior.

The platform are implemented with the presumption that user will employ dpc++ to program the FPGA in context of heterogeneous programming which should work fine.


Matter has been raised and will be part of the future evaluation when there is a unify of fpga_compile and fpga_runtime under the same OS. Unfortunately there are no clear timeline for that.

As there is no further actions can be taken, hence would be closing this thread, pleasure having you here.

Hope that clarify.


Best Wishes

BB


Reply