- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can I runa thread in 16 bit code page and another in a 32 bit code page in the same process?
If so, will the intel microcode behind the scenes and the hyper threading technology run my 16 bit code faster than the same code in 32 bits or slower?
Willthe 16 bit code suffer from alignment lag when accessing the 3/4 bytes in the upper half of what would be 32 bit alignment?
Finally
Reason: We are prototyping a process for decreasing the run time complexity or energy usage of an applicationby using unused bit widths for parallel processing. i.e If you only use the lower 16 bits, use the darn upper 16 bits to paralell process other datum. Testing on Intel finalize on FPGA
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can I runa thread in 16 bit code page and another in a 32 bit code page in the same process?
If so, will the intel microcode behind the scenes and the hyper threading technology run my 16 bit code faster than the same code in 32 bits or slower?
Willthe 16 bit code suffer from alignment lag when accessing the 3/4 bytes in the upper half of what would be 32 bit alignment?
Finally
Reason: We are prototyping a process for decreasing the run time complexity or energy usage of an applicationby using unused bit widths for parallel processing. i.e If you only use the lower 16 bits, use the darn upper 16 bits to paralell process other datum. Testing on Intel finalize on FPGA
>>Testing on Intel finalize on FPGA
Since your intentions is to run on FPGA I would assume the "CPU" architecture is not that of an Intel architecture. You will likely be using your own design or that of one of the usual architectures for FPGA such as ARM. This being the case, any benchmarking you perform using wall clock on Intel platform will not be suitable to ascertain the performance on (in) the FPGA. Instead, your best bet would be to write an emulator of your eventual instruction set, including registers, cache, memory and I/O. Then account for the ticks through each path.
If you roll your own "CPU" it can be any bit width and/or in a large FPGA with simple processor core you can cram 32 or more into one FPGA. Also, in FPGA, the processor cores need not be all the same. You can have different width, functionality (FPU/Integer/other), instruction set and even a blend of digital and analog computations. So benchmarking threading across 16/32 bit code pages is not productive.
Jim Dempsey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can I runa thread in 16 bit code page and another in a 32 bit code page in the same process?
If so, will the intel microcode behind the scenes and the hyper threading technology run my 16 bit code faster than the same code in 32 bits or slower?
Willthe 16 bit code suffer from alignment lag when accessing the 3/4 bytes in the upper half of what would be 32 bit alignment?
Finally
Reason: We are prototyping a process for decreasing the run time complexity or energy usage of an applicationby using unused bit widths for parallel processing. i.e If you only use the lower 16 bits, use the darn upper 16 bits to paralell process other datum. Testing on Intel finalize on FPGA
Hi dgunter,
I'm not an expert on this topic. Are you talking about mixin 16 bits code with 32 bits code in the same application / in the same process? I don't think that's possible. A process runs on 16 or in 32 bits. BTW 16 bits isn't be available in 64 bits operating systems, you have to virtualize.
16 bits applications running on modern 32 bits operating systems run really slow. I don't think that it makes sense to test parallelism in 16 bits... It's weird.
As I always say, just my opinion.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can I runa thread in 16 bit code page and another in a 32 bit code page in the same process?
If so, will the intel microcode behind the scenes and the hyper threading technology run my 16 bit code faster than the same code in 32 bits or slower?
Willthe 16 bit code suffer from alignment lag when accessing the 3/4 bytes in the upper half of what would be 32 bit alignment?
Finally
Reason: We are prototyping a process for decreasing the run time complexity or energy usage of an applicationby using unused bit widths for parallel processing. i.e If you only use the lower 16 bits, use the darn upper 16 bits to paralell process other datum. Testing on Intel finalize on FPGA
>>Testing on Intel finalize on FPGA
Since your intentions is to run on FPGA I would assume the "CPU" architecture is not that of an Intel architecture. You will likely be using your own design or that of one of the usual architectures for FPGA such as ARM. This being the case, any benchmarking you perform using wall clock on Intel platform will not be suitable to ascertain the performance on (in) the FPGA. Instead, your best bet would be to write an emulator of your eventual instruction set, including registers, cache, memory and I/O. Then account for the ticks through each path.
If you roll your own "CPU" it can be any bit width and/or in a large FPGA with simple processor core you can cram 32 or more into one FPGA. Also, in FPGA, the processor cores need not be all the same. You can have different width, functionality (FPU/Integer/other), instruction set and even a blend of digital and analog computations. So benchmarking threading across 16/32 bit code pages is not productive.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes Sir, Very Weird. I meant "simulating" on the intel... I was attempting to see if I could increase the simulation speed. Or I would if the intel would run mixed 16.32 bit code. I just wantedto see how fast 16 bit code is in relation to 32 bit. Apparently, I can't mix different width code pages :-( on Intel.
Thank you for your time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>Testing on Intel finalize on FPGA
Since your intentions is to run on FPGA I would assume the "CPU" architecture is not that of an Intel architecture. You will likely be using your own design or that of one of the usual architectures for FPGA such as ARM. This being the case, any benchmarking you perform using wall clock on Intel platform will not be suitable to ascertain the performance on (in) the FPGA. Instead, your best bet would be to write an emulator of your eventual instruction set, including registers, cache, memory and I/O. Then account for the ticks through each path.
If you roll your own "CPU" it can be any bit width and/or in a large FPGA with simple processor core you can cram 32 or more into one FPGA. Also, in FPGA, the processor cores need not be all the same. You can have different width, functionality (FPU/Integer/other), instruction set and even a blend of digital and analog computations. So benchmarking threading across 16/32 bit code pages is not productive.
Jim Dempsey
DARN!!!! Thanks, it's what I figured. I hate to run a 8 and 16bit simultion on a 32 bit cpu, it seems like such a waste of simulation time. There must be a way on the Intel to use the unused bit width to gain speed advantage from bit width reduction. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
DARN!!!! Thanks, it's what I figured. I hate to run a 8 and 16bit simultion on a 32 bit cpu, it seems like such a waste of simulation time. There must be a way on the Intel to use the unused bit width to gain speed advantage from bit width reduction. Thanks.
Look at the SSEn.m instruction sets. SSE provides for single instruction multiple data (SIMD) whereby you can manipulate multiple like data objects in one instruction.
up to:
16-bytes
8-shorts (word)
4-dwords
2-qwords
4-floats
2-doubles
Note, the Intel Atom supports SSE. You might want to consider a design built around an (some)Atom(s) or hybrid with Atom + smaller FPGA.
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page