Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
21594 Discussions

AES-128 Sbox - Rom Issue

Altera_Forum
Honored Contributor II
2,708 Views

Hello, 

 

I am a student and for my final year project am carrying out an investigation into the hardware implementation of AES-128 in an Altera FPGA - using SystemVerilog. 

 

In the AES algorithm there are 2 mandatory look up tables known as the SBox and Inverse-SBox which both hold 256 x 8bit values. 

 

Currently I am using a 2 separate array's of bytes to store these values, however I want to look at the possibility of using some of the on-fpga memory bits - probably in the form of a ROM however I have very limited knowledge of this and was looking for a push in the right direction. 

My concern is that ROM's have a clock cycle delay, which will obviously slow down my overall design, however for my project it will be a good discussion. Is there a way to implement a ROM in the memory bits without this clock cycle delay? Some asynchronous ROM? 

 

Any replies would be greatly welcome :)
0 Kudos
8 Replies
Altera_Forum
Honored Contributor II
1,112 Views

You can refer Altera's cookbook. I dont think you can infer asynchronous ROM. I don't understand the meaning of "slow design". If you mean latency, proper pipelining will decrease the overall latency rather than increasing it.

0 Kudos
Altera_Forum
Honored Contributor II
1,112 Views

The M*K blocks in Altera's FPGAs can only implement synchronous RAM/ROMs. Ie, they have the 1 cycle delay. 

 

Asynchronous ROMs need to be implemented in LUTs.
0 Kudos
Altera_Forum
Honored Contributor II
1,112 Views

Thank you both for your replies - very much appreciated. 

 

My main issue with using a Synchronous ROM is as follows... 

 

My design uses a FSM, having 13 states - and on each clock cycle the state is increased (generally speaking, in each state, 1 round of encryption/decryption is performed). When using an array, thus the LUT's the instruction... 

 

for (shortint c = 0; c < 4; c++) begin 

State[r][c] = Sbox[State[r][c] >> 4][State[r][c] & 8'h0f]; 

end 

 

Would carry out all 4 substitutions within a single clock cycle, however by using a ROM that piece itself would take 4 clock cycles. 

How would I synchronise my current FSM using a ROM - would I have to increment states every X clock cycles instead of every 1?
0 Kudos
Altera_Forum
Honored Contributor II
1,112 Views

The solution is to simply use 4 synchronous ROMs, one for each "c". Or actually, 2 since each M*K block has two independent read ports. 

 

So, AFAIK as I can see, you can use M*K blocks for your problem, without performance penalty. 

 

To use them, you basically two options. 

 

One, you can use a ROM function such as LPM_ROM or ALT_ROM. 

LPM_ROM is portable, ALT_ROM has more features, such as 2 ports. 

It will require some changes to your code and you need to write a .HEX/.MIF file for the ROM's contents. 

 

The other solution, which I prefer, is to infer the ROM from your code. 

This will require you to change your code to follow a ROM template Quartus can recognize, which may involve a bit of trial and error. 

Take a look into the HDL coding guidelines to see which templates are supported and start from a simple case. 

 

Also, Quartus may also decide that, despite your efforts, your ROM is best implemented in LUTs.
0 Kudos
Altera_Forum
Honored Contributor II
1,112 Views

Great reply- thanks! 

 

However, even if I was to use 4 x ROM (or 2 x Dual Port), I don't understand how there can be no performance penalty? As you still need to wait a full clock cycle to get a result?
0 Kudos
Altera_Forum
Honored Contributor II
1,112 Views

You need to be able to generate the ROM's read address in the clock cycle before you need the data. 

 

I was looking at your code snipped and it looked possible. But I may be wrong here.
0 Kudos
Altera_Forum
Honored Contributor II
1,112 Views

OK great - thank you very much for your help here :)

0 Kudos
Altera_Forum
Honored Contributor II
1,112 Views

Why don't you use a single ROM and change states of your state machine after 4 cycles

0 Kudos
Reply