CRC Calculation - VHDL

Altera_Forum · ‎11-12-2012

Hello to all forum members!!!

I'll be glad to get your's suggestions to solve my problem.

I am interested to make the CRC Calculation as fast as possible (without using the MegaWizard function).

In order to make the calculation as fast as possible, I used the "variables" and not the signals. The purpose of VHDL Code is doing something like 250 xor actions(5 actions on every byte while the data telegram build from 53 bytes). During the sumalition (ModelSim) I got the right results. Pending one clock the result is ready!

Of course, it's uncompareable to real FPGA chip perfomance.

I did the Quartus TimeQuest analyze with 50MHz oscillator. The result was really bad. The parameters that failed are:

Report SetUp Summary
Fmax is something like 20Mhz

And now the question's time :) How can the setup time be affected if I didn't use the clock during calculation (before CRC it was OK)? As i know, SetUp time is a parameter that defines how much time the data has to be stable before changing the clk edge. How can I increase the Fmax of the design.(before CRC Fmax was ~ 120MHz)

thanks for every offer!!!

Y.

Altera_Forum · ‎11-12-2012

to increase the fmax, you need to pipeline the design. 250 XORs is a lot to do in a single clock. And to do that you'll need signals not variables. Before you design ANYTHING with VHDL, you need to think about the circuit before you write any code.

Altera_Forum · ‎11-12-2012

But if I choose to use "signals" it can take something like 250 cycles at least. Also, I have to say that every next value is dependent from the previous one.

What do you mean when you are talking "think about the circuit" ? To build circuit by myself from logic elements?

Do you have the good explanation for the pipeline maybe? I tried to seek the good topic about it but didn't find anything seriously...Maybe do you have the good code example to pipeline?

Tnx for your answer

Altera_Forum · ‎11-12-2012

Why? Use signals and parallelize all the stuff as much, as possible. We use CRC32 with signals giving result in 1 clk cycle.

Altera_Forum · ‎11-12-2012

Which CRC are you computing?

Sounds a bit like CRC16.

You might be able to make use of the fact that you can XOR together two packets and still have a valid CRC (if you correctly allow for the initial value and final inversion).

So maybe you can xor together the CRC value for each bit.

While this is now 53*8 XORs, they can be done in parallel - so only 10 deep.

If large parts of the packet are fixed, they can be excluded from the dynamic crc.

OTOH why don't you defer the CRC calculation until teh data is serialised?

Altera_Forum · ‎11-12-2012

It's CRC32 for MPEG-TS streams.

Altera_Forum · ‎11-12-2012

And ....

You can do a CRC16 with 5 XOR per byte (and some shifts which are free in harware), I doubt the same is true of CRC32.

Altera_Forum · ‎11-12-2012

How many bits do you process in one clock cycle?

I have implemented the CRC32 with 64 bits in one clock cycle and can run it with 156.25 MHz in a Startix V without any problems. And this is a straight forward implementation without any pipelining.

By the way it is possible to pipeline CRC computations. There is a bunch of publications that describe how to do this. The one that I currently favor is:

y. sun, m.s. kim, "a table-based algorithm for pipelined crc calculation," 2010 ieee international conference on communication (http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5501903&url=http%3a%2f%2fieeexplore.ieee.org%2fxpls%2fabs_all.jsp%3farnumber%3d5501903)

Altera_Forum · ‎11-13-2012

--- Quote Start ---

How many bits do you process in one clock cycle?

I have implemented the CRC32 with 64 bits in one clock cycle and can run it with 156.25 MHz in a Startix V without any problems. And this is a straight forward implementation without any pipelining.

By the way it is possible to pipeline CRC computations. There is a bunch of publications that describe how to do this. The one that I currently favor is:

y. sun, m.s. kim, "a table-based algorithm for pipelined crc calculation," 2010 ieee international conference on communication (http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5501903&url=http%3a%2f%2fieeexplore.ieee.org%2fxpls%2fabs_all.jsp%3farnumber%3d5501903)

--- Quote End ---

I am trying to do CRC on full packet (8 bit * 53 = 424)

I'm still not understanding, how does it possible to make the multiply CRC calculation during 1 cycle and with "signals" when your are dependent of the previous value. Every time I have to wait to the end of the cycle in order to get the result

Altera_Forum · ‎11-13-2012

--- Quote Start ---

Which CRC are you computing?

Sounds a bit like CRC16.

You might be able to make use of the fact that you can XOR together two packets and still have a valid CRC (if you correctly allow for the initial value and final inversion).

So maybe you can xor together the CRC value for each bit.

While this is now 53*8 XORs, they can be done in parallel - so only 10 deep.

If large parts of the packet are fixed, they can be excluded from the dynamic crc.

OTOH why don't you defer the CRC calculation until teh data is serialised?

--- Quote End ---

I am computing CRC 16 CCITT

It is based on XORing the data with polynom. Every time I making shifting and XORing with the same polynom. After the execution of all packet I get the result build of 16 bit. I'm sure that you are familiar with it :)

Altera_Forum · ‎11-13-2012

CRC16 can be reduced to the following C:

uint32_t
crc_step(uint32_t crc, uint32_t byte_val)
{
    uint32_t t = crc ^ (byte_val & 0xff);
    t = (t ^ t << 4) & 0xff;
    return crc >> 8 ^ t << 8 ^ t << 3 ^ t >> 4;
}

Which can trivially be converted to VHDL:

        t1 <= crc_in(7 downto 0) xor data(7 downto 0);
        t2 <= t1 xor t1(3 downto 0) & B"0000";
        crc_out <= X"0000" & (X"00" & crc_in(15 downto 8)) xor (t2 & X"00")
                xor (B"00000" & t2 & B"000") xor (X"000" & t2(7 downto 4));

Which is 4 levels of XOR.

As I said earlier, if you really need to generate the CRC of a 53 byte buffer in parallel every clock (I can't imaging why!) then you probably need to make use of the linearity of CRC calculations.

Basically, if you CRC random data, then change a single bit, the difference in the CRC is independant of the original data.

So, for a fixed length packet, you can easily determine which CRC bits each input bit changes and xor those values for every set bit onto the CRC for an all-zero pattern.

Altera_Forum · ‎11-13-2012

--- Quote Start ---

CRC16 can be reduced to the following C:

uint32_t
crc_step(uint32_t crc, uint32_t byte_val)
{
    uint32_t t = crc ^ (byte_val & 0xff);
    t = (t ^ t << 4) & 0xff;
    return crc >> 8 ^ t << 8 ^ t << 3 ^ t >> 4;
}

Which can trivially be converted to VHDL:

        t1 <= crc_in(7 downto 0) xor data(7 downto 0);
        t2 <= t1 xor t1(3 downto 0) & B"0000";
        crc_out <= X"0000" & (X"00" & crc_in(15 downto 8)) xor (t2 & X"00")
                xor (B"00000" & t2 & B"000") xor (X"000" & t2(7 downto 4));

Which is 4 levels of XOR.

As I said earlier, if you really need to generate the CRC of a 53 byte buffer in parallel every clock (I can't imaging why!) then you probably need to make use of the linearity of CRC calculations.

Basically, if you CRC random data, then change a single bit, the difference in the CRC is independant of the original data.

So, for a fixed length packet, you can easily determine which CRC bits each input bit changes and xor those values for every set bit onto the CRC for an all-zero pattern.

--- Quote End ---

Thank you for response :)

My target is to send data packed with rs-485 communication protocol. Every 10mSec I send the packet. As I said before, packed build from 53 bytes and 2 bytes of CheckSum. Before, I used the MegaWizard for CheckSum perfomance and it was done during one clock cycle. At CheckSum I'm interested to change with CRC. That's the reason for CRC calculation. As I have read from different posts, CRC suitable for such a long data packets. Am I right?

Altera_Forum · ‎11-13-2012

53 bytes isn't long!

Just feed in one byte per clock sometime in the 10ms window

Altera_Forum · ‎11-13-2012

How many bits do you receive in one clock cycle? I would compute the CRC with this number if bits in your case.

Lets say you receive 8 bits per clock cycle, this would results in a much smaller logic and meet your timing target easily.

Altera_Forum · ‎11-15-2012

--- Quote Start ---

How many bits do you receive in one clock cycle? I would compute the CRC with this number if bits in your case.

Lets say you receive 8 bits per clock cycle, this would results in a much smaller logic and meet your timing target easily.

--- Quote End ---

The number of bits dependent from what side do you want to look - transmitter or receiver.

* During transmission I get all the packet during one cycle time. One more cycle for CRC calculating and then I enable the transmission. After this I'm waiting the nest 10mSec to transmit another packet.

* During receiveing I get the bytes one by one.

If i'll separate the action on steps (step = one byte calculation). So I need something like 53 cycles to get the result.

how can I use the pipeline for it?

Or maybe onother one method the get the right results with minimum of actions.

TNX

Altera_Forum · ‎11-15-2012

There is no point pipelining it - that would only be relevant if you wre trying to process a packet every clock (with a 53 clock delay before the crc was available).

Surely you can do the CRC as part of the tx dma? Then send the (inverted) CRC register at the end of the normal data buffer.

Altera_Forum · ‎11-15-2012

dsl already posted some code how to implement your CRC for 8 bits per clock cycle.

If you need some more inspiration for your implementation I can recommend to read the user guide for Alteras CRC Compiler MegaCore Function:

http://www.altera.com/products/ip/communications/additional_functions_comm/m-alt-crc-compiler.html

And/Or check Alteras Advanced Synthesis Cookbook - chapter 12 there is even some source code available:

http://www.altera.com/literature/manual/stx_cookbook.pdf

Altera_Forum · ‎11-18-2012

Tnx,I check the cookbook...

I'm trying not to use the IP Cores cause it's making the life easier :)