- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello to all forum members!!!
I'll be glad to get your's suggestions to solve my problem. I am interested to make the CRC Calculation as fast as possible (without using the MegaWizard function). In order to make the calculation as fast as possible, I used the "variables" and not the signals. The purpose of VHDL Code is doing something like 250 xor actions(5 actions on every byte while the data telegram build from 53 bytes). During the sumalition (ModelSim) I got the right results. Pending one clock the result is ready! Of course, it's uncompareable to real FPGA chip perfomance. I did the Quartus TimeQuest analyze with 50MHz oscillator. The result was really bad. The parameters that failed are:- Report SetUp Summary
- Fmax is something like 20Mhz
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
to increase the fmax, you need to pipeline the design. 250 XORs is a lot to do in a single clock. And to do that you'll need signals not variables. Before you design ANYTHING with VHDL, you need to think about the circuit before you write any code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But if I choose to use "signals" it can take something like 250 cycles at least. Also, I have to say that every next value is dependent from the previous one.
What do you mean when you are talking "think about the circuit" ? To build circuit by myself from logic elements? Do you have the good explanation for the pipeline maybe? I tried to seek the good topic about it but didn't find anything seriously...Maybe do you have the good code example to pipeline? Tnx for your answer- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why? Use signals and parallelize all the stuff as much, as possible. We use CRC32 with signals giving result in 1 clk cycle.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Which CRC are you computing?
Sounds a bit like CRC16. You might be able to make use of the fact that you can XOR together two packets and still have a valid CRC (if you correctly allow for the initial value and final inversion). So maybe you can xor together the CRC value for each bit. While this is now 53*8 XORs, they can be done in parallel - so only 10 deep. If large parts of the packet are fixed, they can be excluded from the dynamic crc. OTOH why don't you defer the CRC calculation until teh data is serialised?- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's CRC32 for MPEG-TS streams.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And ....
You can do a CRC16 with 5 XOR per byte (and some shifts which are free in harware), I doubt the same is true of CRC32.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How many bits do you process in one clock cycle?
I have implemented the CRC32 with 64 bits in one clock cycle and can run it with 156.25 MHz in a Startix V without any problems. And this is a straight forward implementation without any pipelining. By the way it is possible to pipeline CRC computations. There is a bunch of publications that describe how to do this. The one that I currently favor is: y. sun, m.s. kim, "a table-based algorithm for pipelined crc calculation," 2010 ieee international conference on communication (http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5501903&url=http%3a%2f%2fieeexplore.ieee.org%2fxpls%2fabs_all.jsp%3farnumber%3d5501903)- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- How many bits do you process in one clock cycle? I have implemented the CRC32 with 64 bits in one clock cycle and can run it with 156.25 MHz in a Startix V without any problems. And this is a straight forward implementation without any pipelining. By the way it is possible to pipeline CRC computations. There is a bunch of publications that describe how to do this. The one that I currently favor is: y. sun, m.s. kim, "a table-based algorithm for pipelined crc calculation," 2010 ieee international conference on communication (http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5501903&url=http%3a%2f%2fieeexplore.ieee.org%2fxpls%2fabs_all.jsp%3farnumber%3d5501903) --- Quote End --- I am trying to do CRC on full packet (8 bit * 53 = 424) I'm still not understanding, how does it possible to make the multiply CRC calculation during 1 cycle and with "signals" when your are dependent of the previous value. Every time I have to wait to the end of the cycle in order to get the result
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- Which CRC are you computing? Sounds a bit like CRC16. You might be able to make use of the fact that you can XOR together two packets and still have a valid CRC (if you correctly allow for the initial value and final inversion). So maybe you can xor together the CRC value for each bit. While this is now 53*8 XORs, they can be done in parallel - so only 10 deep. If large parts of the packet are fixed, they can be excluded from the dynamic crc. OTOH why don't you defer the CRC calculation until teh data is serialised? --- Quote End --- I am computing CRC 16 CCITT It is based on XORing the data with polynom. Every time I making shifting and XORing with the same polynom. After the execution of all packet I get the result build of 16 bit. I'm sure that you are familiar with it :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
CRC16 can be reduced to the following C:
uint32_t
crc_step(uint32_t crc, uint32_t byte_val)
{
uint32_t t = crc ^ (byte_val & 0xff);
t = (t ^ t << 4) & 0xff;
return crc >> 8 ^ t << 8 ^ t << 3 ^ t >> 4;
}
Which can trivially be converted to VHDL: t1 <= crc_in(7 downto 0) xor data(7 downto 0);
t2 <= t1 xor t1(3 downto 0) & B"0000";
crc_out <= X"0000" & (X"00" & crc_in(15 downto 8)) xor (t2 & X"00")
xor (B"00000" & t2 & B"000") xor (X"000" & t2(7 downto 4));
Which is 4 levels of XOR. As I said earlier, if you really need to generate the CRC of a 53 byte buffer in parallel every clock (I can't imaging why!) then you probably need to make use of the linearity of CRC calculations. Basically, if you CRC random data, then change a single bit, the difference in the CRC is independant of the original data. So, for a fixed length packet, you can easily determine which CRC bits each input bit changes and xor those values for every set bit onto the CRC for an all-zero pattern.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- CRC16 can be reduced to the following C:
uint32_t
crc_step(uint32_t crc, uint32_t byte_val)
{
uint32_t t = crc ^ (byte_val & 0xff);
t = (t ^ t << 4) & 0xff;
return crc >> 8 ^ t << 8 ^ t << 3 ^ t >> 4;
}
Which can trivially be converted to VHDL: t1 <= crc_in(7 downto 0) xor data(7 downto 0);
t2 <= t1 xor t1(3 downto 0) & B"0000";
crc_out <= X"0000" & (X"00" & crc_in(15 downto 8)) xor (t2 & X"00")
xor (B"00000" & t2 & B"000") xor (X"000" & t2(7 downto 4));
Which is 4 levels of XOR. As I said earlier, if you really need to generate the CRC of a 53 byte buffer in parallel every clock (I can't imaging why!) then you probably need to make use of the linearity of CRC calculations. Basically, if you CRC random data, then change a single bit, the difference in the CRC is independant of the original data. So, for a fixed length packet, you can easily determine which CRC bits each input bit changes and xor those values for every set bit onto the CRC for an all-zero pattern. --- Quote End --- Thank you for response :) My target is to send data packed with rs-485 communication protocol. Every 10mSec I send the packet. As I said before, packed build from 53 bytes and 2 bytes of CheckSum. Before, I used the MegaWizard for CheckSum perfomance and it was done during one clock cycle. At CheckSum I'm interested to change with CRC. That's the reason for CRC calculation. As I have read from different posts, CRC suitable for such a long data packets. Am I right?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
53 bytes isn't long!
Just feed in one byte per clock sometime in the 10ms window- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How many bits do you receive in one clock cycle? I would compute the CRC with this number if bits in your case.
Lets say you receive 8 bits per clock cycle, this would results in a much smaller logic and meet your timing target easily.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- How many bits do you receive in one clock cycle? I would compute the CRC with this number if bits in your case. Lets say you receive 8 bits per clock cycle, this would results in a much smaller logic and meet your timing target easily. --- Quote End --- The number of bits dependent from what side do you want to look - transmitter or receiver. * During transmission I get all the packet during one cycle time. One more cycle for CRC calculating and then I enable the transmission. After this I'm waiting the nest 10mSec to transmit another packet. * During receiveing I get the bytes one by one. If i'll separate the action on steps (step = one byte calculation). So I need something like 53 cycles to get the result. how can I use the pipeline for it? Or maybe onother one method the get the right results with minimum of actions. TNX
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is no point pipelining it - that would only be relevant if you wre trying to process a packet every clock (with a 53 clock delay before the crc was available).
Surely you can do the CRC as part of the tx dma? Then send the (inverted) CRC register at the end of the normal data buffer.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
dsl already posted some code how to implement your CRC for 8 bits per clock cycle.
If you need some more inspiration for your implementation I can recommend to read the user guide for Alteras CRC Compiler MegaCore Function: http://www.altera.com/products/ip/communications/additional_functions_comm/m-alt-crc-compiler.html And/Or check Alteras Advanced Synthesis Cookbook - chapter 12 there is even some source code available: http://www.altera.com/literature/manual/stx_cookbook.pdf- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tnx,I check the cookbook...
I'm trying not to use the IP Cores cause it's making the life easier :)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page