Concurrent writes in multi-ported memories

Altera_Forum · ‎04-03-2013

Hello,

Do Altera FPGA s have a mechanism to handle concurrent writes? From what I could find from the documentation if both ports try write to same to the same address in the same clock cycle, the value at that location is undefined and it stays that way until one of the ports can write to the location.

I am trying to design multi-ported memories which can handle write conflicts and wanted to know if Altera already has something on this because I was unable to find information on this.

Thanks.

Altera_Forum · ‎04-03-2013

Use the MegaWizard to instantiate a dual-ported RAM. I recall there are some settings towards the end regarding whether you want to read old or new data. Those parameters get translated into generics. You can read the generics from the altsyncram instance that gets created.

I typically do this to get the generics correct, instantiate the altsyncram directly, and delete the megawizard instance.

Cheers,

Dave

Altera_Forum · ‎04-04-2013

Hi Dave,

I think what you are saying is about the read-during-write behavior right? But I was looking for conflicting writes ( what happens when more than 1 port, say in a 4W/8R memory try write to the same location in the same clock cycle).

There are algorithms for concurrent writes in a "shared memory" . But I couldn't find anything for multi-ported memories..

Altera_Forum · ‎04-04-2013

--- Quote Start ---

I think what you are saying is about the read-during-write behavior right?

--- Quote End ---

I guess that must have been what I was recalling :)

--- Quote Start ---

But I was looking for conflicting writes ( what happens when more than 1 port, say in a 4W/8R memory try write to the same location in the same clock cycle).

There are algorithms for concurrent writes in a "shared memory" . But I couldn't find anything for multi-ported memories..

--- Quote End ---

An FPGA has at most dual-ported memory. If you wanted to emulate more ports, then you could, using a higher-frequency for the memory controller. If the memory only needed to be operated at one frequency, then your controller could perform two writes every clock, except for the case of a write to the same location. In that case, you would just discard the one that would have been over-written. If your ports are all at different clock frequencies, then you will need a FIFO on every port. Those FIFOs would contain the transaction type, transaction address, and for writes, transaction data.

You could probably construct a Qsys system with clock domain crossing bridges to a single memory.

Cheers,

Dave

Altera_Forum · ‎04-04-2013

Hi,

If you simultaneously write to the same address of a dual-ported memory from both ports, the output port value is unknown in read/write clock mode.

However, Altera 7014[/attach]https://www.alteraforum.com/forum/attachment.php?attachmentid=7015 ltera.com/literature/hb/stratix-v/stx5_51003.pdf"]embedded memory documentation (http://www.a[attach=) doesn't list any restrictions on the contents of the memory (at least I didn't see any in the documentation).

Interestingly enough, Xilinx does have such a restriction described in a Conflict Avoidance section of its embedded memory documentation (http://www.xilinx.com/support/documentation/user_guides/ug383.pdf) on page 15.

I'd assume there has to be such a restriction, unless Altera implemented some sophisticated mechanism to handle this case.

Thanks,

Evgeni

Altera_Forum · ‎04-04-2013

Hi,

Dave... Yeah most FPGA s have a dual ported memory. But the author of the paper that I have attached has come up with techniques to build memories for FPGA having more ports using these dual ported memories as building blocks. I have done the same. All the ports operate at the same frequency. So now that I can build a nW/nR memory I was thinking about handling concurrent writes to same memory location.

Evgeni .. I went through the ' Xilinx - Conflict avoidance for synchronous clocking' . It says : When one port performs a write operation, the other port must not write into the same location, unless both ports write identical data. I too did not find any such restrictions in Altera documentation. I was thinking of implementing some algorithms to handle this situation on Altera fpgas.

Altera_Forum · ‎04-04-2013

I think adding additional user logic to handle concurrent writes to the same memory location from two ports is going to affect performance. What if both sides are doing writes every clock, and what if that clock is very fast (e.g. 300+ MHz).

I've done tricks like implementing byte-enables in Xilinx memories by taking advantage of rising and falling clock edges and doing read-modify-write. But I'm unsure how to handle concurrent writes in general case by simply adding user logic.

Thanks,

Evgeni

Altera_Forum · ‎04-05-2013

--- Quote Start ---

the author of the paper that I have attached has come up with techniques to build memories for FPGA having more ports using these dual ported memories as building blocks. I have done the same. All the ports operate at the same frequency. So now that I can build a nW/nR memory I was thinking about handling concurrent writes to same memory location.

--- Quote End ---

Keep in mind that page 43 of that paper has the statement: "we assume that multiple writes to the same address are prevented by the system using the multi-ported memory, and that the result of

doing so is undefined"

Why try to handle concurrent writes to the same memory location, when such writes make no sense?

Can you provide a use-case of a system where two writes to the same location should be allowed (for a memory, not an I/O port)? As soon as you make the statement that a write on port A always wins, you have a priority for the multiplexing control.

Just an observation ...

Cheers,

Dave

Altera_Forum · ‎04-17-2013

Hi,

Yeah I guess we could solve it using 1) Priority 2) Let the ports write if they are writing the same data 3) Arbitrarily select one of the ports and let it write. I have read that Parallel Random Access Machines (PRAM) use these technique to solve the conflicts. But I don't know if there have been any hardware implementation of PRAMs!

In a multicore environment with shared memory when multiple threads are issued thread , if a thread is accessing critical section other threads are made to wait. Synchronisation mechanism like atomic instructions , locks etc are implemented. I was thinking if some analogy of that sort would work for a multi-port memory where there are multiple ports instead of multiple threads accessing the same memory. Am I thinking in the right direction? I want to do a hardware implementation (not software) to resolve the issue.

Thanks...

Altera_Forum · ‎04-17-2013

--- Quote Start ---

In a multicore environment with shared memory when multiple threads are issued thread , if a thread is accessing critical section other threads are made to wait. Synchronisation mechanism like atomic instructions , locks etc are implemented. I was thinking if some analogy of that sort would work for a multi-port memory where there are multiple ports instead of multiple threads accessing the same memory. Am I thinking in the right direction? I want to do a hardware implementation (not software) to resolve the issue.

--- Quote End ---

The hardware generally provides "features" for the multiple processors to coordinate their shared access to a resource.

This type of issue comes up with PCI/PCIe device drivers too, eg., motherboard/host processor and peripheral board processor communications. Multiprocessor (eg., multiple NIOS II instances). Read the stuff on p7 of this document:

http://www.ovro.caltech.edu/~dwh/correlator/pdf/cobra_driver.pdf

The hardware interlock discussed there is used to implement a Linux device driver on the PCI host side, and a uC/OS-II driver on the peripheral board side. In both cases, the software has to use operating system primitives like semaphores and mutexes, eg., for the interrupt handler to restart a task that deals with communications.

You cannot avoid doing this type of thing when addressing a common resource.

Another typical scenario is communications using a scatter-gather DMA controller. The hardware implements the movement of data based on scatter-gather lists configured by software. When a list entry is moved between the software (eg., filling or removing data) to the hardware (eg., here's your empty list entry to re-use) then hardware interlocks are used, eg., the controller is disabled.

Cheers,

Dave

Altera_Forum · ‎04-18-2013

Since the M9K are synchronous it ought to be possible to add external logic to supress the write signal from one side.

It is also probable that only the bits that are written differently are undefined.

It may even be true that a 0 always wins (or v.v.) - but I suspect that memory blocks may contain inverted data.

We had some problems because SOPC silently ignored the request for 'old data' on 'read during write'. This was a 'Heisenbug' - a rebuild of the fpga with a minor change (anywhere) caused different board to fail.

Altera_Forum · ‎04-18-2013

But I am confused over one thing. Won't the software code using synchronisation mechanisms make sure that writes are not issued to the same physical address in the same clock cycle? If it is resolved at the software level itself , why should we bother trying to add hardware features to resolve it?

Altera_Forum · ‎04-18-2013

--- Quote Start ---

But I am confused over one thing. Won't the software / Operating system make sure that writes are not issued to the same physical address in the same clock cycle? If it is resolved at the software level itself , why should we bother trying to add hardware features to resolve it?

--- Quote End ---

Multiple-processors = Multiple-operating systems.

The operating systems on two completely disparate machines, eg., an x86 running Linux and a DSP running uCOS-II do not have any way to stop a simultaneous write to a location that they both have in their address map. The x86 host can post a write to the PCI bus at the same instant the DSP performs a write, and although the bus arbiters in a system will ensure there is no electrical conflict when writing to a device such as SRAM, the arbitration logic will not guarantee which order the writes occur.

Bottom line is you need an interlock between the operating systems on the processors and the interlock needs to be provided by the hardware.

Cheers,

Dave

Altera_Forum · ‎04-19-2013

I have two nios cpu that can access the same M9K memory.

One cpu accesses it as 'tightly coupled data memory', the other as an avalon slave (the original plan was to tightly couple it to both cpu, but being able to directly read it over the PCIe is very useful for debug and post mortums).

Most of the locations are only written by one of the cpus - but often read by the other.

I use a modification of Dekker's algorithm when two values have to be modified together (modified because one of the cpu can only do a try_lock() action).

But I know I have a lurking bug because there is one location which can potentially be written by both cpu - but neither does it very often at all. Fixing is difficult because I can't spin due to real-time constraints.

Altera_Forum · ‎04-19-2013

--- Quote Start ---

But I know I have a lurking bug because there is one location which can potentially be written by both cpu - but neither does it very often at all. Fixing is difficult because I can't spin due to real-time constraints.

--- Quote End ---

In your case (since you have one location, infrequently accessed), would it be simpler to avoid the simultaneous write issue altogether and use altera_avalon_mutex as a special case?

It is a little bit broader topic than the dual port undefined contents being discussed in this thread, but NIOS (Qsys fabric) is missing atomic exchange which would be useful in your case (and others).

It would be relatively minor to get a workable solution to that using a custom instruction and conduits to guarantee the "atomic" aspect. For example, LOCK/UNLOCK primitives to guard your mutex try_lock() code.

Not elegant, but kind of easy to implement.

Altera_Forum · ‎04-21-2013

I have a 4W/8R port memory built from dual port block rams as building blocks ( built using the techniques published by an author) . The code is written in verilog and in the test bench, I specify the 4 write address, 4 write data and the read address to check if I am getting the read data after correct number of clock cycles. Now to solve the issue of concurrent writes , I can just put a comparator circuit before the memory which will

1) compare the addresses being written in to the memory.

2) If the write addresses match compare the data being written. If all the ports are writing the same data then there is no problem. But if the data is different then then use either priority (assigning a static priority to port is easy. But I haven't thought of how assign a dynamic priority) or some other technique to select only 1 port and write that data.

Then I can synthesize this design in Quartus. Is this idea sensible?

Altera_Forum · ‎04-22-2013

--- Quote Start ---

Is this idea sensible?

--- Quote End ---

No. :)

Provide a use-case where all four devices will write, and describe who should win, i.e., which write port should write.

If the priority is static, then address comparators and an if statement is probably sufficient.

If the priority is dynamic, then you would need a 4-bit register to determine "who gets to write next", eg., a shift-register with 1 bit set, that gets shifted once the writer corresponding to that bit gets to write, would be a round-robin scheduler.

Although you can conceive of how to do this, its still not clear *why* you want or need to :)

Cheers,

Dave

Altera_Forum · ‎04-22-2013

--- Quote Start ---

2) If the write addresses match compare the data being written. If all the ports are writing the same data then there is no problem.

--- Quote End ---

I don't think it's necessary to check the data being written unless you intended to separately accumulate statistics or otherwise report the collision. The match vs. not has no impact on how you actually process the pending writes (just always write whatever your chosen priority master is writing, and if it happens to match what everybody else wanted to write, fantastic).

As far as priority goes, static priority based on port is probably simpler to implement and simpler to explain to a 3rd party using your module. This shouldn't be a frequently exercised piece of functionality, so minimizing the resources/time devoted to it is probably a good idea.

Altera_Forum · ‎04-22-2013

--- Quote Start ---

Although you can conceive of how to do this, its still not clear *why* you want or need to :)

Cheers,

Dave

--- Quote End ---

Hello Dave,

Is your question - Will simultaneous writes to the same location be issued in the first place? I had the question when I asked in one of my earlier posts where the software itself can resolve the issue using synchronization techniques and there would be no need to think of a hardware feature to implement this. If we consider 3 scenarios :

1) A single NIOS processor [This my present case where I am using a Cyclone iv fpga]

2) A multi-core processor

3) A multi-processor.

Do you mean to say if a single machine has any one of these processors there will not be any write conflict? I understood the case of 2 disparate machines with 2 different OS s access the multi-ported memory.

One more query I have is - Kindly take a look at the attachment. So that is the 4w8r memory at a particular depth. So now the chip has a 4w8r memory and there are the rest of the M9K block rams available for use. How will the system know when it has to acccess this 4w8r memory and not the other block rams? Are separate instructions needed to specify that the 4w8r memory has to be accessed?

I am so hazed with all these questions right now!! Any help will help me understand things better.

Thanks.

Altera_Forum · ‎04-22-2013

--- Quote Start ---

So now the chip has a 4w8r memory and there are the rest of the M9K block rams available for use. How will the system know when it has to acccess this 4w8r memory and not the other block rams? Are separate instructions needed to specify that the 4w8r memory has to be accessed?

--- Quote End ---

Your (larger) system would need to instantiate and make explicit connections to your new 4w8r module. It is not possible to, for example, extend Quartus synthesis to infer your new memory the same way you can write HDL to infer single/dual-port M9K.

Since you have mentioned NIOS a couple times, you may want to go through the exercise of packaging your new module as a Qsys component.