Re: Linux with MMU on NEEK - Page 4

Altera_Forum · ‎09-30-2009

Hi, all.

I'm testing Linux MMU version, on my NEEK.

http://www.nioswiki.com/linux

It works fine and I can use "bash" shell. This is the evident proof that we are using the true 'fork' instead of 'vfork'.

May be this will depends on the version, but TSE driver claims an error and doesn't work on this design. The error is


ERROR: altera_tse.c:1666: request_mem_region() failed

I think that this error is caused by misunderstanding of the usage for the function request_mem_region(). Inside of the request_mem_region(), the function __request_region() is called. If the resource has been already registered, this function returns a non-NULL value, that is the pointer for its resource. But the resource 'sgdma_rx_base' is already registered in the initialization process, so this function returns the 'conflict' and


    if (!request_mem_region(sgdma_rx_base, sgdma_rx_size, "altera_tse")) {

is always true. So I made a dirty patch,


    if (!request_mem_region(sgdma_rx_base, sgdma_rx_size, "altera_tse")) {
        reg_resource = __request_region(&iomem_resource, sgdma_rx_base, sgdma_rx_size, "altera_tse", 0);
        if (reg_resource != NULL && reg_resource->flags & IORESOURCE_BUSY) {
            printk(KERN_ERR "ERROR: %s:%d: request_mem_region() failed\n", __FILE__, __LINE__);
            ret = -EBUSY;
            goto out_sgdma_rx;
        }
    }

Moreover, the author is forgetting that the DMA is working in the physical address world,

so we need to set the pointers of descripters like


//    desc->source = read_addr;
    desc->source = virt_to_phys(read_addr);
//    desc->destination = write_addr;
    desc->destination = virt_to_phys(write_addr);
//    desc->next = (unsigned int *)next;
    desc->next = (unsigned int *)((unsigned long)next & 0x1fffffffUL);

and so on.

Also the frame buffer fb0 will not work well, because the driver 'altfb.c' is not implemented for Linux with MMU version. So I put some codes for altfb_mmap(), like


/* We implement our own mmap to set MAY_SHARE and add the correct size */
static int altfb_mmap(struct fb_info *info, struct vm_area_struct *vma)
{
    unsigned long phys_addr, phys_size;
    unsigned long addr;
    unsigned long size = vma->vm_end - vma->vm_start;
    unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
//    vma->vm_flags |= VM_MAYSHARE | VM_SHARED;
//    vma->vm_start = info->screen_base;
//    vma->vm_end = vma->vm_start + info->fix.smem_len;
    /* check range */
    if (vma->vm_pgoff > (~0UL >> PAGE_SHIFT))
        return -EINVAL;
    if (offset + size > altfb_fix.smem_len)
        return -EINVAL;
    vma->vm_flags |= VM_IO | VM_RESERVED;
    addr = vma->vm_start;
    phys_addr = altfb_fix.smem_start + offset;
    if ((offset + size) < altfb_fix.smem_len)
        phys_size = size;
    else
        phys_size = altfb_fix.smem_len - offset;
    vma->vm_page_prot = __pgprot(_PAGE_PRESENT|_PAGE_READ|_PAGE_WRITE);
    if (remap_pfn_range(vma, addr, phys_addr >> PAGE_SHIFT, phys_size, vma->vm_page_prot))
        return -EAGAIN;
    return 0;
}

and rewrite the DMA descripters like


            desc->next = (void *)virt_to_phys((desc + 1));

So now, I can evoke telnetd and control NEEK through ethernet, and use Nano-X on Linux MMU version, but can't enter ftp session, because 'getservbyname()' function will not work well.

I don't know the directory that the souce of 'getservbyname()' is included. Would anyone please tell me where is it?

Thank you, in advance.

Altera_Forum · ‎01-14-2010

Hi,

Thank you, Michael.

--- Quote Start ---

I asked Frak Storm our dealer's Altera FPGA. He said that the Areferens design of Altera's do have 2*32K Cache and Altera recommends not not change the cache configuration. So the 4K limit does not seem to exist. Nonetheless I asked him to doublecheck the NIOS MMU/Cache "hardware" regarding this issue.

--- Quote End ---

The easiest way to check it is to have caches more than 4KB and run Linux with MMU version on it. But I'm sorry, I don't have enough time to do it.

Kazu

Altera_Forum · ‎01-14-2010

Hi,

Now, I'm trying to implement the relocating codes to the Nios's dynamic linker. At first, I must make the 'static' linker ' nios2-wrs-linux-gnu-ld' or 'ld' put relocating information. So I added some codes for two functions 'nios2_elf32_relocate_section' and 'nios2_elf32_check_relocs' in the file '/nios2gcc4/src/binutils-2.17.50/bfd/elf32-nios2.c'. About the details, please refer the attached file.

With these 'static' linkers, we can compile the 'Samples', main.c, a.c and b.c as follows


nios2-wrs-linux-gnu-gcc -g -shared -Wl,-Bsymbolic -G0 a.c -o a.so
nios2-wrs-linux-gnu-gcc -g -shared -Wl,-Bsymbolic -G0 b.c -o b.so
nios2-wrs-linux-gnu-gcc -g  main.c a.so b.so -o main

The relocation information of shared libraries are as follows.

In 'a.readelf'


Relocation section '.rela.dyn' at offset 0x308 contains 18 entries:
 Offset     Info    Type            Sym.Value  Sym. Name + Addend
00001834  00000027 R_NIOS2_RELATIVE                             00001748
00001838  00000027 R_NIOS2_RELATIVE                             00001864
0000183c  00000027 R_NIOS2_RELATIVE                             00001860
00001840  00000027 R_NIOS2_RELATIVE                             000004b0
00001844  00000027 R_NIOS2_RELATIVE                             0000056c
00001848  00000027 R_NIOS2_RELATIVE                             0000173c
0000184c  00000027 R_NIOS2_RELATIVE                             0000062c
00001854  00000027 R_NIOS2_RELATIVE                             0000185c
0000185c  00000027 R_NIOS2_RELATIVE                             0000185c
00001860  00000027 R_NIOS2_RELATIVE                             00001744
000005f4  00001104 R_NIOS2_CALL26    00000000   func_b + 0
00000614  00001104 R_NIOS2_CALL26    00000000   func_b + 0
000005fc  0000100b R_NIOS2_HIADJ16   00000000   j + 0
00000600  0000100a R_NIOS2_LO16      00000000   j + 0
00000608  0000100b R_NIOS2_HIADJ16   00000000   j + 0
0000060c  0000100a R_NIOS2_LO16      00000000   j + 0
00001850  00000b25 R_NIOS2_GLOB_DAT  00000000   _Jv_RegisterClasses + 0
00001858  00000f25 R_NIOS2_GLOB_DAT  00000000   __cxa_finalize + 0
Relocation section '.rela.plt' at offset 0x3e0 contains 2 entries:
 Offset     Info    Type            Sym.Value  Sym. Name + Addend
0000182c  00000f26 R_NIOS2_JUMP_SLOT 00000000   __cxa_finalize + 0
00001830  00001126 R_NIOS2_JUMP_SLOT 00000000   func_b + 0
There are no unwind sections in this file.
Symbol table '.dynsym' contains 21 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 000003f8     0 SECTION LOCAL  DEFAULT    8 
     2: 000004b0     0 SECTION LOCAL  DEFAULT   10 
     3: 000006bc     0 SECTION LOCAL  DEFAULT   11 
     4: 00000734     0 SECTION LOCAL  DEFAULT   12 
     5: 00001738     0 SECTION LOCAL  DEFAULT   13 
     6: 00001740     0 SECTION LOCAL  DEFAULT   14 
     7: 00001748     0 SECTION LOCAL  DEFAULT   15 
     8: 0000185c     0 SECTION LOCAL  DEFAULT   18 
     9: 00001864     0 SECTION LOCAL  DEFAULT   19 
    10: 000005dc    80 FUNC    GLOBAL DEFAULT   10 func_a
    11: 00000000     0 NOTYPE  WEAK   DEFAULT  UND _Jv_RegisterClasses
    12: 000006bc     0 NOTYPE  GLOBAL DEFAULT   11 _fini
    13: 00009810     0 NOTYPE  GLOBAL DEFAULT  ABS _gp
    14: 00001864     0 NOTYPE  GLOBAL DEFAULT  ABS __bss_start
    15: 00000000   356 FUNC    WEAK   DEFAULT  UND __cxa_finalize@GLIBC_2.10 (2)
    16: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND j
    17: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND func_b
    18: 00001868     0 NOTYPE  GLOBAL DEFAULT  ABS _end
    19: 00001864     0 NOTYPE  GLOBAL DEFAULT  ABS _edata
    20: 000003f8     0 NOTYPE  GLOBAL DEFAULT    8 _init

.

In 'b.readelf'


Relocation section '.rela.dyn' at offset 0x300 contains 17 entries:
 Offset     Info    Type            Sym.Value  Sym. Name + Addend
00001850  00000027 R_NIOS2_RELATIVE                             0000175c
00001854  00000027 R_NIOS2_RELATIVE                             00001880
00001858  00000027 R_NIOS2_RELATIVE                             0000187c
0000185c  00000027 R_NIOS2_RELATIVE                             00000484
00001860  00000027 R_NIOS2_RELATIVE                             00000540
00001864  00000027 R_NIOS2_RELATIVE                             00001750
00001868  00000027 R_NIOS2_RELATIVE                             00000640
00001870  00000027 R_NIOS2_RELATIVE                             00001878
00001878  00000027 R_NIOS2_RELATIVE                             00001878
0000187c  00000027 R_NIOS2_RELATIVE                             00001758
000005f8  00000004 R_NIOS2_CALL26                               000005b0
00000600  0000000b R_NIOS2_HIADJ16                              00001830
00000604  0000000a R_NIOS2_LO16                                 00001830
00000618  0000000b R_NIOS2_HIADJ16                              00001830
0000061c  0000000a R_NIOS2_LO16                                 00001830
0000186c  00000b25 R_NIOS2_GLOB_DAT  00000000   _Jv_RegisterClasses + 0
00001874  00000f25 R_NIOS2_GLOB_DAT  00000000   __cxa_finalize + 0
Relocation section '.rela.plt' at offset 0x3cc contains 1 entries:
 Offset     Info    Type            Sym.Value  Sym. Name + Addend
0000184c  00000f26 R_NIOS2_JUMP_SLOT 00000000   __cxa_finalize + 0
There are no unwind sections in this file.
Symbol table '.dynsym' contains 21 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 000003d8     0 SECTION LOCAL  DEFAULT    8 
     2: 00000484     0 SECTION LOCAL  DEFAULT   10 
     3: 000006d0     0 SECTION LOCAL  DEFAULT   11 
     4: 00000748     0 SECTION LOCAL  DEFAULT   12 
     5: 0000174c     0 SECTION LOCAL  DEFAULT   13 
     6: 00001754     0 SECTION LOCAL  DEFAULT   14 
     7: 0000175c     0 SECTION LOCAL  DEFAULT   15 
     8: 00001830     0 SECTION LOCAL  DEFAULT   17 
     9: 00001878     0 SECTION LOCAL  DEFAULT   19 
    10: 00001880     0 SECTION LOCAL  DEFAULT   20 
    11: 00000000     0 NOTYPE  WEAK   DEFAULT  UND _Jv_RegisterClasses
    12: 000006d0     0 NOTYPE  GLOBAL DEFAULT   11 _fini
    13: 00009830     0 NOTYPE  GLOBAL DEFAULT  ABS _gp
    14: 00001880     0 NOTYPE  GLOBAL DEFAULT  ABS __bss_start
    15: 00000000   356 FUNC    WEAK   DEFAULT  UND __cxa_finalize@GLIBC_2.10 (2)
    16: 00001830     4 OBJECT  GLOBAL DEFAULT   17 j
    17: 000005e0    96 FUNC    GLOBAL DEFAULT   10 func_b
    18: 00001884     0 NOTYPE  GLOBAL DEFAULT  ABS _end
    19: 00001880     0 NOTYPE  GLOBAL DEFAULT  ABS _edata
    20: 000003d8     0 NOTYPE  GLOBAL DEFAULT    8 _init

.

And for the 'dynamic' linker, I put some relocating codes to the machine dependent function 'elf_machine_rela' of '/nios2gcc4/src/glibc-ports-2.5/sysdeps/nios2/dl-machine.h'.

Unfortunately this relocation rewrites the 'text' section, so we must set the flag of elf's 'Program Headers' like


Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00000000 0x00000000 0x00738 0x00738 RWE 0x1000 <-- Flg must be RWE instead of R E.
  LOAD           0x000738 0x00001738 0x00001738 0x0012c 0x0012d RW  0x1000
  DYNAMIC        0x00074c 0x0000174c 0x0000174c 0x000c8 0x000c8 RW  0x4

.

Now I can't do this through the linker, so now I used a 'binary editor' and rewrote it.

Anyway, we can make the position dependent codes to shared libraries, but we need switches '-Bsymbolic' and '-G0' (this is needed to avoid the use of gp register for optimization), instead of '-fPIC'. I'm not sure that this direction is right or not.

Kazu

Altera_Forum · ‎01-14-2010

Sorry, but I don't understand the details, and in fact I'm not really sure what difference this update makes.

Did you fix a problem ?

In what cases does this problem hit ? (Kernel basic code, Kernel Modules (which i learned would not work before), user land programs, user land so's) ?

Will this get included into the distribution ?

How ?

Is MontaVista and the other commercial provider aware of this ? Why does it not hit them ?

Thanks,

-Michael

Altera_Forum · ‎01-14-2010

--- Quote Start ---

The easiest way to check it is to have caches more than 4KB and run Linux with MMU version on it. But I'm sorry, I don't have enough time to do it.

Kazu

--- Quote End ---

As mentioned earlier in the thread, I've done this, and it runs, but ethernet does not work.

Altera_Forum · ‎01-15-2010

Hi,

--- Quote Start ---

As mentioned earlier in the thread, I've done this, and it runs, but ethernet does not work.

--- Quote End ---

The thing is not so easy, because the kernel code itself does not use MMU's address conversion mechanism. So we need an adequate user-land program to stick the pinhole.

Kazu

Altera_Forum · ‎01-15-2010

Hi,

--- Quote Start ---

Sorry, but I don't understand the details, and in fact I'm not really sure what difference this update makes.

--- Quote End ---

By this new (static) linker, you can make shared libraries from pre-compiled position dependent codes, and the new dynamic linker can relocate those adequately.

For example, if you compile a sample program like


/** a.c --- Test for Nios Dynamic Linker  **/
extern func_b(int);
extern int j;
int func_a(int i)
{
    j = func_b(i);
    return j;
}

,with the command


nios2-wrs-linux-gnu-gcc -c -G0 -g a.c -o a.o

the compiler will generate relocating information with its position dependent codes as follows.


Relocation section '.rela.text' at offset 0x7b0 contains 5 entries:
 Offset     Info    Type            Sym.Value  Sym. Name + Addend
00000018  00000e04 R_NIOS2_CALL26    00000000   func_b + 0
00000020  00000f0b R_NIOS2_HIADJ16   00000000   j + 0
00000024  00000f0a R_NIOS2_LO16      00000000   j + 0
0000002c  00000f0b R_NIOS2_HIADJ16   00000000   j + 0
00000030  00000f0a R_NIOS2_LO16      00000000   j + 0

But the old (static) linker can not pass these information to the shared library headers.

Added codes of the new linker will do it.


Relocation section '.rela.dyn' at offset 0x18c contains 5 entries:
 Offset     Info    Type            Sym.Value  Sym. Name + Addend
0000021c  00000404 R_NIOS2_CALL26    00000000   func_b + 0
00000224  0000020b R_NIOS2_HIADJ16   00000000   j + 0
00000228  0000020a R_NIOS2_LO16      00000000   j + 0
00000230  0000020b R_NIOS2_HIADJ16   00000000   j + 0
00000234  0000020a R_NIOS2_LO16      00000000   j + 0
Relocation section '.rela.plt' at offset 0x1c8 contains 1 entries:
 Offset     Info    Type            Sym.Value  Sym. Name + Addend
000012fc  00000426 R_NIOS2_JUMP_SLOT 00000000   func_b + 0

.

The new dynamic linker 'ld.so.1' will do the relocation for these new items, ' R_NIOS2_CALL26', 'R_NIOS2_HIADJ16' and 'R_NIOS2_LO16'.

--- Quote Start ---

Did you fix a problem ?

--- Quote End ---

Yes, but partially fixed.

--- Quote Start ---

In what cases dopes this problem hit ? (Kernel basic code, Kernel Modules (which i learned would not work before), user land programs, user land so's) ?

--- Quote End ---

This modifications are only applied to user-land so's. Newly added codes never work without two switches '-shared' and '-Bsymbolic'.

--- Quote Start ---

Will this get included into the distribution ?

How ?

--- Quote End ---

I don't know, because I'm only a Sunday programmer.

--- Quote Start ---

Is MontaVista and the other commercial provider aware of this ? Why does it not hit them ?

--- Quote End ---

I really don't know who wrote these codes. But I have a little bit doubt whether the implementer do his job seriously or not. The codes are messy, confused and have many un-fixed parts. Sometimes I encounter unbelievable comments like


/* The runtime resolver receives the original function arguments in r4
   through r7, the shared library identifier from GOT? in r14, and the
   relocation index times four in r15. It updates the corresponding PLT GOT
   entry so that the PLT entry will transfer control directly to the target
   in the future, and then transfers control to the target. */

.

Why there exists '?' after GOT[1] ? It seems to me that this implementer doesn't know the structure of GOT well.

Kazu

Altera_Forum · ‎01-15-2010

--- Quote Start ---

Hi,

I have a little bit doubt that Nios CPU with MMU can't have over 4Kbytes caches. There are several ways of connecting method for CPU, cache and MMU. For example,

1) CPU -- cache -- MMU -- Memory

2) CPU -- MMU -- cache -- Memory.

The first method has low latency, but also has 'synonym problems'. The second method accesses the cache by physical addresses, but has larger latency to the contrary. So I think that Nios CPU takes the next strategy i.e.

3) CPU |--cache --| -- Memory

............|--MMU --|

by limiting the size of cache under page size.(Please refer Nios handbook n2cpu_nii5v1.pdf, page 2-10, Figure 2-2.)

If so, we can't have both instruction and data caches larger than page size(=4Kbytes).

Kazu

--- Quote End ---

I found the relevant section, page 3-53 of the handbook:

--- Quote Start ---

Virtual Address Aliasing

A virtual address alias occurs when two virtual addresses map to the same physical

address. When an MMU and caches are present and the caches are larger than a page

(4 KBytes), the operating system must prevent illegal virtual address aliases. Because

the caches are virtually-indexed and physically-tagged, a portion of the virtual

address is used to select the cache line. If the cache is 4 KBytes or less in size, the

portion of the virtual address used to select the cache line fits with bits 11:0 of the

virtual address which have the same value as bits 11:0 of the physical address (they

are untranslated bits of the page offset). However, if the cache is larger than 4 KBytes,

bits beyond the page offset (bits 12 and up) are used to select the cache line and these

bits are allowed to be different than the corresponding physical address.

For example, in a 64 KByte direct-mapped cache with a 16-byte line, bits 15:4 are used

to select the line. Assume that virtual address 0x1000 is mapped to physical address

0xF000 and virtual address 0x2000 is also mapped to physical address 0xF000.

This is an illegal virtual address alias because accesses to virtual address 0x1000 use

line 0x1 and accesses to virtual address 0x2000 use line 0x2 even though they map to

the same physical address. This results in two copies of the same physical address in

the cache. With an n-byte direct-mapped cache, there could be n/4096 copies of the

same physical address in the cache if illegal virtual address aliases are not prevented.

The bits of the virtual address that are used to select the line and are translated bits

(bits 12 and up) are known as the color of the address. An operating system avoids

illegal virtual address aliases by ensuring that if multiple virtual addresses map the

same physical address, the virtual addresses have the same color. Note though, the

color of the virtual addresses does not need to be the same as the color as the physical

address because the cache tag contains all the bits of the PFN.

--- Quote End ---

Question is is this support implemented in Linux and what would be required to fix it.

Altera_Forum · ‎01-16-2010

"virtually-indexed and physically-tagged" ???

How does this compare to ARM, that uses the cache and the MMU completely in the "wrong" order (strictly using physical addresses in the cache). Because of that, ARM-Linux needs to flush the cache completely with any task-switch. That is why for ARM systems with many task switches, not using the MMU is recommended.

I sincerely hope that such a drastic method is not necessary with NIOS !

Happily, I myself am planing a heavily multithreaded system, so not that many MMU reprogramming (and cache invalidating) as with a heavily multitasking system. Same is only possible with MMU, as the non-MMU-compiler does not support TLS, which is essential to do decent multithreaded applications.

-Michael

Altera_Forum · ‎01-18-2010

Hi,

--- Quote Start ---

"virtually-indexed and physically-tagged" ???

--- Quote End ---

So, I think that

3) CPU |--cache --| -- Memory

............|--MMU --|

is 'Bingo'.

--- Quote Start ---

How does this compare to ARM, that uses the cache and the MMU completely in the "wrong" order (strictly using physical addresses in the cache). Because of that, ARM-Linux needs to flush the cache completely with any task-switch. That is why for ARM systems with many task switches, not using the MMU is recommended.

I sincerely hope that such a drastic method is not necessary with NIOS !

--- Quote End ---

In NIOS, you don't need to flush the cache for each task-switch. But the TLB uses PID mechanism to distinguish each user tasks,


void set_mmu_pid(unsigned long pid) {
   WRCTL(CTL_TLBMISC, (RDCTL(CTL_TLBMISC) & (WAY_MASK << WAY_SHIFT)) | ((pid & PID_MASK) << PID_SHIFT));
}

so TLB flush and loading will occur automatically.

Kazu

Altera_Forum · ‎01-18-2010

Sounds good.

Is the software part already implemented in the Kernel ?

Does it work ?

Decent performance ?

Thanks,

-Michael

Altera_Forum · ‎01-20-2010

Hi,

--- Quote Start ---

Is the software part already implemented in the Kernel ?

Does it work ?

--- Quote End ---

Yes, of course, we are using these. Nios's TLB control parts are included in the file '/nios2-linux/linux-2.6/arch/nios2/mm/mmu_context.c, tlb.c' and cache flush functions are included in '/nios2-linux/linux-2.6/arch/nios2/mm/cacheflush.c'.

But please note that the cache size is limited within 4KB. In present codes, there is no mechanism to avoid the alias problem.

--- Quote Start ---

Decent performance ?

--- Quote End ---

Though I don't have any concrete data for the performance, but I think it is not so bad. FLTK's demos. for example, 'editor' works well. It uses 'Bitblit' functionality and this is heavy task for Nios CPU and its MMU, but the scrolling speed is not so slow.

Kazu

Altera_Forum · ‎01-20-2010

--- Quote Start ---

But please note that the cache size is limited within 4KB. In present codes, there is no mechanism to avoid the alias problem.

--- Quote End ---

So it is not correctly implemented. All Altera example designs use much more cache !

What should we do about that ?

I feel that 4K cache will degrade performance a lot.. I did not test this thoroughly, but I once did a speed test with a uCLinux vs a fill lunux design and found that the full Linux design was much slower (up to half speed). I'm not sure about the cache sizes, though

.

-Michael

Altera_Forum · ‎01-20-2010

Actually, I have it working with 32KB caches now, I'm not sure what changed (been through a few FPGA updates since I last tried it). I think at least some support for the aliasing problem is there actually - see cache_flush.c, syscall.c, Documentation/cachetlb.txt.

I did make this change, but haven't noticed any difference with or without it, but it seems more correct for the COLOUR_ALIGN macro in syscall.c and by the documentation in cachetlb.txt:

--- a/arch/nios2/include/asm/shmparam.h
+++ b/arch/nios2/include/asm/shmparam.h
@@ -1 +1,2 @@
-#include <asm-generic/shmparam.h>
+#include <asm/nios.h>
+#define SHMLBA DCACHE_SIZE

Altera_Forum · ‎01-21-2010

Hi,

--- Quote Start ---

So it is not correctly implemented. All Altera example designs use much more cache !

What should we do about that ?

--- Quote End ---

I'm not sure that Wind River guys had implemented a correct mechanism to deal with the 'alias problem'. So give me some time to check it.

--- Quote Start ---

I feel that 4K cache will degrade performance a lot.. I did not test this thoroughly, but I once did a speed test with a uCLinux vs a fill lunux design and found that the full Linux design was much slower (up to half speed). I'm not sure about the cache sizes, though

--- Quote End ---

Please take account of the kernel with MMU which must do many tasks compared to no-MMU version and we can enjoy many excellent features instead of its lower performance.

Kazu

Altera_Forum · ‎01-21-2010

Hi,

--- Quote Start ---

I think at least some support for the aliasing problem is there actually - see cache_flush.c, syscall.c, Documentation/cachetlb.txt.

--- Quote End ---

Oh, I'm sorry that my opinion " there is no mechanism to avoid the alias problem." was an overstatement. There exists the traces of implementation for 'alias problem', but I'm not sure whether these will work well or not. As mentioned in the 'Documentation/cachetlb.txt', the 'alias problem' affects only the D-cache. If we have 'alias' copies of same physical address contents, we must take special care of those flushing from the D-cache. But whether we will have the 'alias' or not depends on the functionality of Linux kernel. So the problem is a little bit difficult. It seems to me that functions 'copy_from_user_page' and 'copy_to_user_page' will work well


void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
                         unsigned long user_vaddr,
                         void *dst, void *src, int len)
{
   flush_cache_page(vma, user_vaddr, page_to_pfn(page));
    memcpy(dst, src, len);
   flush_dcache_range((unsigned long)src, (unsigned long)src+len);
   if(vma->vm_flags & VM_EXEC) {
      flush_icache_range((unsigned long)src, (unsigned long)src+len);
   }
}
void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
                       unsigned long user_vaddr,
                       void *dst, void *src, int len)
{
   flush_cache_page(vma, user_vaddr, page_to_pfn(page));
    memcpy(dst, src, len);
   flush_dcache_range((unsigned long)dst, (unsigned long)dst+len);
   if(vma->vm_flags & VM_EXEC) {
      flush_icache_range((unsigned long)dst, (unsigned long)dst+len);
   }
}

, beacuse these are only the 'alias problems' between user-land virtual addresses and kernel's ones. But I'm still not sure for the case of shared maps among user-lands.

Kazu

Altera_Forum · ‎01-22-2010

--- Quote Start ---

So give me some time to check it.

--- Quote End ---

Great ! Thanks,

-Michael

Altera_Forum · ‎08-09-2010

Hi, kazu & Gurus

I'm working on using LCD in a nios2-mmu system on NEEK, I followed the instructions on nios-wiki to configure the kernel and modified altfb.c by advices of Kazu above, The result in my Linux is: "fb_test" command can be ran successfully:


The framebuffer device was opened successfully.
800*480, 32bpp
The framebuffer device was mapped to memory successfully.

but I can not run Nano-X :


cantnot bind to named socket

and the image I ran is "linux.initramsfs.gz".

I need your help, please give me some advice and thank you.

Altera_Forum · ‎08-11-2010

Hi.

--- Quote Start ---


cantnot bind to named socket

--- Quote End ---

At first, please check your kernel includes 'Unix domain sockets'.


 
 Networking support  ---> 
      --- Networking support                                          
           Networking options  ---> 
               <*> Packet socket                                            
                
   Packet socket: mmapped IO    
               <*> Unix domain sockets

Kazu

Altera_Forum · ‎08-12-2010

Hi, Kazu

Thank you for your response, the problem has been solved by your advice. But the new confusion was that nothing appeared on the screen and Nano-X seemed running correctly:


/#  nano-X &
 686 nano-X
/#  nanowm &
 687 nanowm
/#  nxclock &
 688 nxclock

And the linux init messages is below


$ nios2-terminal
nios2-terminal: connected to hardware target using JTAG UART on cable
nios2-terminal: &quot;USB-Blaster &quot;, device 1, instance 1
nios2-terminal: (Use the IDE stop button or Ctrl-C to terminate)
Linux version 2.6.30-00494-g84a224b-dirty (alex@alex-desktop) (gcc version 4.1.2)# 38 Thu Aug 12 16:36:39 CST 2010
console  enabled
Early printk initialized
Linux/Nios II-MMU
init_bootmem_node(?,0x50c, 0x0, 0x2000)
free_bootmem(0x50c000, 0x1af4000)
reserve_bootmem(0x50c000, 0x400)
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 8128
Kernel command line: 
NR_IRQS:32
PID hash table entries: 128 (order: 7, 512 bytes)
Console: colour dummy device 80x25
Dentry cache hash table entries: 4096 (order: 2, 16384 bytes)
Inode-cache hash table entries: 2048 (order: 1, 8192 bytes)
We have 8192 pages of RAM
Memory available: 27300k/5165k RAM, 0k/0k ROM (1577k kernel code, 3588k data)
Calibrating delay loop... 49.25 BogoMIPS (lpj=246272)
Mount-cache hash table entries: 512
net_namespace: 296 bytes
NET: Registered protocol family 16
init_BSP(): registering device resources
bio: create slab <bio-0> at 0
NET: Registered protocol family 2
IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
TCP established hash table entries: 1024 (order: 1, 8192 bytes)
TCP bind hash table entries: 1024 (order: 0, 4096 bytes)
TCP: Hash tables configured (established 1024 bind 1024)
TCP reno registered
NET: Registered protocol family 1
msgmni has been set to 53
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
fb0: Altera FB frame buffer device
ttyJ0 at MMIO 0x8001410 (irq = 7) is a Altera JTAG UART
console handover: boot  -> real 
altps2 : base e8001420 irq 9
mice: PS/2 mouse device common for all mice
TCP cubic registered
NET: Registered protocol family 17
atkbd.c: keyboard reset failed on altps2.0
Welcome to
          ____ _  _
         /  __| ||_|                 
    _   _| |  | | _ ____  _   _  _  _ 
   | | | | |  | || |  _ \| | | |\ \/ /
   | |_| | |__| || | | | | |_| |/    
   |  ___\____|_||_|_| |_|\____|\_/\_/
   | |
   |_|
For further information check:
http://www.uclinux.org/
BusyBox v1.15.1 (2010-08-12 16:35:17 CST) hush - the humble shell
Enter 'help' for a list of built-in commands.

please give me some advices when your are free, and thank you again.

--Smarter.UJS

Altera_Forum · ‎08-13-2010

Hi,

--- Quote Start ---

But the new confusion was that nothing appeared on the screen and Nano-X seemed running correctly:


/#  nano-X &
 686 nano-X
/#  nanowm &
 687 nanowm
/#  nxclock &
 688 nxclock

--- Quote End ---

Have you revised that the all DMA descriptors use physical addresses? For example,


} __attribute__ ((packed)) sgdma_desc;
# include <asm/cacheflush.h>  // <-- to use 'flush_cache_all()'
static int altfb_dma_start(unsigned long start, unsigned long len)
{
    unsigned long base =
        (unsigned long)ioremap(SGDMABASE, ALTERA_SGDMA_IO_EXTENT);
    sgdma_desc *desc, *desc1;
    int ndesc = (len + DISPLAY_BYTES_PER_DESC - 1) / DISPLAY_BYTES_PER_DESC;
    int ndesc_size = sizeof(sgdma_desc) * ndesc;
    int i;
    writel(ALTERA_SGDMA_CONTROL_SOFTWARERESET_MSK,
           base + ALTERA_SGDMA_CONTROL);    /* halt current transfer */
    writel(0, base + ALTERA_SGDMA_CONTROL);    /* disable interrupts */
    writel(0xff, base + ALTERA_SGDMA_STATUS);    /* clear status */
    /* assume cache line size is 32, which is required by sgdma desc */
    desc1 = kzalloc(ndesc_size, GFP_KERNEL);
    if (desc1 == NULL)
        return -ENOMEM;
//    desc1 = ioremap((unsigned long)desc1, ndesc_size);
    for (i = 0, desc = desc1; i < ndesc; i++, desc++) {
        unsigned ctrl = ALTERA_SGDMA_DESCRIPTOR_CONTROL_OWNED_BY_HW_MSK;
        desc->read_addr = (void *)start;
        if (i == (ndesc - 1)) {
//            desc->next = (void *)desc1;
            desc->next = (void *)virt_to_phys(desc1);
            desc->bytes_to_transfer = len;
            ctrl |=
                ALTERA_SGDMA_DESCRIPTOR_CONTROL_GENERATE_EOP_MSK;
        } else {
//            desc->next = (void *)(desc + 1);
            desc->next = (void *)virt_to_phys((desc + 1));
            desc->bytes_to_transfer = DISPLAY_BYTES_PER_DESC;
        }
        if (i == 0)
            ctrl |=
                ALTERA_SGDMA_DESCRIPTOR_CONTROL_GENERATE_SOP_MSK;
        desc->control = ctrl;
        start += DISPLAY_BYTES_PER_DESC;
        len -= DISPLAY_BYTES_PER_DESC;
    }
//    writel((unsigned long)desc1, base + ALTERA_SGDMA_NEXT_DESC_POINTER);
    writel(((unsigned long)virt_to_phys(desc1)), base + ALTERA_SGDMA_NEXT_DESC_POINTER);
    writel(ALTERA_SGDMA_CONTROL_RUN_MSK | ALTERA_SGDMA_CONTROL_PARK_MSK,
           base + ALTERA_SGDMA_CONTROL);    /* start */
    flush_cache_all();   // <- To flush descriptors from the cache.  Indeed, it's enough to flush the D-cache.
    return 0;
}# else
static int altfb_dma_start(unsigned long start, unsigned long len)

Kazu

Altera_Forum · ‎08-14-2010

Oh, sorry. It's racing.

The descriptors must be flushed before we re-start the DMA.


//    writel((unsigned long)desc1, base + ALTERA_SGDMA_NEXT_DESC_POINTER);
    writel(((unsigned long)virt_to_phys(desc1)), base + ALTERA_SGDMA_NEXT_DESC_POINTER);
    flush_cache_all();  // <- Must be here.
    writel(ALTERA_SGDMA_CONTROL_RUN_MSK | ALTERA_SGDMA_CONTROL_PARK_MSK,
           base + ALTERA_SGDMA_CONTROL);    /* start */
    // flush_cache_all();   // <- To flush descriptors from the cache.  Indeed, it's enough to flush the D-cache.
    return 0;
}

Kazu