Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++

Linux with MMU on NEEK

Altera_Forum
Honored Contributor II
5,048 Views

Hi, all. 

 

I'm testing Linux MMU version, on my NEEK. 

 

http://www.nioswiki.com/linux 

 

It works fine and I can use "bash" shell. This is the evident proof that we are using the true 'fork' instead of 'vfork'. 

 

May be this will depends on the version, but TSE driver claims an error and doesn't work on this design. The error is 

 

ERROR: altera_tse.c:1666: request_mem_region() failed I think that this error is caused by misunderstanding of the usage for the function request_mem_region(). Inside of the request_mem_region(), the function __request_region() is called. If the resource has been already registered, this function returns a non-NULL value, that is the pointer for its resource. But the resource 'sgdma_rx_base' is already registered in the initialization process, so this function returns the 'conflict' and  

 

if (!request_mem_region(sgdma_rx_base, sgdma_rx_size, "altera_tse")) { is always true. So I made a dirty patch, 

 

if (!request_mem_region(sgdma_rx_base, sgdma_rx_size, "altera_tse")) { reg_resource = __request_region(&iomem_resource, sgdma_rx_base, sgdma_rx_size, "altera_tse", 0); if (reg_resource != NULL && reg_resource->flags & IORESOURCE_BUSY) { printk(KERN_ERR "ERROR: %s:%d: request_mem_region() failed\n", __FILE__, __LINE__); ret = -EBUSY; goto out_sgdma_rx; } } Moreover, the author is forgetting that the DMA is working in the physical address world, 

so we need to set the pointers of descripters like 

// desc->source = read_addr; desc->source = virt_to_phys(read_addr); // desc->destination = write_addr; desc->destination = virt_to_phys(write_addr); // desc->next = (unsigned int *)next; desc->next = (unsigned int *)((unsigned long)next & 0x1fffffffUL); and so on. 

 

Also the frame buffer fb0 will not work well, because the driver 'altfb.c' is not implemented for Linux with MMU version. So I put some codes for altfb_mmap(), like 

/* We implement our own mmap to set MAY_SHARE and add the correct size */ static int altfb_mmap(struct fb_info *info, struct vm_area_struct *vma) { unsigned long phys_addr, phys_size; unsigned long addr; unsigned long size = vma->vm_end - vma->vm_start; unsigned long offset = vma->vm_pgoff << PAGE_SHIFT; // vma->vm_flags |= VM_MAYSHARE | VM_SHARED; // vma->vm_start = info->screen_base; // vma->vm_end = vma->vm_start + info->fix.smem_len; /* check range */ if (vma->vm_pgoff > (~0UL >> PAGE_SHIFT)) return -EINVAL; if (offset + size > altfb_fix.smem_len) return -EINVAL; vma->vm_flags |= VM_IO | VM_RESERVED; addr = vma->vm_start; phys_addr = altfb_fix.smem_start + offset; if ((offset + size) < altfb_fix.smem_len) phys_size = size; else phys_size = altfb_fix.smem_len - offset; vma->vm_page_prot = __pgprot(_PAGE_PRESENT|_PAGE_READ|_PAGE_WRITE); if (remap_pfn_range(vma, addr, phys_addr >> PAGE_SHIFT, phys_size, vma->vm_page_prot)) return -EAGAIN; return 0; } and rewrite the DMA descripters like 

desc->next = (void *)virt_to_phys((desc + 1)); So now, I can evoke telnetd and control NEEK through ethernet, and use Nano-X on Linux MMU version, but can't enter ftp session, because 'getservbyname()' function will not work well.  

I don't know the directory that the souce of 'getservbyname()' is included. Would anyone please tell me where is it? 

 

Thank you, in advance.
0 Kudos
95 Replies
Altera_Forum
Honored Contributor II
719 Views

Hi, 

 

Thank you, Michael. 

 

 

--- Quote Start ---  

I asked Frak Storm our dealer's Altera FPGA. He said that the Areferens design of Altera's do have 2*32K Cache and Altera recommends not not change the cache configuration. So the 4K limit does not seem to exist. Nonetheless I asked him to doublecheck the NIOS MMU/Cache "hardware" regarding this issue. 

 

--- Quote End ---  

 

 

The easiest way to check it is to have caches more than 4KB and run Linux with MMU version on it. But I'm sorry, I don't have enough time to do it. 

 

Kazu
0 Kudos
Altera_Forum
Honored Contributor II
719 Views

Hi, 

 

Now, I'm trying to implement the relocating codes to the Nios's dynamic linker. At first, I must make the 'static' linker ' nios2-wrs-linux-gnu-ld' or 'ld' put relocating information. So I added some codes for two functions 'nios2_elf32_relocate_section' and 'nios2_elf32_check_relocs' in the file '/nios2gcc4/src/binutils-2.17.50/bfd/elf32-nios2.c'. About the details, please refer the attached file. 

With these 'static' linkers, we can compile the 'Samples', main.c, a.c and b.c as follows 

nios2-wrs-linux-gnu-gcc -g -shared -Wl,-Bsymbolic -G0 a.c -o a.so nios2-wrs-linux-gnu-gcc -g -shared -Wl,-Bsymbolic -G0 b.c -o b.so nios2-wrs-linux-gnu-gcc -g main.c a.so b.so -o main The relocation information of shared libraries are as follows. 

In 'a.readelf' 

Relocation section '.rela.dyn' at offset 0x308 contains 18 entries: Offset Info Type Sym.Value Sym. Name + Addend 00001834 00000027 R_NIOS2_RELATIVE 00001748 00001838 00000027 R_NIOS2_RELATIVE 00001864 0000183c 00000027 R_NIOS2_RELATIVE 00001860 00001840 00000027 R_NIOS2_RELATIVE 000004b0 00001844 00000027 R_NIOS2_RELATIVE 0000056c 00001848 00000027 R_NIOS2_RELATIVE 0000173c 0000184c 00000027 R_NIOS2_RELATIVE 0000062c 00001854 00000027 R_NIOS2_RELATIVE 0000185c 0000185c 00000027 R_NIOS2_RELATIVE 0000185c 00001860 00000027 R_NIOS2_RELATIVE 00001744 000005f4 00001104 R_NIOS2_CALL26 00000000 func_b + 0 00000614 00001104 R_NIOS2_CALL26 00000000 func_b + 0 000005fc 0000100b R_NIOS2_HIADJ16 00000000 j + 0 00000600 0000100a R_NIOS2_LO16 00000000 j + 0 00000608 0000100b R_NIOS2_HIADJ16 00000000 j + 0 0000060c 0000100a R_NIOS2_LO16 00000000 j + 0 00001850 00000b25 R_NIOS2_GLOB_DAT 00000000 _Jv_RegisterClasses + 0 00001858 00000f25 R_NIOS2_GLOB_DAT 00000000 __cxa_finalize + 0 Relocation section '.rela.plt' at offset 0x3e0 contains 2 entries: Offset Info Type Sym.Value Sym. Name + Addend 0000182c 00000f26 R_NIOS2_JUMP_SLOT 00000000 __cxa_finalize + 0 00001830 00001126 R_NIOS2_JUMP_SLOT 00000000 func_b + 0 There are no unwind sections in this file. Symbol table '.dynsym' contains 21 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 000003f8 0 SECTION LOCAL DEFAULT 8 2: 000004b0 0 SECTION LOCAL DEFAULT 10 3: 000006bc 0 SECTION LOCAL DEFAULT 11 4: 00000734 0 SECTION LOCAL DEFAULT 12 5: 00001738 0 SECTION LOCAL DEFAULT 13 6: 00001740 0 SECTION LOCAL DEFAULT 14 7: 00001748 0 SECTION LOCAL DEFAULT 15 8: 0000185c 0 SECTION LOCAL DEFAULT 18 9: 00001864 0 SECTION LOCAL DEFAULT 19 10: 000005dc 80 FUNC GLOBAL DEFAULT 10 func_a 11: 00000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses 12: 000006bc 0 NOTYPE GLOBAL DEFAULT 11 _fini 13: 00009810 0 NOTYPE GLOBAL DEFAULT ABS _gp 14: 00001864 0 NOTYPE GLOBAL DEFAULT ABS __bss_start 15: 00000000 356 FUNC WEAK DEFAULT UND __cxa_finalize@GLIBC_2.10 (2) 16: 00000000 0 NOTYPE GLOBAL DEFAULT UND j 17: 00000000 0 NOTYPE GLOBAL DEFAULT UND func_b 18: 00001868 0 NOTYPE GLOBAL DEFAULT ABS _end 19: 00001864 0 NOTYPE GLOBAL DEFAULT ABS _edata 20: 000003f8 0 NOTYPE GLOBAL DEFAULT 8 _init

In 'b.readelf' 

Relocation section '.rela.dyn' at offset 0x300 contains 17 entries: Offset Info Type Sym.Value Sym. Name + Addend 00001850 00000027 R_NIOS2_RELATIVE 0000175c 00001854 00000027 R_NIOS2_RELATIVE 00001880 00001858 00000027 R_NIOS2_RELATIVE 0000187c 0000185c 00000027 R_NIOS2_RELATIVE 00000484 00001860 00000027 R_NIOS2_RELATIVE 00000540 00001864 00000027 R_NIOS2_RELATIVE 00001750 00001868 00000027 R_NIOS2_RELATIVE 00000640 00001870 00000027 R_NIOS2_RELATIVE 00001878 00001878 00000027 R_NIOS2_RELATIVE 00001878 0000187c 00000027 R_NIOS2_RELATIVE 00001758 000005f8 00000004 R_NIOS2_CALL26 000005b0 00000600 0000000b R_NIOS2_HIADJ16 00001830 00000604 0000000a R_NIOS2_LO16 00001830 00000618 0000000b R_NIOS2_HIADJ16 00001830 0000061c 0000000a R_NIOS2_LO16 00001830 0000186c 00000b25 R_NIOS2_GLOB_DAT 00000000 _Jv_RegisterClasses + 0 00001874 00000f25 R_NIOS2_GLOB_DAT 00000000 __cxa_finalize + 0 Relocation section '.rela.plt' at offset 0x3cc contains 1 entries: Offset Info Type Sym.Value Sym. Name + Addend 0000184c 00000f26 R_NIOS2_JUMP_SLOT 00000000 __cxa_finalize + 0 There are no unwind sections in this file. Symbol table '.dynsym' contains 21 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 000003d8 0 SECTION LOCAL DEFAULT 8 2: 00000484 0 SECTION LOCAL DEFAULT 10 3: 000006d0 0 SECTION LOCAL DEFAULT 11 4: 00000748 0 SECTION LOCAL DEFAULT 12 5: 0000174c 0 SECTION LOCAL DEFAULT 13 6: 00001754 0 SECTION LOCAL DEFAULT 14 7: 0000175c 0 SECTION LOCAL DEFAULT 15 8: 00001830 0 SECTION LOCAL DEFAULT 17 9: 00001878 0 SECTION LOCAL DEFAULT 19 10: 00001880 0 SECTION LOCAL DEFAULT 20 11: 00000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses 12: 000006d0 0 NOTYPE GLOBAL DEFAULT 11 _fini 13: 00009830 0 NOTYPE GLOBAL DEFAULT ABS _gp 14: 00001880 0 NOTYPE GLOBAL DEFAULT ABS __bss_start 15: 00000000 356 FUNC WEAK DEFAULT UND __cxa_finalize@GLIBC_2.10 (2) 16: 00001830 4 OBJECT GLOBAL DEFAULT 17 j 17: 000005e0 96 FUNC GLOBAL DEFAULT 10 func_b 18: 00001884 0 NOTYPE GLOBAL DEFAULT ABS _end 19: 00001880 0 NOTYPE GLOBAL DEFAULT ABS _edata 20: 000003d8 0 NOTYPE GLOBAL DEFAULT 8 _init

And for the 'dynamic' linker, I put some relocating codes to the machine dependent function 'elf_machine_rela' of '/nios2gcc4/src/glibc-ports-2.5/sysdeps/nios2/dl-machine.h'. 

Unfortunately this relocation rewrites the 'text' section, so we must set the flag of elf's 'Program Headers' like 

Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x00000000 0x00000000 0x00738 0x00738 RWE 0x1000 <-- Flg must be RWE instead of R E. LOAD 0x000738 0x00001738 0x00001738 0x0012c 0x0012d RW 0x1000 DYNAMIC 0x00074c 0x0000174c 0x0000174c 0x000c8 0x000c8 RW 0x4

Now I can't do this through the linker, so now I used a 'binary editor' and rewrote it. 

 

Anyway, we can make the position dependent codes to shared libraries, but we need switches '-Bsymbolic' and '-G0' (this is needed to avoid the use of gp register for optimization), instead of '-fPIC'. I'm not sure that this direction is right or not. 

 

Kazu
0 Kudos
Altera_Forum
Honored Contributor II
719 Views

Sorry, but I don't understand the details, and in fact I'm not really sure what difference this update makes.  

 

Did you fix a problem ? 

In what cases does this problem hit ? (Kernel basic code, Kernel Modules (which i learned would not work before), user land programs, user land so's) ? 

 

 

Will this get included into the distribution ?  

How ?  

Is MontaVista and the other commercial provider aware of this ? Why does it not hit them ? 

 

Thanks, 

-Michael
0 Kudos
Altera_Forum
Honored Contributor II
719 Views

 

--- Quote Start ---  

The easiest way to check it is to have caches more than 4KB and run Linux with MMU version on it. But I'm sorry, I don't have enough time to do it. 

 

Kazu 

--- Quote End ---  

 

As mentioned earlier in the thread, I've done this, and it runs, but ethernet does not work.
0 Kudos
Altera_Forum
Honored Contributor II
719 Views

Hi, 

 

 

--- Quote Start ---  

As mentioned earlier in the thread, I've done this, and it runs, but ethernet does not work. 

--- Quote End ---  

 

 

The thing is not so easy, because the kernel code itself does not use MMU's address conversion mechanism. So we need an adequate user-land program to stick the pinhole. 

 

Kazu
0 Kudos
Altera_Forum
Honored Contributor II
719 Views

Hi, 

 

 

--- Quote Start ---  

 

Sorry, but I don't understand the details, and in fact I'm not really sure what difference this update makes.  

 

--- Quote End ---  

 

 

By this new (static) linker, you can make shared libraries from pre-compiled position dependent codes, and the new dynamic linker can relocate those adequately.  

For example, if you compile a sample program like 

/** a.c --- Test for Nios Dynamic Linker **/ extern func_b(int); extern int j; int func_a(int i) { j = func_b(i); return j; } ,with the command 

nios2-wrs-linux-gnu-gcc -c -G0 -g a.c -o a.o the compiler will generate relocating information with its position dependent codes as follows. 

Relocation section '.rela.text' at offset 0x7b0 contains 5 entries: Offset Info Type Sym.Value Sym. Name + Addend 00000018 00000e04 R_NIOS2_CALL26 00000000 func_b + 0 00000020 00000f0b R_NIOS2_HIADJ16 00000000 j + 0 00000024 00000f0a R_NIOS2_LO16 00000000 j + 0 0000002c 00000f0b R_NIOS2_HIADJ16 00000000 j + 0 00000030 00000f0a R_NIOS2_LO16 00000000 j + 0 But the old (static) linker can not pass these information to the shared library headers. 

Added codes of the new linker will do it. 

Relocation section '.rela.dyn' at offset 0x18c contains 5 entries: Offset Info Type Sym.Value Sym. Name + Addend 0000021c 00000404 R_NIOS2_CALL26 00000000 func_b + 0 00000224 0000020b R_NIOS2_HIADJ16 00000000 j + 0 00000228 0000020a R_NIOS2_LO16 00000000 j + 0 00000230 0000020b R_NIOS2_HIADJ16 00000000 j + 0 00000234 0000020a R_NIOS2_LO16 00000000 j + 0 Relocation section '.rela.plt' at offset 0x1c8 contains 1 entries: Offset Info Type Sym.Value Sym. Name + Addend 000012fc 00000426 R_NIOS2_JUMP_SLOT 00000000 func_b + 0

The new dynamic linker 'ld.so.1' will do the relocation for these new items, ' R_NIOS2_CALL26', 'R_NIOS2_HIADJ16' and 'R_NIOS2_LO16'. 

 

 

--- Quote Start ---  

 

Did you fix a problem ? 

 

--- Quote End ---  

 

Yes, but partially fixed. 

 

 

--- Quote Start ---  

 

In what cases dopes this problem hit ? (Kernel basic code, Kernel Modules (which i learned would not work before), user land programs, user land so's) ? 

 

--- Quote End ---  

 

This modifications are only applied to user-land so's. Newly added codes never work without two switches '-shared' and '-Bsymbolic'. 

 

 

--- Quote Start ---  

 

Will this get included into the distribution ?  

How ?  

 

--- Quote End ---  

 

I don't know, because I'm only a Sunday programmer. 

 

 

--- Quote Start ---  

 

Is MontaVista and the other commercial provider aware of this ? Why does it not hit them ? 

 

--- Quote End ---  

 

I really don't know who wrote these codes. But I have a little bit doubt whether the implementer do his job seriously or not. The codes are messy, confused and have many un-fixed parts. Sometimes I encounter unbelievable comments like 

/* The runtime resolver receives the original function arguments in r4 through r7, the shared library identifier from GOT? in r14, and the relocation index times four in r15. It updates the corresponding PLT GOT entry so that the PLT entry will transfer control directly to the target in the future, and then transfers control to the target. */

Why there exists '?' after GOT[1] ? It seems to me that this implementer doesn't know the structure of GOT well. 

 

Kazu
0 Kudos
Altera_Forum
Honored Contributor II
719 Views

 

--- Quote Start ---  

Hi, 

 

 

 

I have a little bit doubt that Nios CPU with MMU can't have over 4Kbytes caches. There are several ways of connecting method for CPU, cache and MMU. For example, 

 

1) CPU -- cache -- MMU -- Memory 

2) CPU -- MMU -- cache -- Memory. 

 

The first method has low latency, but also has 'synonym problems'. The second method accesses the cache by physical addresses, but has larger latency to the contrary. So I think that Nios CPU takes the next strategy i.e. 

 

3) CPU |--cache --| -- Memory 

............|--MMU --| 

 

by limiting the size of cache under page size.(Please refer Nios handbook n2cpu_nii5v1.pdf, page 2-10, Figure 2-2.) 

If so, we can't have both instruction and data caches larger than page size(=4Kbytes). 

 

Kazu 

--- Quote End ---  

 

 

I found the relevant section, page 3-53 of the handbook: 

 

--- Quote Start ---  

Virtual Address Aliasing 

A virtual address alias occurs when two virtual addresses map to the same physical 

address. When an MMU and caches are present and the caches are larger than a page 

(4 KBytes), the operating system must prevent illegal virtual address aliases. Because 

the caches are virtually-indexed and physically-tagged, a portion of the virtual 

address is used to select the cache line. If the cache is 4 KBytes or less in size, the 

portion of the virtual address used to select the cache line fits with bits 11:0 of the 

virtual address which have the same value as bits 11:0 of the physical address (they 

are untranslated bits of the page offset). However, if the cache is larger than 4 KBytes, 

bits beyond the page offset (bits 12 and up) are used to select the cache line and these 

bits are allowed to be different than the corresponding physical address. 

 

For example, in a 64 KByte direct-mapped cache with a 16-byte line, bits 15:4 are used 

to select the line. Assume that virtual address 0x1000 is mapped to physical address 

0xF000 and virtual address 0x2000 is also mapped to physical address 0xF000. 

This is an illegal virtual address alias because accesses to virtual address 0x1000 use 

line 0x1 and accesses to virtual address 0x2000 use line 0x2 even though they map to 

the same physical address. This results in two copies of the same physical address in 

the cache. With an n-byte direct-mapped cache, there could be n/4096 copies of the 

same physical address in the cache if illegal virtual address aliases are not prevented. 

The bits of the virtual address that are used to select the line and are translated bits 

(bits 12 and up) are known as the color of the address. An operating system avoids 

illegal virtual address aliases by ensuring that if multiple virtual addresses map the 

same physical address, the virtual addresses have the same color. Note though, the 

color of the virtual addresses does not need to be the same as the color as the physical 

address because the cache tag contains all the bits of the PFN. 

 

--- Quote End ---  

 

Question is is this support implemented in Linux and what would be required to fix it.
0 Kudos
Altera_Forum
Honored Contributor II
719 Views

"virtually-indexed and physically-tagged" ??? 

 

How does this compare to ARM, that uses the cache and the MMU completely in the "wrong" order (strictly using physical addresses in the cache). Because of that, ARM-Linux needs to flush the cache completely with any task-switch. That is why for ARM systems with many task switches, not using the MMU is recommended. 

 

I sincerely hope that such a drastic method is not necessary with NIOS ! 

 

Happily, I myself am planing a heavily multithreaded system, so not that many MMU reprogramming (and cache invalidating) as with a heavily multitasking system. Same is only possible with MMU, as the non-MMU-compiler does not support TLS, which is essential to do decent multithreaded applications. 

 

-Michael
0 Kudos
Altera_Forum
Honored Contributor II
719 Views

Hi, 

 

 

--- Quote Start ---  

 

"virtually-indexed and physically-tagged" ??? 

 

--- Quote End ---  

 

 

So, I think that 

 

3) CPU |--cache --| -- Memory 

............|--MMU --| 

 

is 'Bingo'. 

 

 

--- Quote Start ---  

 

How does this compare to ARM, that uses the cache and the MMU completely in the "wrong" order (strictly using physical addresses in the cache). Because of that, ARM-Linux needs to flush the cache completely with any task-switch. That is why for ARM systems with many task switches, not using the MMU is recommended. 

 

I sincerely hope that such a drastic method is not necessary with NIOS ! 

 

--- Quote End ---  

 

 

In NIOS, you don't need to flush the cache for each task-switch. But the TLB uses PID mechanism to distinguish each user tasks,  

void set_mmu_pid(unsigned long pid) { WRCTL(CTL_TLBMISC, (RDCTL(CTL_TLBMISC) & (WAY_MASK << WAY_SHIFT)) | ((pid & PID_MASK) << PID_SHIFT)); } so TLB flush and loading will occur automatically. 

 

Kazu
0 Kudos
Altera_Forum
Honored Contributor II
719 Views

Sounds good.  

 

Is the software part already implemented in the Kernel ? 

 

Does it work ?  

 

Decent performance ?  

 

Thanks, 

-Michael
0 Kudos
Altera_Forum
Honored Contributor II
719 Views

Hi, 

 

 

--- Quote Start ---  

 

Is the software part already implemented in the Kernel ? 

Does it work ?  

 

--- Quote End ---  

 

 

Yes, of course, we are using these. Nios's TLB control parts are included in the file '/nios2-linux/linux-2.6/arch/nios2/mm/mmu_context.c, tlb.c' and cache flush functions are included in '/nios2-linux/linux-2.6/arch/nios2/mm/cacheflush.c'. 

But please note that the cache size is limited within 4KB. In present codes, there is no mechanism to avoid the alias problem. 

 

 

--- Quote Start ---  

 

Decent performance ?  

 

--- Quote End ---  

 

 

Though I don't have any concrete data for the performance, but I think it is not so bad. FLTK's demos. for example, 'editor' works well. It uses 'Bitblit' functionality and this is heavy task for Nios CPU and its MMU, but the scrolling speed is not so slow. 

 

Kazu
0 Kudos
Altera_Forum
Honored Contributor II
719 Views

 

--- Quote Start ---  

But please note that the cache size is limited within 4KB. In present codes, there is no mechanism to avoid the alias problem. 

--- Quote End ---  

So it is not correctly implemented. All Altera example designs use much more cache ! 

 

What should we do about that ?  

 

I feel that 4K cache will degrade performance a lot.. I did not test this thoroughly, but I once did a speed test with a uCLinux vs a fill lunux design and found that the full Linux design was much slower (up to half speed). I'm not sure about the cache sizes, though 

 

-Michael
0 Kudos
Altera_Forum
Honored Contributor II
719 Views

Actually, I have it working with 32KB caches now, I'm not sure what changed (been through a few FPGA updates since I last tried it). I think at least some support for the aliasing problem is there actually - see cache_flush.c, syscall.c, Documentation/cachetlb.txt. 

 

I did make this change, but haven't noticed any difference with or without it, but it seems more correct for the COLOUR_ALIGN macro in syscall.c and by the documentation in cachetlb.txt: 

--- a/arch/nios2/include/asm/shmparam.h +++ b/arch/nios2/include/asm/shmparam.h @@ -1 +1,2 @@ -#include <asm-generic/shmparam.h> +#include <asm/nios.h> +#define SHMLBA DCACHE_SIZE
0 Kudos
Altera_Forum
Honored Contributor II
719 Views

Hi, 

 

 

--- Quote Start ---  

 

So it is not correctly implemented. All Altera example designs use much more cache ! 

What should we do about that ?  

 

--- Quote End ---  

 

 

I'm not sure that Wind River guys had implemented a correct mechanism to deal with the 'alias problem'. So give me some time to check it. 

 

 

--- Quote Start ---  

 

I feel that 4K cache will degrade performance a lot.. I did not test this thoroughly, but I once did a speed test with a uCLinux vs a fill lunux design and found that the full Linux design was much slower (up to half speed). I'm not sure about the cache sizes, though 

 

--- Quote End ---  

 

 

Please take account of the kernel with MMU which must do many tasks compared to no-MMU version and we can enjoy many excellent features instead of its lower performance.  

 

Kazu
0 Kudos
Altera_Forum
Honored Contributor II
719 Views

Hi, 

 

 

--- Quote Start ---  

 

I think at least some support for the aliasing problem is there actually - see cache_flush.c, syscall.c, Documentation/cachetlb.txt. 

 

--- Quote End ---  

 

 

Oh, I'm sorry that my opinion " there is no mechanism to avoid the alias problem." was an overstatement. There exists the traces of implementation for 'alias problem', but I'm not sure whether these will work well or not. As mentioned in the 'Documentation/cachetlb.txt', the 'alias problem' affects only the D-cache. If we have 'alias' copies of same physical address contents, we must take special care of those flushing from the D-cache. But whether we will have the 'alias' or not depends on the functionality of Linux kernel. So the problem is a little bit difficult. It seems to me that functions 'copy_from_user_page' and 'copy_to_user_page' will work well 

void copy_from_user_page(struct vm_area_struct *vma, struct page *page, unsigned long user_vaddr, void *dst, void *src, int len) { flush_cache_page(vma, user_vaddr, page_to_pfn(page)); memcpy(dst, src, len); flush_dcache_range((unsigned long)src, (unsigned long)src+len); if(vma->vm_flags & VM_EXEC) { flush_icache_range((unsigned long)src, (unsigned long)src+len); } } void copy_to_user_page(struct vm_area_struct *vma, struct page *page, unsigned long user_vaddr, void *dst, void *src, int len) { flush_cache_page(vma, user_vaddr, page_to_pfn(page)); memcpy(dst, src, len); flush_dcache_range((unsigned long)dst, (unsigned long)dst+len); if(vma->vm_flags & VM_EXEC) { flush_icache_range((unsigned long)dst, (unsigned long)dst+len); } }, beacuse these are only the 'alias problems' between user-land virtual addresses and kernel's ones. But I'm still not sure for the case of shared maps among user-lands. 

 

Kazu
0 Kudos
Altera_Forum
Honored Contributor II
719 Views

 

--- Quote Start ---  

So give me some time to check it. 

--- Quote End ---  

 

Great ! Thanks, 

-Michael
0 Kudos
Altera_Forum
Honored Contributor II
719 Views

Hi, kazu & Gurus 

I'm working on using LCD in a nios2-mmu system on NEEK, I followed the instructions on nios-wiki to configure the kernel and modified altfb.c by advices of Kazu above, The result in my Linux is: "fb_test" command can be ran successfully: 

The framebuffer device was opened successfully. 800*480, 32bpp The framebuffer device was mapped to memory successfully. but I can not run Nano-X :  

cantnot bind to named socket and the image I ran is "linux.initramsfs.gz". 

 

I need your help, please give me some advice and thank you.
0 Kudos
Altera_Forum
Honored Contributor II
716 Views

Hi. 

 

 

--- Quote Start ---  

 

cantnot bind to named socket  

--- Quote End ---  

 

 

At first, please check your kernel includes 'Unix domain sockets'. 

Networking support ---> --- Networking support Networking options ---> <*> Packet socket Packet socket: mmapped IO <*> Unix domain sockets Kazu
0 Kudos
Altera_Forum
Honored Contributor II
716 Views

Hi, Kazu  

Thank you for your response, the problem has been solved by your advice. But the new confusion was that nothing appeared on the screen and Nano-X seemed running correctly: 

/# nano-X & 686 nano-X /# nanowm & 687 nanowm /# nxclock & 688 nxclock And the linux init messages is below 

$ nios2-terminal nios2-terminal: connected to hardware target using JTAG UART on cable nios2-terminal: &quot;USB-Blaster &quot;, device 1, instance 1 nios2-terminal: (Use the IDE stop button or Ctrl-C to terminate) Linux version 2.6.30-00494-g84a224b-dirty (alex@alex-desktop) (gcc version 4.1.2)# 38 Thu Aug 12 16:36:39 CST 2010 console enabled Early printk initialized Linux/Nios II-MMU init_bootmem_node(?,0x50c, 0x0, 0x2000) free_bootmem(0x50c000, 0x1af4000) reserve_bootmem(0x50c000, 0x400) Built 1 zonelists in Zone order, mobility grouping on. Total pages: 8128 Kernel command line: NR_IRQS:32 PID hash table entries: 128 (order: 7, 512 bytes) Console: colour dummy device 80x25 Dentry cache hash table entries: 4096 (order: 2, 16384 bytes) Inode-cache hash table entries: 2048 (order: 1, 8192 bytes) We have 8192 pages of RAM Memory available: 27300k/5165k RAM, 0k/0k ROM (1577k kernel code, 3588k data) Calibrating delay loop... 49.25 BogoMIPS (lpj=246272) Mount-cache hash table entries: 512 net_namespace: 296 bytes NET: Registered protocol family 16 init_BSP(): registering device resources bio: create slab <bio-0> at 0 NET: Registered protocol family 2 IP route cache hash table entries: 1024 (order: 0, 4096 bytes) TCP established hash table entries: 1024 (order: 1, 8192 bytes) TCP bind hash table entries: 1024 (order: 0, 4096 bytes) TCP: Hash tables configured (established 1024 bind 1024) TCP reno registered NET: Registered protocol family 1 msgmni has been set to 53 io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) fb0: Altera FB frame buffer device ttyJ0 at MMIO 0x8001410 (irq = 7) is a Altera JTAG UART console handover: boot -> real altps2 : base e8001420 irq 9 mice: PS/2 mouse device common for all mice TCP cubic registered NET: Registered protocol family 17 atkbd.c: keyboard reset failed on altps2.0 Welcome to ____ _ _ / __| ||_| _ _| | | | _ ____ _ _ _ _ | | | | | | || | _ \| | | |\ \/ / | |_| | |__| || | | | | |_| |/ | ___\____|_||_|_| |_|\____|\_/\_/ | | |_| For further information check: http://www.uclinux.org/ BusyBox v1.15.1 (2010-08-12 16:35:17 CST) hush - the humble shell Enter 'help' for a list of built-in commands. please give me some advices when your are free, and thank you again. 

 

--Smarter.UJS
0 Kudos
Altera_Forum
Honored Contributor II
716 Views

Hi, 

 

 

--- Quote Start ---  

 

But the new confusion was that nothing appeared on the screen and Nano-X seemed running correctly: 

/# nano-X & 686 nano-X /# nanowm & 687 nanowm /# nxclock & 688 nxclock  

--- Quote End ---  

 

 

Have you revised that the all DMA descriptors use physical addresses? For example, 

 

} __attribute__ ((packed)) sgdma_desc; # include <asm/cacheflush.h> // <-- to use 'flush_cache_all()' static int altfb_dma_start(unsigned long start, unsigned long len) { unsigned long base = (unsigned long)ioremap(SGDMABASE, ALTERA_SGDMA_IO_EXTENT); sgdma_desc *desc, *desc1; int ndesc = (len + DISPLAY_BYTES_PER_DESC - 1) / DISPLAY_BYTES_PER_DESC; int ndesc_size = sizeof(sgdma_desc) * ndesc; int i; writel(ALTERA_SGDMA_CONTROL_SOFTWARERESET_MSK, base + ALTERA_SGDMA_CONTROL); /* halt current transfer */ writel(0, base + ALTERA_SGDMA_CONTROL); /* disable interrupts */ writel(0xff, base + ALTERA_SGDMA_STATUS); /* clear status */ /* assume cache line size is 32, which is required by sgdma desc */ desc1 = kzalloc(ndesc_size, GFP_KERNEL); if (desc1 == NULL) return -ENOMEM; // desc1 = ioremap((unsigned long)desc1, ndesc_size); for (i = 0, desc = desc1; i < ndesc; i++, desc++) { unsigned ctrl = ALTERA_SGDMA_DESCRIPTOR_CONTROL_OWNED_BY_HW_MSK; desc->read_addr = (void *)start; if (i == (ndesc - 1)) { // desc->next = (void *)desc1; desc->next = (void *)virt_to_phys(desc1); desc->bytes_to_transfer = len; ctrl |= ALTERA_SGDMA_DESCRIPTOR_CONTROL_GENERATE_EOP_MSK; } else { // desc->next = (void *)(desc + 1); desc->next = (void *)virt_to_phys((desc + 1)); desc->bytes_to_transfer = DISPLAY_BYTES_PER_DESC; } if (i == 0) ctrl |= ALTERA_SGDMA_DESCRIPTOR_CONTROL_GENERATE_SOP_MSK; desc->control = ctrl; start += DISPLAY_BYTES_PER_DESC; len -= DISPLAY_BYTES_PER_DESC; } // writel((unsigned long)desc1, base + ALTERA_SGDMA_NEXT_DESC_POINTER); writel(((unsigned long)virt_to_phys(desc1)), base + ALTERA_SGDMA_NEXT_DESC_POINTER); writel(ALTERA_SGDMA_CONTROL_RUN_MSK | ALTERA_SGDMA_CONTROL_PARK_MSK, base + ALTERA_SGDMA_CONTROL); /* start */ flush_cache_all(); // <- To flush descriptors from the cache. Indeed, it's enough to flush the D-cache. return 0; }# else static int altfb_dma_start(unsigned long start, unsigned long len) Kazu
0 Kudos
Altera_Forum
Honored Contributor II
716 Views

Oh, sorry. It's racing. 

The descriptors must be flushed before we re-start the DMA. 

 

// writel((unsigned long)desc1, base + ALTERA_SGDMA_NEXT_DESC_POINTER); writel(((unsigned long)virt_to_phys(desc1)), base + ALTERA_SGDMA_NEXT_DESC_POINTER); flush_cache_all(); // <- Must be here. writel(ALTERA_SGDMA_CONTROL_RUN_MSK | ALTERA_SGDMA_CONTROL_PARK_MSK, base + ALTERA_SGDMA_CONTROL); /* start */ // flush_cache_all(); // <- To flush descriptors from the cache. Indeed, it's enough to flush the D-cache. return 0; } Kazu
0 Kudos
Reply