Developing Games on Intel Graphics
If you are gaming on graphics integrated in your Intel Processor, this is the place for you! Find answers to your questions or post your issues with PC games
548 Discussions

3D engine

Anonymous
Not applicable
54,872 Views

Hello there,

ok, here we go, I have a dream, make a 3D engine 100% assembler intel only with CPU, I use rotation matrix only for now.


it works of course, but it's slow when I put a lot of pixels.

Recently I decided to include voxels in my engine, and it's slow when I put> = 8000 voxels (20 * 20 * 20 cube) and when I saw that nvidia display 32M voxels (fire) I wonder how they can do it !



And I have a little idea of  the reason: MMU, paging, segmentation. memory.

Am I right?



Another question, is the FPU is the slowest to compute floating point  than SSE or depending of data manipulate ?


PS: I work without OS like Windows or Linux, I run on my own kernel + bootloader in assembly too with NASM.

Sorry if i don't wirte a good english, i'm french and use google translate ^-^

0 Kudos
1 Solution
Bradley_W_Intel
Employee
54,442 Views

You clearly are using the processor in a very advanced way. I will do my best to answer your questions:

1) Why is your voxel engine not able to efficiently render as many voxels as you'd like? Voxel engines need to maximize their use of parallelism (both threading and SIMD) and also to store the data efficiently in an octree or some other structure that can handle sparse data. If you are doing all these things and still not getting the performance you expect, it's an optimization problem. Some Intel tools like VTune Performance Analyzer are excellent for performance analysis.

2) Is single data floating point math faster than SIMD (if I understood you)? Typically SIMD will be faster than single data instructions if your data is laid out in a way that supports the SIMD calls. In all cases, the only way for you to know for certain which way is faster is to test it.

3) How can you select between discrete and processor graphics? DirectX has methods of enumerating adapters. In such a case, the processor graphics is listed separately from the discrete graphics. If you are choosing your adapter based on the amount of available memory, you may be favoring the processor graphics when you didn't intend to. Intel has sample code that shows how to properly detect adapters in DirectX at https://software.intel.com/en-us/vcsource/samples/gpu-detect. The process for OpenGL is not well documented.

4) Can I use one processor to control execution of a second processor? Probably not. The details on Intel processors are covered at http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html. It's possible, though unlikely, that you'll be able to find something in there that can help you.

 

View solution in original post

0 Kudos
270 Replies
Anonymous
Not applicable
3,542 Views

phi.x, phi.y and phi.z mean phi of x, ...

(phi = angle)

0 Kudos
Bernard
Valued Contributor I
3,542 Views

shaynox s. wrote:

Back, is there an instruction for truncation smid register directly ?

i use those code for do that, unfortunately:

			cvtps2pi  	mm0, xmm0	
			cvtpi2ps	xmm0, mm0

 

Truncation is the part of the casting machine code instruction and it is performer in the hardware.

0 Kudos
Bernard
Valued Contributor I
3,542 Views

>>>Do you know any IDE with icc ? >>>

If you are on Windows you can use Visual Studio.

0 Kudos
Bernard
Valued Contributor I
3,542 Views

>>>But the asm write by GCC without built-in is still fill by fpu instruction>>>

Do you mean x87 machine code instructions? What is the target CPU for GCC compiled C code?

0 Kudos
Bernard
Valued Contributor I
3,542 Views

>>>Do SDL interact with gpu for execute the clearing of screen ? i'm lost>>>

What does this variable PhysBasePtr  points to? It seems that this some kind of pointer to base address which is later zeroed by xorps instruction. I suppose that PhysBasePtr could point to some kind of physical buffer which is later accessed by the GPU( some kind of Host-to-GPU circular buffer).

0 Kudos
Anonymous
Not applicable
3,542 Views

Hello, the target is intel i7 cpu core, and here the sample of asm code make by gcc:

		LVL3:
			fmul	dword [_rotation_z]
		LVL4:
			fsincos
		LVL5:
			fld	st(4)
		LVL6:
			fxch	st(1) al
		LVL7:
			fst		dword [esp+4]
			fmulp	st(1), st
			fmul	dword [esp+24]
			fld		st(4)
			fmul	st, st(2)
			fmul	dword [esp+28]
			fsubp	st(1), st

I think i found the problem for crash, the roll block work, because i don't store value (z) in RAM, whereas other block (pitch and yaw) i stock with

movd    [_x], xmm0

movd    [_y], xmm4

 

Yes, PhysBasePtr is a pointer to first pixel in top right, calculate by first function, honnestly i don't know how this mapping is doing and GPU access to this buffer:

 

				; Return VBE Mode Information
					mov		ax, 0x4F01
					mov		cx, 0x115			; Mode number
					mov		di, ModeInfoBlock   ; Pointer to ModeInfoBlock structure. Scroll up for read all structure
					int		0x10
				; Set VBE Mode
					mov		ax, 0x4F02
					mov		bx, 0_1_0_0_0_0_0_1_0_0_0_1_0_1_0_1b
						; 	| | | | | | | `-`-`-`-`-`-`-`-`----- Mode number: 0x115 = 800 * 600
						; 	| | | | | `-`---------------------- Reserved (must be 0)
						;   | | | | `------------------------- 0 = Use current default refresh rate, 1 = Use user specified CRTC values for refresh rate
						; 	| | `-`-------------------------- Reserved for VBE/AF (must be 0)
						; 	| `----------------------------- 0 = Use windowed frame buffer model, 1 = Use linear/flat frame buffer model
						; 	`------------------------------ 0 = Clear display memory, 1 = Don't clear display memory
					mov		di, CRTCInfoBlock
					int		0x10

 

else i have wrote this algorithm for found LFB pointer without call this function:

		; Algo: LFB = 0x40000000+((RAM_SIZE_Go-1)*0x40000000)
			;
			; If RAM == 1 Go { LFB = 0x040000000 }
			;									  + 0x40000000
			; If RAM == 2 Go { LFB = 0x080000000 }
			;									  + 0x40000000
			; If RAM == 3 Go { LFB = 0x0C0000000 }
			;									  + 0x40000000
			; If  RAM == 4 Go { LFB = 0x100000000 }
			;									  + 0x40000000
			;  RAM == 5 Go { LFB = 0x140000000 } 

 

This pointer is only availbe in vesa mode is, for SDL i don't know how is managed, but the algorithm it's the same, work with a pointer:

SDL_Surface *screen;
screen = SDL_SetVideoMode( LENGTH, WIDTH, 32, SDL_HWSURFACE );
int *ptr_pixel = screen->pixels;
*(ptr_pixel) = 0xFF0000FF;      //blue

 

0 Kudos
Anonymous
Not applicable
3,542 Views

Here's my function for matrix rotation:

#define		REPERE		    (PITCH * (WIDTH/2 - 1)) + (BPP * (LENGTH/2 - 1))
#define		BPP			    4
#define		LENGTH		    800     // x
#define		WIDTH		    600     // y
#define		DEGTORAD(angle) angle * 0.017453292  // angle * PI/180
#define		RAPPORT			(LENGTH/WIDTH)
#define		FOCALE          800
#define		PROFONDEUR      2000


float   rotation_x, rotation_y, rotation_z;

void        put_pixel(float x, float y, float z, int pixel)
{
    float   x_end;
    float   y_end;
    float   z_end;
    int     *ptr_pixel;

    float   cx = cos(DEGTORAD(rotation_x));
    float   cy = cos(DEGTORAD(rotation_y));
    float   cz = cos(DEGTORAD(rotation_z));
    float   sx = sin(DEGTORAD(rotation_x));
    float   sy = sin(DEGTORAD(rotation_y));
    float   sz = sin(DEGTORAD(rotation_z));

    x_end = x * ((cy * cz))                       - y * ((cy * sz))                        - z * (sy)     ;
    y_end = x * ((cx * sz)      - (sx * sy * cz)) + y * ((sx * sy * sz) + (cx * cz))       - z * (sx * cy);
    z_end = x * ((cx * sy * cz) + (sx * sz))      + y * ((sx * cz)      - (cx * sy * sz )) + z * (cx * cy);

    x = x_end / RAPPORT;
    y = y_end;
    z = z_end;

    x = (x * FOCALE) / (z + PROFONDEUR);
    y = (y * FOCALE) / (z + PROFONDEUR);

    ptr_pixel = screen->pixels + REPERE - screen->pitch * (int)y + (int)x * BPP;

    if (ptr_pixel > screen->pixels && ptr_pixel < (screen->pixels + LENGTH*WIDTH * BPP))
        if ((x <= LENGTH/2 && x >= -LENGTH/2) && (y <= WIDTH/2 && y >= -WIDTH/2))
            *(ptr_pixel) = pixel;
}

 

0 Kudos
Bernard
Valued Contributor I
3,542 Views

 

I see that your code is calling int 0x10. Does your rendering loop call int 0x10 for every pixel to be displayed?

0 Kudos
Anonymous
Not applicable
3,542 Views

no, i just do that:

			mov		edi, [PhysBasePtr]
		; REPERE + -(PITCH * y + x * BPP)
			mov		[edi + REPERE + eax], ebx

 

0 Kudos
Bernard
Valued Contributor I
3,542 Views

shaynox s. wrote:

Here's my function for matrix rotation:

#define		REPERE		    (PITCH * (WIDTH/2 - 1)) + (BPP * (LENGTH/2 - 1))
#define		BPP			    4
#define		LENGTH		    800     // x
#define		WIDTH		    600     // y
#define		DEGTORAD(angle) angle * 0.017453292  // angle * PI/180
#define		RAPPORT			(LENGTH/WIDTH)
#define		FOCALE          800
#define		PROFONDEUR      2000


float   rotation_x, rotation_y, rotation_z;

void        put_pixel(float x, float y, float z, int pixel)
{
    float   x_end;
    float   y_end;
    float   z_end;
    int     *ptr_pixel;

    float   cx = cos(DEGTORAD(rotation_x));
    float   cy = cos(DEGTORAD(rotation_y));
    float   cz = cos(DEGTORAD(rotation_z));
    float   sx = sin(DEGTORAD(rotation_x));
    float   sy = sin(DEGTORAD(rotation_y));
    float   sz = sin(DEGTORAD(rotation_z));

    x_end = x * ((cy * cz))                       - y * ((cy * sz))                        - z * (sy)     ;
    y_end = x * ((cx * sz)      - (sx * sy * cz)) + y * ((sx * sy * sz) + (cx * cz))       - z * (sx * cy);
    z_end = x * ((cx * sy * cz) + (sx * sz))      + y * ((sx * cz)      - (cx * sy * sz )) + z * (cx * cy);

    x = x_end / RAPPORT;
    y = y_end;
    z = z_end;

    x = (x * FOCALE) / (z + PROFONDEUR);
    y = (y * FOCALE) / (z + PROFONDEUR);

    ptr_pixel = screen->pixels + REPERE - screen->pitch * (int)y + (int)x * BPP;

    if (ptr_pixel > screen->pixels && ptr_pixel < (screen->pixels + LENGTH*WIDTH * BPP))
        if ((x <= LENGTH/2 && x >= -LENGTH/2) && (y <= WIDTH/2 && y >= -WIDTH/2))
            *(ptr_pixel) = pixel;
}

 

Very nice:)

Which 3D graphics book are you using for all those equations?

Do you really need parametrized macro #define it is prone to the errors. I think that best option could be usage of C++ with its constexpr keyword  and put all those macro values in header file with static linkage. Then compiler could calculate them at the compile time. 

Regarding ptr_pixel pointer declaration I usually tend to initialize it to NULL.

0 Kudos
Anonymous
Not applicable
3,542 Views

And i initialyse vesa mode only in 16 bits, this code is in 32 bit mode:

		
		; disable the interrupts
			cli				
			
		; Charge la GDT
			lgdt	[GDT64]		
	
	; *1:	
		; switch to protected mode		
			mov		eax, cr0 		
			or		al, 0x1		; PE = 1
			mov		cr0, eax
			
		; Désactiver la Pagination
			mov		eax, cr0 		
			and		eax, 01111111_11111111_11111111_11111111b ; PG = 0
			mov		cr0, eax
		jmp		(CODE32_SELECTOR-GDT64):KERNEL32
	; *

[BITS 32]
KERNEL32:

 

0 Kudos
Anonymous
Not applicable
3,542 Views

previous code*

 

0 Kudos
Bernard
Valued Contributor I
3,542 Views

shaynox s. wrote:

no, i just do that:

			mov		edi, [PhysBasePtr]
		; REPERE + -(PITCH * y + x * BPP)
			mov		[edi + REPERE + eax], ebx

 

So you are writing directly to Video RAM?

0 Kudos
Anonymous
Not applicable
3,542 Views

Thanks ^^

I just unroll matrix rotation Rx*Ry*Rz:

Rotation matrix on x:																						Vecteur:
  _________________________ _________________________ _________________________ _________________________        _________________________ 
 |						   |						 |						   |						 |      |                         |
 | 			  1 		   | 		    0 			 | 		      0 		   |			 0           |      |            x            |
 |_________________________|_________________________|_________________________|_________________________|      |_________________________|
 |						   |						 |						   |						 |      |                         |
 | 			  0 		   |        cos(phi_x)       |       -sin(phi_x)       | 		     0 			 |      |            y            |
 |_________________________|_________________________|_________________________|_________________________|  *   |_________________________|
 |						   |						 |						   |					     |      |                         |
 | 			  0 		   |        sin(phi_x)       |        cos(phi_x)       |			 0			 |      |            z            |
 |_________________________|_________________________|_________________________|_________________________|      |_________________________|
 |						   |						 |						   |					     |      |                         |
 |			  0		       |			 0			 |			  0			   |		     1  	     |      |          color          |
 |_________________________|_________________________|_________________________|_________________________|      |_________________________|
	
			*Result:
				x' = x
				y' = y.cos(phi) + -z.sin(phi)
				   = y.cos(phi) - z.sin(phi)
				z' = y.sin(phi) + z.cos(phi)
Rotation matrix on y:																						    Vecteur:
  _________________________ _________________________ _________________________ _________________________        _________________________ 
 |						   |						 |						   |						 |      |                         |
 |        cos(phi_y)       | 		    0 			 |        -sin(phi_y)      | 		    0 			 |      |            x            |
 |_________________________|_________________________|_________________________|_________________________|      |_________________________|
 |						   |						 |						   |						 |      |                         |
 | 			  0 		   |			1	         | 			  0 		   | 		    0 			 |      |            y            |
 |_________________________|_________________________|_________________________|_________________________|  *   |_________________________|
 |						   |						 |						   |						 |      |                         |
 |        sin(phi_y)       | 		    0 			 |        cos(phi_y)       | 		    0 			 |      |            z            |
 |_________________________|_________________________|_________________________|_________________________|      |_________________________|
 |						   |					     |						   |						 |      |                         |
 |			  0			   |			0	         |			  0			   |			1			 |      |          color          |
 |_________________________|_________________________|_________________________|_________________________|      |_________________________|

			*Result:
				x' = x.cos(phi) + -z.sin(phi)
				   = x.cos(phi) - z.sin(phi)
				y' = y
				z' = x.sin(phi) + z.cos(phi)
Rotation matrix on z:																					        Vecteur:
  _________________________ _________________________ _________________________ _________________________        _________________________
 |						   |						 |						    |						 |      |      	                  |
 |       cos(phi_z)        |       -sin(phi_z)       |			 0			    |			0            |      |            x            |
 |_________________________|_________________________|__________________________|________________________|      |_________________________|
 |						   |						 |						    |						 |      |                         |
 |       sin(phi_z)        |        cos(phi_z)       | 			 0 			    | 		    0 			 |      |            y            |
 |_________________________|_________________________|__________________________|________________________|  *   |_________________________|
 |						   |						 |						    |						 |      |                         |
 |			  0			   | 		    0 			 |			 1			    |			0			 |      |            z            |
 |_________________________|_________________________|__________________________|________________________|      |_________________________|
 |						   |						 |						    |						 |      |                         |
 |			  0			   | 		    0 			 |			 0			    |			1			 |      |          color          |
 |_________________________|_________________________|__________________________|________________________|      |_________________________|


			*Result:
				x' = x.cos(phi) + -y.sin(phi)
				   = x.cos(phi) - y.sin(phi)
				y' = x.sin(phi) + y.cos(phi)
				z' = z
Rotation matrix on x,y and z:																														   Vecteur:
 ___________________________________ ___________________________________ ___________________________________ ___________________________________        _________________________
|						            |						            |						            |						            |      |      	                 |
|       cos(phi_y) * cos(phi_z)     |       cos(phi_y) * -sin(phi_z)    |            -sin(phi_y)            |			  0                     |      |            x            |
|___________________________________|___________________________________|___________________________________|___________________________________|      |_________________________|
|						            |						            |						            |						            |      |                         |
| -sin(phi_x)*sin(phi_y)*cos(phi_z) |  sin(phi_x)*sin(phi_y)*sin(phi_z) |      -sin(phi_x)*cos(phi_y)       | 		      0 			        |      |            y            |
|    + cos(phi_x) * sin(phi_z)      |     + cos(phi_x)*cos(phi_z)       |						            |						            |      |                         | 
|___________________________________|___________________________________|___________________________________|___________________________________|  *   |_________________________|
|						            |						            |						            |						            |      |                         |
|  cos(phi_x)*sin(phi_y)*cos(phi_z) | cos(phi_x)*sin(phi_y)*-sin(phi_z) |      cos(phi_x)*cos(phi_y)        |			  0			            |      |            z            |
|    + sin(phi_x)*sin(phi_z)        |    + sin(phi_x)*cos(phi_z)        |						            |						            |      |                         |
|___________________________________|___________________________________|___________________________________|___________________________________|      |_________________________|
|						            |						            |						            |						            |      |                         |
|			  0			            | 		    0 			            |			 0			            |			  1			            |      |          color          |
|___________________________________|___________________________________|___________________________________|___________________________________|      |_________________________|

			*Result:
				x' = x.(cos(phi_y) * cos(phi_z)) - y.(cos(phi_y) * sin(phi_z)) - z.sin(phi_y)
				y' = x.(cos(phi_x) * sin(phi_z) - sin(phi_x) * sin(phi_y) * cos(phi_z)) + y.(sin(phi_x) * sin(phi_y) * sin(phi_z) + cos(phi_x) * cos(phi_z)) - z.(sin(phi_x) * cos(phi_y))
				z' = x.(cos(phi_x) * sin(phi_y) * cos(phi_z) + sin(phi_x) * sin(phi_z)) + y.(cos(phi_x) * sin(phi_y) * -sin(phi_z) + sin(phi_x) * cos(phi_z)) + z.(cos(phi_x) * cos(phi_y))		
				   = x.(cos(phi_x) * sin(phi_y) * cos(phi_z) + sin(phi_x) * sin(phi_z)) + y.(sin(phi_x) * cos(phi_z) - cos(phi_x) * sin(phi_y) * sin(phi_z) ) + z.(cos(phi_x) * cos(phi_y))		
0 Kudos
Anonymous
Not applicable
3,542 Views

no, i wrote on RAM, exemple if i have 1Go of RAM, the LFB_ptr (linear framebuffer) would equal to 0x4000_0000 = 1_073_741_824 = 1Go (and i don't get it, normally, point outside of ram cause a reset of CPU no ?

        ; Algo: LFB = 0x40000000+((RAM_SIZE_Go-1)*0x40000000)
            ;
            ; IF RAM == 1 Go { LFB = 0x040000000 }

And maybe gpu scan this portion of memory for transfer data into it's ram (vram), do you have any idea how ?

0 Kudos
Anonymous
Not applicable
3,542 Views

For parametrized macro #define, indeed it's not needed, i change it now, just it was for more easier reading.

 

0 Kudos
Anonymous
Not applicable
3,542 Views

 Asm code is from my kernel.asm ^^

pixel_ptr is intialyse by, well i send u all project, C (code block project) and asm (nasm project) with notepad++ like editor:

you need to install SDL, and for asm program, you need to change value of cx/bx by video number with 800*600 resolution. change only field of mode number.

(you can change for higher resolution, but you will need change some constant data: rapport, ect, i don't remember other but doesn't matter for now ^^)

				; Return VBE Mode Information
					mov		ax, 0x4F01
					mov		cx, 0x115			; Mode number
					mov		di, ModeInfoBlock   ; Pointer to ModeInfoBlock structure
					int		0x10
			; Set VBE Mode
				mov		ax, 0x4F02
				mov		bx, 0_1_0_0_0_0_0_1_0_0_0_1_0_1_0_1b
						; 	| | | | | | | `-`-`-`-`-`-`-`-`----- Mode number: 0x115 = 800 * 600
						; 	| | | | | `-`---------------------- Reserved (must be 0)
						;   | | | | `------------------------- 0 = Use current default refresh rate, 1 = Use user specified CRTC values for refresh rate
						; 	| | `-`-------------------------- Reserved for VBE/AF (must be 0)
						; 	| `----------------------------- 0 = Use windowed frame buffer model, 1 = Use linear/flat frame buffer model
						; 	`------------------------------ 0 = Clear display memory, 1 = Don't clear display memory
				mov		di, CRTCInfoBlock
				int		0x10

C project:

409244

ASM project: Some comment are in french.

Ffor run it, go to 2.MAKE, run batch, then open disk (small icon) and choose one USB key empty, then select Removable Disk 1 (remove all other usb key for let only the test key), deselect open as readonly.

Then copy all data from HackOS.bin to your usb key at start offset (0), save it, and boot your usb key on another PC or your actual PC.

409247

For found mode number, if it's don't work (0x115), run the program "list mode vesa.exe" who list all video mode number, include in zip.

0 Kudos
Anonymous
Not applicable
3,542 Views

"list mode vesa.exe" is 16-bit program.

0 Kudos
Bernard
Valued Contributor I
3,545 Views

.>>And maybe gpu scan this portion of memory for transfer data into it's ram (vram), do you have any idea how ?>>>

Usually VBE  is used during the BIOS phase of pc boot process. I only suppose that at early stage of Windows boot  process so called bootvid.sys  driver is used and this driver probably calls int 0x10 services. After loading miniport and display driver(Windows) probably all the memory transactions are done through DMA engines which read/write so called Memory-Mapped I/O where hundreds of GPU registers are mapped to. I think that GPU Video BIOS firmware is setting up those memory regions for later usage by the display driver. Probably display driver working directly with the DirectX kernel driver is writing vertex data, fonts and bitmaps into those regions which are later read by GPU Command Processor Unit(ATI/AMD) and directly send to GPU scheduler.

Please have a look at those freely available AMD GPU docs.

http://www.x.org/docs/AMD/old/

 

 

0 Kudos
Bernard
Valued Contributor I
3,545 Views

If you are interested I have a library of elementary and special functions implemented in Java and in C partly vectorized. They are fast mainly because I pre-calculated their coefficients with the help of Mathematica 8 and used Horner Scheme for result convergence. Moreover I have also library of various integrators also in multithreaded version(Windows) only. You will need this when trying to calculate numerically BRDF functions.

0 Kudos
Bernard
Valued Contributor I
3,545 Views

Thank you very much for uploading your source code I really appreciate it.

0 Kudos
Reply