Developing Games on Intel Graphics
If you are gaming on graphics integrated in your Intel Processor, this is the place for you! Find answers to your questions or post your issues with PC games
489 Discussions

3D engine

Anonymous
Not applicable
11,361 Views

Hello there,

ok, here we go, I have a dream, make a 3D engine 100% assembler intel only with CPU, I use rotation matrix only for now.


it works of course, but it's slow when I put a lot of pixels.

Recently I decided to include voxels in my engine, and it's slow when I put> = 8000 voxels (20 * 20 * 20 cube) and when I saw that nvidia display 32M voxels (fire) I wonder how they can do it !



And I have a little idea of  the reason: MMU, paging, segmentation. memory.

Am I right?



Another question, is the FPU is the slowest to compute floating point  than SSE or depending of data manipulate ?


PS: I work without OS like Windows or Linux, I run on my own kernel + bootloader in assembly too with NASM.

Sorry if i don't wirte a good english, i'm french and use google translate ^-^

0 Kudos
1 Solution
Bradley_W_Intel
Employee
10,931 Views

You clearly are using the processor in a very advanced way. I will do my best to answer your questions:

1) Why is your voxel engine not able to efficiently render as many voxels as you'd like? Voxel engines need to maximize their use of parallelism (both threading and SIMD) and also to store the data efficiently in an octree or some other structure that can handle sparse data. If you are doing all these things and still not getting the performance you expect, it's an optimization problem. Some Intel tools like VTune Performance Analyzer are excellent for performance analysis.

2) Is single data floating point math faster than SIMD (if I understood you)? Typically SIMD will be faster than single data instructions if your data is laid out in a way that supports the SIMD calls. In all cases, the only way for you to know for certain which way is faster is to test it.

3) How can you select between discrete and processor graphics? DirectX has methods of enumerating adapters. In such a case, the processor graphics is listed separately from the discrete graphics. If you are choosing your adapter based on the amount of available memory, you may be favoring the processor graphics when you didn't intend to. Intel has sample code that shows how to properly detect adapters in DirectX at https://software.intel.com/en-us/vcsource/samples/gpu-detect. The process for OpenGL is not well documented.

4) Can I use one processor to control execution of a second processor? Probably not. The details on Intel processors are covered at http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html. It's possible, though unlikely, that you'll be able to find something in there that can help you.

 

View solution in original post

0 Kudos
270 Replies
Bernard
Valued Contributor I
776 Views

>>>Do you know how to access to first pixel of thé screen, easier way like vesa. I would like to continue my project of 3D engine without OS in IA-32e mode. i don't know why VBE core is discontinue :/>>>

I do not know how to do it.

I only suppose that you will need write your own display driver and manage memory-mapped region by yourself.

0 Kudos
Anonymous
Not applicable
776 Views

ok, i post my update: i have change SDL by WinAPI: (terraindata.obj is unusable for now, i don't change it for now)

416424

0 Kudos
Bernard
Valued Contributor I
776 Views

Thanks for the upload.

0 Kudos
Bernard
Valued Contributor I
776 Views

I will run it later at my home pc.

0 Kudos
Bernard
Valued Contributor I
776 Views

shaynox s. wrote:

No of course, i mean if i put only one mov, it fall down:

Ex:

              __asm      mov       eax, 0xdeadbeef

Two reason for this behaviour, one is icl don't like asm inline, second i don't know how to configure compiler's option correctly .

Please read following article https://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/

0 Kudos
Bernard
Valued Contributor I
776 Views

I cannot still understand how can you use advanced DirectX 10 and DirectX11 capabilities of your GPU without OS. I suggest you to develop your own 3D engine for Windows and crucial part of the code write in assembly. AFAIK your rendering loop is based on int 10h services

0 Kudos
Anonymous
Not applicable
776 Views

No, i explain: i initialize MMIO for use GPU by int 0x10 and ax = 0x4F02, then i wrote on this memory, and it's like streaming, once memory case is change, the screen responding by set pixel's color with the memory's value changed.

I don't know how it's work :/ it's like GDI for window: bitblt. Is there more API graphics than opengl or directx, for access to GPU ?

And yes i will continue my engine for windows, even if without OS is more interesting ^^

0 Kudos
Anonymous
Not applicable
776 Views

Well it's certainly other driver than opengl or directx, but actually i don't have any knowledge necessary for write driver, sound like titan job ^^

0 Kudos
Bernard
Valued Contributor I
776 Views

shaynox s. wrote:

Well it's certainly other driver than opengl or directx, but actually i don't have any knowledge necessary for write driver, sound like titan job ^^

Yes you are right. This is display driver and video miniport driver. Actually you can check  Linux source and search for ATI driver.

0 Kudos
Bernard
Valued Contributor I
776 Views

shaynox s. wrote:

No, i explain: i initialize MMIO for use GPU by int 0x10 and ax = 0x4F02, then i wrote on this memory, and it's like streaming, once memory case is change, the screen responding by set pixel's color with the memory's value changed.

I don't know how it's work :/ it's like GDI for window: bitblt. Is there more API graphics than opengl or directx, for access to GPU ?

And yes i will continue my engine for windows, even if without OS is more interesting ^^

You are still limited to int 0x10 services which are very slow  and are usually used by Bios-like software. Modern OS do not rely on int 0x10 services. Usually Renderring framework like DirectX are bound to DxgKrnl (DirectX kernel) which in turn is probably calling display driver and passing to him DMA packets which will be scheduled to run on GPU. It is usually done by MMIO GPU registers set by GPU firmware/Bios. By relying on Windows you can unleash the full power of programmable DirectX pipeline which is implemented in hardware by the GPU.

0 Kudos
Bernard
Valued Contributor I
776 Views

>>>Is there more API graphics than opengl or directx, for access to GPU >>>

DirectX does not access GPU directly it is done by display driver. There is more 3D Graphics API beside OpenGL and DirectX look here: http://webcache.googleusercontent.com/search?q=cache:MdpC4kWGTVMJ:en.wikipedia.org/wiki/List_of_3D_graphics_libraries+&cd=1&hl=en&ct=clnk&gl=gr

0 Kudos
Bradley_W_Intel
Employee
776 Views

shaynox s. wrote:

IDo intel show calendar of new Extensions ISA ? i tried to search it

We don't post a calendar, but we do post planned new instructions at https://software.intel.com/en-us/intel-isa-extensions. There is an interactive reference at https://software.intel.com/sites/landingpage/IntrinsicsGuide/.

 

0 Kudos
Anonymous
Not applicable
776 Views

I don't think int 0x10 is slow personally, for the simple reason there are not much protocol like in directx or opengl.

I have sent email to VESA corporation for request source code of their VBE/Core (they have abandoned their project, 1998 end of project).

I will compare speed one day

Finally i migrated under windows 8, and surprise, my FPS were multiplied by 2 oO, it's a very good news, but still strange :D

0 Kudos
Anonymous
Not applicable
782 Views

Back, they responded me, they don't have any source code of those 0x10 vbe/core function :/

It's strange, so 2nd option is contact constructor of BIOS, but i don't get it, those function is labeled VESA but they don't have source code oO

0 Kudos
Bernard
Valued Contributor I
782 Views

>>>I don't think int 0x10 is slow personally, for the simple reason there are not much protocol like in directx or opengl.>>>

You are limited by transition to Real Mode where 16-bit  IDT lives  that's mean that your code is working in VGA mode and probably writing pixel data through I/O ports. While using graphics API's like DirectX or OpenGL  CPU wil communicate with the GPU by using 32/64-bit DMA engine to transfer vertex data, texture data etc... to MMIO where  DMA engines will transfer the data to command procesor and to GPU scheduler.There are literally hundreds of various GPU registers mapped  to host memory which are written/read by display driver and video port driver. All those registers represent DirectX pipeline hardware implementation.

0 Kudos
Bernard
Valued Contributor I
782 Views

shaynox s. wrote:

Back, they responded me, they don't have any source code of those 0x10 vbe/core function :/

It's strange, so 2nd option is contact constructor of BIOS, but i don't get it, those function is labeled VESA but they don't have source code oO

Bios developers will not send you source code of Bios. The only option is to disassemble Bios routines with IDA Pro and try to find call chain of int 10h handler and its implementation. I can reassure that it will be a daunting task.

0 Kudos
Bernard
Valued Contributor I
782 Views

Please read following AMD specifications in order to understand how DirectX is implemented at register level

http://www.x.org/docs/AMD/old/

0 Kudos
Bernard
Valued Contributor I
782 Views

>>>Bios developers will not send you source code of Bios. The only option is to disassemble Bios routines with IDA Pro and try to find call chain of int 10h handler and its implementation. I can reassure that it will be a daunting task.>>>

Moreover in order to mimick DirectX functionality you will need to implement it into your own library API because VESA does not support programmable DirectX pipeline with VS,GS,PS units.

So my advise is to continue your project for educational purpose and write if you want in assembly DirectX based engine.

0 Kudos
Anonymous
Not applicable
782 Views

Hello,

I found a strange thing, the linear frame buffer is calculate by that:

		; Algo: LFB = 0x40000000+((RAM_SIZE_GB-1)*0x40000000)
			;
			; If RAM == 1 GB { LFB = 0x040000000 }
			;									  + 0x40000000
			; If RAM == 2 GB { LFB = 0x080000000 }
			;									  + 0x40000000
			; If RAM == 3 GB { LFB = 0x0C0000000 }
			;									  + 0x40000000
			; If RAM == 4 GB { LFB = 0x100000000 }
			;									  + 0x40000000
			; If RAM == 5 GB { LFB = 0x140000000 }

And I look at ressources in properties of my graphic card (AMD radeaon HD 7400M) and found in memory range those adress:

Memory Range "0x0C0000000-0x0CFFFFFFF"
Memory Range "0x0D0400000-0x0D041FFFF"
I/O Range         "0x2000-0x20FF"


I/O Range         "0x03B0-0x03BB"
I/O Range         "0x03C0-0x03DF"
Memory Range "0x0D00A0000-0x0D00BFFFF"

I have 5GB and by this way i learn LFB is not located by 0x0C0000000 and not 0x140000000, i don't test it for now, but i will do it later.

The 2nd data block concerns VGA graphic.

I don't know if it's coincidence for 0x0C0000000 and my linera frame buffer calculated but little strange, for adresse port (0x2000) i don't have any information about that, is vesa use it for transfer pixel and by this way directx and opengl too ?

I think forgot VESA for a while untill CPU intel keep their 48 bit addressing and can't reading all RAM's memory with one pointer.

Need a petition for remove alignment memory :D , i want byte per byte (there is a double of instruction that CPU have for managed alignment system i think, personally it will better to remove that system and give more register to CPU instead "useless" instruction, of course i don't have any knowledge about the mechanic intern of CPU but i think instruction need to be written with transistor, and if we remove the alignment system, we divide by 2 all transistor into CPU, my theory :D.

And we need extra instruction for aligned memory :/

0 Kudos
Anonymous
Not applicable
782 Views

I have a question: why AVX register is lower than SSE register ?

I have a i7-2640M CPU, and when i try to access to AVX register in my program my fps fall down at 5 fps whereas when i keep SSE register my fps is higher: 27.

Here's a sample:

		VBROADCASTF128		ymm2, [rotate_yz_ymm2]
		VBROADCASTF128		ymm3, [rotate_xyz_ymm3]
		VBROADCASTF128		ymm4, [rotate_yz_ymm4]
		VBROADCASTF128		ymm5, [rotate_z_ymm5]
		VBROADCASTF128		ymm6, [rotate_y_ymm6]
		VBROADCASTF128		ymm7, [coordonee]

is slower than:

		vmovups		xmm2, [rotate_yz_ymm2]
		vmovups		xmm3, [rotate_xyz_ymm3]
		vmovups		xmm4, [rotate_yz_ymm4]
		vmovups		xmm5, [rotate_z_ymm5]
		vmovups		xmm6, [rotate_y_ymm6]
		vmovups		xmm7, [coordonee]

strange :/

0 Kudos
Anonymous
Not applicable
782 Views

And finnaly i use winapi for do use graphic and remove sdl.

I do like that for draw:

// include the basic windows header file
#include <windows.h>
#include <stdio.h>
#include <math.h>
#include <time.h>
 
// Indice video for rotation's matrix
#define		REPERE				(LENGTH * (WIDTH/2 - 1)) + ((LENGTH/2 - 1))
#define		BPP					4
#define		LENGTH				1920				// x
#define		WIDTH				1080				// y
#define     DEPTH				800					// z
#define		PITCH				LENGTH * BPP
#define		DEG2RAD(angle)		angle * 0.01745329	// angle * (PI/180) (0.01745329)
#define		RAPPORT				(LENGTH/WIDTH)
#define		FOCALE				45
#define		PROFONDEUR			200

// Standard color for rendering scene (copy through ColorPic program)
#define		WHITE				0x00FFFFFF
#define		BLACK				0x00000000
#define		RED					0x00FF0000
#define		GREEN				0x0000FF00
#define		BLUE				0x000000FF
#define		BLUE_SCENE			0x0000204D
#define		GREY_SCENE			0x001E1E22
#define		MAP_GRID_SCENE		0x00585858
#define		SELECT_OBJECT_SCENE	0x00FFA700

// Indice for array
#define		_x					0
#define		_y					1
#define		_z					2
#define		_color				3
#define		__x					0
#define		__y					4
#define		__z					8
#define		__color				12


int			screen[LENGTH*WIDTH]	= { 0 };
int			FPS_COUNTER_DATA[3] = { 0 };    // 0: compteur_boucle & affiche_fps , 1: last_time

HINSTANCE		instance;
HBITMAP			texture;
PAINTSTRUCT 	paint_struct;
HDC 			device_context_screen;
HDC 			device_context_texture;
WNDCLASS		windows_class;
HWND			windows_handle;
MSG				event;
LRESULT			anwser;

LRESULT		windows_procedure(HWND windows_handle, UINT message, WPARAM w_param_message, LPARAM l_param_message)
{
	switch (message)
	{
		case WM_KEYDOWN:
			switch (w_param_message)
			{
				case VK_ESCAPE:
					PostMessage(windows_handle, WM_KEYUP, VK_ESCAPE, 0);
					//SendMessage(windows_handle, WM_KEYUP, 0, 0);
				break;
			}
		break;
	}

	anwser = DefWindowProc(windows_handle, message, w_param_message, l_param_message);
	return	anwser;
}

void		upload_screen()
{
	SetBitmapBits(texture, LENGTH * WIDTH * 4, screen);
	SelectObject(device_context_texture, texture);
}

void	init_video(void)
{
		windows_class.cbClsExtra = 0;
		windows_class.cbWndExtra = 0;
		windows_class.hbrBackground = 0;
		windows_class.hCursor = 0;
		windows_class.hIcon = 0;
		windows_class.hInstance = instance;
		windows_class.lpfnWndProc = windows_procedure;
		windows_class.lpszClassName = "Classe 1";
		windows_class.lpszMenuName = NULL;
		windows_class.style = CS_HREDRAW | CS_VREDRAW;
	RegisterClass(&windows_class);

	windows_handle = CreateWindow(	"Classe 1",
									"3D engine(HackOS)",
									WS_OVERLAPPEDWINDOW | WS_VISIBLE,
									CW_USEDEFAULT, CW_USEDEFAULT, LENGTH, WIDTH,
									NULL,
									NULL,
									instance,
									NULL);

	device_context_screen = GetDC(windows_handle);
	texture = CreateCompatibleBitmap(device_context_screen, LENGTH, WIDTH);
	device_context_texture = CreateCompatibleDC(device_context_screen);
}

int		main(HINSTANCE _instance, HINSTANCE _previous_instance, LPSTR cmd_line, int cmd_show)
{
	instance = _instance;
	init_video();

	while (1)
	{
		PeekMessage(&event, windows_handle, 0, 0, PM_REMOVE);
		windows_procedure(windows_handle, event.message, event.wParam, event.lParam);
                
                {...}   /// Modify screen[]

		BeginPaint(windows_handle, &paint_struct);
			upload_screen();
			BitBlt(device_context_screen, 0, 0, LENGTH, WIDTH, device_context_texture, 0, 0, SRCCOPY);
		EndPaint(windows_handle, &paint_struct);

		calculate_fps:
		{
			FPS_COUNTER_DATA[0]++;
			if (clock() - FPS_COUNTER_DATA[1] >= 1000)
			{
				printf("FPS = %d\n", FPS_COUNTER_DATA[0]);
				FPS_COUNTER_DATA[0] = 0;
				FPS_COUNTER_DATA[1] = clock();
			}
		}
	}
}

PS: Sorry for long waiting time for answer, it just take on my nerve all those problem :x

0 Kudos
Reply