Developing Games on Intel Graphics
If you are gaming on graphics integrated in your Intel Processor, this is the place for you! Find answers to your questions or post your issues with PC games

3D engine

Anonymous
Not applicable
11,615 Views

Hello there,

ok, here we go, I have a dream, make a 3D engine 100% assembler intel only with CPU, I use rotation matrix only for now.


it works of course, but it's slow when I put a lot of pixels.

Recently I decided to include voxels in my engine, and it's slow when I put> = 8000 voxels (20 * 20 * 20 cube) and when I saw that nvidia display 32M voxels (fire) I wonder how they can do it !



And I have a little idea of  the reason: MMU, paging, segmentation. memory.

Am I right?



Another question, is the FPU is the slowest to compute floating point  than SSE or depending of data manipulate ?


PS: I work without OS like Windows or Linux, I run on my own kernel + bootloader in assembly too with NASM.

Sorry if i don't wirte a good english, i'm french and use google translate ^-^

0 Kudos
1 Solution
Bradley_W_Intel
Employee
11,185 Views

You clearly are using the processor in a very advanced way. I will do my best to answer your questions:

1) Why is your voxel engine not able to efficiently render as many voxels as you'd like? Voxel engines need to maximize their use of parallelism (both threading and SIMD) and also to store the data efficiently in an octree or some other structure that can handle sparse data. If you are doing all these things and still not getting the performance you expect, it's an optimization problem. Some Intel tools like VTune Performance Analyzer are excellent for performance analysis.

2) Is single data floating point math faster than SIMD (if I understood you)? Typically SIMD will be faster than single data instructions if your data is laid out in a way that supports the SIMD calls. In all cases, the only way for you to know for certain which way is faster is to test it.

3) How can you select between discrete and processor graphics? DirectX has methods of enumerating adapters. In such a case, the processor graphics is listed separately from the discrete graphics. If you are choosing your adapter based on the amount of available memory, you may be favoring the processor graphics when you didn't intend to. Intel has sample code that shows how to properly detect adapters in DirectX at https://software.intel.com/en-us/vcsource/samples/gpu-detect. The process for OpenGL is not well documented.

4) Can I use one processor to control execution of a second processor? Probably not. The details on Intel processors are covered at http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html. It's possible, though unlikely, that you'll be able to find something in there that can help you.

 

View solution in original post

0 Kudos
270 Replies
Bernard
Valued Contributor I
844 Views

Probably repere could mean point of reference.

On page no. 521 - 527 you have a description of World-to-Camera transformation. Look simply at the code itself.

0 Kudos
Bernard
Valued Contributor I
844 Views

I can upload source code if you want. Then you can go through the source code and port it from C to NASM.

Can you send me your private e-mail?

0 Kudos
Bernard
Valued Contributor I
844 Views

Did you try to port C sources to NASM?

0 Kudos
Anonymous
Not applicable
844 Views

I didn't find simple information of how moving camera in books, and so I'm block here.

My algorithm:

-  At start,  I have a vector point corresponding to a point of my object.

-  Object_Management:

- I add this vector with another vector called obj_coord[4]
 
- Then, I pass this vector through my equations of rotations:
 
    // Rotate xyz order matrix
        new.x = x * ((cos(obj_rad_y) *                                                      cos(obj_rad_z))) - y * ((                 cos(obj_rad_y) *                                     sin(obj_rad_z))) - z * (                 sin(obj_rad_y));
        new.y = x * ((cos(obj_rad_x) * sin(obj_rad_z)) - (sin(obj_rad_x) * sin(obj_rad_y) * cos(obj_rad_z))) + y * ((sin(obj_rad_x) * sin(obj_rad_y) * sin(obj_rad_z)) + (cos(obj_rad_x) * cos(obj_rad_z))) - z * (sin(obj_rad_x) * cos(obj_rad_y));
        new.z = x * ((cos(obj_rad_x) * sin(obj_rad_y) * cos(obj_rad_z)) + (sin(obj_rad_x) * sin(obj_rad_z))) + y * ((sin(obj_rad_x) * cos(obj_rad_z)) - (cos(obj_rad_x) * sin(obj_rad_y) * sin(obj_rad_z))) + z * (cos(obj_rad_x) * cos(obj_rad_y));
 
    x = new.x
    y = new.y
    z = new.z
 
- I add the output vector with another vector called obj_coordsys[4]

- Camera_Management:

- I add the vector with vector called world_coordsys[4]
 
- I subtract the vector with vector called cam_coordsys[4]
 
- I pass this vector through my equations of rotation once again: 
    cam_rad_x = cam_rad_x * -1;
    cam_rad_y = cam_rad_y * -1;
    cam_rad_z = cam_rad_z * -1;
 
    // Rotate xyz order matrix
        new.x = x * ((cos(cam_rad_y) *                                                      cos(cam_rad_z))) - y * ((                 cos(cam_rad_y) *                                     sin(cam_rad_z))) - z * (                 sin(cam_rad_y));
        new.y = x * ((cos(cam_rad_x) * sin(cam_rad_z)) - (sin(cam_rad_x) * sin(cam_rad_y) * cos(cam_rad_z))) + y * ((sin(cam_rad_x) * sin(cam_rad_y) * sin(cam_rad_z)) + (cos(cam_rad_x) * cos(cam_rad_z))) - z * (sin(cam_rad_x) * cos(cam_rad_y));
        new.z = x * ((cos(cam_rad_x) * sin(cam_rad_y) * cos(cam_rad_z)) + (sin(cam_rad_x) * sin(cam_rad_z))) + y * ((sin(cam_rad_x) * cos(cam_rad_z)) - (cos(cam_rad_x) * sin(cam_rad_y) * sin(cam_rad_z))) + z * (cos(cam_rad_x) * cos(cam_rad_y));
 
    x = new.x
    y = new.y
    z = new.z
 
- I subtract the vector with vector called cam_coord[4]

- Screen_Management :

 - I put this vector into an array through below equation, then I displays it as an image:
 
    z = z * -1;
    x = (x * distance) / z
    y = (y * distance) / z
 
    screen[repere - ((x * bpp) + (pitch * y))] = color      ; bpp = 4 (bytes_per_pixel), 32 (bits_per_pixel) . pitch = screen_y (y resolution of image) * bpp
                                                            ; repere = (screen_x * (screen_y - 1)) + ((screen_x / 2) - 1);

 

I think just need to calculus cam_coordsys, according to vector forward, maybe by multiple vector by XYZ rotations ... 

And there is a good new, I double my fps in nasm code (40 fps) ... unlike icl.

https://sourceforge.net/projects/hackengine/

0 Kudos
Bernard
Valued Contributor I
844 Views

Is your solution working as expected?

 

0 Kudos
Anonymous
Not applicable
844 Views

Finally I found, but it contain bug (It's not properly work for some configurations of angle):

(the used variables are float 3D vectors: point, new,
                                          obj_coord, obj_coordsys
                                          obj_rad, cam_rad,
                                          world_coordsys,
                                          coordsys_cam, cam_coord,
                                          cam_x, cam_y, cam_z,
                                          cos_x, cos_y, cos_z,
                                          sin_x, sin_y, sin_z
 and one integer speed_cam)

- At the beginning, I have a vector point corresponding to a vertex of my object.

- Object_Management:

    - I add this vector with obj_coord[].

    - Then, I'm passing through my equations for rotations:

        cos_x = cos(obj_rad[_x])
        cos_y = cos(obj_rad[_y])
        cos_z = cos(obj_rad[_z])


        sin_x = sin(obj_rad[_x])
        sin_y = sin(obj_rad[_y])
        sin_z = sin(obj_rad[_z])

        // Rotate xyz order matrix
            new[_x] = (point[_x] *   cos_y *                           cos_z  ) - (point[_y] *   cos_y *                           sin_z  ) - (point[_z] *          sin_y )
            new[_y] = (point[_x] * ((cos_x * sin_z) - (sin_x * sin_y * cos_z))) + (point[_y] * ((sin_x * sin_y * sin_z) + (cos_x * cos_z))) - (point[_z] * (sin_x * cos_y))
            new[_z] = (point[_x] * ((cos_x * sin_y * cos_z) + (sin_x * sin_z))) + (point[_y] * ((sin_x * cos_z) - (cos_x * sin_y * sin_z))) + (point[_z] * (cos_x * cos_y))

        point[_x] = new[_x]
        point[_y] = new[_y]
        point[_z] = new[_z]

    - I add the output vector with obj_coordsys[].

- I add it with world_coordsys[].

- Camera_Management:

    cos_x = cos(-cam_rad[_x])
    cos_y = cos(-cam_rad[_y])
    cos_z = cos(-cam_rad[_z])


    sin_x = sin(-cam_rad[_x])
    sin_y = sin(-cam_rad[_y])
    sin_z = sin(-cam_rad[_z])

    // vector right/left
       cam_y[_x] = speed_cam * (  cos_y *                           cos_z  )
       cam_y[_y] = speed_cam * (((cos_x * sin_z) - (sin_x * sin_y * cos_z)))
       cam_y[_z] = speed_cam * (((cos_x * sin_y * cos_z) + (sin_x * sin_z)))

    // vector up/down
       cam_x[_x] = speed_cam * (  cos_y *                           sin_z  )
       cam_x[_y] = speed_cam * (((sin_x * sin_y * sin_z) + (cos_x * cos_z)))
       cam_x[_z] = speed_cam * (((sin_x * cos_z) - (cos_x * sin_y * sin_z)))

    // vector forward/backward
       cam_z[_x] = speed_cam * (         sin_y )
       cam_z[_y] = speed_cam * ((sin_x * cos_y))
       cam_z[_z] = speed_cam * ((cos_x * cos_y))

      Example:

        key 'Z':  // Moving Forward
        {
            coordsys_cam[_x] += cam_z[_x]
            coordsys_cam[_y] += cam_z[_y]
            coordsys_cam[_z] += cam_z[_z]
        }


        key 'S':  // Moving Backward
        {
            coordsys_cam[_x] -= cam_z[_x]
            coordsys_cam[_y] -= cam_z[_y]
            coordsys_cam[_z] -= cam_z[_z]
        }

        key 'D':  // Moving Right
        {
            coordsys_cam[_x] += cam_x[_x]
            coordsys_cam[_y] += cam_x[_y]
            coordsys_cam[_z] += cam_x[_z]
        }

        key 'Q':  // Moving Left
        {
            coordsys_cam[_x] -= cam_x[_x]
            coordsys_cam[_y] -= cam_x[_y]
            coordsys_cam[_z] -= cam_x[_z]
        }

        key 'A':  // Moving Up
        {
            coordsys_cam[_x] += cam_y[_x]
            coordsys_cam[_y] += cam_y[_y]
            coordsys_cam[_z] += cam_y[_z]
        }

        key 'E':  // Moving Down
        {
            coordsys_cam[_x] -= cam_y[_x]
            coordsys_cam[_y] -= cam_y[_y]
            coordsys_cam[_z] -= cam_y[_z]
        }

    - I sub it with cam_coordsys[].

    - I'm pass it again through my equations for rotations:

 

       // Rotate xyz order matrix
            new[_x] = (point[_x] *   cos_y *                           cos_z  ) - (point[_y] *   cos_y *                           sin_z  ) - (point[_z] *          sin_y )
            new[_y] = (point[_x] * ((cos_x * sin_z) - (sin_x * sin_y * cos_z))) + (point[_y] * ((sin_x * sin_y * sin_z) + (cos_x * cos_z))) - (point[_z] * (sin_x * cos_y))
            new[_z] = (point[_x] * ((cos_x * sin_y * cos_z) + (sin_x * sin_z))) + (point[_y] * ((sin_x * cos_z) - (cos_x * sin_y * sin_z))) + (point[_z] * (cos_x * cos_y))


        point[_x] = new[_x]
        point[_y] = new[_y]
        point[_z] = new[_z]

    - I sub point[] with cam_coord[].

- Screen_Management :


    - I place this vector into a table through the below equation, then I displays it as an image:

    point[_x] = ((point[_x] * distance) / -point[_z])
    point[_y] = ((point[_y] * distance) / -point[_z])

    screen[repere - ((x * bpp) + (pitch * y))] = color      ; bpp = 4 (bytes_per_pixel), 32 (bits_per_pixel) . pitch = screen_y (y resolution of image) * bpp
                                                            ; repere = (screen_x * (screen_y - 1)) + ((screen_x / 2) - 1);

 

 

 

0 Kudos
Bernard
Valued Contributor I
844 Views

In that book there are examples in C how to use various camera movement and rotation matrices.

0 Kudos
Bernard
Valued Contributor I
844 Views

>>>Finally I found, but it contain bug (It's not properly work for some configurations of angle):>>>

For which configuration of angle it is not working? Do you mean precission and accurracy isues?

0 Kudos
Anonymous
Not applicable
887 Views

Well, the camera don't move with forward/backward right/left up/down axe correctly sometimes when I turn in 180° I guess, for see what are those configuration, run my program for see that:

You need to change keyboard layout (win + space), for move the camera:

    - (QWERTY kbd)Key binding : - alt + ` = Exit program.
                                - W/A/S/D = Move Forward/Rotate Left/Moving Backward/Rotate Right
                                - Q/E     = Moving Down/Moving Up
                                - Z/X     = Moving Left/Moving Right

    - Mouse binding : - MouseMove + Left click = Make a rotation of active object through mouse movement:
                                                  - MouseRight/Left(x) = RotY.
                                                  - MouseUp/Down(y)    = RotX.

 

0 Kudos
Anonymous
Not applicable
887 Views

Hello, if you are interested, I develop my own x64 ABI: http://forum.nasm.us/index.php?topic=2122.msg9411#msg9411

And I have an optimization question, is it true that memory access for mathematical and comparison operation is slower than register access ?

0 Kudos
Bernard
Valued Contributor I
887 Views

>>>And I have an optimization question, is it true that memory access for mathematical and comparison operation is slower than register access ?>>>

Do you mean situation like this (no vectorization here)

vmulsd xmm0, xmmword ptr[ebp-8] ;  memory access multiplication

vmovsd xmm1, xmmword ptr[ebp-16]

vmovsd xmm2, xmmword ptr[ebp-24]

vmulsd xmm2,xmm2,xmm1 ; register-register multiplication

In such a scenario memory access multiplication will be slower. If this is first touch scenario for example first iteration of the loop then values will not be cached in L1D and CPU Front-End will load a value from the memory(stack). Register-to Register will be faster I think ~3 cycles when the values are present in physical regsters.

 

 

0 Kudos
Bernard
Valued Contributor I
887 Views

>>>and why not generate equation (2 * 3 + [rax] ...) directly/faster in asm code instead use compiler for it, design for complexes calculus.>>>

I did not understand this sentence?

0 Kudos
Anonymous
Not applicable
887 Views

Ok,

Anyway, I will get this way: less code = faster code

Even is sometimes it can be false maybe, but it's less complex like way.

And for the question about equation written, it was like in C-language where we can write an expression mathematical, but with register support like operand.

But finally, it's not the best way for evolving assembly, I abandoned this idea.

And yes, my sentences is not very constructed, was a long time ago ^^

 

(Edge not work very well when writing message, like when press enter without character after, the cursor get back to the start of message)  

0 Kudos
Bernard
Valued Contributor I
887 Views

Not always less code is faster.

Think about function inlining. There is a trade off  between code bloat and overhead of call/ret sequence.

>>>And for the question about equation written, it was like in C-language where we can write an expression mathematical, but with register support like operand.>>>

Interesting idea.

0 Kudos
Anonymous
Not applicable
887 Views

Which program, HackEngine OS ?

Cause The HackEngine that run on Windows, don't use int 0x10.

Else for the others program, I don't need, this interruption is only needed to initialize video mode, after I use the LFB pointer for change color of a pixel at the screen, and pointer of Vesa mode that I initialized with int 0x10.

0 Kudos
Bernard
Valued Contributor I
887 Views

Yes HackEngine OS.

Of course on Windows you cannot directly access interrupt 0x10 only from kernel mode. I think that only BIOS uses it in order to display its messages.

0 Kudos
Pandora__Peter
Beginner
887 Views

Thank you for good idea. I can applied to use in next time.

 

พนันบอล  : po

0 Kudos
Anonymous
Not applicable
887 Views

Thanks for having revived my old thread :D

Anyway I wanted to post something with the same subject ^^

Which Idea you talk about ?

Else sorry for the late reply but I haven't got the notification.

0 Kudos
Clark__Maria
Beginner
887 Views

So much valuable and useful information and links in this discussion! Thanks a lot! I will definitely use it in my VR in construction applications

0 Kudos
Reply