<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Creating an XNOR net on Intel architecture in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131513#M25621</link>
    <description>&lt;P&gt;Hello!&lt;/P&gt;

&lt;P&gt;I am working on a project where the I am programming CUDA convolutional kernels with XNOR bitwise operations for forward propagation. I am capable of implementing CUDA convolutional kernels for Nvidia GPUs.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;However, I would like to explore how to parallelize and increase the computation speed of an XNOR net on CPUs. Bitwise XNOR operations can be highly parallelized and I have read somewhere that such a neural network with only +1 and -1 matrix multiplications can work extremely fast on CPUs.&lt;/P&gt;

&lt;P&gt;The CUDA programming language is well documented for handling and parallelizing matrix multiply operations etc., however I would like to explore the XNOR net architecture on Intel Xeon Phi processors too.&lt;/P&gt;

&lt;P&gt;Can someone suggest me well documented resources so that i can create optimized C code for XNOR Matrix multiply/Convolution and integrate it with Theano/Tensorflow etc to speed up my computations?&lt;/P&gt;

&lt;P&gt;Thank you!&lt;/P&gt;

&lt;P&gt;Cheers.&lt;/P&gt;</description>
    <pubDate>Tue, 26 Sep 2017 17:44:32 GMT</pubDate>
    <dc:creator>YAkha</dc:creator>
    <dc:date>2017-09-26T17:44:32Z</dc:date>
    <item>
      <title>Creating an XNOR net on Intel architecture</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131513#M25621</link>
      <description>&lt;P&gt;Hello!&lt;/P&gt;

&lt;P&gt;I am working on a project where the I am programming CUDA convolutional kernels with XNOR bitwise operations for forward propagation. I am capable of implementing CUDA convolutional kernels for Nvidia GPUs.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;However, I would like to explore how to parallelize and increase the computation speed of an XNOR net on CPUs. Bitwise XNOR operations can be highly parallelized and I have read somewhere that such a neural network with only +1 and -1 matrix multiplications can work extremely fast on CPUs.&lt;/P&gt;

&lt;P&gt;The CUDA programming language is well documented for handling and parallelizing matrix multiply operations etc., however I would like to explore the XNOR net architecture on Intel Xeon Phi processors too.&lt;/P&gt;

&lt;P&gt;Can someone suggest me well documented resources so that i can create optimized C code for XNOR Matrix multiply/Convolution and integrate it with Theano/Tensorflow etc to speed up my computations?&lt;/P&gt;

&lt;P&gt;Thank you!&lt;/P&gt;

&lt;P&gt;Cheers.&lt;/P&gt;</description>
      <pubDate>Tue, 26 Sep 2017 17:44:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131513#M25621</guid>
      <dc:creator>YAkha</dc:creator>
      <dc:date>2017-09-26T17:44:32Z</dc:date>
    </item>
    <item>
      <title>Hi Yash,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131514#M25622</link>
      <description>&lt;P&gt;Hi Yash,&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;The following links has some starter implementation on the topic.&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;A href="https://www.intelnervana.com/accelerating-neural-networks-binary-arithmetic/" target="_blank"&gt;https://www.intelnervana.com/accelerating-neural-networks-binary-arithmetic/&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;https://github.com/NervanaSystems/neon/tree/master/examples/binary&lt;/P&gt;

&lt;P&gt;However, I am checking for other sources. I will get back.&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P&gt;Ravi Keron N&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Sep 2017 12:49:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131514#M25622</guid>
      <dc:creator>RaviKeron_N_Intel</dc:creator>
      <dc:date>2017-09-27T12:49:39Z</dc:date>
    </item>
    <item>
      <title>Quote:Ravi Keron N. (Intel)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131515#M25623</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Ravi Keron N. (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Hi Yash,&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;The following links has some starter implementation on the topic.&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;A href="https://www.intelnervana.com/accelerating-neural-networks-binary-arithmetic/"&gt;https://www.intelnervana.com/accelerating-neural-networks-binary-arithme...&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;A href="https://github.com/NervanaSystems/neon/tree/master/examples/binary"&gt;https://github.com/NervanaSystems/neon/tree/master/examples/binary&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;However, I am checking for other sources. I will get back.&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P&gt;Ravi Keron N&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Thanks, I've seen those resources. However, they dont seem to be optimized to scale on CPUs, or even a single CPU. Especially with the great parallelization capacity that XNOR-nets provide. I would like to understand how i can code my own popcnt XNOR operations in the fastest manner for CPUs. It is the key to my Early Innovators project.&lt;/P&gt;

&lt;P&gt;I appreciate your help!&lt;/P&gt;</description>
      <pubDate>Wed, 27 Sep 2017 20:41:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131515#M25623</guid>
      <dc:creator>YAkha</dc:creator>
      <dc:date>2017-09-27T20:41:43Z</dc:date>
    </item>
    <item>
      <title>Hi Yash,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131516#M25624</link>
      <description>&lt;P&gt;Hi Yash,&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Not sure if you have seen the below intrinsic links on Bitwise operations.&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;https://software.intel.com/en-us/node/523854&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;https://software.intel.com/en-us/node/523808&lt;/P&gt;

&lt;P&gt;Would it be possible to share some information on what you did on the GPU so that it helps to get more information on the lines required.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P&gt;Ravi Keron N&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Sep 2017 13:37:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131516#M25624</guid>
      <dc:creator>RaviKeron_N_Intel</dc:creator>
      <dc:date>2017-09-28T13:37:32Z</dc:date>
    </item>
    <item>
      <title>Quote:Ravi Keron N. (Intel)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131517#M25625</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Ravi Keron N. (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Hi Yash,&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Not sure if you have seen the below intrinsic links on Bitwise operations.&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;A href="https://software.intel.com/en-us/node/523854"&gt;https://software.intel.com/en-us/node/523854&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;A href="https://software.intel.com/en-us/node/523808"&gt;https://software.intel.com/en-us/node/523808&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Would it be possible to share some information on what you did on the GPU so that it helps to get more information on the lines required.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P&gt;Ravi Keron N&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Hey!&lt;/P&gt;

&lt;P&gt;Thanks for the help. I will surely look into this. I packed a matrix of +1 and -1's to unsigned ints, and ran a bitwise operator ~(A^B) using CUDA, which allowed me to do matrix multiplication much faster. it is good to know that I can parallelize the bitwise operations over AVX-512 ISA. I will look into packing matrices into integers and running it via AVX512 ISA using the&amp;nbsp;Intrinsics for Bitwise Logical Operations.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 03 Oct 2017 01:43:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131517#M25625</guid>
      <dc:creator>YAkha</dc:creator>
      <dc:date>2017-10-03T01:43:21Z</dc:date>
    </item>
    <item>
      <title>Hi Yash,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131518#M25626</link>
      <description>&lt;P&gt;Hi Yash,&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; Did you get a chance to go through the links? did it help?&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P&gt;Ravi Keron N&lt;/P&gt;</description>
      <pubDate>Sun, 08 Oct 2017 18:03:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131518#M25626</guid>
      <dc:creator>RaviKeron_N_Intel</dc:creator>
      <dc:date>2017-10-08T18:03:46Z</dc:date>
    </item>
    <item>
      <title>Quote:Ravi Keron N. (Intel)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131519#M25627</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Ravi Keron N. (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Hi Yash,&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; Did you get a chance to go through the links? did it help?&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P&gt;Ravi Keron N&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Hello,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Sorry for my delayed response. I went through the links however i have not gotten a chance to try these things out. I will look into it now that my midsemester exams are over.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 13 Oct 2017 13:13:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131519#M25627</guid>
      <dc:creator>YAkha</dc:creator>
      <dc:date>2017-10-13T13:13:13Z</dc:date>
    </item>
    <item>
      <title>Quote:Ravi Keron N. (Intel)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131520#M25628</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Ravi Keron N. (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Hi Yash,&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; Did you get a chance to go through the links? did it help?&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P&gt;Ravi Keron N&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Hello!&lt;/P&gt;

&lt;P&gt;I have been trying to code the matrix multiplication, for this purpose I need to pack a 4x8 matrix into a 32 bit integer. I cannot find it. My matrix will be of the form [1,0,1,1,0,1...], and i need to pack that matrix to a 32 bit integer. I only see this&amp;nbsp;_mm512_unpackhi_epi32. This is to unpack, but I am not entirely sure what it is doing. Can you tell me how I can pack a 4x8 matrix of 1s and 0s to a 32 bit Integer using AVX512 intrinsics?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;After packing the matrix, I need to do a XOR operation over 2 such packed values. For this I am using: _mm512_xor_epi32(__m512i a, __m512i b), and for a population count, i am using _mm512_popcnt_epi32(__m512i a).&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Can you tell me what __m512i is? Is it a data type? How do I initialise such a data type?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Really appreciate the help.&lt;/P&gt;

&lt;P&gt;Thank you!&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 15 Oct 2017 17:29:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131520#M25628</guid>
      <dc:creator>YAkha</dc:creator>
      <dc:date>2017-10-15T17:29:19Z</dc:date>
    </item>
    <item>
      <title>Hi Yash,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131521#M25629</link>
      <description>&lt;P&gt;Hi Yash,&lt;/P&gt;

&lt;P&gt;If you would like to individually initialize 32bit integer values stored in __m512i, you can use _mm512_set_epi32 instrinsics function. However, It is usually faster to directly load the values from the memory or convert them from other __m512i variables.&lt;/P&gt;

&lt;P&gt;To better understand your use case, how is your 4x8 1-bit matrix is stored in the memory? __512i can hold 16 of such matrices, so I am assuming you want to store 16 4x8 1-bit matrix in one __m512i register? Then, are you performing matrix multiplication on these converted 32bit integer values?&lt;/P&gt;

&lt;P&gt;Thank you,&lt;/P&gt;

&lt;P&gt;Efe&lt;/P&gt;</description>
      <pubDate>Mon, 16 Oct 2017 18:08:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131521#M25629</guid>
      <dc:creator>Murat_G_Intel</dc:creator>
      <dc:date>2017-10-16T18:08:08Z</dc:date>
    </item>
    <item>
      <title>Quote:Murat Efe Guney (Intel)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131522#M25630</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Murat Efe Guney (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Hi Yash,&lt;/P&gt;

&lt;P&gt;If you would like to individually initialize 32bit integer values stored in __m512i, you can use _mm512_set_epi32 instrinsics function. However, It is usually faster to directly load the values from the memory or convert them from other __m512i variables.&lt;/P&gt;

&lt;P&gt;To better understand your use case, how is your 4x8 1-bit matrix is stored in the memory? __512i can hold 16 of such matrices, so I am assuming you want to store 16 4x8 1-bit matrix in one __m512i register? Then, are you performing matrix multiplication on these converted 32bit integer values?&lt;/P&gt;

&lt;P&gt;Thank you,&lt;/P&gt;

&lt;P&gt;Efe&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Hello,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I shall explain my job in detail.&lt;/P&gt;

&lt;P&gt;I will be passing two 2 dimensional arrays to a functionI(say A and B), one of them will be of size 4x4(A), and the other of size NxM(B). I need to convolve the 4x4 size matrix over NxM, but the dot product done during convolution operation can be replaced by bitwise operations. The array will be a float matrix containing 0s and 1s. After I get both the matrices of size (4x4) and (NxM) in the function, I need to create a sub matrix of size 4x4 from the B matrix, so that a dot product can be taken from matrix A.&lt;/P&gt;

&lt;P&gt;It is only 16 values in a 4x4 matrix. I can extend it through the depth dimension to have a 4x4x32 (Or the maximum depth possible) matrix, to have 512 values.&lt;/P&gt;

&lt;P&gt;So, basically&lt;STRONG&gt; I need to pack the 4 * 4 * y float matrix of 1s and 0s passed to that function, to a __m512 datatype, &lt;/STRONG&gt;so I can run this function:&amp;nbsp;&lt;SPAN style="font-size: 12px;"&gt;_mm512_xor_epi32(__m512i a, __m512i b).&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;I may do convolution through the depth to ensure maximum speed.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;&lt;SPAN style="font-size: 1em;"&gt;Also, as I will be selecting submatrices of size 4x4 from a larger matrix, what is the fastest way to do so? Is the MKL ?lacpy a good way to select submatrices?&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN style="font-size: 1em;"&gt;Can you suggest some others, if there are better ways to do so?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Mon, 16 Oct 2017 18:27:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131522#M25630</guid>
      <dc:creator>YAkha</dc:creator>
      <dc:date>2017-10-16T18:27:22Z</dc:date>
    </item>
    <item>
      <title>Hi Yash,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131523#M25631</link>
      <description>&lt;P&gt;Hi Yash,&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; Referred your question to the product team and the suggestion is to use MKL functions for the 32 bit matrix multiplication.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; The MKL function that can be used is &lt;STRONG&gt;cblas_gemm_s16s16s32&lt;/STRONG&gt;. The following&lt;SPAN style="font-size: 1em;"&gt;&amp;nbsp;link explains how to implement this function:&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;A href="https://software.intel.com/en-us/mkl-developer-reference-c-2018-beta-cblas_gemm_s16s16s32x"&gt;https://software.intel.com/en-us/mkl-developer-reference-c-2018-beta-cblas_gemm_s16s16s32x&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; But you need the latest MKL version which is MKL 2018.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P&gt;Ravi Keron N&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Oct 2017 06:11:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131523#M25631</guid>
      <dc:creator>RaviKeron_N_Intel</dc:creator>
      <dc:date>2017-10-17T06:11:00Z</dc:date>
    </item>
    <item>
      <title>I need to pack a matrix of</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131524#M25632</link>
      <description>&lt;P&gt;I need to pack a matrix of 512 length containing 1s and 0s to a _m512i data type.&lt;BR /&gt;
	For instance, if array = [1,1,0,1,0,0,0,1,1,1,0,0,0,1,0,1,1,0,0,1,1,1,0,1,0,1,0,1,1,0,1,0]&lt;BR /&gt;
	Then I can pack this to an unsigned int which when read in binary would be 11010001110001011001110101011010.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Is there a way to do this quickly to a _m512i data type? I can do it using a custom function but I wanted to know if there is an intrinsic function which can do this.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Oct 2017 15:01:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Creating-an-XNOR-net-on-Intel-architecture/m-p/1131524#M25632</guid>
      <dc:creator>YAkha</dc:creator>
      <dc:date>2017-10-17T15:01:29Z</dc:date>
    </item>
  </channel>
</rss>

