<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Speeding up H264 encoding with GPU in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/Speeding-up-H264-encoding-with-GPU/m-p/902994#M13019</link>
    <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/110237"&gt;andrewk88&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;Regardless of your code, I'm afraid, there is a conceptual problem with your approach - you can't seperate "ME and encoding" due to the MB mode decision aimingatminimization ofthe encoding cost usually expressed as "D+lambda*R" to "optimize" a video quality loss vs. number of bits used for encoding. In other words,if youwant to encode a current MB in a reasonable/"optimal way"you should take a look atwhat is a modeof previous MB that has already been encoded, in particular a value of MV predictor. &lt;BR /&gt;Hope it helps,&lt;BR /&gt;&lt;BR /&gt;AndrewK&lt;BR /&gt;&lt;BR /&gt;PS.I'd reccomend to read some literature regarding the H.264 inter mode decision. Depending on what is exactly your ME algorithm youmightbe ableto"offload a low-level processing intensive part of ME" to GPU, but rather try to keep "encoding part" and mode decision on a single CPU / multicore with a shared memory. Alternatively, you might attempta multi-slice encoding but it has its own challenges.&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Hello Andrew, thanks for your answer!&lt;BR /&gt;&lt;BR /&gt;It seems I was a little too optimistic... (and not enough documented) :D&lt;BR /&gt;&lt;BR /&gt;Do you think I could enhance something doing SAD calculation for the macroblocks of a frame on the GPU before doing the real ME and encoding ?&lt;BR /&gt;</description>
    <pubDate>Tue, 17 Feb 2009 17:15:06 GMT</pubDate>
    <dc:creator>chrisdo</dc:creator>
    <dc:date>2009-02-17T17:15:06Z</dc:date>
    <item>
      <title>Speeding up H264 encoding with GPU</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Speeding-up-H264-encoding-with-GPU/m-p/902992#M13017</link>
      <description>Hello,&lt;BR /&gt;&lt;BR /&gt;I am currently trying to modify IPP samples for H264 encoding for speeding it up using CUDA.&lt;BR /&gt;My goal is to make Motion Estimation running on the GPU.&lt;BR /&gt;&lt;BR /&gt;For that, I saw that each MacroBlock of a frame (actually a slice) is visited for ME and then encoding in the function H264CoreEncoder_Compress_Slice(). So the first part of my job is to do ME for all MBs first, and then to do encoding.&lt;BR /&gt;My problem is there, I'm stuck doing it : when I separate ME and encoding, the final encoded video is not as expected (it's all grey...).&lt;BR /&gt;&lt;BR /&gt;Here is my code :&lt;BR /&gt;&lt;BR /&gt;
&lt;PRE&gt;[cpp]Status H264ENC_MAKE_NAME(H264CoreEncoder_Compress_Slice_GPU)(&lt;BR /&gt;    void* state,&lt;BR /&gt;    H264SliceType *curr_slice,&lt;BR /&gt;	bool is_first_mb)&lt;BR /&gt;{&lt;BR /&gt;	H264CoreEncoderType* core_enc = (H264CoreEncoderType *)state;&lt;BR /&gt;	H264CurrentMacroblockDescriptorType &amp;amp;cur_mb = curr_slice-&amp;gt;m_cur_mb;&lt;BR /&gt;	H264BsRealType* pBitstream = (H264BsRealType *)curr_slice-&amp;gt;m_pbitstream;&lt;BR /&gt;	Ipp32s slice_num = curr_slice-&amp;gt;m_slice_number;&lt;BR /&gt;	EnumSliceType slice_type = curr_slice-&amp;gt;m_slice_type;&lt;BR /&gt;	Ipp8u uUsePCM = 0;&lt;BR /&gt;&lt;BR /&gt;	Ipp8u *pStartBits;&lt;BR /&gt;	Ipp32u uStartBitOffset;&lt;BR /&gt;&lt;BR /&gt;	Ipp32u uRecompressMB;&lt;BR /&gt;	Ipp8u  iLastQP;&lt;BR /&gt;	Ipp32u uSaved_Skip_Run;&lt;BR /&gt;&lt;BR /&gt;	Ipp8u bSeenFirstMB = false;&lt;BR /&gt;&lt;BR /&gt;	Status status = UMC_OK;&lt;BR /&gt;&lt;BR /&gt;	Ipp32u uNumMBs = core_enc-&amp;gt;m_HeightInMBs * core_enc-&amp;gt;m_WidthInMBs;&lt;BR /&gt;	Ipp32u uFirstMB = core_enc-&amp;gt;m_field_index * uNumMBs;&lt;BR /&gt;&lt;BR /&gt;	H264CurrentMacroblockDescriptorType* mbTab = new H264CurrentMacroblockDescriptorType[ uNumMBs ];&lt;BR /&gt;	for( unsigned int idx = 0 ; idx &amp;lt; uNumMBs; idx++ )&lt;BR /&gt;	{&lt;BR /&gt;		mbTab[ idx ].LocalMacroblockInfo = new H264MacroblockLocalInfo;&lt;BR /&gt;		mbTab[ idx ].LocalMacroblockPairInfo = new  H264MacroblockLocalInfo;&lt;BR /&gt;		mbTab[ idx ].GlobalMacroblockInfo = new H264MacroblockGlobalInfo;&lt;BR /&gt;		mbTab[ idx ].GlobalMacroblockPairInfo = new H264MacroblockGlobalInfo;&lt;BR /&gt;		mbTab[ idx ].MacroblockCoeffsInfo = new H264MacroblockCoeffsInfo;&lt;BR /&gt;		mbTab[ idx ].intra_types = new T_AIMode;&lt;BR /&gt;		mbTab[ idx ].MVs[0] = new H264MacroblockMVs;&lt;BR /&gt;		mbTab[ idx ].MVs[1] = new H264MacroblockMVs;&lt;BR /&gt;		mbTab[ idx ].MVs[2] = new H264MacroblockMVs;&lt;BR /&gt;		mbTab[ idx ].MVs[3] = new H264MacroblockMVs;&lt;BR /&gt;		mbTab[ idx ].RefIdxs[0] = new H264MacroblockRefIdxs;&lt;BR /&gt;		mbTab[ idx ].RefIdxs[1] = new H264MacroblockRefIdxs;&lt;BR /&gt;	}&lt;BR /&gt;&lt;BR /&gt;	Ipp32s MBYAdjust = 0;&lt;BR /&gt;	if (core_enc-&amp;gt;m_field_index)&lt;BR /&gt;	{&lt;BR /&gt;		MBYAdjust  = core_enc-&amp;gt;m_HeightInMBs;&lt;BR /&gt;	}&lt;BR /&gt;&lt;BR /&gt;	curr_slice-&amp;gt;m_InitialOffset = core_enc-&amp;gt;m_InitialOffsets[core_enc-&amp;gt;m_pCurrentFrame-&amp;gt;m_bottom_field_flag[core_enc-&amp;gt;m_field_index]];&lt;BR /&gt;	curr_slice-&amp;gt;m_is_cur_mb_field = core_enc-&amp;gt;m_pCurrentFrame-&amp;gt;m_PictureStructureForDec &amp;lt; FRM_STRUCTURE;&lt;BR /&gt;	curr_slice-&amp;gt;m_is_cur_mb_bottom_field = core_enc-&amp;gt;m_pCurrentFrame-&amp;gt;m_bottom_field_flag[core_enc-&amp;gt;m_field_index] == 1;&lt;BR /&gt;&lt;BR /&gt;	curr_slice-&amp;gt;m_use_transform_for_intra_decision = 1;&lt;BR /&gt;&lt;BR /&gt;	// loop over all MBs in the picture&lt;BR /&gt;// MB Motion Estimation is done here&lt;BR /&gt;	for (Ipp32u uMB = uFirstMB; uMB &amp;lt; uFirstMB + uNumMBs; uMB++)&lt;BR /&gt;	{&lt;BR /&gt;		// Is this MB in the current slice?  If not, move on...&lt;BR /&gt;		if (core_enc-&amp;gt;m_pCurrentFrame-&amp;gt;m_mbinfo.mbs[uMB].slice_id != slice_num) &lt;BR /&gt;		{&lt;BR /&gt;			continue;&lt;BR /&gt;		} &lt;BR /&gt;		else if (!bSeenFirstMB) &lt;BR /&gt;		{&lt;BR /&gt;			// Reset xpos and ypos in framedata struct&lt;BR /&gt;			// This is necessary because the same slice may be recoded multiple times.&lt;BR /&gt;&lt;BR /&gt;			// reset intra MB counter per slice&lt;BR /&gt;			curr_slice-&amp;gt;m_Intra_MB_Counter = 0;&lt;BR /&gt;			curr_slice-&amp;gt;m_MB_Counter = 0;&lt;BR /&gt;&lt;BR /&gt;			// Fill in the first mb in slice field in the slice header.&lt;BR /&gt;			curr_slice-&amp;gt;m_first_mb_in_slice = is_first_mb ? 0 : uMB - uFirstMB;&lt;BR /&gt;&lt;BR /&gt;			// Fill in the current deblocking filter parameters.&lt;BR /&gt;			curr_slice-&amp;gt;m_slice_alpha_c0_offset = (Ipp8s)core_enc-&amp;gt;m_info.deblocking_filter_alpha;&lt;BR /&gt;			curr_slice-&amp;gt;m_slice_beta_offset = (Ipp8s)core_enc-&amp;gt;m_info.deblocking_filter_beta;&lt;BR /&gt;			curr_slice-&amp;gt;m_disable_deblocking_filter_idc =  core_enc-&amp;gt;m_info.deblocking_filter_idc;&lt;BR /&gt;			curr_slice-&amp;gt;m_cabac_init_idc = core_enc-&amp;gt;m_info.cabac_init_idc;&lt;BR /&gt;&lt;BR /&gt;			// Write a slice header&lt;BR /&gt;			H264ENC_MAKE_NAME(H264BsReal_PutSliceHeader)(&lt;BR /&gt;				pBitstream,&lt;BR /&gt;				core_enc-&amp;gt;m_SliceHeader,&lt;BR /&gt;				core_enc-&amp;gt;m_PicParamSet,&lt;BR /&gt;				core_enc-&amp;gt;m_SeqParamSet,&lt;BR /&gt;				core_enc-&amp;gt;m_PicClass,&lt;BR /&gt;				curr_slice);&lt;BR /&gt;			bSeenFirstMB = true;&lt;BR /&gt;&lt;BR /&gt;			// Fill in the correct value for m_iLastXmittedQP, used to correctly code&lt;BR /&gt;			// the per MB QP Delta&lt;BR /&gt;			curr_slice-&amp;gt;m_iLastXmittedQP = core_enc-&amp;gt;m_PicParamSet.pic_init_qp + curr_slice-&amp;gt;m_slice_qp_delta;&lt;BR /&gt;			Ipp32s SliceQPy = curr_slice-&amp;gt;m_iLastXmittedQP;&lt;BR /&gt;&lt;BR /&gt;			if (core_enc-&amp;gt;m_info.entropy_coding_mode)&lt;BR /&gt;			{&lt;BR /&gt;				if (slice_type==INTRASLICE)&lt;BR /&gt;					H264ENC_MAKE_NAME(H264BsReal_InitializeContextVariablesIntra_CABAC)(&lt;BR /&gt;					pBitstream,&lt;BR /&gt;					SliceQPy);&lt;BR /&gt;				else&lt;BR /&gt;					H264ENC_MAKE_NAME(H264BsReal_InitializeContextVariablesInter_CABAC)(&lt;BR /&gt;					pBitstream,&lt;BR /&gt;					SliceQPy,&lt;BR /&gt;					curr_slice-&amp;gt;m_cabac_init_idc);&lt;BR /&gt;			}&lt;BR /&gt;&lt;BR /&gt;			// Initialize the MB skip run counter&lt;BR /&gt;			curr_slice-&amp;gt;m_uSkipRun = 0;&lt;BR /&gt;		}&lt;BR /&gt;&lt;BR /&gt;		cur_mb.lambda = lambda_sq[curr_slice-&amp;gt;m_iLastXmittedQP];&lt;BR /&gt;		cur_mb.uMB = uMB;&lt;BR /&gt;		cur_mb.chroma_format_idc = core_enc-&amp;gt;m_PicParamSet.chroma_format_idc;&lt;BR /&gt;		cur_mb.mbPtr = core_enc-&amp;gt;m_pCurrentFrame-&amp;gt;m_pYPlane + core_enc-&amp;gt;m_pMBOffsets[uMB].uLumaOffset[core_enc-&amp;gt;m_is_cur_pic_afrm][curr_slice-&amp;gt;m_is_cur_mb_field];&lt;BR /&gt;		cur_mb.mbPitchPixels =  core_enc-&amp;gt;m_pCurrentFrame-&amp;gt;m_pitchPixels &amp;lt;&amp;lt; curr_slice-&amp;gt;m_is_cur_mb_field;&lt;BR /&gt;		cur_mb.uMBx = uMB % core_enc-&amp;gt;m_WidthInMBs;&lt;BR /&gt;		cur_mb.uMBy = uMB / core_enc-&amp;gt;m_WidthInMBs - MBYAdjust;&lt;BR /&gt;		H264ENC_MAKE_NAME(H264CoreEncoder_UpdateCurrentMBInfo)(state, curr_slice);&lt;BR /&gt;		cur_mb.lumaQP = getLumaQP(cur_mb.LocalMacroblockInfo-&amp;gt;QP, core_enc-&amp;gt;m_PicParamSet.bit_depth_luma);&lt;BR /&gt;		cur_mb.lumaQP51 = getLumaQP51(cur_mb.LocalMacroblockInfo-&amp;gt;QP, core_enc-&amp;gt;m_PicParamSet.bit_depth_luma);&lt;BR /&gt;		cur_mb.chromaQP = getChromaQP(cur_mb.LocalMacroblockInfo-&amp;gt;QP, core_enc-&amp;gt;m_PicParamSet.chroma_qp_index_offset, core_enc-&amp;gt;m_SeqParamSet.bit_depth_chroma);&lt;BR /&gt;		pSetMB8x8TSFlag(curr_slice-&amp;gt;m_cur_mb.GlobalMacroblockInfo, 0);&lt;BR /&gt;		curr_slice-&amp;gt;m_MB_Counter++;&lt;BR /&gt;		H264BsBase_GetState(&amp;amp;pBitstream-&amp;gt;m_base, &amp;amp;pStartBits, &amp;amp;uStartBitOffset);&lt;BR /&gt;		iLastQP = curr_slice-&amp;gt;m_iLastXmittedQP;&lt;BR /&gt;		uSaved_Skip_Run = curr_slice-&amp;gt;m_uSkipRun;   // To restore it if we recompress&lt;BR /&gt;		uUsePCM = 0;    // Don't use the PCM mode initially.&lt;BR /&gt;		do &lt;BR /&gt;		{    // this is to recompress MBs that are too big.&lt;BR /&gt;			H264ENC_MAKE_NAME(H264CoreEncoder_MB_Decision)(state, curr_slice, uMB);&lt;BR /&gt;&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].uMB = cur_mb.uMB;&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].uMBpair = cur_mb.uMBpair;&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].uMBx = cur_mb.uMBx;&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].uMBy = cur_mb.uMBy;&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].mbPtr = cur_mb.mbPtr;&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].mbPitchPixels = cur_mb.mbPitchPixels;&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].lambda = cur_mb.lambda;&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].chroma_format_idc = cur_mb.chroma_format_idc;&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].lumaQP = cur_mb.lumaQP;&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].lumaQP51 = cur_mb.lumaQP51;&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].chromaQP = cur_mb.chromaQP;&lt;BR /&gt;			memcpy( mbTab[ uMB-uFirstMB ].LocalMacroblockInfo, cur_mb.LocalMacroblockInfo, sizeof(H264MacroblockLocalInfo) );&lt;BR /&gt;			memcpy( mbTab[ uMB-uFirstMB ].LocalMacroblockPairInfo, cur_mb.LocalMacroblockPairInfo, sizeof(H264MacroblockLocalInfo) );&lt;BR /&gt;			memcpy( mbTab[ uMB-uFirstMB ].GlobalMacroblockInfo, cur_mb.GlobalMacroblockInfo, sizeof(H264MacroblockGlobalInfo) );&lt;BR /&gt;			memcpy( mbTab[ uMB-uFirstMB ].GlobalMacroblockPairInfo, cur_mb.GlobalMacroblockPairInfo, sizeof(H264MacroblockGlobalInfo) );&lt;BR /&gt;			memcpy( mbTab[ uMB-uFirstMB ].MacroblockCoeffsInfo, cur_mb.MacroblockCoeffsInfo, sizeof(H264MacroblockCoeffsInfo) );&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].m_uIntraCBP4x4 = cur_mb.m_uIntraCBP4x4;&lt;BR /&gt;			memcpy( mbTab[ uMB-uFirstMB ].m_iNumCoeffs4x4, cur_mb.m_iNumCoeffs4x4, 16*sizeof(Ipp32s) );&lt;BR /&gt;			memcpy( mbTab[ uMB-uFirstMB ].m_iLastCoeff4x4, cur_mb.m_iLastCoeff4x4, 16*sizeof(Ipp32s) );&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].m_uIntraCBP8x8 = cur_mb.m_uIntraCBP8x8;&lt;BR /&gt;			memcpy( mbTab[ uMB-uFirstMB ].m_iNumCoeffs8x8, cur_mb.m_iNumCoeffs8x8, 16*sizeof(Ipp32s) );&lt;BR /&gt;			memcpy( mbTab[ uMB-uFirstMB ].m_iLastCoeff8x8, cur_mb.m_iLastCoeff8x8, 16*sizeof(Ipp32s) );&lt;BR /&gt;			memcpy( mbTab[ uMB-uFirstMB ].intra_types, cur_mb.intra_types, sizeof(T_AIMode) );&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].mb4x4 = cur_mb.mb4x4;&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].mb8x8 = cur_mb.mb8x8;&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].mb16x16 = cur_mb.mb16x16;&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].mbInter = cur_mb.mbInter;&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].mbChromaInter = cur_mb.mbChromaInter;&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].mbChromaIntra = cur_mb.mbChromaIntra;&lt;BR /&gt;&lt;BR /&gt;			memcpy( mbTab[ uMB-uFirstMB ].MVs[0], cur_mb.MVs[0], sizeof(H264MacroblockMVs) );&lt;BR /&gt;			memcpy( mbTab[ uMB-uFirstMB ].MVs[1], cur_mb.MVs[1], sizeof(H264MacroblockMVs) );&lt;BR /&gt;			memcpy( mbTab[ uMB-uFirstMB ].MVs[2], cur_mb.MVs[2], sizeof(H264MacroblockMVs) );&lt;BR /&gt;			memcpy( mbTab[ uMB-uFirstMB ].MVs[3], cur_mb.MVs[3], sizeof(H264MacroblockMVs) );&lt;BR /&gt;			memcpy( mbTab[ uMB-uFirstMB ].RefIdxs[0], cur_mb.RefIdxs[0], sizeof(H264MacroblockRefIdxs) );&lt;BR /&gt;			memcpy( mbTab[ uMB-uFirstMB ].RefIdxs[1], cur_mb.RefIdxs[1], sizeof(H264MacroblockRefIdxs) );&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].MacroblockNeighbours = cur_mb.MacroblockNeighbours;&lt;BR /&gt;			mbTab[ uMB-uFirstMB ].BlockNeighbours = cur_mb.BlockNeighbours;&lt;BR /&gt;&lt;BR /&gt;			uRecompressMB = 0;&lt;BR /&gt;		} while (uRecompressMB);        // End of the MB recompression loop.&lt;BR /&gt;	}&lt;BR /&gt;&lt;BR /&gt;	bSeenFirstMB = false;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;	// loop over all MBs in the picture&lt;BR /&gt;// encoding is done here&lt;BR /&gt;	for (Ipp32u uMB = uFirstMB; uMB &amp;lt; uFirstMB + uNumMBs; uMB++)&lt;BR /&gt;	{&lt;BR /&gt;		// Is this MB in the current slice?  If not, move on...&lt;BR /&gt;		if (core_enc-&amp;gt;m_pCurrentFrame-&amp;gt;m_mbinfo.mbs[uMB].slice_id != slice_num) &lt;BR /&gt;		{&lt;BR /&gt;			continue;&lt;BR /&gt;		} &lt;BR /&gt;		else if (!bSeenFirstMB) &lt;BR /&gt;		{&lt;BR /&gt;			// Reset xpos and ypos in framedata struct&lt;BR /&gt;			// This is necessary because the same slice may be recoded multiple times.&lt;BR /&gt;&lt;BR /&gt;			// reset intra MB counter per slice&lt;BR /&gt;			curr_slice-&amp;gt;m_Intra_MB_Counter = 0;&lt;BR /&gt;			curr_slice-&amp;gt;m_MB_Counter = 0;&lt;BR /&gt;&lt;BR /&gt;			// Fill in the first mb in slice field in the slice header.&lt;BR /&gt;			curr_slice-&amp;gt;m_first_mb_in_slice = is_first_mb ? 0 : uMB - uFirstMB;&lt;BR /&gt;&lt;BR /&gt;			// Fill in the current deblocking filter parameters.&lt;BR /&gt;			curr_slice-&amp;gt;m_slice_alpha_c0_offset = (Ipp8s)core_enc-&amp;gt;m_info.deblocking_filter_alpha;&lt;BR /&gt;			curr_slice-&amp;gt;m_slice_beta_offset = (Ipp8s)core_enc-&amp;gt;m_info.deblocking_filter_beta;&lt;BR /&gt;			curr_slice-&amp;gt;m_disable_deblocking_filter_idc =  core_enc-&amp;gt;m_info.deblocking_filter_idc;&lt;BR /&gt;			curr_slice-&amp;gt;m_cabac_init_idc = core_enc-&amp;gt;m_info.cabac_init_idc;&lt;BR /&gt;&lt;BR /&gt;			bSeenFirstMB = true;&lt;BR /&gt;&lt;BR /&gt;			// Fill in the correct value for m_iLastXmittedQP, used to correctly code&lt;BR /&gt;			// the per MB QP Delta&lt;BR /&gt;			curr_slice-&amp;gt;m_iLastXmittedQP = core_enc-&amp;gt;m_PicParamSet.pic_init_qp + curr_slice-&amp;gt;m_slice_qp_delta;&lt;BR /&gt;			Ipp32s SliceQPy = curr_slice-&amp;gt;m_iLastXmittedQP;&lt;BR /&gt;&lt;BR /&gt;			if (core_enc-&amp;gt;m_info.entropy_coding_mode)&lt;BR /&gt;			{&lt;BR /&gt;				if (slice_type==INTRASLICE)&lt;BR /&gt;					H264ENC_MAKE_NAME(H264BsReal_InitializeContextVariablesIntra_CABAC)(&lt;BR /&gt;					pBitstream,&lt;BR /&gt;					SliceQPy);&lt;BR /&gt;				else&lt;BR /&gt;					H264ENC_MAKE_NAME(H264BsReal_InitializeContextVariablesInter_CABAC)(&lt;BR /&gt;					pBitstream,&lt;BR /&gt;					SliceQPy,&lt;BR /&gt;					curr_slice-&amp;gt;m_cabac_init_idc);&lt;BR /&gt;			}&lt;BR /&gt;&lt;BR /&gt;			// Initialize the MB skip run counter&lt;BR /&gt;			curr_slice-&amp;gt;m_uSkipRun = 0;&lt;BR /&gt;		}&lt;BR /&gt;&lt;BR /&gt;		cur_mb.lambda = lambda_sq[curr_slice-&amp;gt;m_iLastXmittedQP];&lt;BR /&gt;		cur_mb.uMB = uMB;&lt;BR /&gt;		cur_mb.chroma_format_idc = core_enc-&amp;gt;m_PicParamSet.chroma_format_idc;&lt;BR /&gt;		cur_mb.mbPtr = core_enc-&amp;gt;m_pCurrentFrame-&amp;gt;m_pYPlane + core_enc-&amp;gt;m_pMBOffsets[uMB].uLumaOffset[core_enc-&amp;gt;m_is_cur_pic_afrm][curr_slice-&amp;gt;m_is_cur_mb_field];&lt;BR /&gt;		cur_mb.mbPitchPixels =  core_enc-&amp;gt;m_pCurrentFrame-&amp;gt;m_pitchPixels &amp;lt;&amp;lt; curr_slice-&amp;gt;m_is_cur_mb_field;&lt;BR /&gt;		cur_mb.uMBx = uMB % core_enc-&amp;gt;m_WidthInMBs;&lt;BR /&gt;		cur_mb.uMBy = uMB / core_enc-&amp;gt;m_WidthInMBs - MBYAdjust;&lt;BR /&gt;		H264ENC_MAKE_NAME(H264CoreEncoder_UpdateCurrentMBInfo)(state, curr_slice);&lt;BR /&gt;		cur_mb.lumaQP = getLumaQP(cur_mb.LocalMacroblockInfo-&amp;gt;QP, core_enc-&amp;gt;m_PicParamSet.bit_depth_luma);&lt;BR /&gt;		cur_mb.lumaQP51 = getLumaQP51(cur_mb.LocalMacroblockInfo-&amp;gt;QP, core_enc-&amp;gt;m_PicParamSet.bit_depth_luma);&lt;BR /&gt;		cur_mb.chromaQP = getChromaQP(cur_mb.LocalMacroblockInfo-&amp;gt;QP, core_enc-&amp;gt;m_PicParamSet.chroma_qp_index_offset, core_enc-&amp;gt;m_SeqParamSet.bit_depth_chroma);&lt;BR /&gt;		pSetMB8x8TSFlag(curr_slice-&amp;gt;m_cur_mb.GlobalMacroblockInfo, 0);&lt;BR /&gt;		curr_slice-&amp;gt;m_MB_Counter++;&lt;BR /&gt;		H264BsBase_GetState(&amp;amp;pBitstream-&amp;gt;m_base, &amp;amp;pStartBits, &amp;amp;uStartBitOffset);&lt;BR /&gt;		iLastQP = curr_slice-&amp;gt;m_iLastXmittedQP;&lt;BR /&gt;		uSaved_Skip_Run = curr_slice-&amp;gt;m_uSkipRun;   // To restore it if we recompress&lt;BR /&gt;		uUsePCM = 0;    // Don't use the PCM mode initially.&lt;BR /&gt;		do &lt;BR /&gt;		{    // this is to recompress MBs that are too big.&lt;BR /&gt;&lt;BR /&gt;// we restore cur_mb&lt;BR /&gt;			cur_mb.uMB = mbTab[ uMB-uFirstMB ].uMB;&lt;BR /&gt;			cur_mb.uMBpair = mbTab[ uMB-uFirstMB ].uMBpair;&lt;BR /&gt;			cur_mb.uMBx = mbTab[ uMB-uFirstMB ].uMBx;&lt;BR /&gt;			cur_mb.uMBy = mbTab[ uMB-uFirstMB ].uMBy;&lt;BR /&gt;			cur_mb.mbPtr = mbTab[ uMB-uFirstMB ].mbPtr;&lt;BR /&gt;			cur_mb.mbPitchPixels = mbTab[ uMB-uFirstMB ].mbPitchPixels;&lt;BR /&gt;			cur_mb.lambda = mbTab[ uMB-uFirstMB ].lambda;&lt;BR /&gt;			cur_mb.chroma_format_idc = mbTab[ uMB-uFirstMB ].chroma_format_idc;&lt;BR /&gt;			cur_mb.lumaQP = mbTab[ uMB-uFirstMB ].lumaQP;&lt;BR /&gt;			cur_mb.lumaQP51 = mbTab[ uMB-uFirstMB ].lumaQP51;&lt;BR /&gt;			cur_mb.chromaQP = mbTab[ uMB-uFirstMB ].chromaQP;&lt;BR /&gt;			memcpy( cur_mb.LocalMacroblockInfo, mbTab[ uMB-uFirstMB ].LocalMacroblockInfo, sizeof(H264MacroblockLocalInfo) );&lt;BR /&gt;			memcpy( cur_mb.LocalMacroblockPairInfo, mbTab[ uMB-uFirstMB ].LocalMacroblockPairInfo, sizeof(H264MacroblockLocalInfo) );&lt;BR /&gt;			memcpy( cur_mb.GlobalMacroblockInfo, mbTab[ uMB-uFirstMB ].GlobalMacroblockInfo, sizeof(H264MacroblockGlobalInfo) );&lt;BR /&gt;			memcpy( cur_mb.GlobalMacroblockPairInfo, mbTab[ uMB-uFirstMB ].GlobalMacroblockPairInfo, sizeof(H264MacroblockGlobalInfo) );&lt;BR /&gt;			memcpy( cur_mb.MacroblockCoeffsInfo, mbTab[ uMB-uFirstMB ].MacroblockCoeffsInfo, sizeof(H264MacroblockCoeffsInfo) );&lt;BR /&gt;			cur_mb.m_uIntraCBP4x4 = mbTab[ uMB-uFirstMB ].m_uIntraCBP4x4;&lt;BR /&gt;			memcpy( cur_mb.m_iNumCoeffs4x4, mbTab[ uMB-uFirstMB ].m_iNumCoeffs4x4, 16*sizeof(Ipp32s) );&lt;BR /&gt;			memcpy( cur_mb.m_iLastCoeff4x4, mbTab[ uMB-uFirstMB ].m_iLastCoeff4x4, 16*sizeof(Ipp32s) );&lt;BR /&gt;			cur_mb.m_uIntraCBP8x8 = mbTab[ uMB-uFirstMB ].m_uIntraCBP8x8;&lt;BR /&gt;			memcpy( cur_mb.m_iNumCoeffs8x8, mbTab[ uMB-uFirstMB ].m_iNumCoeffs8x8, 16*sizeof(Ipp32s) );&lt;BR /&gt;			memcpy( cur_mb.m_iLastCoeff8x8, mbTab[ uMB-uFirstMB ].m_iLastCoeff8x8, 16*sizeof(Ipp32s) );&lt;BR /&gt;			memcpy( cur_mb.intra_types, mbTab[ uMB-uFirstMB ].intra_types, sizeof(T_AIMode) );&lt;BR /&gt;			cur_mb.mb4x4 = mbTab[ uMB-uFirstMB ].mb4x4;&lt;BR /&gt;			cur_mb.mb8x8 = mbTab[ uMB-uFirstMB ].mb8x8;&lt;BR /&gt;			cur_mb.mb16x16 = mbTab[ uMB-uFirstMB ].mb16x16;&lt;BR /&gt;			cur_mb.mbInter = mbTab[ uMB-uFirstMB ].mbInter;&lt;BR /&gt;			cur_mb.mbChromaInter = mbTab[ uMB-uFirstMB ].mbChromaInter;&lt;BR /&gt;			cur_mb.mbChromaIntra = mbTab[ uMB-uFirstMB ].mbChromaIntra;&lt;BR /&gt;&lt;BR /&gt;			memcpy( cur_mb.MVs[0], mbTab[ uMB-uFirstMB ].MVs[0], sizeof(H264MacroblockMVs) );&lt;BR /&gt;			memcpy( cur_mb.MVs[1], mbTab[ uMB-uFirstMB ].MVs[1], sizeof(H264MacroblockMVs) );&lt;BR /&gt;			memcpy( cur_mb.MVs[2], mbTab[ uMB-uFirstMB ].MVs[2], sizeof(H264MacroblockMVs) );&lt;BR /&gt;			memcpy( cur_mb.MVs[3], mbTab[ uMB-uFirstMB ].MVs[3], sizeof(H264MacroblockMVs) );&lt;BR /&gt;			memcpy( cur_mb.RefIdxs[0], mbTab[ uMB-uFirstMB ].RefIdxs[0], sizeof(H264MacroblockRefIdxs) );&lt;BR /&gt;			memcpy( cur_mb.RefIdxs[1], mbTab[ uMB-uFirstMB ].RefIdxs[1], sizeof(H264MacroblockRefIdxs) );&lt;BR /&gt;			cur_mb.MacroblockNeighbours = mbTab[ uMB-uFirstMB ].MacroblockNeighbours;&lt;BR /&gt;			cur_mb.BlockNeighbours = mbTab[ uMB-uFirstMB ].BlockNeighbours;&lt;BR /&gt;&lt;BR /&gt;			Ipp32s mb_bits;&lt;BR /&gt;			Ipp32s bit_offset;&lt;BR /&gt;			if (core_enc-&amp;gt;m_PicParamSet.entropy_coding_mode) &lt;BR /&gt;			{&lt;BR /&gt;				bit_offset = pBitstream-&amp;gt;m_base.m_nReadyBits;&lt;BR /&gt;				if (pBitstream-&amp;gt;m_base.m_nReadyBits == 9) bit_offset = 8;&lt;BR /&gt;			}&lt;BR /&gt;			// Code the macroblock, all planes&lt;BR /&gt;			cur_mb.LocalMacroblockInfo-&amp;gt;cbp_bits = 0;&lt;BR /&gt;			cur_mb.LocalMacroblockInfo-&amp;gt;cbp_bits_chroma = 0;&lt;BR /&gt;			uSaved_Skip_Run = curr_slice-&amp;gt;m_uSkipRun;&lt;BR /&gt;			H264ENC_MAKE_NAME(H264CoreEncoder_CEncAndRecMB)(state, curr_slice);&lt;BR /&gt;&lt;BR /&gt;			mb_bits = 0;&lt;BR /&gt;			status = H264ENC_MAKE_NAME(H264CoreEncoder_Put_MB_Real)(state, curr_slice);&lt;BR /&gt;			if (status != UMC_OK) &lt;BR /&gt;				goto done;&lt;BR /&gt;&lt;BR /&gt;			Ipp8u *pEndBits;&lt;BR /&gt;			Ipp32u uEndBitOffset;&lt;BR /&gt;			H264BsBase_GetState(&amp;amp;pBitstream-&amp;gt;m_base, &amp;amp;pEndBits, &amp;amp;uEndBitOffset);&lt;BR /&gt;&lt;BR /&gt;			mb_bits += (Ipp32u) (pEndBits - pStartBits)*8;&lt;BR /&gt;			if (uEndBitOffset &amp;gt;= uStartBitOffset)&lt;BR /&gt;				mb_bits += uEndBitOffset - uStartBitOffset;&lt;BR /&gt;			else&lt;BR /&gt;				mb_bits -= uStartBitOffset - uEndBitOffset;&lt;BR /&gt;&lt;BR /&gt;			// Should not recompress for CABAC&lt;BR /&gt;			if (!core_enc-&amp;gt;m_PicParamSet.entropy_coding_mode &amp;amp;&amp;amp; (mb_bits &amp;gt; MB_RECODE_THRESH) &amp;amp;&amp;amp; core_enc-&amp;gt;m_info.rate_controls.method == H264_RCM_QUANT)&lt;BR /&gt;			{&lt;BR /&gt;				// OK, this is bad, it's not compressing very much!!!&lt;BR /&gt;				// TBD: Tune this decision to QP...  Higher QPs will progressively trash PSNR,&lt;BR /&gt;				// so if they are still using a lot of bits, then PCM coding is extra attractive.&lt;BR /&gt;&lt;BR /&gt;				// We're going to be recoding this MB, so reset some stuff.&lt;BR /&gt;				H264BsBase_SetState(&amp;amp;pBitstream-&amp;gt;m_base, pStartBits, uStartBitOffset);&lt;BR /&gt;				// Zero out unused bits in buffer before OR in next op&lt;BR /&gt;				// This removes dependency on buffer being zeroed out.&lt;BR /&gt;				*pStartBits = (Ipp8u)((*pStartBits &amp;gt;&amp;gt; (8-uStartBitOffset)) &amp;lt;&amp;lt; (8-uStartBitOffset));&lt;BR /&gt;&lt;BR /&gt;				curr_slice-&amp;gt;m_iLastXmittedQP = iLastQP; // Restore the last xmitted QP&lt;BR /&gt;				curr_slice-&amp;gt;m_uSkipRun = uSaved_Skip_Run;   // Restore the skip run&lt;BR /&gt;&lt;BR /&gt;				// If the QP has only been adjusted up 0 or 1 times, and QP != 51&lt;BR /&gt;				if (((cur_mb.LocalMacroblockInfo-&amp;gt;QP -&lt;BR /&gt;					core_enc-&amp;gt;m_PicParamSet.pic_init_qp + curr_slice-&amp;gt;m_slice_qp_delta) &amp;lt; 2) &amp;amp;&amp;amp;&lt;BR /&gt;					(cur_mb.LocalMacroblockInfo-&amp;gt;QP != 51))&lt;BR /&gt;				{&lt;BR /&gt;					// Quantize more and try again!&lt;BR /&gt;					cur_mb.LocalMacroblockInfo-&amp;gt;QP++;&lt;BR /&gt;					uRecompressMB = 1;&lt;BR /&gt;				} &lt;BR /&gt;				else &lt;BR /&gt;				{&lt;BR /&gt;					// Code this block as a PCM MB next time around.&lt;BR /&gt;					uUsePCM = 1;&lt;BR /&gt;					uRecompressMB = 0;&lt;BR /&gt;					// Reset the MB QP value to the "last transmitted QP"&lt;BR /&gt;					// Since no DeltaQP will be transmitted for a PCM block&lt;BR /&gt;					// This is important, since the Loop Filter will use the&lt;BR /&gt;					// this value in filtering this MB&lt;BR /&gt;					cur_mb.LocalMacroblockInfo-&amp;gt;QP = curr_slice-&amp;gt;m_iLastXmittedQP;&lt;BR /&gt;				}&lt;BR /&gt;&lt;BR /&gt;			} &lt;BR /&gt;			else&lt;BR /&gt;			{&lt;BR /&gt;				uRecompressMB = 0;&lt;BR /&gt;			}&lt;BR /&gt;		} while (uRecompressMB);        // End of the MB recompression loop.&lt;BR /&gt;&lt;BR /&gt;		// If the above MB encoding failed to efficiently predict the MB, then&lt;BR /&gt;		// code it as raw pixels using the mb_type = PCM&lt;BR /&gt;		if (uUsePCM)&lt;BR /&gt;		{&lt;BR /&gt;			cur_mb.GlobalMacroblockInfo-&amp;gt;mbtype = MBTYPE_PCM;&lt;BR /&gt;			cur_mb.LocalMacroblockInfo-&amp;gt;cbp_luma = 0xffff;&lt;BR /&gt;&lt;BR /&gt;			memset(cur_mb.MacroblockCoeffsInfo-&amp;gt;numCoeff, 16, 24);&lt;BR /&gt;&lt;BR /&gt;			Ipp32s  k;     // block number, 0 to 15&lt;BR /&gt;			for (k = 0; k &amp;lt; 16; k++) {&lt;BR /&gt;				cur_mb.intra_types&lt;K&gt; = 2;&lt;BR /&gt;				cur_mb.MVs[LIST_0]-&amp;gt;MotionVectors&lt;K&gt; = null_mv;&lt;BR /&gt;				cur_mb.MVs[LIST_1]-&amp;gt;MotionVectors&lt;K&gt; = null_mv;&lt;BR /&gt;				cur_mb.RefIdxs[LIST_0]-&amp;gt;RefIdxs&lt;K&gt; = -1;&lt;BR /&gt;				cur_mb.RefIdxs[LIST_1]-&amp;gt;RefIdxs&lt;K&gt; = -1;&lt;BR /&gt;			}&lt;BR /&gt;&lt;BR /&gt;			H264ENC_MAKE_NAME(H264CoreEncoder_Put_MBHeader_Real)(state, curr_slice);   // PCM values are written in the MB Header.&lt;BR /&gt;		}&lt;BR /&gt;&lt;BR /&gt;		if (core_enc-&amp;gt;m_PicParamSet.entropy_coding_mode){&lt;BR /&gt;			H264ENC_MAKE_NAME(H264BsReal_EncodeFinalSingleBin_CABAC)(&lt;BR /&gt;				pBitstream,&lt;BR /&gt;				(uMB == uFirstMB + uNumMBs - 1) ||&lt;BR /&gt;				(core_enc-&amp;gt;m_pCurrentFrame-&amp;gt;m_mbinfo.mbs[uMB + 1].slice_id != slice_num));&lt;BR /&gt;			H264ENC_MAKE_NAME(H264CoreEncoder_ReconstuctCBP)(&amp;amp;cur_mb);&lt;BR /&gt;		}&lt;BR /&gt;	}   // loop over MBs&lt;BR /&gt;&lt;BR /&gt;#ifndef NO_FINAL_SKIP_RUN&lt;BR /&gt;	// Check if the last N MBs were skip blocks.  If so, write a final skip run&lt;BR /&gt;	// NOTE!  This is _optional_.  The encoder is not required to do this, and&lt;BR /&gt;	// decoders need to be able to handle it either way.&lt;BR /&gt;&lt;BR /&gt;	// Even though skip runs are not written for I Slices, m_uSkipRun can only be&lt;BR /&gt;	// non-zero for non-I slices, so the following test is OK.&lt;BR /&gt;	if (curr_slice-&amp;gt;m_uSkipRun !=0 &amp;amp;&amp;amp; core_enc-&amp;gt;m_info.entropy_coding_mode==0) {&lt;BR /&gt;		H264ENC_MAKE_NAME(H264BsReal_PutVLCCode)(pBitstream, curr_slice-&amp;gt;m_uSkipRun);&lt;BR /&gt;	}&lt;BR /&gt;&lt;BR /&gt;#endif // NO_FINAL_SKIP_RUN&lt;BR /&gt;&lt;BR /&gt;	// save the frame class&lt;BR /&gt;&lt;BR /&gt;done:&lt;BR /&gt;	if (core_enc-&amp;gt;m_PicParamSet.entropy_coding_mode) {&lt;BR /&gt;		H264ENC_MAKE_NAME(H264BsReal_TerminateEncode_CABAC)(pBitstream);&lt;BR /&gt;	}&lt;BR /&gt;	else {&lt;BR /&gt;		H264BsBase_WriteTrailingBits(&amp;amp;pBitstream-&amp;gt;m_base);&lt;BR /&gt;	}&lt;BR /&gt;&lt;BR /&gt;	return status;&lt;BR /&gt;&lt;BR /&gt;}[/cpp]&lt;/K&gt;&lt;/K&gt;&lt;/K&gt;&lt;/K&gt;&lt;/K&gt;&lt;/PRE&gt;
&lt;BR /&gt;Do you have an idea of what is wrong ?&lt;BR /&gt;</description>
      <pubDate>Tue, 17 Feb 2009 08:34:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Speeding-up-H264-encoding-with-GPU/m-p/902992#M13017</guid>
      <dc:creator>chrisdo</dc:creator>
      <dc:date>2009-02-17T08:34:34Z</dc:date>
    </item>
    <item>
      <title>Re: Speeding up H264 encoding with GPU</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Speeding-up-H264-encoding-with-GPU/m-p/902993#M13018</link>
      <description>&amp;gt;&amp;gt;My problem is there, I'm stuck doing it : when I separate ME and encoding, the final encoded video is not as &amp;gt;&amp;gt;expected (it's all grey...).&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Regardless of your code, I'm afraid, there is a conceptual problem with your approach - you can't seperate "ME and encoding" due to the MB mode decision aimingatminimization ofthe encoding cost usually expressed as "D+lambda*R" to "optimize" a video quality loss vs. number of bits used for encoding. In other words,if youwant to encode a current MB in a reasonable/"optimal way"you should take a look atwhat is a modeof previous MB that has already been encoded, in particular a value of MV predictor. &lt;BR /&gt;Hope it helps,&lt;BR /&gt;&lt;BR /&gt;AndrewK&lt;BR /&gt;&lt;BR /&gt;PS.I'd reccomend to read some literature regarding the H.264 inter mode decision. Depending on what is exactly your ME algorithm youmightbe ableto"offload a low-level processing intensive part of ME" to GPU, but rather try to keep "encoding part" and mode decision on a single CPU / multicore with a shared memory. Alternatively, you might attempta multi-slice encoding but it has its own challenges.&lt;BR /&gt;</description>
      <pubDate>Tue, 17 Feb 2009 14:50:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Speeding-up-H264-encoding-with-GPU/m-p/902993#M13018</guid>
      <dc:creator>andrewk88</dc:creator>
      <dc:date>2009-02-17T14:50:07Z</dc:date>
    </item>
    <item>
      <title>Re: Speeding up H264 encoding with GPU</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Speeding-up-H264-encoding-with-GPU/m-p/902994#M13019</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/110237"&gt;andrewk88&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;Regardless of your code, I'm afraid, there is a conceptual problem with your approach - you can't seperate "ME and encoding" due to the MB mode decision aimingatminimization ofthe encoding cost usually expressed as "D+lambda*R" to "optimize" a video quality loss vs. number of bits used for encoding. In other words,if youwant to encode a current MB in a reasonable/"optimal way"you should take a look atwhat is a modeof previous MB that has already been encoded, in particular a value of MV predictor. &lt;BR /&gt;Hope it helps,&lt;BR /&gt;&lt;BR /&gt;AndrewK&lt;BR /&gt;&lt;BR /&gt;PS.I'd reccomend to read some literature regarding the H.264 inter mode decision. Depending on what is exactly your ME algorithm youmightbe ableto"offload a low-level processing intensive part of ME" to GPU, but rather try to keep "encoding part" and mode decision on a single CPU / multicore with a shared memory. Alternatively, you might attempta multi-slice encoding but it has its own challenges.&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Hello Andrew, thanks for your answer!&lt;BR /&gt;&lt;BR /&gt;It seems I was a little too optimistic... (and not enough documented) :D&lt;BR /&gt;&lt;BR /&gt;Do you think I could enhance something doing SAD calculation for the macroblocks of a frame on the GPU before doing the real ME and encoding ?&lt;BR /&gt;</description>
      <pubDate>Tue, 17 Feb 2009 17:15:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Speeding-up-H264-encoding-with-GPU/m-p/902994#M13019</guid>
      <dc:creator>chrisdo</dc:creator>
      <dc:date>2009-02-17T17:15:06Z</dc:date>
    </item>
    <item>
      <title>Re: Speeding up H264 encoding with GPU</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Speeding-up-H264-encoding-with-GPU/m-p/902995#M13020</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/415160"&gt;chrisdo&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Hello Andrew, thanks for your answer!&lt;BR /&gt;&lt;BR /&gt;It seems I was a little too optimistic... (and not enough documented) :D&lt;BR /&gt;&lt;BR /&gt;Do you think I could enhance something doing SAD calculation for the macroblocks of a frame on the GPU before doing the real ME and encoding ?&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Definitely, you could do a full pel SADs on the GPU assuming some knowledge of a "promising" Motion Vectors candidates/predictors unless you can afford doing the Full Search on the GPU and sendinga portion of the bestSADs back to CPU to make a MB mode decision there. Depending on your ME algorithm, very often those MVs candidates/predictors are derived based on previously encoded MBs, so you'd need to establish a CPU-GPU communication protocol on a MB processing basis andI'm not sure if that is your intention and how beneficial that might be vs. processing&amp;amp; communcation on a frame level basis (half pel interpolationwould bea good candidate to be offloaded on a frame basis)&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;&lt;BR /&gt;Andrew&lt;BR /&gt;</description>
      <pubDate>Wed, 18 Feb 2009 14:25:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Speeding-up-H264-encoding-with-GPU/m-p/902995#M13020</guid>
      <dc:creator>andrewk88</dc:creator>
      <dc:date>2009-02-18T14:25:34Z</dc:date>
    </item>
    <item>
      <title>Re: Speeding up H264 encoding with GPU</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Speeding-up-H264-encoding-with-GPU/m-p/902996#M13021</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/110237"&gt;andrewk88&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;Refer to the example/explanation in this article :&lt;BR /&gt;&lt;A href="http://ieeexplore.org/stamp/stamp.jsp?tp=&amp;amp;arnumber=424284972"&gt;http://ieeexplore.org/stamp/stamp.jsp?tp=&amp;amp;arnumber=424284972&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Definitely, you could do a full pel SADs on the GPU assuming some knowledge of a "promising" Motion Vectors candidates/predictors unless you can afford doing the Full Search on the GPU and sendinga portion of the bestSADs back to CPU to make a MB mode decision there. Depending on your ME algorithm, very often those MVs candidates/predictors are derived based on previously encoded MBs, so you'd need to establish a CPU-GPU communication protocol on a MB processing basis andI'm not sure if that is your intention and how beneficial that might be vs. processing&amp;amp; communcation on a frame level basis (half pel interpolationwould bea good candidate to be offloaded on a frame basis)&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;&lt;BR /&gt;Andrew&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;</description>
      <pubDate>Mon, 23 Mar 2009 21:51:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Speeding-up-H264-encoding-with-GPU/m-p/902996#M13021</guid>
      <dc:creator>Priya_Natarajan</dc:creator>
      <dc:date>2009-03-23T21:51:26Z</dc:date>
    </item>
  </channel>
</rss>

