developing on top of the intel extension for transformers

AlignmentLabAI · ‎12-14-2023

is there a well known or recommended way to scale out on the sapphire rapids that its optimized for, once weve finished building out the most efficient stack we can on it? im trying to get my organization comfortable using intel cpu frameworks in general in preparation for gaudi's to represent more of the available ecosystem going forward so im trying to optimize our experiment spend and look at how scaling a cpu focused stack looks

kta_intel · ‎12-14-2023

Hey, thanks for the question. Glad to hear your org is interested in Intel CPU and Gaudi

Can you clarify what you mean by scaling in this context?

Overall, it'll probably depend on specific use case, but if you're interested in optimizing Intel Extension for Transformers to continue to be performant on sapphire rapids as you expand your user base and/or if you're interested in optimizing it so that it will be more performant as you increase your server size, general recommendation is to start by leveraging low precision data type (ie INT4/INT8) and data parallelization (i.e. DPP): https://github.com/intel/intel-extension-for-transformers