Taking Big Data Analytics to the Next Level with Native Vectorized Databases

MaryT_Intel · ‎10-01-2021

Content provided by Kinetica

Modern x86 processors include vector units that can operate on multiple data objects with a single instruction, otherwise known as Single Instruction, Multiple Data (SIMD) units. Vectorization is the process of converting an algorithm from operating on a single value at a time to operating on a set of values at one time. Big data analytics workloads involving summing, counting, predicate joins, window functions, derived columns, graph solving, rendering visualizations at scale, and others can be accelerated by order of magnitudes by vectorization.

For many years, database vendors have been incorporating vectorization into limited specific operations. While this has yielded incremental performance gains, it has yet to disrupt the database market such as other performance innovations like in-memory or scale-out architectures. This is about to change with the rise of the native vectorized database.

Native vectorized databases are built from the ground up to leverage modern cloud hardware to capitalize on both data-level and instruction level parallelism. Native vectorized databases auto-vectorize queries at the kernel level, use vector performance primitive libraries, and include analytic functions that are fully vectorized. The result is extremely fast parallel processing of big data. Intel’s benchmarks show improvements from vectorization to be as much as 16X.

Kinetica is a native vectorize database, and recently announced general availability of its product as-a-Service in the Microsoft Azure Marketplace. Kinetica started off as a custom solution for the National Security Administration (NSA), and has recently seen a surge in growth in both the defense industry with new customers like NORAD, and with innovative commercial accounts like Citibank and OVO. The availability of Kinetica in Microsoft Azure will mark the first time that native vectorized database capabilities will be available to companies of all sizes as-a-service in the cloud.

Kinetica leverages the Intel® Advanced Vector Extensions 512 (Intel® AVX-512) to vastly accelerate big data analytics and achieve greater performance on fewer nodes. A major US retailer was able to replace a 100 node NoSQL cluster with an 8 node Kinetica cluster. A global pharmaceutical company took out an 88 node SQL on Hadoop cluster for 6 node Kinetica cluster. These extraordinary performance gains illustrate the impact from native vectorization with Intel AVX-512 to accelerate performance while slashing data infrastructure spending.

Notices and Disclaimers

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.
Your costs and results may vary.
Intel technologies may require enabled hardware, software or service activation.
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.