Replacing Python loops: Aggregations and Reductions

Ramya_Ravi · ‎06-04-2024

How to replace slow Python loops by strategic function equivalents for aggregating data

This article was originally published on medium.com.

Posted on behalf of:

Bob Chesebrough, Solutions Architect, Intel Corporation

This article demonstrates how to address common concepts for aggregating or reducing data dimensions using functions similar to sums and means and other aggregation functions implemented as Universal Functions in NumPy.

Aggregations/ Reductions:

Aggregations or reductions are means of compressing or shrinking the dimensionality of data into a summarized state using mathematical functions.

Some examples would be computing a sum, an average, the median, the standard deviation, or the mode of a column or row of data, as seen in the table below:

Table 1: Examples of common aggregations

But there are more ways than these to aggregate data that are used commonly in data science and statistics. Other common aggregations are min, max, argmin, argmax.

It is worthwhile exploring the wide set of methods provided by NumPy ufuncs which contain many of the aggregations.

Below is a table of aggregations commonly used in machine learning:

Table 2: UFuncs commonly used in machine learning

Benefits of replacing “roll your own” Python loops with NumPy Aggregations:

As I explained in “Accelerate Numerical Calculations in NumPy With Intel oneAPI Math Kernel Library,” NumPy, powered by Intel oneAPI, can achieve higher levels of readability, maintainability, future proofing your code for new hardware and software library optimizations, and performance than a straight forward “roll your own” loop.

I’ll provide an example below of my “roll your own” Python code for computing the mean and standard deviation, since this is done frequently in machine learning, and compare the readability and performance of my code versus a straightforward use of NumPy to achieve the same computation.


rng = np.random.default_rng(2021)
# random.default_range is the recommended method for generated random's
# see blog "Stop using numpy.random.seed()" for reasoning
# https://towardsdatascience.com/stop-using-numpy-random-seed-581a9972805f

a = rng.random((10_000_000,))
t1 = time.time()
timing = {} # collect timing information
S = 0
for i in range (len(a)): # compute the sum
S += a[i]
mean = S/len(a) # compute the mean
std = 0
for i in range (len(a)): # compute the sum of the differences from the mean
d = a[i] - mean
std += d*d
std = np.sqrt(std/len(a)) #compute the standard deviation
timing['loop'] = time.time() - t1
print("mean", mean)
print("std", std)
print(timing)

Compare the code above to the much more readable and maintainable code below:

t1 = time.time()
print(a.mean())
print(a.std())
>
timing['numpy'] = time.time() - t1
print(timing)
print(f"Acceleration {timing['loop']/timing['numpy']:4.1f} X")

As you can see, the NumPy code is much more readable ( 3 lines of code versus 16 lines), more maintainable, and it is faster too. The timing comparison is as follows ‘naive loop’: 2.8 seconds, NumPy mean/std: .04 seconds. This is a speedup over the naive code by a factor of over 60X.

Chart 1: speedup of NumPy mean/std code over my naive Python implementation

Flexibility of NumPy Aggregations

Another aspect of many NumPy aggregations/reductions is the ability to specify an axis over which to apply the reducing function. As an example, consider the NumPy sum() function. Without an axis specifier it adds ALL the values in an N dimensions array to arrive at a scalar value. In the example below I specify axis=0, it returns a row vector that sums each column of the matrix.

Example 1: Summing on axis = 0

Alternatively, if I specify axis=1, I get a column vector whose entries are the sums of each row.

Example 2: Summing on axis = 108_03_NumPy_Aggregations.ipynb

Flexibility is provided by specifying parameters for each aggregation function to accommodate variations in how the aggregation should be done as we see in the sum example above.
Get the code for this article (Jupyter notebook: 08_03_NumPy_Aggregations.ipynb) and the rest of the series on GitHub.

Next Steps

Try out this code sample using the standard free Intel® Tiber™ Developer Cloud account and the ready-made Jupyter Notebook.

We encourage you to also check out and incorporate Intel’s other AI/ML Framework optimizations and end-to-end portfolio of tools into your AI workflow and learn about the unified, open, standards-based oneAPI programming model that forms the foundation of Intel’s AI Software Portfolio to help you prepare, build, deploy, and scale your AI solutions.

Intel Developer Cloud System Configuration as tested:

x86_64, CPU op-mode(s): 32-bit, 64-bit, Address sizes: 52 bits physical, 57 bits virtual, Byte Order: Little Endian, CPU(s): 224, On-line CPU(s) list: 0–223, Vendor ID: GenuineIntel, Model name: Intel(R) Xeon(R) Platinum 8480+, CPU family: 6, Model: 143, Thread(s) per core: 2, Core(s) per socket: 56, Socket(s): 2, Stepping: 8, CPU max MHz: 3800.0000, CPU min MHz: 800.0000