What is Vectorization in NumPy?
Vectorization is the process of converting an algorithm from operating on a single value at a time to operating on a set of values (vector) at one time.
Hence, we can use these techniques to perform operations on Numpy arrays without using loops. It only uses the pre-defined inbuilt functions like the sum function etc., and mathematical operations like "+", "-", "/" etc. for performing operations in Numpy arrays.
Vectorization also uses the concept of broadcasting for performing operations in arrays of different sizes (discussed later in this article).
Let's look at the following examples using np.vectorize to understand the concept of vectorization:
Code:
Output:
Explanation: Two arrays, arr, and arr1, are created using the arange function. These both contain the same number of elements. The addition between the elements of an array is performed using the mathematical operator "+". The addition is in an element-by-element manner like the first element of arr is added to the first element of arr1 and so on.
We can also perform vectorization using the np.vectorize function. Let's see how to use the np.vectorize function.
Numpy Vectorization with the np.vectorize() Function
The NumPy vectorize function (np.vectorize) is provided by the Python library. It accepts a nested sequence of objects or a NumPy array as input and returns a single NumPy array or a tuple of NumPy arrays as output. With the addition of the NumPy broadcasting rules, the np vectorize function evaluates pyfunc(user-defined function parameter) over successive tuples of the input arrays, similar to the map function in Python.
Now let's see how np.vectorize() works with the help of examples given below:
Vectorized Implementation
Code:
Output:
Explanation: If we use the inbuilt sum function on array arr in place of the for loop, we get the same output as 45. This is vectorization.
Code:
Output:
Explanation: In this example, two arrays, arr, and arr1, are created using the arange function. The inbuilt function dot is used for performing dot products between the elements of an array.
Non-Vectorized Implementation
Code:
Output:
Explanation: In the above code example, an array is created using the arange function. Initiating the sum variable with zero. The elements of the array are added using a for loop, and the updated sum is stored in the sum variable. After printing the sum, we get 45 as an output.
Syntax of NumPy Vectorize
In the above syntax, np.vectorize is a function class with different parameters. Let's quickly understand the use of the parameters of the np vectorize function:
Required Parameters
- pyfunc: pyfunc is a required function or method. It is a parameter that can be called or defined by the user.
Optional Parameters of np.Vectorize
- otypes: otypes is the output datatype. It is the specified type of data that we want as an output. It can be a string or a list. By default, it is specified as none.
- doc: It is an optional parameter to the docstring. It can be a string. If it is none, then the docstring will be pyfunc_doc_str.
- Cache: If true and otypes is not given, it is the first function call that determines the number of outputs. It can be bool.
- excluded: The positional or keyword parameters for the functions that won't be vectorized are represented by this parameter as a collection of strings or integers. These parameters will be passed unmodified and directly to pyfunc.
- Signature: This universal function is generalized. It is used for vectorized matrix-vector function multiplication, e.g., (a, b), (b) -> (a). Whenever it is specified, pyfunc will be called with (and it must return) an array containing shapes determined by the size of the estimations for the look-at focus. Scalars are typically accepted as input and output for Pyfunc by default.
Vectorizing a Function in NumPy
In the following examples, using def, the add function is created by passing two parameters to perform an addition between them. After passing the add function and setting otypes to get the desired datatype inside the np.vectorize function, all work done by this function is saved in the vecfunc variable. Now we can use vecfunc variable as a function and can pass parameters directly into the vecfunc to get the desired result, as performed in the def function.
Code:
Output:
Explanation: By bypassing two parameters inside the vecfunc variable, and using np.vectorize function, we get the addition of two elements as the output of a float datatype.
Code:
Output:
Explanation: By passing arrays arr1 and arr2, which are created using the np.arange function inside the vecfunc function, and using the np.vectorize function, we get the sum of elements of the arrays as an integer data type.
np.vectorize() vs. Python for a Loop – Vectorization Speed Comparison
Vectorization takes less memory and is highly optimized with the NumPy program, and can be executed faster rather than using loops. Loops iterate over an array of elements one by one, which takes lots of time, but in the case of vectorization, we can process multiple elements of the array simultaneously, which increases the speed of the program. Let's see how vectorization is faster than using loops in Python with the help of the following program:
Code:
Output:
Explanation: In this example, the array arr is created using the arange function. The sum variable is initialized by 0, st_time, and ed_time variables, which calculate the starting time (program begins to execute) and ending time (at which time the program stops) of the program using the time module of Python. The subtraction of them gives the time taken in the execution of the program. Using a for loop, we add the array elements and get 45 as the sum of all the array elements.
Code:
Output:
Explanation: An array, arr, is created using the np.arange function. Two variables, st_time2, and ed_time2, calculate the starting time and end time of the program. Subtracting st_time2 from ed_time2 gives the execution time of the program. Elements of the array are added using the inbuilt sum function of Python, and the final answer is stored in arr2.
In the above examples, we saw that vectorization takes less time than using loops. That's why vectorization is faster.
Caching in NumPy Vectorization
The function will call the first parameter of the input to calculate the number of inputs if the optypes are not given. This result can be cached to stop the function from doing the same operation again and again. Cache implementation must only be utilized if the function evaluation is computationally expensive because it slows down future calls. By changing the parameter cache to true, the cache can be changed. Let's look at some more examples below for vectorization:
Example
Example 1: Using vectorized sum method on NumPy array.
Code:
Output:
Explanation:
Example 2: Using NumPy Exponential Function
Syntax:
Code:
Output:
Explanation:
An array, arr, is created using a simple NumPy method. Using the np.exp function on arr1, we get exponential values for each element of the array, and the final result is stored in the arr2 variable.
Using Python's built-in math Library Exponential Function:
Syntax:
Code:
Output:
Explanation: The array arr1 is created using the np.arange function. When we are using the in-built function math.exp of Python to find the exponential values of an array, we get TypeError because this function only expects a single value.
Code:
Output:
Explanation: In the above example, the exponential values of elements can be calculated. It doesn't matter if the number is negative.
Code:
Output:
Explanation: In this example, two arrays, arr, and arr1, are created using the arange function. The inbuilt function dot is used for performing dot products between the elements of an array.
Code:
Output:
Explanation: Two arrays, arr, and arr1, are created using the arange function. Vectorization using the inbuilt mathematical operator * is performed between elements of the array arr and arr1 in an element-wise manner, i.e., the first element of arr is multiplied by the first element of arr1, and so on.
Conclusion
- Vectorization performs operations on NumPy arrays using inbuilt functions without using loops.
- Python's time module is used for calculating the execution time of the program.
- Vectorization is faster than loops.
- For vectorization, the np.vectorize() function with some required and optional parameters is used.