NumPy Array Creation
Overview
The NumPy library's computations are entirely built on top of NumPy arrays. They are straightforward Python lists that have a few extra features. The dtype attribute of a NumPy array indicates the data type of each element in the array. All elements are cast into the largest type if the array contains elements of several data types (a process known as upcasting). A NumPy array can also be a reference to, or copy of, another NumPy array.
Introduction
Before we dive into the implementation, let’s understand each term clearly, its meaning, and its importance.
So let's start with our first hurdle, the term "arrays." Let us first understand what arrays are and then move to the higher terminologies in them.
By definition, an array could be defined as a group of similar data or variables referenced under a common name.
Now that we have some understanding of the term "arrays," let's move on to our next topic. While dealing with programming in Python, one of the most common terms heard is NumPy, but what is NumPy? Numpy is a Python package primarily used for numerical computation.
NumPy stands for Numerical Python. It is used to process and compute operations on vectors and arrays. It has a lot of inbuilt statistical and mathematical functions, which saves you time.
But you might also wonder why you use vectors and arrays. Why not just numbers? From a computer’s perspective, the time required to do any operation on ‘n’ numbers and ‘n’ arrays, vectors, or matrices requires the same time. So instead of doing the same operation ‘n’ times, we do it in one go to save time. The main use of this comes from its use in the fields of machine learning and data science since they require heavy computations.
Benefits of NumPy Arrays
A question that might pop into your mind might be, hey, we have ‘lists’ in Python, so why not use them? The reason is that lists are a collection of items with different data types, so each item is not stored in a contiguous memory location, so fetching and collectively using these items in computation takes time, whereas arrays, on the other hand, contain items with similar data types and are stored in contiguous memory locations, so computation on arrays is faster.
To sum it up, the reason why Numpy is used over lists is that it's faster, way faster, about 50 times faster than lists. It also has a clean code structure and is built on top of C++, which is considered to be one of the most efficient and fastest general-purpose programming languages.
An array consumes less memory and is more convenient to use than lists. NumPy arrays use much less memory to store data, and they provide the virtue of specifying the data types of the elements. This allows the code to be optimized even further.
NumPy is not just more efficient, it is also more convenient. You get a lot of vector and matrix operations free of charge, which sometimes allows you to avoid unnecessary work and save time. And they are also efficiently implemented.
Now we have a basic understanding of NumPy.
Let’s dive into the core of this article, implementation. We will see different methods to create a NumPy array.
Different Methods of Creating Arrays
Let's look at different methods to create arrays in Python.
Creating Arrays Using Python Data Structures
We can create a collection of elements under a common name using the data structures available in Python. Specifically, lists and tuples.
Let's see how it goes.
Lists
Lists are a collection of elements of dissimilar data types referenced under a common name. You can perform various operations like append, delete, pop, etc on a list. Let's see how it works
Ouput:
Tuples
We can also use another data structure available in Python called tuples to do the same. So what's the difference? Unlike lists, tuples do not support modifications of their elements. So why do we still use them? There are instances while programming when you need to place constants in an array that do not change throughout the program, and tuples are the best way to do so.
For example, you can use it to store the RGB color values of colors in tuples as it will not change during the program. Let's see how we can use tuples.
Ouput:
Creating Arrays Using Numpy
Arrays created from the NumPy package are often called NumPy arrays.They have the data type numpy. ndarray, which means an n-dimensional NumPy array from the NumPy package.
There are six ways to create NumPy arrays. Let's look at them one by one.
Note: Throughout the code snippets, we will use an alias for numpy as np, to avoid repeatedly writing the keyword "numpy" over and over again. This saves time and makes the code look neat.
Conversion from Lists and Tuples in Python
One way to create a Numpy array is by converting existing Python lists and tuples into Numpy arrays using a function called array().
Let's see how things go.
Ouput:
Here we can see that the list was converted into an n-dimensional NumPy array. We can also perform this for multidimensional arrays (lists of lists).
Ouput:
For the function array(), we can also specify an optional argument called dtype, which stands for the data type. It allows us to specify the data type for the elements in our NumPy array, giving us a feeling of how elements are processed in a C/C++ function. dtype can take values like int8 (8-bit integer), int32 (32-bit integer), etc.
dtype is used to resolve data type mismatches in NumPy arrays, and we have to explicitly mention it for a specific type.
If two numpy arrays have the same dtype and operations are performed on them, then the resultant NumPy array will also have the same dtype. Or else, if two NumPy arrays have different dtypes, then NumPy will assign a dype suitable for all elements in the NumPy array.
Ouput:
Now let's see for different dtypes,
Output:
Here, each element in the NumPy array was converted to a float64 dtype, which was intrinsically assigned by NumPy.
Creation using Numpy functions
Now let's see how we can create an array using NumPy functions. Here we will be using two functions to create an array. Specifically, arange() and linspace().
Let's begin with arange().
arange() is a function that creates a sequential array based on the given input. Let's see how it goes.
Ouput:
Here, a point to be noted is that the NumPy array created will have elements ranging from 0 to input-1.
Now that we've seen arrange(), let's move on to linspace().
linspace() stands for linearly spaced. It creates a NumPy array where every element is linearly spaced, i.e., the distance between every consecutive element is the same.
Let's see linspace () in action.
Ouput:
Here, a NumPy array of 6 elements is created where every consecutive element is linearly spaced and the value of elements ranges from 1 to 5.
These functions are used in various tasks such as arithmetic progression problems and sorting algorithm testing, among others.
Joining, Mutating, and Replicating existing arrays Let's see this through with an example.
Ouput:
Here we are not creating an array exactly, we are using a variable b to reference an existing NumPy array and hence converting the variable to a NumPy array.
In this example, thea was an existing NumPy array, and b is a variable reference to the first two elements of a. An operation +1 was done on b, adding 1 to every element in b. Since b was already referenced to the first two elements of a, the first two elements of a will change too. We can replicate an existing array as well, using the function copy(), much like lists.
Ouput:
Here, b is a copy of a, not referenced to a, so changes in b will not affect a.
Reading Arrays from Disk, or Standard Formats
This type of array creation can be used while dealing with documents.
Standard Binary Formats
For array data, different fields have specified forms. The list below includes those that can be read by well-known Python libraries and return NumPy arrays (there may be others for which it is possible to read and convert to NumPy arrays so check the last section as well)
- HDF5: h5py
- FITS: Astropy
Common ASCII Formats
Programs like Excel and LabView use delimited files, such as comma-separated value (CSV) and tab-separated value (TSV) files. These files can be read and parsed line-by-line by Python routines.For example,reading text We can use np.loadtxt(file_name),
Ouput:
From the input text, we created a NumPy array. This can also be done using file pointers and operations.
For CSV files,
Ouput:
We created a NumPy array from a CSV file here.
Creating Arrays from Raw Bytes through the Use of Strings or Buffers
We can create NumPy arrays from different simple file formats by using np.fromfile (file_name) and np.tofile (directory). You can refer to the official NumPy documentation for more information.
Use of Special Library Functions
With the use of extensive Python libraries like Scipy, OpenCV, Pandas, etc., NumPy arrays can be created, as they use the common format numpy.ndarray for data exchange.
How to Reshape a NumPy Array?
Now that we have seen how to create a NumPy array, we will look into different ways to structure them and use them in different tasks.
First of all, let’s understand what reshaping means.
In essence, reshaping can be said to change the shape of the NumPy array to suit our needs.
The shape of a NumPy array is determined by the number of elements in each dimension it has.
By reshaping our NumPy array, we can add or remove dimensions and also change the number of elements in each dimension.
One of the use cases is that, during various tasks, we have to deal with matrix operations, and we know that matrices follow a certain syntax while performing operations. For example, we have row and column constraints while multiplying two or more matrices.
Such as, to multiply a matrix by another matrix, we need to do the dot product of rows and columns.
To multiply an mn order matrix by an np order matrix, the n’s must be the same, and the resultant matrix is of the order mp matrix.
We face these kinds of problems in various tasks. We have to change the shape of the matrices to proceed forward.
That is where an inbuilt function called reshape() in the Numpy package becomes useful.
Reshape allows us to reshape the NumPy array into whatever shape we need.
Let’s see how we can use this.
Syntax :
order is the dimensions of the new numpy array
Ouput:
The input NumPy array had 12 elements and was a single-dimensional array in this case. Using the NumPy function reshape(), we converted that into a two-dimensional array with 4 rows and 3 columns.
How to Flatten a NumPy Array?
Flattening is a concept in vector mathematics where a vector of n dimensions is flattened into a vector of only a single dimension.
For example, in deep learning, the input to a neural network should be a single-dimensional vector, but the features we receive might not be limited to a single dimension, hence we use flattening there to provide input to each neuron in the layer.
Let’s see how it’s used.
Syntax:
It takes an optional argument 'order', which specifies how to flatten the NumPy array. You can refer to the documentation for that.
Ouput:
Here we created a NumPy called arr and using the NumPy function flatten, the 2 dimensional NumPy array was flattened into a 1 dimensional NumPy array.
Conclusion
Let's recall what we have learned here.
- An array is a group of similar items referenced under a common name.
- NumPy is a Python library for performing efficient vector and array computations.
- Various methods to create a NumPy array
- Reshaping is a technique to convert an n-dimensional array into the specified dimensions and it can be achieved by using the NumPy function reshape().
- Flattening is the process of reducing an n-dimensional array to a single-dimensional array and it can be done using the Numpy flatten().