Copies and Views in NumPy
Overview
While using NumPy, you may have noticed that some functions modify the array given as input while some create a copy of the input and modify the copy. Copy, in programming, refers to a new array created using an already existing array, and both of these are treated as separate entities. In contrast, view refers to a different approach to viewing the same NumPy array. In other words, it can be claimed that the view is physically located in the same memory location as the original array, but the copy is physically kept somewhere else.
Introduction
Let's introduce ourselves with some of the terminologies we will use throughout this article.
You may have noticed while using NumPy that some methods return the copy while others return the view. The primary distinction between copy and view is that the former is the new array while the latter is the original array's view. In other words, the view has the same memory address as the original array, while the duplicate is physically kept somewhere else.
The material stored at that specific address is returned, which is the copy of the input array, as opposed to view, which returns a different view of the same memory region. The copy of an input array is physically stored at some other location. There is also called something as a no copy. During regular assignments, an array object is not copied. Instead, it accesses the original array with the same id. Additionally, any modifications to either are reflected in the other.
In this article, we'll look at the process used to create the various copies and views from a memory location.
Note: Throughout the code snippets, we will use an alias for numpy as np*, to avoid repeatedly writing the keyword "numpy" over and over again. This saves time and makes the code look neat.
Creating Copies in NumPy
What is "copy" in NumPy?
A "copy", as the word itself states, is a duplicate of something. But here, concerning NumPy arrays, what does it mean?
Copy in NumPy means that an entirely new entity has been created from an already existing one. So in proper terminology, copy means a new NumPy array has been created using an already existing array, and both of these are two separate entities.
Let us understand this with an example.
To do this, we will use the help of a NumPy function called copy().
Syntax: ndarray. copy ndarray refers to an n-dimensional array created from the NumPy package.
Output:
As shown here, the array new_items was created using the copy() function from NumPy. Also, new_items is an exact copy of the array items. We should observe the fact that the function id() returns the unique identity of the arrays, and we can see that both these arrays have different identities, implying the fact that these two arrays are two separate entities. And a brand new array new_items was created from the array items. Let's dive deep into the copy() function.
The "copy" owns the data, meaning that the data is recreated and stored in a different memory location. And any changes made to the copy will not affect the original array because the data is no longer the same, it was created in a different memory location, and any changes made to the original array will not affect the copy.
If the user does not specify a copy, NumPy will intrinsically call a no-copy.
While using copy() on a NumPy array, a new data buffer and new metadata are created along with it. So, even though it is a copy, it is unique in its way.
Any changes made to the copy array will not affect the original array.
Since the elements of the original array have to be recreated, copy() is slower.
For the above-mentioned reason, copy in NumPy is often called Deep Copy.
Note: Deep copy stores the copy of the original array and recursively copies the elements in it as well.
A copy can be used for various purposes.
For instance, There will be multiple times when an array needs to be used over and over again and modified each time. So, to keep track of the original values, it is easier to use a copy rather than create a new array with the same elements from scratch.
Let's see how copy() works.
Note: The syntax for using copy() was previously mentioned.
Output:
Here, we can see that id() returned different identities for the copy and the original arrays.
And also, every element of the copy was changed, adding the value of 10 to it. But the changes made in the copy array new_items were not reflected in the original array items. Hence, we can infer from this that, the changes made to the copy array are not reflected in the original array.
But is the opposite true? Are the original array's changes reflected in the copy array?
Let's see.
The same change will be made in the NumPy array items, but the copy array new_items will be left unchanged.
Output:
Hence, we can safely infer that changes made to any of the copy or original arrays will not be reflected in the original or copy arrays, respectively.
More optional parameters of copy() can be found in the official NumPy documentation
NumPy View Creation
What Does "view" Mean?
We have seen what copy does. Now let's look into what view does.
As the word means, it is the ability to see something or to be seen from a particular place. In computer science, a view often means a similar thing. Now let's talk about NumPy arrays. A view is a different perspective of the same array. "View" is similar to "copy," yet different. Let's see how. We can understand view better with the help of a code snippet.
To do this, we will make use of a Numpy function called view().
Syntax: ndarray. view
ndarray refers to an n-dimensional array created from the NumPy package.
Output:
Here, the array new_items is a view of the original array items. But how is it different from copy? While using view(), any change made to the elements in new_items will reflect those changes in the original array items, whereas a copy() does not.
The "view" does not own the data, meaning that the data remains the same as that of the original NumPy array. And any changes made to the copy will be affected the original array. Because the data is the same, it was not recreated in a different memory location, and any changes made to the original array will affect the copy.
If the user does not specify a view, NumPy will intrinsically call a no-copy.
While using view() on a NumPy array, a data buffer is not changed and remains the same, but new metadata is created along with it. While being a copy of the original NumPy array, it maintained a unique memory location on its own.
Any changes made to the copy array will be reflected in the original NumPy array.
Since the elements of the original array and the copy share the same data buffer in view, it is comparatively faster than copy. However, it is slower than a no-copy because it does not require the creation of new metadata.
View in NumPy is often called a Shallow Copy.
Shallow Copy: Refers to the copy of an array whose properties share the same references (point to the same values) as those of the original array from which the copy was made.
A view can be used for various purposes.
For instance,
There will be multiple scenarios where an array needs to be modified over and over again. When the same array must be used in two different instances, it is simpler to use a view rather than create a new array with the same elements from scratch and keep modifying the changes in it because views use the same data buffer but different metadata.
Let's see how view() works.
The Syntax for using view() was mentioned above.
Output:
Here, the array new_items was created as a view of the original NumPy array items.
The unique identities returned by id() are different here.
A change by adding the value 10 to the first element of the copy new_items was introduced. And we can see that this change was reflected in the original NumPy array items as well. This is because view shares the same data buffer but not the same metadata.
Now let's see if the inverse holds. The changes made in the original array will be reflected in the copy array.
Output:
For the same reason mentioned before, the changes made to the original NumPy array items were reflected in the copy new_items for the same reason, because they share the same data buffer and different metadata.
Hence, we can safely infer that changes made to any of the copy or original arrays will be reflected in the original or copy arrays, respectively.
More view() parameters are available in the official NumPy documentation.
You can also create views of an array by selecting a slice of the original array, or also by changing the dtype of the NumPy array (or a combination of both).
- Slice Views
In this type of view creation, we can perform slicing of the original array. Then we can address the view by using offsets, strides, and counts of the original array.
Before we jump into the code, let's understand each term mentioned above.
-
offsets: is an integer indicating the distance between the beginning of the object and a given element or point. Same as that of general programming jargon.
-
strides: The strides of a NumPy array tell us how many bytes we have to skip in memory to move to the next position along the specified axis. For example, we have to skip 4 bytes (1 value) to move to the next column but 20 bytes (5 values) to get to the same position in the next row for a 2 x 5 NumPy array.
-
counts: Counts for elements in the array we need to process at a time.
Now let's see how it works.
Output:
Here the offset was 1, the strides were 10, and the count specified was 2.
The array items_view created by slicing the original array items acts as a view.
- NumPy dtype view
Another way to create array views is by assigning another dtype to the same data area.
dtype: data type of the NumPy array.
In this case, we change the data type, or the dtype of the original NumPy array, and generate a view of the array.
Let's see how it's done.
Output :
Here, a new view called view_array was created by changing the dtype of the original NumPy array items.
We can observe that, initially, the dtype of items was specified to int32, but when the view was created, the dtype was changed to int16, and as a result, a value of 0 was inserted after every element in the array to compensate for the change in dtype from 32 bits to 16 bits.
A question might pop into your mind, why was only one 0 appended?
The reason is that the view_array holds elements of 16 bits, so two 16 bits together form 32 bits, so a 16 bit value of 0 was appended to every element in the view_array.
If you were to change the dtype of items to int64, three 0s would have been appended, as four 16 bit numbers together form a 64 bit number.
What is a no-copy in NumPy?
A no-copy is very much like a view, but it is intrinsically (done internally by the package) called, and we do not require any additional functions.
What does this mean? Let's see.
Output:
Here everything is the same as in view(). The changes made to the array new_items will be reflected in the original array items. But additionally, we can see a difference from view() here. It is the value, i.e. the unique identity returned by id(), is the same, meaning that new_items is just a reference to the original array items, and changes made in any of the arrays will be reflected in both the arrays.
Indexing Operations in NumPy
Indexing is a technique to address a single element of an array or a specific part of an n-dimensional array by specifying the position of the element in the array.
In Python, indexing starts from 0.
For example
Output:
Now, regarding copies and views, let's see how indexing affects them.
Output
Since the base() function returned an array, we can infer that indexing creates a view of the array.
But, is it the same when we do complex or advanced indexing on the array?
Let's find out.
Output:
Let's go through the code. Here, the array new_items was created by indexing from the original NumPy array items.
The code above means that new_items will contain the elements in the second and third row of the original NumPy array items (since indexing starts from 0).
Since the base() function returned None, it implies that the array new_items is a copy of the original NumPy array items.
So to sum it up, indexing on a NumPy array yields a view of the same array, whereas advanced or complex indexing on the NumPy array yields a copy of the same array.
Other Operations
Now that we've seen the effect of indexing on the array concerning copies and views,
Let's look at the changes that some other functions bring.
- numpy.reshape() The numpy. reshape function creates a view of the NumPy array where possible, or a copy otherwise. In most cases, the strides can be modified to reshape the array with a view.
Output:
Here, the base() function returned an array, meaning that reshape() on an array creates its view.
However, in some cases where the array becomes non-contiguous (perhaps after a transpose operation), the reshaping cannot be done by modifying strides and requires a copy.
In these cases, we can raise an error by assigning the new shape to the shape attribute of the array.
Output:
- numpy.ravel() Taking the example of another operation, numpy. ravel returns a contiguous flattened view of the array wherever possible.
Output:
Since after the ravel() function base() returned an array, we can infer that the ravel() operation on a NumPy array creates a view of the array.
- numpy.flatten() While ravel() returns a view of the array, numpy. flatten always returns a flattened copy of the array.
Output:
Because base() returned None after the flatten() function, we can conclude that the flatten() operation on a NumPy array creates a copy of the array.
How to Recognize if an Array is a Copy or a View?
Now that we have understood how to create copies and views in NumPy.
These copies and views are created during NumPy array operations as well.
How will we know if a NumPy array created is a view or a copy?
Not knowing this can be fatal while working with NumPy on applications.
To identify whether a NumPy array is a view or a copy, we make use of a NumPy function called base().
Syntax: ndarray. base
input: an n-dimensional NumPy array. output: an n-dimensional NumPy array if the array is a view, None if the array is a copy.
Output:
Here we can see that base() returned a value None for the copy made from the original array and returned the array itself for the view of the array.
Checking if the NumPy Array Owns its Data
We now know that a copy owns its data while the view does not. We can check this with the base function that returns a None value if the array owns its data. Similar to the above case.
Output:
So in essence, only a copy of a NumPy array can own its data.
Difference between NumPy Copy Vs. View
The main differences between copy and view are mentioned below.
The "copy" owns the data, meaning that the data is recreated and stored in a different memory location. And any changes made to the copy will not affect the original array because the data is no longer the same, it was created in a different memory location, and any changes made to the original array will not affect the copy.
If the user does not specify a copy, NumPy will intrinsically call a no-copy.
While using copy() on a NumPy array, a new data buffer and new metadata are created along with it. So, even though it is a copy, it is unique in its way.
Any changes made to the copy array will not affect the original array.
Since the elements of the original array have to be recreated, copy() is slower.
Copy in NumPy is often called deep copy for the above-mentioned reason.
The "view" does not own the data, meaning that the data remains the same as that of the original NumPy array. And any changes made to the copy will be affected the original array. Because the data is the same, it was not recreated in a different memory location, and any changes made to the original array will affect the copy.
If the user does not specify a view, NumPy will intrinsically call a no-copy.
When you use view() on a NumPy array, the data buffer is not changed; instead, new metadata is created along with it. While being a copy of the original NumPy array, it maintained its unique memory location on its own.
Any changes made to the copy array will be reflected in the original NumPy array.
Since the elements of the original array and the copy share the same data buffer in view, it is comparatively faster than copy. But slower than a no-copy as it doesn't even require the creation of new metadata.
A Tabular representation of the same is given below,
Copy | View | No-Copy |
---|---|---|
Owns the Data | Does not own the data | Does not own the data |
Called Deep Copy | Called Shallow Copy | Called No Copy |
Changes made in copy will not affect the original | Changes made in copy will affect the original | Changes made in copy will affect the original |
Does not use the same data buffer as original | Uses the same data buffer as original | Uses the same data buffer as original |
Does not use the same metadata as original | Uses the same data buffer as original | Uses the same data buffer as original |
Explicit mention is required | Explicit mention is required | No Explicit mention is required |
Uses numpy.copy() | Uses numpy.view() | Normal assignment creates a no-copy |
Slow | Fast | Fastest |
Conclusion
- A copy contains the same elements as the original array but is treated as two separate entities.
- A view is only a different perspective of the original array.
- A copy owns its data, whereas a view does not.
- Different methods to create copies and views in NumPy.
- Creation of copy or view in NumPy Operations.
- Copy or view identification.
- Distinctions between copy and view.