Multithreading in Python
Introduction
We all know the famous saying- Time is money. It is always said that humans are not built for multitasking. But that is where machines come in.
So while you might not be able to multitask easily, we will talk about the tips and tricks that increase performance and reduce time consumption, in the case of computers. Can you guess what we are going to learn about today?
The multitasking approach that we will discuss in this tutorial is multithreading in Python!
What is a Process?
A program in execution is known as a process. When you start any app or program on your computer, such as the internet browser, the operating system treats it as a process.
A process may consist of several threads of execution that may execute concurrently. In other words, we can say a process facilitates multithreading.
What is a Thread?
In Computer Science, a thread is synonymous with lightweight processes. So a thread is nothing but an independent flow of execution. It can also be defined as an instance of a process.
Well, you must be wondering, what's a process? It's nothing but a program in execution.
Simply put, a thread is a sequence of instructions that the computer performs. It is executed independently. Depending on the scenario, a thread can also be pre-empted or put to sleep.
Note-Preemption refers to the action of stopping a task temporarily to continue it at a later time. So, in this case, the thread can be temporarily interrupted by the processor.
But in the case of the implementations of Python 3, the threads merely appear to be executing simultaneously.
To facilitate multithreading in Python, we can make use of the following modules offered by Python –
- Thread Module
- Threading Module
With the Threading module in Python, which provides a very intuitive API for spawning threads, we can perform multithreading in Python quite effectively.
What is Multithreading in Python?
If you wish to save time and improve performance, you should use multithreading in Python!
Multithreading in Python is a popular technique that enables multiple tasks to be executed simultaneously. In simple words, the ability of a processor to execute multiple threads simultaneously is known as multithreading.
Python multithreading facilitates sharing data space and resources of multiple threads with the main thread. It allows efficient and easy communication between the threads.
What is Multiprocessing?
The ability of a processor to execute several unrelated processes simultaneously is known as multiprocessing. These processes do not share any resources.
Multiprocessing breaks down processes into smaller routines that run independently. The more tasks a single processor is burdened with, the more difficult it becomes for the processor to keep track of them.
It evidently gives rise to the need for multiprocessing. Multiprocessing tries to ensure that every processor gets its own processor/processor core and that execution is hassle-free.
Note-In the case of multicore processor systems like Intel i3, a processor core is allotted to a process.
Python Multithreading vs Multiprocessing
The main differences to note between Multithreading and Multiprocessing are as follows –
Multithreading | Multiprocessing |
---|---|
It is a technique where a process spawns multiple threads simultaneously. | It is the technique where multiple processes run across multiple processors/processor cores simultaneously. |
Python multithreading implements concurrency. | Python multiprocessing implements parallelism in its truest form. |
It gives the illusion that they are running parallelly, but they work in a concurrent manner. | It is parallel in the sense that the multiprocessing module facilitates the running of independent processes parallelly by using subprocesses. |
In multithreading, the GIL or Global Interpreter Lock prevents the threads from running simultaneously. | In multiprocessing, each process has its own Python Interpreter performing the execution. |
Why and When to Use Multithreading in Python?
If you wish to break down your tasks and applications into multiple sub-tasks and then execute them simultaneously, then multithreading in Python is your best bet.
All important aspects such as performance, rendering, speed and time consumption will drastically be improved by using proper Python multithreading.
Multithreading in Python should be used only when there is no existing inter-dependency between the threads.
Starting a New Thread
After learning what a thread is, the next step is to learn how to create one. Python offers a standard library called "threading" to perform multithreading in Python.
The syntax to create a new thread is as follows –
In Python multithreading, there are two ways in which you can start a new thread-
1. Using the Threading Module
Let's take a look at the code using which we can create a new thread using the Threading Module –
Output:
2. Using the Thread Module
The way to create a new thread using the Thread module is as follows –
Output:
Note: The output of the above code snippet can differ for different runs. It happens because multithreading in Python using the _thread module is unstable.
No one can tell which thread is going to get executed first. The _thread module treats threads as functions while the Threading module is implemented in an object-oriented way, which means every thread corresponds to an object. The _thread module has also been deprecated and is only used for backward incompatibilities in Python3.
Out of the two methods, the recommended one is using the Threading module, which we will explore in-depth in the upcoming sections of the article. The Threading module is preferred as its intuitive APIs help us synchronize thread executions, thus making it predictable and highly reliable.
Working with Multiple Threads
More often than not, you will be working with multiple threads and doing exciting work with them.
You can create the threads individually, as we saw above. But there is an easier and more efficient way of doing that. We do this by using the ThreadPoolExecutor in Python!
Using the ThreadPoolExecutor
The easiest way to work with multiple threads is by using the ThreadPoolExecutor, part of the standard Python library. It falls under the concurrent.features library.
Using the with statement, you can create a context manager. It would enable you to create and delete a pool efficiently. We can also import the ThreadPoolExecutor directly from the concurrent.features library.
The syntax for creating a ThreadPoolExecutor is-
The max_worker parameter can take any integer value based on the scenario.
Let's take a look at the following code to see how to achieve this –
Output –
The Threading Module
The threading module is the high-level implementation of multithreading in Python. It is the go-to approach for managing multithreaded applications. The benefits of the Threading module outweigh the ones of the Thread module.
Along with the methods of the Thread module, the Threading module offers additional methods such as –
- threading.activeCount() − This returns the number of active thread objects.
- threading.currentThread() − This returns the number of objects that are under the thread control of the caller.
- threading.enumerate() −This returns the list of all thread objects that are presently active
Race Conditions
Race conditions are one of the primary issues you will face while working with multithreading in Python.
Most commonly, race conditions happen when two or more threads access the shared piece of data and resource. In real-time multithreading in Python, it can occur when threads overlap. The solution to this is synchronizing threads which we will see further.
Synchronizing Threads
In Python, you can implement the locking mechanism to enable you to synchronize the threads.
The low-level synchronization primitive lock is implemented through the _thread module.
A thread can have one of the following two states –
- Locked
- Unlocked
The class that is used to implement primitive locks is known as Lock. Lock objects are created to make the threads run synchronously. Only one thread at a time can have a lock.
The two methods supported by a lock object are –
- acquire()– This method changes the state of an unlocked lock. But if it is already locked, the acquire() method is blocked.
- release()– This method is used to free up the locks when no longer required. It can be called by any state, irrespective of their state.
The parameter that we pass through the Lock class' acquire() method is known as the blocking parameter.
Let's assume we have a thread object T to understand how all these work together. As mentioned before, if we wish to call the acquire() method for T, we will pass the blocking parameter through it.
The blocking parameter can have only two possible values- True or False. It gives rise to two scenarios.
- Scenario 1: When the blocking parameter is set to True, and we call the acquire method as T.acquire(blocking=True), the thread object will acquire the lock T. This can happen only when T has no existing locks. On the flip side, if the thread object T is already locked, the acquire() call is suspended, and it waits until T releases the lock. The moment thread T frees up, the calling thread immediately re-locks it. The calling thread then acquires the released lock.
- Scenario 2: When the blocking parameter is set to False, and T is unlocked, it acquires a lock and returns True. Whereas if T is already locked and the blocking parameter is set to False, the acquire method does not affect T. It simply returns False.
Multithreading Priority Queues
A new queue object can be created using the Queue module. It can hold a specific number of items.
You can use the following methods to control the queue –
- get()– It returns and removes an item from the queue.
- put()– It adds an item to the queue.
- qsize()– It returns the number of items currently in the queue.
- empty()– It returns True or False based on whether the queue is empty or not.
- full()– It returns True if the queue is full otherwise, it returns False.
Check out this article to learn more about Queue in Python.
Threading Objects
- Semaphore
- Timer
- Barrier
1. Semaphore
- The threading. Semaphore is the first on the list.
- It is a counter with certain special properties.
- Since the counting is atomic, you can be sure that the operating system will not switch the thread in the middle of an increment or decrement.
- The counter is incremented when threading.release() is called.
- The counter is decremented when threading.acquire() is called.
2. Time
- The object used to schedule a function to be called after a certain amount of time has passed is the threading.Timer.
- The Timer is started by using the .start() method.
- The Timer is stopped by using the .cancel() method.
3. Barrier
- The object used to keep multiple threads in sync is the threading.Barrier() object.
- You need to specify the number of threads to be synchronized while creating the Barrier.
- The .wait() method is called by each thread on a Barrier.
Advantages of Multithreading in Python
Multithreading in Python has several advantages, making it a popular approach. Let's take a look at some of them –
- Python multithreading enables efficient utilization of the resources as the threads share the data space and memory.
- Multithreading in Python allows the concurrent and parallel occurrence of various tasks.
- It causes a reduction in time consumption or response time, thereby increasing the performance.
You have now reached the end of the tutorial and successfully learned what is multithreading in Python. Now,let's conclude with some important points.
Conclusion
- A program in execution is known as a process.
- A thread is a single sequential flow of execution of tasks of a process so it is also known as thread of execution or thread of control.
- We can make use of the following modules offered by Python –
- Thread Module
- Threading Module
- Multithreading in Python is a popular technique that enables multiple tasks to be executed simultaneously. In simple words, the ability of a processor to execute multiple threads simultaneously is known as multithreading.
- The ability of a processor to execute several unrelated processes simultaneously is known as multiprocessing. These processes do not share any resources.
Check out this Enumerate() in Python to know about .