python LogoMultiprocessing

Multiprocessing is a Python library that enables a program to execute multiple operations in parallel by spawning separate processes. Unlike multithreading, where threads within the same process share the same memory space and are subject to the Global Interpreter Lock (GIL) which limits true parallel execution of CPU-bound tasks, multiprocessing creates independent processes. Each process has its own memory space, system resources, and Python interpreter instance, allowing it to bypass the GIL and achieve true parallelism on multi-core processors.

Key components and concepts of the `multiprocessing` library include:

1. `Process` Class: The fundamental building block for creating new processes. You define a target function that the new process will execute.
2. `Pool` Class: Provides a convenient way to parallelize the execution of a function across multiple input values, often used for "map-reduce" style tasks. It manages a pool of worker processes.
3. Inter-Process Communication (IPC): Since processes have separate memory spaces, explicit mechanisms are needed to communicate and share data. Common IPC methods include:
- `Queue`: A multi-producer, multi-consumer queue for sharing data between processes.
- `Pipe`: A two-way communication channel (duplex) or one-way (simplex) between two processes.
4. Synchronization Primitives: Tools to coordinate the activities of multiple processes and manage access to shared resources:
- `Lock`: Used to protect critical sections of code, ensuring only one process can access shared data at a time.
- `Semaphore`: A counter-based lock that allows a limited number of processes to access a resource concurrently.
5. Shared Memory: While generally more complex and often discouraged for complex data, simple data types or arrays can be shared using `Value` and `Array` objects managed by the `multiprocessing` module's server process.

Advantages:
- True Parallelism: Fully utilizes multiple CPU cores, ideal for CPU-bound tasks.
- Bypasses GIL: Not constrained by Python's Global Interpreter Lock.
- Robustness: Processes are isolated; an error in one process typically does not crash the entire program.

Disadvantages:
- Higher Overhead: Creating and managing processes is more resource-intensive (memory, CPU) than creating threads.
- Complex IPC: Sharing data between processes requires explicit, often more complex, mechanisms compared to shared memory in multithreading.

Example Code

import multiprocessing
import time
import os

def worker_function(name):
    """A function to be executed by a separate process."""
    print(f"Process {name}: Starting work... (PID: {os.getpid()})")
     Simulate some CPU-bound or I/O-bound work
    time.sleep(2)
    print(f"Process {name}: Finishing work. (PID: {os.getpid()})")

if __name__ == "__main__":
    print("Main process: Starting multiprocessing example.\n")
    processes = []
    num_processes = 3

     Create and start processes
    start_time = time.time()
    for i in range(num_processes):
         Create a Process object, specifying the target function and its arguments
        p = multiprocessing.Process(target=worker_function, args=(f"Worker-{i+1}",))
        processes.append(p)
        p.start()  Start the execution of the worker_function in a new process

     Wait for all processes to complete
    for p in processes:
        p.join()  This blocks until the process finishes

    end_time = time.time()
    print(f"\nMain process: All worker processes finished in {end_time - start_time:.2f} seconds.")
    print("This demonstrates concurrent execution: each worker runs in parallel.")

     Example using multiprocessing.Pool
    print("\nMain process: Starting multiprocessing Pool example.")
    with multiprocessing.Pool(processes=num_processes) as pool:
         Map worker_function to a list of inputs
         Each item in the list will be processed by a worker in the pool.
        pool.map(worker_function, [f"PoolWorker-{i+1}" for i in range(num_processes)])
         The 'with' statement ensures the pool is properly closed and joined.
    print("Main process: All pool worker processes finished.")