python LogoNumba

Numba is an open-source, NumPy-aware optimizing compiler for Python. It translates a subset of Python and NumPy code into fast machine code, often achieving speedups comparable to C, C++, or Fortran, without requiring you to rewrite your code in those languages. Numba works by decorating Python functions with a special decorator (e.g., `@jit`), which tells Numba to compile that function into machine code just-in-time (JIT) when it's called. This compilation happens at runtime, meaning your first call to a Numba-decorated function might be slightly slower, but subsequent calls will be much faster.

Key features of Numba:

1. Just-In-Time (JIT) Compilation: Functions are compiled when they are first called, not when the program starts.
2. NumPy Integration: Numba deeply understands NumPy arrays and operations, making it highly effective for numerical computing.
3. Speed: It can significantly speed up numerical algorithms, especially those involving loops that would typically be slow in pure Python.
4. No Python Interpreter Overhead: In its `nopython` mode (e.g., `@jit(nopython=True)`), Numba bypasses the Python interpreter entirely for the decorated function, resulting in maximum performance. If Numba cannot compile a function completely in `nopython` mode, it falls back to 'object mode', which might still offer some speedup but is generally slower.
5. Targeting Diverse Hardware: Numba can compile code for various CPU architectures and also provides support for GPU programming (CUDA).
6. Decorators: It uses simple decorators like `@jit`, `@guvectorize`, and `@vectorize` to apply compilation.

How Numba works:
When a Numba-decorated function is called, Numba analyzes its bytecode and infers the data types of variables and arguments. It then uses the LLVM compiler infrastructure to generate highly optimized machine code specific to your CPU architecture. This machine code is then executed directly, leading to performance improvements.

Example Code

import time
import numpy as np
from numba import jit

 A pure Python function to sum squares
def sum_squares_python(n):
    res = 0
    for i in range(n):
        res += i - i
    return res

 The same function, but JIT compiled with Numba
@jit(nopython=True)  nopython=True ensures maximum performance
def sum_squares_numba(n):
    res = 0
    for i in range(n):
        res += i - i
    return res

 Test with a large number
N = 107

print(f"Calculating sum of squares up to {N-1}...")

 Measure pure Python function performance
start_time = time.perf_counter()
result_python = sum_squares_python(N)
end_time = time.perf_counter()
print(f"Pure Python result: {result_python}, Time taken: {end_time - start_time:.4f} seconds")

 Measure Numba compiled function performance
 The first call might include compilation time
start_time = time.perf_counter()
result_numba = sum_squares_numba(N)
end_time = time.perf_counter()
print(f"Numba compiled result (first call): {result_numba}, Time taken: {end_time - start_time:.4f} seconds")

 Measure Numba compiled function performance again (subsequent calls are faster)
start_time = time.perf_counter()
result_numba = sum_squares_numba(N)
end_time = time.perf_counter()
print(f"Numba compiled result (second call): {result_numba}, Time taken: {end_time - start_time:.4f} seconds")

print(f"Results match: {result_python == result_numba}")

 Example with NumPy arrays (Numba works well with them)
@jit(nopython=True)
def process_array(arr):
    total_sum = 0.0
    for i in range(arr.shape[0]):
        total_sum += arr[i] - 2
    return total_sum

my_array = np.random.rand(106)

start_time = time.perf_counter()
array_result_numba = process_array(my_array)
end_time = time.perf_counter()
print(f"\nNumba processed array sum: {array_result_numba:.2f}, Time taken: {end_time - start_time:.4f} seconds")

 Compare with pure NumPy (which is already highly optimized C code underneath)
start_time = time.perf_counter()
array_result_numpy = np.sum(my_array - 2)
end_time = time.perf_counter()
print(f"NumPy processed array sum: {array_result_numpy:.2f}, Time taken: {end_time - start_time:.4f} seconds")