The NumPy Python package
The NumPy website has some tutorials such as
- NumPy: the absolute basics for beginners,
- NumPy quickstart and
- NumPy fundamentals
- Tutorial: Linear algebra on n-dimensional arrays is a nice tutorial with an application to image compression.
Why NumPy?
Here's a fragment of code in the Python programming language.
result = 0 for i in range(100): result += i
Here is something similar in C.
int result = 0; for(int i=0; i<100; i++){ result += i; }
In C the types of all variables are declared and fixed. In Python we can do this.
x = 4 x = "four"
But in C, we cannot. This gives an error.
int x = 4; x = "four";
The standard Python interpreter is written in C. A Python int is a C
structure something like this.
struct _longobject { long ob_refcnt; PyTypeObject *ob_type; size_t ob_size; long ob_digit[1]; };
where
ob_refcnt- is a counter of references to the structure
ob_type- encodes the type of the variable
ob_size- specifies the size of the following data members
ob_digit- contains the actual integer value
All of this information is required for Python to have the flexibility that it
offers us. C only requires one long to store the integer.
We could use Python lists to represent arrays of numbers but that would be inefficient because Python lists are heterogeneous.
[23, 4.56, True, "Tuesday"]
Figure 1: NumPy arrays vs Python lists
from numpy import arange from time import perf_counter class timeit: def __init__(self, text: str = None): self.text = text or "Time" def __enter__(self): self.start = perf_counter() return self def __exit__(self, type, value, traceback): self.time = perf_counter() - self.start self.readout = f"{self.text}: {self.time:.2e} seconds" print(self.readout) size = 2**24 list1 = list(range(size)) list2 = list(range(size)) array1 = arange(size) array2 = arange(size) with timeit("Lists") as _t: resultantList = [(a * b) for a, b in zip(list1, list2)] with timeit("Numpy") as _t: resultantArray = array1 * array2
Lists: 8.43e-01 seconds Numpy: 2.47e-02 seconds
Data types
| Data type | Description |
|---|---|
bool_ |
Boolean (True or False) stored as a byte |
int_ |
Default integer type (same as C long; normally either int64 or int32) |
intc |
Identical to C int (normally int32 or int64) |
intp |
Integer used for indexing (same as C ssizet; normally either int32 or int64) |
int8 |
Byte (–128 to 127) |
int16 |
Integer (–32768 to 32767) |
int32 |
Integer (–2147483648 to 2147483647) |
int64 |
Integer (–9223372036854775808 to 9223372036854775807) |
uint8 |
Unsigned integer (0 to 255) |
uint16 |
Unsigned integer (0 to 65535) |
uint32 |
Unsigned integer (0 to 4294967295) |
uint64 |
Unsigned integer (0 to 18446744073709551615) |
float_ |
Shorthand for float64 |
float16 |
Half-precision float: sign bit, 5 bits exponent, 10 bits mantissa |
float32 |
Single-precision float: sign bit, 8 bits exponent, 23 bits mantissa |
float64 |
Double-precision float: sign bit, 11 bits exponent, 52 bits mantissa |
complex_ |
Shorthand for complex128 |
complex64 |
Complex number, represented by two 32-bit floats |
complex128 |
Complex number, represented by two 64-bit floats |
Creating
First we import numpy in the conventional way.
import numpy as np
np.array([1,2,3]) |
1d array |
np.array([(1,2,3),(4,5,6)]) |
2d array |
np.arange(start,stop,step) |
range array |
np.linspace(0,2,9) |
Add evenly spaced values btw interval to array of length |
np.zeros((1,2)) |
Create and array filled with zeros |
np.ones((1,2)) |
Creates an array filled with ones |
np.random.random((5,5)) |
Creates random array |
np.empty((2,2)) |
Creates an empty array |
When creating an array, NumPy will infer the type from the data given but you can specify it if you wish.
print([np.array([1, 2, 3], dtype="uint"), np.array([1.2, 3.4, 5.6], dtype="float")])
[array([1, 2, 3], dtype=uint64), array([1.2, 3.4, 5.6])]
Accessing array properties
array.shape |
Dimensions (Rows,Columns) |
len(array) |
Length of Array |
array.ndim |
Number of Array Dimensions |
array.dtype |
Data Type |
array.astype(type) |
Converts to Data Type |
type(array) |
Type of Array |
Indexing, slicing, selecting
array[i] |
1d array at index i |
array[i,j] |
2d array at index[i][j] |
array[i<4] |
Boolean Indexing |
array[0:3] |
Select items of index 0, 1 and 2 |
array[0:2,1] |
Select items of rows 0 and 1 at column 1 |
array[:1] |
Select items of row 0 (equals array[0:1, :]) |
array[1:2, :] |
Select items of row 1 |
array[ : :-1] |
Reverses array |
array > 5 |
Array of Booleans |
array[array > 5] |
Boolean indexing |
Copying, sorting
np.copy(array) |
Creates copy of array |
other = array.copy() |
Creates deep copy of array |
array.sort() |
Sorts an array |
array.sort(axis=0) |
Sorts axis of array |
Manipulation
Adding or Removing Elements
np.append(a,b) |
Append items to array |
np.insert(array, 1, 2, axis) |
Insert items into array at axis 0 or 1 |
np.resize((2,4)) |
Resize array to shape(2,4) |
np.delete(array,1,axis) |
Deletes items from array |
Combining
np.concatenate((a,b),axis=0) |
Concatenates 2 arrays, adds to end |
np.vstack((a,b)) |
Stack array row-wise |
np.hstack((a,b)) |
Stack array column wise |
Splitting
np.split() |
Split an array into multiple sub-arrays. |
np.array_split(array, 3) |
Split an array in sub-arrays of (nearly) identical size |
np.hsplit(array, 3) |
Split the array horizontally at 3rd index |
Linear algebra
other = ndarray.flatten() |
Flattens a 2d array to 1d |
array = np.transpose(other) |
Transpose array |
array.T |
|
inverse = np.linalg.inv(matrix) |
Inverse of a given matrix |
a @ b |
Matrix multiplication |
Numerical calculations
Arithmetic and Trigonometry
np.add(x,y) |
|
x + y |
Addition |
np.substract(x,y) |
|
x - y |
Subtraction |
np.divide(x,y) |
|
x / y |
Division |
np.multiply(x,y) |
|
x * y |
Multiplication |
np.matmul(x,y) |
|
x @ y |
Matrix Multiplication |
np.sqrt(x) |
Square Root |
np.sin(x) |
Element-wise sine |
np.cos(x) |
Element-wise cosine |
np.log(x) |
Element-wise natural log |
np.dot(x,y) |
Dot product |
np.roots([1,0,-4]) |
Roots of a given polynomial coefficients |
Statistics
np.mean(array) |
Mean |
np.std(array) |
Standard Deviation |
np.median(array) |
Median |
np.corrcoef(array) |
Correlation Coefficient |
array.sum() |
Array-wise sum |
array.min() |
Array-wise minimum value |
array.max(axis=0) |
Maximum value of specified axis |
array.cumsum(axis=0) |
Cumulative sum of specified axis |
Slow loops, fast array computations
np.random.seed(0) def compute_reciprocals(values): output = np.empty(len(values)) for i in range(len(values)): output[i] = 1.0 / values[i] return output values = np.random.randint(1, 10, size=5) print(compute_reciprocals(values))
[0.16666667 1. 0.25 0.25 0.125 ]
big_array = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)
1.11 s ± 19.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
print(1.0 / values)
[0.16666667 1. 0.25 0.25 0.125 ]
%timeit (1.0 / big_array)
1.07 ms ± 8.68 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
The / operator here is a NumPy universal function or ufunc