Why NumPy is important ?


To start data science journey with Python we must be armed with Numpy library. NumPy (or Numpy) is a Linear Algebra Library for Python, the reason it is essential for Data Science is that almost all of the libraries in the PyData Ecosystem rely on NumPy as one of their main building blocks. Numpy is incredibly fast, as it has bindings to C libraries.

NumPy DS cheat sheet.


It is highly recommended you install Python using the Anaconda distribution to make sure all underlying dependencies all sync up with the use of a conda install. Once you've installed NumPy you can import it as a library:




conda install numpy

import numpy as np

Numpy Input/Output:



a=np.arange(0,10)
#Save an array to a binary file in NumPy ``.npy`` format.
np.save('my_array',a)
#Load arrays or pickled objects from ``.npy``, ``.npz`` or pickled files.
np.load('my_array.npy')
#Load data from a text file.
np.loadtxt('myfile.txt')
# Load data from a text file, with missing values handled as specified.
np.genfromtxt('myfile.csv', delimeter=',')
# Save an array to a text file.
np.savetxt('myarray.txt', a, delimeter=' ')

Creating Arrays:


Creating 1D array.


d1 = np.array([2,3,5])
d1

OUT: array([2, 3, 5])


Creating 2D array.


d2 = np.array([(2.2,3,5), (1.4,6,5)], dtype=float)
d2

OUT: array([[2.2, 3. , 5. ],
[1.4, 6. , 5. ]])


Creating 3D array.


d3 = np.array([[(2.2,3,5), (1.4,6,5)], [(2.2,3,5), (1.4,6,5)]], dtype=float)
d3

OUT: array([[[2.2, 3. , 5. ],
[1.4, 6. , 5. ]],

[[2.2, 3. , 5. ],
[1.4, 6. , 5. ]]])


Creating an array of zeros.


z = np.zeros((3,3))
z 

OUT: array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])


Creating an array of ones.


one = np.ones((3,3))
one

OUT: array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])


Creating an array of evenely spaced values with step.


evenly = np.arange(5,50,5)
evenly

OUT: array([ 5, 10, 15, 20, 25, 30, 35, 40, 45])


Creating an array of evenely spaced values defining number of samples.


evenlyn = np.linspace(0,40,5)
evenlyn

OUT: array([ 0., 10., 20., 30., 40.])


Creating identity (eye) matrix.


np.eye(4)

OUT: array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])


Creating constant array.


np.full((2,2), 3)

OUT: array([[3, 3],
[3, 3]])


Creating empty array of given shape without initializing entries.


empty1=np.empty([2,2])
empty1

OUT: array([[2.49707101e-316, 7.22619400e+165],
[2.57217278e+151, 2.90154876e+183]])


Creating an array of the given shape and populate it with random samples from a uniform distribution over [0, 1).


np.random.rand(2)

OUT: array([ 0.11570539, 0.35279769])


Returns a sample (or samples) from the "standard normal" distribution. Unlike rand which is uniform:


np.random.randn(5,5)

OUT: array([[ 0.70154515, 0.22441999, 1.33563186, 0.82872577, -0.28247509],
[ 0.64489788, 0.61815094, -0.81693168, -0.30102424, -0.29030574],
[ 0.8695976 , 0.413755 , 2.20047208, 0.17955692, -0.82159344],
[ 0.59264235, 1.29869894, -1.18870241, 0.11590888, -0.09181687],
[-0.96924265, -1.62888685, -2.05787102, -0.29705576, 0.68915542]])


Returns a random integers from low (inclusive) to high (exclusive).


np.random.randint(1,100,10)

OUT: np.random.randint(1,100,10)


Array Attributes and Methods:


Lets organize initial arrays first.


arr = np.arange(25)
ranarr = np.random.randint(0,50,10)
arr

OUT: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24])



ranarr

OUT: array([10, 12, 41, 17, 49, 2, 46, 3, 19, 39])


Reshape - returns an array containing the same data with a new shape.


arr.reshape(5,5)

OUT: array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])


max,min,argmax,argmin


ranarr.max()

OUT: 49



ranarr.argmax()

OUT: 8



ranarr.min()

OUT: 4



ranarr.argmin()

OUT: 1


Shape is an attribute that arrays have (not a method):


# Vector
arr.shape

OUT: (25,)



arr.reshape(1,25)

OUT: array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24]])



arr.reshape(1,25).shape

OUT: (1, 25)



arr.reshape(25,1)

OUT: array([[ 0],
[ 1],
[ 2],
[ 3],
[ 4],
[ 5],
[ 6],
[ 7],
[ 8],
[ 9],
[10],
[11],
[12],
[13],
[14],
[15],
[16],
[17],
[18],
[19],
[20],
[21],
[22],
[23],
[24]])



arr.reshape(25,1).shape

OUT: (25, 1)


You can grab the data type of the object in the array:



arr.dtype

OUT: dtype('int64')


NumPy Indexing and Selection:


Lets organize initial arrays first.


arr = np.arange(0,11)
arr

OUT: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])


The simplest way to pick one or some elements of an array looks very similar to python lists:


arr[8]

OUT: 8



arr[1:5]

OUT: array([1, 2, 3, 4])



arr[0:5]

OUT: array([0, 1, 2, 3, 4])


Broadcasting


arr[0:5]=100
arr

OUT: array([100, 100, 100, 100, 100, 5, 6, 7, 8, 9, 10])


Indexing a 2D array (matrices).


arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))
arr_2d

OUT: array([[ 5, 10, 15],
[20, 25, 30],
[35, 40, 45]])



#Indexing row
arr_2d[1]

OUT: array([20, 25, 30])



# Format is arr_2d[row][col] or arr_2d[row,col]
# Getting individual element value
arr_2d[1][0]

OUT: 20



# Getting individual element value
arr_2d[1,0]

OUT: 20



# 2D array slicing
#Shape (2,2) from top right corner
arr_2d[:2,1:]

OUT: array([[10, 15],
[25, 30]])



#Shape bottom row
arr_2d[2]

OUT: array([35, 40, 45])



#Shape bottom row
arr_2d[2,:]

OUT: array([35, 40, 45])


Fancy Indexing.


#Set up matrix
arr2d = np.zeros((10,10))

#Length of array
arr_length = arr2d.shape[1]

#Set up array

for i in range(arr_length):
    arr2d[i] = i
    
arr2d

OUT: array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[ 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
[ 4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
[ 5., 5., 5., 5., 5., 5., 5., 5., 5., 5.],
[ 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
[ 7., 7., 7., 7., 7., 7., 7., 7., 7., 7.],
[ 8., 8., 8., 8., 8., 8., 8., 8., 8., 8.],
[ 9., 9., 9., 9., 9., 9., 9., 9., 9., 9.]])



arr2d[[2,4,6,8]]

OUT: array([[ 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[ 4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
[ 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
[ 8., 8., 8., 8., 8., 8., 8., 8., 8., 8.]])



#Allows in any order
arr2d[[6,4,2,7]]

OUT: array([[ 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
[ 4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
[ 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[ 7., 7., 7., 7., 7., 7., 7., 7., 7., 7.]])


Selection.


arr = np.arange(1,11)
arr

OUT: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])



arr > 4

OUT: array([False, False, False, False, True, True, True, True, True, True], dtype=bool)



bool_arr = arr>4
bool_arr

OUT: array([False, False, False, False, True, True, True, True, True, True], dtype=bool)



arr[bool_arr]

OUT: array([ 5, 6, 7, 8, 9, 10])



arr[arr>2]

OUT: array([ 3, 4, 5, 6, 7, 8, 9, 10])


NumPy Operations:


You can easily perform array with array arithmetic, or scalar with array arithmetic.


arr = np.arange(0,10)
arr + arr

OUT: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])



arr * arr

OUT: array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])



arr - arr

OUT: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])



# Warning on division by zero, an error replaced with nan
arr/arr

OUT: /Users/user/ipykernel/__main__.py:1: RuntimeWarning: invalid value encountered in true_divide
if __name__ == '__main__':
array([ nan, 1., 1., 1., 1., 1., 1., 1., 1., 1.])



# The same in case of infinity 
1/arr

OUT: /Users/user/ipykernel/__main__.py:1: RuntimeWarning: invalid value encountered in true_divide
if __name__ == '__main__':
array([ inf, 1. , 0.5 , 0.33333333, 0.25 ,
0.2 , 0.16666667, 0.14285714, 0.125 , 0.11111111])



arr**3

OUT: array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729])


Universal Array Functions:


Taking Square Roots.


np.sqrt(arr)

OUT: array([ 0. , 1. , 1.41421356, 1.73205081, 2. ,
2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ])


Calcualting exponential (e^).


np.exp(arr)

OUT: array([ 1.00000000e+00, 2.71828183e+00, 7.38905610e+00,
2.00855369e+01, 5.45981500e+01, 1.48413159e+02,
4.03428793e+02, 1.09663316e+03, 2.98095799e+03, 8.10308393e+03])



np.max(arr) 
#same as arr.max()

OUT: 9



np.sin(arr)

OUT: array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ,
-0.95892427, -0.2794155 , 0.6569866 , 0.98935825, 0.41211849])



np.log(arr)

OUT: /Users/user/ipykernel/__main__.py:1: RuntimeWarning: divide by zero encountered in log
if __name__ == '__main__':
array([ -inf, 0. , 0.69314718, 1.09861229, 1.38629436,
1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458])




See also related topics: