Data Science Starts with NumPy.

Why NumPy is important ?


To start data science journey with Python we must be armed with Numpy library. NumPy (or Numpy) is a Linear Algebra Library is a fundamental package for data analysis in Python. It provides an efficient way to work with large arrays and matrices of numerical data. One of the main advantages of using NumPy for data science is its speed and efficiency. NumPy is written in C and allows for vectorized operations, making it much faster than traditional Python loops.

NumPy DS cheat sheet.



Additionally, NumPy has a vast array of mathematical functions and operations, including linear algebra and Fourier transforms, making it a powerful tool for data analysis. NumPy also integrates seamlessly with other Python libraries, such as Pandas and Matplotlib, to provide a complete data analysis toolkit.

NumPy is also highly adaptable, being used in a wide range of fields beyond data science, such as physics, engineering, and finance. The package is constantly evolving, with new features and functions being added regularly. NumPy's open-source nature and large community of developers ensure its continued growth and improvement. NumPy is also highly compatible with other data science tools, such as Jupyter Notebooks and Anaconda, making it a popular choice for data science projects.

Looking into the future, NumPy is expected to continue to play a critical role in data science and analytics. As the amount of data generated by businesses and organizations continues to grow, the need for efficient and scalable data analysis tools will only increase. NumPy's speed and efficiency make it a top choice for handling large datasets and performing complex mathematical operations. Additionally, the development of new libraries and tools specifically designed for data science, such as TensorFlow and PyTorch, will continue to expand NumPy's capabilities. Overall, NumPy's adaptability, efficiency, and versatility make it a reliable choice for data science projects both now and in the future.

It is highly recommended you install Python using the Anaconda distribution to make sure all underlying dependencies all sync up with the use of a conda install. Once you've installed NumPy you can import it as a library:



conda install numpy

import numpy as np


Numpy Input/Output:



a=np.arange(0,10)
#Save an array to a binary file in NumPy ``.npy`` format.
np.save('my_array',a)
#Load arrays or pickled objects from ``.npy``, ``.npz`` or pickled files.
np.load('my_array.npy')
#Load data from a text file.
np.loadtxt('myfile.txt')
# Load data from a text file, with missing values handled as specified.
np.genfromtxt('myfile.csv', delimeter=',')
# Save an array to a text file.
np.savetxt('myarray.txt', a, delimeter=' ')

Creating Arrays:


Creating 1D array.


d1 = np.array([2,3,5])
d1

OUT: array([2, 3, 5])


Creating 2D array.


d2 = np.array([(2.2,3,5), (1.4,6,5)], dtype=float)
d2

OUT: array([[2.2, 3. , 5. ],
[1.4, 6. , 5. ]])


Creating 3D array.


d3 = np.array([[(2.2,3,5), (1.4,6,5)], [(2.2,3,5), (1.4,6,5)]], dtype=float)
d3

OUT: array([[[2.2, 3. , 5. ],
[1.4, 6. , 5. ]],

[[2.2, 3. , 5. ],
[1.4, 6. , 5. ]]])



Creating an array of zeros.


z = np.zeros((3,3))
z 

OUT: array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])


Creating an array of ones.


one = np.ones((3,3))
one

OUT: array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])


Creating an array of evenely spaced values with step.


evenly = np.arange(5,50,5)
evenly

OUT: array([ 5, 10, 15, 20, 25, 30, 35, 40, 45])


Creating an array of evenely spaced values defining number of samples.


evenlyn = np.linspace(0,40,5)
evenlyn

OUT: array([ 0., 10., 20., 30., 40.])


Creating identity (eye) matrix.


np.eye(4)

OUT: array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])


Creating constant array.


np.full((2,2), 3)

OUT: array([[3, 3],
[3, 3]])


Creating empty array of given shape without initializing entries.


empty1=np.empty([2,2])
empty1

OUT: array([[2.49707101e-316, 7.22619400e+165],
[2.57217278e+151, 2.90154876e+183]])


Creating an array of the given shape and populate it with random samples from a uniform distribution over [0, 1).


np.random.rand(2)

OUT: array([ 0.11570539, 0.35279769])


Returns a sample (or samples) from the "standard normal" distribution. Unlike rand which is uniform:


np.random.randn(5,5)

OUT: array([[ 0.70154515, 0.22441999, 1.33563186, 0.82872577, -0.28247509],
[ 0.64489788, 0.61815094, -0.81693168, -0.30102424, -0.29030574],
[ 0.8695976 , 0.413755 , 2.20047208, 0.17955692, -0.82159344],
[ 0.59264235, 1.29869894, -1.18870241, 0.11590888, -0.09181687],
[-0.96924265, -1.62888685, -2.05787102, -0.29705576, 0.68915542]])


Returns a random integers from low (inclusive) to high (exclusive).


np.random.randint(1,100,10)

OUT: np.random.randint(1,100,10)


Array Attributes and Methods:


Lets organize initial arrays first.


arr = np.arange(25)
ranarr = np.random.randint(0,50,10)
arr

OUT: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24])



ranarr

OUT: array([10, 12, 41, 17, 49, 2, 46, 3, 19, 39])


Reshape - returns an array containing the same data with a new shape.


arr.reshape(5,5)

OUT: array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])


max,min,argmax,argmin


ranarr.max()

OUT: 49



ranarr.argmax()

OUT: 8



ranarr.min()

OUT: 4



ranarr.argmin()

OUT: 1


Shape is an attribute that arrays have (not a method):


# Vector
arr.shape

OUT: (25,)



arr.reshape(1,25)

OUT: array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24]])



arr.reshape(1,25).shape

OUT: (1, 25)



arr.reshape(25,1)

OUT: array([[ 0],
[ 1],
[ 2],
[ 3],
[ 4],
[ 5],
[ 6],
[ 7],
[ 8],
[ 9],
[10],
[11],
[12],
[13],
[14],
[15],
[16],
[17],
[18],
[19],
[20],
[21],
[22],
[23],
[24]])



arr.reshape(25,1).shape

OUT: (25, 1)


You can grab the data type of the object in the array:



arr.dtype

OUT: dtype('int64')


NumPy Indexing and Selection:


Lets organize initial arrays first.


arr = np.arange(0,11)
arr

OUT: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])


The simplest way to pick one or some elements of an array looks very similar to python lists:


arr[8]

OUT: 8



arr[1:5]

OUT: array([1, 2, 3, 4])



arr[0:5]

OUT: array([0, 1, 2, 3, 4])


Broadcasting


arr[0:5]=100
arr

OUT: array([100, 100, 100, 100, 100, 5, 6, 7, 8, 9, 10])


Indexing a 2D array (matrices).


arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))
arr_2d

OUT: array([[ 5, 10, 15],
[20, 25, 30],
[35, 40, 45]])



#Indexing row
arr_2d[1]

OUT: array([20, 25, 30])



# Format is arr_2d[row][col] or arr_2d[row,col]
# Getting individual element value
arr_2d[1][0]

OUT: 20



# Getting individual element value
arr_2d[1,0]

OUT: 20



# 2D array slicing
#Shape (2,2) from top right corner
arr_2d[:2,1:]

OUT: array([[10, 15],
[25, 30]])



#Shape bottom row
arr_2d[2]

OUT: array([35, 40, 45])



#Shape bottom row
arr_2d[2,:]

OUT: array([35, 40, 45])


Fancy Indexing.


#Set up matrix
arr2d = np.zeros((10,10))

#Length of array
arr_length = arr2d.shape[1]

#Set up array

for i in range(arr_length):
    arr2d[i] = i
    
arr2d

OUT: array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[ 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
[ 4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
[ 5., 5., 5., 5., 5., 5., 5., 5., 5., 5.],
[ 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
[ 7., 7., 7., 7., 7., 7., 7., 7., 7., 7.],
[ 8., 8., 8., 8., 8., 8., 8., 8., 8., 8.],
[ 9., 9., 9., 9., 9., 9., 9., 9., 9., 9.]])



arr2d[[2,4,6,8]]

OUT: array([[ 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[ 4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
[ 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
[ 8., 8., 8., 8., 8., 8., 8., 8., 8., 8.]])



#Allows in any order
arr2d[[6,4,2,7]]

OUT: array([[ 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
[ 4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
[ 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[ 7., 7., 7., 7., 7., 7., 7., 7., 7., 7.]])


Selection.


arr = np.arange(1,11)
arr

OUT: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])



arr > 4

OUT: array([False, False, False, False, True, True, True, True, True, True], dtype=bool)



bool_arr = arr>4
bool_arr

OUT: array([False, False, False, False, True, True, True, True, True, True], dtype=bool)



arr[bool_arr]

OUT: array([ 5, 6, 7, 8, 9, 10])



arr[arr>2]

OUT: array([ 3, 4, 5, 6, 7, 8, 9, 10])


NumPy Operations:


You can easily perform array with array arithmetic, or scalar with array arithmetic.


arr = np.arange(0,10)
arr + arr

OUT: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])



arr * arr

OUT: array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])



arr - arr

OUT: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])



# Warning on division by zero, an error replaced with nan
arr/arr

OUT: /Users/user/ipykernel/__main__.py:1: RuntimeWarning: invalid value encountered in true_divide
if __name__ == '__main__':
array([ nan, 1., 1., 1., 1., 1., 1., 1., 1., 1.])



# The same in case of infinity 
1/arr

OUT: /Users/user/ipykernel/__main__.py:1: RuntimeWarning: invalid value encountered in true_divide
if __name__ == '__main__':
array([ inf, 1. , 0.5 , 0.33333333, 0.25 ,
0.2 , 0.16666667, 0.14285714, 0.125 , 0.11111111])



arr**3

OUT: array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729])


Universal Array Functions:


Taking Square Roots.


np.sqrt(arr)

OUT: array([ 0. , 1. , 1.41421356, 1.73205081, 2. ,
2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ])


Calcualting exponential (e^).


np.exp(arr)

OUT: array([ 1.00000000e+00, 2.71828183e+00, 7.38905610e+00,
2.00855369e+01, 5.45981500e+01, 1.48413159e+02,
4.03428793e+02, 1.09663316e+03, 2.98095799e+03, 8.10308393e+03])



np.max(arr) 
#same as arr.max()

OUT: 9



np.sin(arr)

OUT: array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ,
-0.95892427, -0.2794155 , 0.6569866 , 0.98935825, 0.41211849])



np.log(arr)

OUT: /Users/user/ipykernel/__main__.py:1: RuntimeWarning: divide by zero encountered in log
if __name__ == '__main__':
array([ -inf, 0. , 0.69314718, 1.09861229, 1.38629436,
1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458])





See also related topics: