Why NumPy is important ?
To start data science journey with Python we must be armed with Numpy library. NumPy (or Numpy) is a Linear Algebra Library is a fundamental package for data analysis in Python. It provides an efficient way to work with large arrays and matrices of numerical data. One of the main advantages of using NumPy for data science is its speed and efficiency. NumPy is written in C and allows for vectorized operations, making it much faster than traditional Python loops.
Python Knowledge Base: Make coding great again.
- Updated:
2024-12-20 by Andrey BRATUS, Senior Data Analyst.
Numpy Input/Output:
Creating Arrays:
Array Attributes and Methods:
NumPy Indexing and Selection:
NumPy Operations:
Universal Array Functions:
Additionally, NumPy has a vast array of mathematical functions and operations, including linear algebra and Fourier transforms, making it a powerful tool for data analysis. NumPy also integrates seamlessly with other Python libraries, such as Pandas and Matplotlib, to provide a complete data analysis toolkit.
NumPy is also highly adaptable, being used in a wide range of fields beyond data science, such as physics, engineering, and finance. The package is constantly evolving, with new features and functions being added regularly. NumPy's open-source nature and large community of developers ensure its continued growth and improvement. NumPy is also highly compatible with other data science tools, such as Jupyter Notebooks and Anaconda, making it a popular choice for data science projects.
Looking into the future, NumPy is expected to continue to play a critical role in data science and analytics. As the amount of data generated by businesses and organizations continues to grow, the need for efficient and scalable data analysis tools will only increase. NumPy's speed and efficiency make it a top choice for handling large datasets and performing complex mathematical operations. Additionally, the development of new libraries and tools specifically designed for data science, such as TensorFlow and PyTorch, will continue to expand NumPy's capabilities. Overall, NumPy's adaptability, efficiency, and versatility make it a reliable choice for data science projects both now and in the future.
It is highly recommended you install Python using the Anaconda distribution to make sure all underlying dependencies all sync up with the use of a conda install.
Once you've installed NumPy you can import it as a library:
conda install numpy
import numpy as np
a=np.arange(0,10)
#Save an array to a binary file in NumPy ``.npy`` format.
np.save('my_array',a)
#Load arrays or pickled objects from ``.npy``, ``.npz`` or pickled files.
np.load('my_array.npy')
#Load data from a text file.
np.loadtxt('myfile.txt')
# Load data from a text file, with missing values handled as specified.
np.genfromtxt('myfile.csv', delimeter=',')
# Save an array to a text file.
np.savetxt('myarray.txt', a, delimeter=' ')
Creating 1D array.
d1 = np.array([2,3,5])
d1
OUT: array([2, 3, 5])
Creating 2D array.
d2 = np.array([(2.2,3,5), (1.4,6,5)], dtype=float)
d2
OUT: array([[2.2, 3. , 5. ],
[1.4, 6. , 5. ]])
Creating 3D array.
d3 = np.array([[(2.2,3,5), (1.4,6,5)], [(2.2,3,5), (1.4,6,5)]], dtype=float)
d3
OUT: array([[[2.2, 3. , 5. ],
[1.4, 6. , 5. ]],
[[2.2, 3. , 5. ],
[1.4, 6. , 5. ]]])
Creating an array of zeros.
z = np.zeros((3,3))
z
OUT: array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
Creating an array of ones.
one = np.ones((3,3))
one
OUT: array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
Creating an array of evenely spaced values with step.
evenly = np.arange(5,50,5)
evenly
OUT: array([ 5, 10, 15, 20, 25, 30, 35, 40, 45])
Creating an array of evenely spaced values defining number of samples.
evenlyn = np.linspace(0,40,5)
evenlyn
OUT: array([ 0., 10., 20., 30., 40.])
Creating identity (eye) matrix.
np.eye(4)
OUT: array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])
Creating constant array.
np.full((2,2), 3)
OUT: array([[3, 3],
[3, 3]])
Creating empty array of given shape without initializing entries.
empty1=np.empty([2,2])
empty1
OUT: array([[2.49707101e-316, 7.22619400e+165],
[2.57217278e+151, 2.90154876e+183]])
Creating an array of the given shape and populate it with random samples from a uniform distribution over [0, 1).
np.random.rand(2)
OUT: array([ 0.11570539, 0.35279769])
Returns a sample (or samples) from the "standard normal" distribution. Unlike rand which is uniform:
np.random.randn(5,5)
OUT: array([[ 0.70154515, 0.22441999, 1.33563186, 0.82872577, -0.28247509],
[ 0.64489788, 0.61815094, -0.81693168, -0.30102424, -0.29030574],
[ 0.8695976 , 0.413755 , 2.20047208, 0.17955692, -0.82159344],
[ 0.59264235, 1.29869894, -1.18870241, 0.11590888, -0.09181687],
[-0.96924265, -1.62888685, -2.05787102, -0.29705576, 0.68915542]])
Returns a random integers from low (inclusive) to high (exclusive).
np.random.randint(1,100,10)
OUT: np.random.randint(1,100,10)
Lets organize initial arrays first.
arr = np.arange(25)
ranarr = np.random.randint(0,50,10)
arr
OUT: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24])
ranarr
OUT: array([10, 12, 41, 17, 49, 2, 46, 3, 19, 39])
Reshape - returns an array containing the same data with a new shape.
arr.reshape(5,5)
OUT: array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
max,min,argmax,argmin
ranarr.max()
OUT: 49
ranarr.argmax()
OUT: 8
ranarr.min()
OUT: 4
ranarr.argmin()
OUT: 1
Shape is an attribute that arrays have (not a method):
# Vector
arr.shape
OUT: (25,)
arr.reshape(1,25)
OUT: array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24]])
arr.reshape(1,25).shape
OUT: (1, 25)
arr.reshape(25,1)
OUT: array([[ 0],
[ 1],
[ 2],
[ 3],
[ 4],
[ 5],
[ 6],
[ 7],
[ 8],
[ 9],
[10],
[11],
[12],
[13],
[14],
[15],
[16],
[17],
[18],
[19],
[20],
[21],
[22],
[23],
[24]])
arr.reshape(25,1).shape
OUT: (25, 1)
You can grab the data type of the object in the array:
arr.dtype
OUT: dtype('int64')
Lets organize initial arrays first.
arr = np.arange(0,11)
arr
OUT: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
The simplest way to pick one or some elements of an array looks very similar to python lists:
arr[8]
OUT: 8
arr[1:5]
OUT: array([1, 2, 3, 4])
arr[0:5]
OUT: array([0, 1, 2, 3, 4])
Broadcasting
arr[0:5]=100
arr
OUT: array([100, 100, 100, 100, 100, 5, 6, 7, 8, 9, 10])
Indexing a 2D array (matrices).
arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))
arr_2d
OUT: array([[ 5, 10, 15],
[20, 25, 30],
[35, 40, 45]])
#Indexing row
arr_2d[1]
OUT: array([20, 25, 30])
# Format is arr_2d[row][col] or arr_2d[row,col]
# Getting individual element value
arr_2d[1][0]
OUT: 20
# Getting individual element value
arr_2d[1,0]
OUT: 20
# 2D array slicing
#Shape (2,2) from top right corner
arr_2d[:2,1:]
OUT: array([[10, 15],
[25, 30]])
#Shape bottom row
arr_2d[2]
OUT: array([35, 40, 45])
#Shape bottom row
arr_2d[2,:]
OUT: array([35, 40, 45])
Fancy Indexing.
#Set up matrix
arr2d = np.zeros((10,10))
#Length of array
arr_length = arr2d.shape[1]
#Set up array
for i in range(arr_length):
arr2d[i] = i
arr2d
OUT: array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[ 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
[ 4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
[ 5., 5., 5., 5., 5., 5., 5., 5., 5., 5.],
[ 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
[ 7., 7., 7., 7., 7., 7., 7., 7., 7., 7.],
[ 8., 8., 8., 8., 8., 8., 8., 8., 8., 8.],
[ 9., 9., 9., 9., 9., 9., 9., 9., 9., 9.]])
arr2d[[2,4,6,8]]
OUT: array([[ 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[ 4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
[ 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
[ 8., 8., 8., 8., 8., 8., 8., 8., 8., 8.]])
#Allows in any order
arr2d[[6,4,2,7]]
OUT: array([[ 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
[ 4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
[ 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[ 7., 7., 7., 7., 7., 7., 7., 7., 7., 7.]])
Selection.
arr = np.arange(1,11)
arr
OUT: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
arr > 4
OUT: array([False, False, False, False, True, True, True, True, True, True], dtype=bool)
bool_arr = arr>4
bool_arr
OUT: array([False, False, False, False, True, True, True, True, True, True], dtype=bool)
arr[bool_arr]
OUT: array([ 5, 6, 7, 8, 9, 10])
arr[arr>2]
OUT: array([ 3, 4, 5, 6, 7, 8, 9, 10])
You can easily perform array with array arithmetic, or scalar with array arithmetic.
arr = np.arange(0,10)
arr + arr
OUT: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
arr * arr
OUT: array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
arr - arr
OUT: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
# Warning on division by zero, an error replaced with nan
arr/arr
OUT: /Users/user/ipykernel/__main__.py:1: RuntimeWarning: invalid value encountered in true_divide
if __name__ == '__main__':
array([ nan, 1., 1., 1., 1., 1., 1., 1., 1., 1.])
# The same in case of infinity
1/arr
OUT: /Users/user/ipykernel/__main__.py:1: RuntimeWarning: invalid value encountered in true_divide
if __name__ == '__main__':
array([ inf, 1. , 0.5 , 0.33333333, 0.25 ,
0.2 , 0.16666667, 0.14285714, 0.125 , 0.11111111])
arr**3
OUT: array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729])
Taking Square Roots.
np.sqrt(arr)
OUT: array([ 0. , 1. , 1.41421356, 1.73205081, 2. ,
2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ])
Calcualting exponential (e^).
np.exp(arr)
OUT: array([ 1.00000000e+00, 2.71828183e+00, 7.38905610e+00,
2.00855369e+01, 5.45981500e+01, 1.48413159e+02,
4.03428793e+02, 1.09663316e+03, 2.98095799e+03,
8.10308393e+03])
np.max(arr)
#same as arr.max()
OUT: 9
np.sin(arr)
OUT: array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ,
-0.95892427, -0.2794155 , 0.6569866 , 0.98935825, 0.41211849])
np.log(arr)
OUT: /Users/user/ipykernel/__main__.py:1: RuntimeWarning: divide by zero encountered in log
if __name__ == '__main__':
array([ -inf, 0. , 0.69314718, 1.09861229, 1.38629436,
1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458])