Python for Data Science Cheat Sheet: Your Ultimate Guide.

Why should we use Python ?


Python is a versatile and powerful programming language that has become increasingly popular in the field of data science. One of the main advantages of using Python for data science is its ease of use and readability, making it accessible to both novice and experienced programmers. Additionally, Python has a vast ecosystem of libraries and tools specifically designed for data science, including NumPy, Pandas, Matplotlib, and Scikit-learn, which greatly simplify data manipulation, analysis, and visualization.

Python DS cheat sheet.



Python's flexibility also allows for seamless integration with other technologies, such as SQL databases and Hadoop clusters. Moreover, Python is an open-source language, meaning it's free to use and has a large and supportive community of developers constantly contributing to its growth and improvement. Finally, Python's popularity in the data science community means that there are abundant resources available, including tutorials, forums, and online courses, making it easy to learn and master.

In last several years Python became a first-class tool for scientific analytical tasks, including the analysis and visualization of large data volumes.

The effectiveness of Python for data science is obvious from the language usage itself, but also due to large and active ecosystem of third-party packages, which are really really helpful for data manipulation, common scientific computing tasks, high-quality visualizations, interactive execution and sharing of code, machine learning and many more use cases.

Python's future in the realm of data science looks promising. As companies increasingly rely on data to inform their decision-making processes, the demand for skilled data scientists and analysts is expected to grow. Python's versatility and ease of use make it a top choice for data science projects. Additionally, the development of new libraries and tools specifically designed for data science, such as TensorFlow and PyTorch, will continue to expand Python's capabilities. With the rise of artificial intelligence and machine learning, Python's role in these fields is expected to become even more significant. Finally, Python's open-source nature and active community of developers ensure that it will continue to evolve and improve, making it a reliable choice for data science projects both now and in the future.

This cheat sheet provides a quick tour of the essential features of the Python language for data scientists willing to use basic to middle level Python techniques.



Numbers manipulations.



    1 + 1

OUT: 2



    1 * 3

OUT: 3



    1 / 2

OUT: 0.5



    2 ** 4

OUT: 16



    6 % 2

OUT: 0



    5 % 2

OUT: 1



    (1 + 3) * (5 + 5)

OUT: 40



Variable Assignment.



# Can not start with number or special characters
name_of_var = 2

x = 2
y = 3

z = x + y
z

OUT: 5


Data types - Strings.



'single quotes'

"double quotes"

" wrap lot's of other quotes"

Printing.



x = 'hello'

print(x)

OUT: hello



num = 12
name = 'Sam'

print('My number is: {one}, and my name is: {two}'.format(one=num,two=name))

OUT: My number is: 12, and my name is: Sam



print('My number is: {}, and my name is: {}'.format(num,name))

OUT: My number is: 12, and my name is: Sam



print(f'My number is: {num}, and my name is: {name}')

OUT: My number is: 12, and my name is: Sam


Data types - Lists.


Lists are used to store multiple items in a single variable.


[1,2,3]

['hi',1,[1,2]]

my_list = ['a','b','c']

my_list.append('d')

my_list

OUT: ['a', 'b', 'c', 'd']



#indexing
my_list[0]

OUT: 'a'



#slicing
my_list[1:]

OUT: ['b', 'c', 'd']



#changing
my_list[0] = 'NEW'

OUT: ['NEW', 'b', 'c', 'd']



#nesting
nest = [1,2,3,[4,5,['target']]]
nest[3][2][0]

OUT: 'target'


Data types - Dictionaries.


Dictionaries are used to store data values in key:value pairs.


d = {'key1':'item1','key2':'item2'}
d['key1']

OUT: 'item1'


Data types - Booleans.


Booleans represent one of two values: True or False..


True

OUT: True



False

OUT: False


Data types - Tuples.


Tuples - collections which are ordered and unchangeable.


t = (1,2,3)
t[0]

OUT: 1


Data types - Sets.


Sets - collections which are unordered, unchangeable*, and unindexed.


{1,2,3}

OUT: {1,2,3}


Comparison Operators.



1 > 2

OUT: False



1 >= 1

OUT: True



1 == 1

OUT: True



'hi' == 'bye'

OUT: False


Logic Operators.



(1 == 2) and (2 == 3)

OUT: False



(1 == 2) or (2 == 3) or (4 == 4)

OUT: True


if,elif, else Statements.



if 1 == 1:
    print('Yep!')

OUT: Yep!



if 1 == 2:
    print('first')
else:
    print('last')

OUT: last



if 1 == 2:
    print('first')
elif 3 == 3:
    print('middle')
else:
    print('Last')

OUT: middle


for Loops.



seq = [1,2,3,4,5]

for item in seq:
    print(item)

OUT: 1
2
3
4
5


while Loops.



i = 1
while i < 5:
    print('i is: {}'.format(i))
    i = i+1

OUT: i is: 1
i is: 2
i is: 3
i is: 4


range().



range(5)

list(range(5))

OUT: [0, 1, 2, 3, 4]


List comprehension.



x = [1,2,3,4]

[item**2 for item in x]

OUT: [1, 4, 9, 16]


Functions.



def my_func(param1='default'):
    """
    Docstring goes here.
    """
    print(param1)

my_func('new param')

OUT: new param


Lambda expressions - a small anonymous function.



x = lambda a, b : a * b
print(x(5, 6))

OUT: 30


Map and filter.



seq = [1,2,3,4,5]

list(map(lambda var: var*2,seq))

OUT: [2, 4, 6, 8, 10]



list(filter(lambda item: item%2 == 0,seq))

OUT: [2, 4]


Useful methods.



st = 'Hello my name is Sam'

st.lower()

OUT: 'hello my name is sam'



st.upper()

OUT: 'HELLO MY NAME IS SAM'



st.split()

OUT: ['hello', 'my', 'name', 'is', 'Sam']



d.keys()

OUT: dict_keys(['key2', 'key1'])



d.items()

OUT: dict_items([('key2', 'item2'), ('key1', 'item1')])



lst = [1,2,3]

lst.pop()

OUT: 3



lst

OUT: [1, 2]



'x' in ['x','y','z']

OUT: True





See also related topics: