Predict with Python Power: Your Regression Journey Starts Here.

Python for Regression.


Python programming language and its libraries combined together form a powerful tool for solving Regression analysis tasks.

Regression study is a predictive modelling method that analyzes the relation between the target or dependent variable and features or independent variables in a dataset.

Regression models in Python.
Regression models in Python meme.

Python Knowledge Base: Make coding great again.
- Updated: 2024-07-26 by Andrey BRATUS, Senior Data Analyst.




    The different types of regression analysis methods are used when the target and independent features described by a linear or non-linear relationships between each other, and the target variable contains continuous values. The regression technique gets used mainly to determine the predictor strength, forecast trends, time series, and sometimes in case of cause & effect relation. Regression analysis is the basic technique to solve the regression problems in machine learning ML using data models. It consists of determining the best fit line, which is a line that passes through all the data points in such a way that distance of the line from each data point is optimal/minimized.


  1. Data Preprocessing Template for Jupiter Notebook or Google Colab.


  2. Importing the libraries


    
    import numpy as np
    import matplotlib.pyplot as plt
    import pandas as pd
    


    Importing the dataset


    
    dataset = pd.read_csv('myData.csv')
    X = dataset.iloc[:, :-1].valuest
    y = dataset.iloc[:, -1].values
    

    Splitting the dataset into the Training set and Test set


    
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
    

    Taking care of missing data


    
    from sklearn.impute import SimpleImputer
    imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
    imputer.fit(X[:, 1:3])
    X[:, 1:3] = imputer.transform(X[:, 1:3])
    

    Encoding categorical data


    Encoding the Independent Variable


    
    from sklearn.compose import ColumnTransformer
    from sklearn.preprocessing import OneHotEncoder
    ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')
    X = np.array(ct.fit_transform(X))
    

    Encoding the Dependent Variable


    
    from sklearn.preprocessing import LabelEncoder
    le = LabelEncoder()
    y = le.fit_transform(y)
    


    Feature Scaling


    
    from sklearn.preprocessing import StandardScaler
    sc = StandardScaler()
    X_train[:, 3:] = sc.fit_transform(X_train[:, 3:])
    X_test[:, 3:] = sc.transform(X_test[:, 3:])
    

    Training the Simple Linear Regression model on the Training set for Jupiter Notebook or Google Colab


    
    from sklearn.linear_model import LinearRegression
    regressor = LinearRegression()
    regressor.fit(X_train, y_train)
    

    Predicting the Test set results


    
    y_pred = regressor.predict(X_test)
    

    Visualising the Training set results


    
    plt.scatter(X_train, y_train, color = 'red')
    plt.plot(X_train, regressor.predict(X_train), color = 'blue')
    plt.title('Target vs Feature (Training set)')
    plt.xlabel('Feature')
    plt.ylabel('Target')
    plt.show()
    

    Visualising the Test set results


    
    plt.scatter(X_test, y_test, color = 'red')
    plt.plot(X_train, regressor.predict(X_train), color = 'blue')
    plt.title('Target vs Feature (Test set)')
    plt.xlabel('Feature')
    plt.ylabel('Target')
    plt.show()
    

    Concatenating predictions and actaul values


    
    np.set_printoptions(precision=2)
    print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))
    

    Training the Polynomial Regression model on the whole dataset


    
    from sklearn.linear_model import LinearRegression
    lin_reg = LinearRegression()
    lin_reg.fit(X, y)
    from sklearn.preprocessing import PolynomialFeatures
    poly_reg = PolynomialFeatures(degree = 4)
    X_poly = poly_reg.fit_transform(X)
    lin_reg_2 = LinearRegression()
    lin_reg_2.fit(X_poly, y)
    

    Predicting a new single result with Linear Regression - single feature


    
    lin_reg.predict([[put_here_any_value]])
    

    Predicting a new single result with Polynomial Regression - single feature


    
    lin_reg_2.predict(poly_reg.fit_transform([[put_here_any_value]]))
    

  3. Support Vector Regression (SVR) in Python.


  4. 
    #Importing the dataset and reshaping target
    dataset = pd.read_csv('my_dataset.csv')
    X = dataset.iloc[:, :-1].values
    y = dataset.iloc[:, -1].values
    y = y.reshape(len(y),1)
    #Feature Scaling
    from sklearn.preprocessing import StandardScaler
    sc_X = StandardScaler()
    sc_y = StandardScaler()
    X = sc_X.fit_transform(X)
    y = sc_y.fit_transform(y)
    #Training the SVR model on the whole datase (without split)
    from sklearn.svm import SVR
    regressor = SVR(kernel = 'rbf')
    regressor.fit(X, y)
    #Predicting a new result (SVR) - single feature
    sc_y.inverse_transform(regressor.predict(sc_X.transform([[put_here_any_value]])))
    #Visualising the SVR results
    plt.scatter(sc_X.inverse_transform(X), sc_y.inverse_transform(y), color = 'red')
    plt.plot(sc_X.inverse_transform(X), sc_y.inverse_transform(regressor.predict(X)), color = 'blue')
    plt.title('Target vs Feature (SVR)')
    plt.xlabel('Feature')
    plt.ylabel('Target')
    plt.show()
    

  5. Decision Tree Regression in Python.


  6. 
    #Importing the dataset
    dataset = pd.read_csv('my_dataset.csv')
    X = dataset.iloc[:, :-1].values
    y = dataset.iloc[:, -1].values
    #Training the Decision Tree Regression model on the whole datase (without split)
    from sklearn.tree import DecisionTreeRegressor
    regressor = DecisionTreeRegressor(random_state = 0)
    regressor.fit(X, y)
    #Predicting a new result (Decision Tree) - single feature
    regressor.predict([[put_here_any_value]])
    #Visualising the Decision Tree Regression results in higher resolution
    X_grid = np.arange(min(X), max(X), 0.01)
    X_grid = X_grid.reshape((len(X_grid), 1))
    plt.scatter(X, y, color = 'red')
    plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
    plt.title('Target vs Feature (Decision Tree Regression)')
    plt.xlabel('Feature')
    plt.ylabel('Target')
    plt.show()
    

  7. Random Forest Regression model in Python.


  8. 
    #The only difference in code from Decision Tree Regression above is Training the Random Forest Regression model,
    # on the whole dataset (without split) in this example
    from sklearn.ensemble import RandomForestRegressor
    regressor = RandomForestRegressor(n_estimators = 10, random_state = 0)
    regressor.fit(X, y)
    

  9. Regression models tips and features.


  10. Model: Linear Regression.
    Pros: Works on any size of dataset, gives informations about relevance of features.
    Cons: Linear Regression Assumptions.

    Model: Polynomial Regression.
    Pros: Works on any size of dataset, works very well on non linear problems.
    Cons: Needed to choose the right polynomial degree for a good bias/variance tradeoff.

    Model: SVR.
    Pros: Easily adaptable, works very well on non linear problems, not biased by outliers.
    Cons: Compulsory to apply feature scaling, difficult interpretations.

    Model: Decision Tree Regression.
    Pros: Interpretability, no need for feature scaling, works on both linear / nonlinear problems.
    Cons: Poor results on too small datasets, overfitting can easily occur.

    Model: Random Forest Regression.
    Pros: Powerful and accurate, good performance on many problems, including non linear.
    Cons: No interpretability, overfitting can easily occur, needed to choose the number of trees.




See also related topics: