Linear Regression Machine Learning

Alok Choudhary
4 min readMar 30, 2024

--

Linear regression is a fundamental algorithm in the field of machine learning and statistics. It’s a predictive modeling technique used to predict a continuous outcome variable (Y) based on one or more predictor variables (X).

Linear Regression is a algorithm of Machine Learning and it is a part of Supervised Machine Learning and it use for Regression Problem.

What is Linear Regression ?

Linear regression assumes a linear or straight line relationship between the input variables (X) and the single output variable (Y). In this algorithm aur aim is to find best fit line with minimal error. When there is a single input variable, the method is referred to as simple linear regression. For more than one input variable, it’s known as multiple linear regression.

Linear Regression

The Mathematics Behind Linear Regression

Basic Maths

Y = mx + c

Where:

  • Y is the dependent variable (output/outcome/prediction/estimation)
  • m = slope or coefficient
  • c = Intercept or Error or Residual Error

Cost Function can be represented as:

Cost Function

Where:

  • J(θ) is the cost function.
  • m is the number of instances in the dataset.
  • hθ​(x(i)) is the hypothesis function, representing the predicted value.
  • y(i) is the actual value.
  • The summation symbol ∑ indicates that we sum over all instances in the dataset.

How Does Linear Regression Work?

Linear regression uses the method of least squares to find the best fit line. The least squares method calculates the best-fitting line by minimizing the sum of the squares of the vertical deviations from each data point to the line. Because these deviations are squared, it places a greater weight on outliers.

Real World Applications of Linear Regression

Train → Model → Hypothesis → O/P

Linear regression is used in various fields and industries. Some of the applications include:

  • Economics: Linear regression can be used to observe economic trends over time, predict future trends, etc.
  • Finance: It’s used in predicting future stock prices, financial forecasting, risk assessment, etc.
  • Healthcare: Linear regression can be used to predict disease trends, health outcomes, etc.
  • Machine Learning: It’s used in predictive modeling, forecasting, trend analysis, etc.

Outline

  1. Start with some minimum θ ​.
  2. Keep Changing θ to reduce J(θ0, θ1) until we reach global minima.
  3. Convergence Theorems
Convergence Theorems

Linear Regression Practical Implementation in Python

# Import Library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
df = fetch_openml(name='boston')
df
dataset=pd.DataFrame(df.data)
dataset
# Independent features and dependent features
X=dataset
y=df.target
# train test split
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.30, random_state=42)
X_train
# standardizing the dataset
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# Transformation
X_train=scaler.fit_transform(X_train)
X_test=scaler.transform(X_test)
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score #cross validation
regression=LinearRegression()
regression.fit(X_train,y_train)
mse=cross_val_score(regression,X_train,y_train,scoring='neg_mean_squared_error',cv=10)
mse
np.mean(mse)
# prediction
reg_pred=regression.predict(X_test)
reg_pred
# Plotting
import seaborn as sns
sns.displot(reg_pred-y_test,kind='kde')
Plotting Output
# r2_score
from sklearn.metrics import r2_score
score=r2_score(reg_pred,y_test)
score
print("Thank-You!Happy Learning!")

Advantages:

  1. Simplicity
  2. Efficiency
  3. Performance
  4. Reducing Overfitting

Disadvantages:

  1. Prone to Underfitting
  2. Sensitive to outliers
  3. Limited to Single Output and numerical data
  4. Overfitting in high-dimensional space

Conclusion

Linear regression stands as a fundamental and interpretable algorithm, offering insights into the relationship between variables. While it excels in capturing linear trends efficiently and is easy to implement, its limitations include the inability to handle complex, nonlinear relationships and susceptibility to outliers. Nonetheless, when applied judiciously with an understanding of its assumptions and constraints, linear regression remains a valuable tool for predictive modeling and inference in various domains.

Join me in exploring these pillars of technological evolution. Let’s unravel the mysteries, debunk the myths, and harness the power of data to shape our future. Follow my journey, engage with the visuals, and let’s decode the future, one pixel at a time.

--

--

Alok Choudhary
Alok Choudhary

Written by Alok Choudhary

My Self Alok Choudhary, a Data Science scholar at IIT Patna, is pioneering in AI, ML, DS, and DL, crafting algorithms that redefine the tech landscape.

No responses yet