Logistic Regression Machine Learning

3 min readApr 30, 2024

Logistic Regression with mathematical intuition:

Logistic regression is a powerful statistical method commonly employed in scenarios with binary dependent variables. Its primary function is to predict the probability that a given input belongs to one group rather than another, or to other groups, under various circumstances.

Understand Dataset

Logistic regression finds application in machine learning, particularly in binary classification problems where the target variable has two classes at most. This article provides an overview of logistic regression, focusing on its key components: the sigmoid function and cost function, and also explores its derivation from linear regression.

Sigmoid Function:

In logistic regression, we need a function that can map any real-valued number to the range [0, 1] to represent probabilities. The sigmoid function (also called the logistic function) serves this purpose:

Where:

(sigma(g)) is the output between 0 and 1 (probability estimate)
( z ) is the input to the function (linear combination of weights and features)
( e ) is the base of the natural logarithm

Hypothesis Function:

The hypothesis function in logistic regression is defined as:

This function predicts the probability that the output is 1 given input 𝑥 and parameters 𝜃.

Linear Regression vs Logistic Regression

Linear Regression

Predicts continuous outcomes.
Output is a real number.
To model the relationship between the input values and output value by using linear function.

Logistic Regression

To address binary classification problems.
Output can be converted to a probability between 0 and 1.
The output of a linear combination of input features is transformed into probabilities using the logistic (sigmoid) function.

Cost Function

The cost function in logistic regression is often referred to as the log loss or cross-entropy loss function. It measures the discrepancy between the predicted probabilities and the actual classes in the training data. The formula for the cost function is:

where:

𝐽(𝜃) is the cost function.
m is the number of training examples.
𝑥(𝑖) is the feature vector of the i-th training example.
𝑦(𝑖) is the actual class label (0 or 1) of the i-th training example.
ℎ𝜃(𝑥(𝑖)) is the predicted probability of the i-th training example belonging to the positive class.

The goal is to minimize this cost function to improve the model’s accuracy.

Logistic Regression Practical Implementation in Python :-

Click here to get practical Implementation.

In conclusion, logistic regression is a powerful algorithm for binary classification tasks, using the sigmoid function to model probabilities and the log loss function to optimize parameters. It’s widely used in various domains due to its simplicity and effectiveness.

I hope this gives you a comprehensive understanding of logistic regression, from the mathematical intuition to the differences with linear regression.