What is Logistic Regression?
Logistic regression predicts the probability of an outcome that can only have two values (i.e. a dichotomy).
The prediction based on the use of one or several predictors (numerical and categorical).
A linear regression not appropriate for predicting the value of a binary variable for two reasons:
- A linear regression will predict values outside the acceptable range (e.g. predicting probabilities outside the range 0 to 1).
- Since the dichotomous experiments can only have one of two possible values for each experiment, the residuals will not normally distribute above the predicted line.
A logistic regression produces a logistic curve, which limited to values between 0 and 1.
Logistic regression similar to a linear regression, but the curve constructed using the natural logarithm of the “odds” of the target variable, rather than the probability.
Moreover, the predictors do not have to be normally distributed or have equal variance in each group.
In the logistic regressions, the constant (b0) moves the curve left and right and the slope (b1) defines the steepness of the curve.
An advantage of logistic regression is that the algorithm is highly flexible, taking any kind of input. And supports several different analytical tasks:
- Use demographics to make predictions about outcomes, such as risk for a certain disease.
- Explore and weight the factors that contribute to a result. For example, find the factors that influence customers to make a repeat visit to a store.
- Classify documents, e-mail, or other objects that have many attributes.