Simple Linear Regression

Home > Economics > Econometrics > Simple Linear Regression

A statistical method for modeling the relationship between two variables, where one variable is used to predict the other.

Regression analysis: A statistical technique used to establish a relationship between a dependent variable and one or more independent variables.
Simple linear regression: A statistical technique that establishes a linear relationship between a dependent variable and a single independent variable.
Assumptions of simple linear regression: Linearity, independence, normality, homoscedasticity, and absence of outliers are the five key assumptions that form the basis of simple linear regression.
Parameter estimation: The process of calculating the slope and intercept of a linear regression line is known as parameter estimation.
Model evaluation: The model's accuracy in predicting the dependent variable is assessed using various evaluation techniques, such as the R-squared value, adjusted R-squared value, and significance of the regression coefficient.
Hypothesis testing: A statistical test is used to assess if there is a significant relationship between the dependent variable and independent variable.
Residual analysis: The residual plot is examined to assess if the residuals are normally distributed and follow a random pattern, indicating that the simple linear regression model is a good fit for the data.
Multicollinearity: A condition in which two independent variables are highly correlated and have a similar effect on the dependent variable.
Heteroscedasticity: A condition in which the variance of the residuals is not constant across the range of values of the independent variable.
Outliers: Extreme values of the independent variable that can have a significant impact on the regression model.
Correlation: The strength and direction of the linear relationship between two variables. Correlation values range from -1 to 1, with -1 indicating a strong negative correlation, 0 indicating no correlation, and 1 indicating a strong positive correlation.
Coefficient of determination: A statistical metric that measures the proportion of the variation in the dependent variable that is explained by the independent variable.
Standard error: A measure of the precision of the regression coefficient estimate.
Confidence intervals: The range of values within which the true population regression coefficient is likely to lie.
Predictive modeling: The process of developing a mathematical model that estimates the value of the dependent variable based on the values of the independent variables.
Time series analysis: A statistical technique used to analyze time series data to find patterns, trends, and seasonal variations.
Panel data analysis: A statistical technique used to analyze data sets with cross-sectional and time-series components.
Instrumental variables: A statistical technique used to estimate the causal effect of an independent variable on the dependent variable in the presence of endogeneity.
Endogeneity: A condition in which the independent variable is correlated with the error term, which violates the assumption of independence in regression analysis.
OLS assumptions: The Ordinary Least Squares (OLS) assumptions include linearity, independence of errors, homoscedasticity, normality of errors, and absence of multicollinearity.
Ordinary Least Squares Regression (OLS): It is the most commonly used type of regression in econometrics. It tries to minimize the sum of the squared errors between the predicted and actual values.
Weighted Least Squares Regression (WLS): When the errors are not constant, WLS is used. It gives more weight to the observations with lower error variance.
Generalized Least Squares Regression (GLS): When there is heteroskedasticity and autocorrelation in the data, GLS is used. It can be used to correct for both of these problems at the same time.
Two-Stage Least Squares Regression (2SLS): When there is endogeneity in the independent variables, 2SLS is used. It uses an instrumental variable to estimate the independent variable and then uses that estimate in the regression.
Three-Stage Least Squares Regression (3SLS): When there are several endogenous variables, 3SLS is used. It uses instrumental variables to estimate all the endogenous independent variables.
Seemingly Unrelated Regressions (SUR): When there are multiple dependent and independent variables, and they are related, SUR is used. It estimates the model for each dependent variable and then relates them to each other.
Panel Data Regression: When the data is collected over time for the same group of individuals, panel data regression is used. It estimates the model for each individual and then relates them to each other.
Quantile Regression: It is used to estimate the conditional quantiles of the dependent variable. It can be used to study the distribution of the dependent variable across different percentiles.
Tobit Regression: When the dependent variable is censored, i.e., it only takes values within a particular range, Tobit regression is used. It estimates the model for both the censored and uncensored observations.
Logit and Probit Regression: When the dependent variable is binary, i.e., it can take only two values, logit and probit regression are used. It estimates the probability of the dependent variable taking one of the two values.
"In statistics, simple linear regression is a linear regression model with a single explanatory variable."
"...as accurately as possible, predicts the dependent variable values as a function of the independent variable."
"The adjective simple refers to the fact that the outcome variable is related to a single predictor."
"The accuracy of each predicted value is measured by its squared residual... and the goal is to make the sum of these squared deviations as small as possible."
"Other regression methods that can be used in place of ordinary least squares include least absolute deviations... and the Theil–Sen estimator."
"Deming regression... is not really an instance of simple linear regression because it does not separate the coordinates into one dependent and one independent variable and could potentially return a vertical line as its fit."
"In this case, the slope of the fitted line is equal to the correlation between y and x corrected by the ratio of standard deviations of these variables."
"The intercept of the fitted line is such that the line passes through the center of mass (x, y) of the data points."