Introduction
In statistical modeling, Regression Analysis is employed to understand the nature and size of the relationship among selected variables. Regression Analysis is also understood as an instrument that measures the relation between the mean of one variable (dependent) and corresponding values of another variable (s) (independent) (Sen & Srivastava, 2013). There are various types of Regression Analysis, such as simple/linear-multiple-robust, which are employed for specific objectives and in particular conditions (Boslaugh, 2012). Also, the nature of data and research also strongly influences the type of regression analysis that to be employed to estimate the nature and size of the relationship among the variables. Each type of regression analysis has its own set of assumptions, which must be observed. As regression allows us to detect any significant relationship among variables (also the type and nature); therefore, regression analysis has an enormous corporate application (Ozyasar, 2018).
Classical Regression
Like any other regression model, Classical Linear Regression Model is a statistical instrument that is employed to project the future behavior or values of the dependent variable, when changes in independent (explanatory) variable(s) occur. The model assumes that the dependent variable is non-random or non-stochastic (Applied Econometrics, 2015). However, other conditions and assumptions are associated with the Classical Linear Regression Model (CLRM). For instance, Ordinary Least Square Model is used for the estimations of parameters. These assumptions are,
- The model must be linear in parameters (not linear in the variables).
- In repeated sampling, the values of the variable X (X-values) should be fixed. It implies that the explanatory or independent variable must not be random or stochastic.
- For this model, the conditional value of error term (et) must be equal to zero (given the value of X).
- For all the observations of this statistical model, the value of ei is the same. This phenomenon is understood as Homoscedasticity (equal variance of ei).
- There should be no autocorrelation between the error terms of two X values.
- The covariance between the error term and independent variable must be equal to zero.
- It is necessary that the number of observations are greater than the total number of parameters (n>P).
These assumptions are vital for the application of CLRM, which emphasizes homoscedasticity.
Robust Regression
Robust Regression is designed to address and overcome the contradictions and limitations of conventional parametric and non-parametric models. It is imperative to understand that whenever any assumption of CLRM gets violated, the model produces biased or flawed results. Mostly, we apply robust regression, when there is a strong suspicion of heteroscedasticity. As heteroscedasticity permits the variance to lean on or depend upon the X variable (independent); therefore, for real scenarios, heteroscedasticity-based regression models are more appropriate. Also, it can also address the issue of corrupt variables (Bhatia, Jain, & Kar, 2015).
Data
For this academic exercise, which aims to apply Classical and Robust regression models to learn how different their outcomes are, we will data of two European Union countries (Germany and France). To carry out statistical analyses, we have opted for STATA, statistical analysis software known for its simplicity and precision. The total number of observations of each variable is 20. The starting year, of the retrieved data is 1998 and it stretches to the year 2017 (World Bank, 2018).
Tests to be performed
- OLS
- Robust Regression
Results/Analysis
Summary of Statistics
Variable | Obs Mean Std. Dev. Min Max
————-+——————————————————–
INF | 20 1.349438 .7572429 .136763 2.619995
UNE | 20 9.275 1.227306 7.06 11.88
ING | 20 1.131517 .6358419 -.4497879 2.013046
UNG | 20 7.4595 2.316607 3.4 11.17
France
OLS
Source | SS df MS Number of obs = 20
————-+—————————— F( 1, 18) = 7.19
Model | .091862426 1 .091862426 Prob> F = 0.0153
Residual | .230104892 18 .012783605 R-squared = 0.2853
————-+—————————— Adj R-squared = 0.2456
Total | .321967318 19 .016945648 Root MSE = .11306
——————————————————————————
logUNE | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+—————————————————————-
logINF | -.0799733 .0298334 -2.68 0.015 -.142651 -.0172957
_cons | 2.222781 .0253172 87.80 0.000 2.169592 2.275971
——————————————————————————
Robust
Linear regression Number of obs = 20
F( 1, 18) = 12.85
Prob> F = 0.0021
R-squared = 0.4592
Root MSE = .92726
——————————————————————————
| Robust
UNE | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+—————————————————————-
INF | -1.098319 .3063915 -3.58 0.002 -1.742024 -.4546142
_cons | 10.75711 .5538499 19.42 0.000 9.593518 11.92071
Germany
OLS
Source | SS df MS Number of obs = 19
————-+—————————— F( 1, 17) = 8.41
Model | .732759321 1 .732759321 Prob> F = 0.0099
Residual | 1.48033768 17 .087078687 R-squared = 0.3311
————-+—————————— Adj R-squared = 0.2918
Total | 2.213097 18 .122949833 Root MSE = .29509
——————————————————————————
logUNG | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+—————————————————————-
logING | -.3567767 .1229905 -2.90 0.010 -.6162641 -.0972893
_cons | 1.976637 .0682407 28.97 0.000 1.832662 2.120612
——————————————————————————
Robust
Linear regression Number of obs = 20
F( 1, 18) = 7.29
Prob> F = 0.0147
R-squared = 0.2879
Root MSE = 2.0085
——————————————————————————
| Robust
UNG | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+—————————————————————-
ING | -1.954766 .7241738 -2.70 0.015 -3.476199 -.4333332
_cons | 9.671352 .9647997 10.02 0.000 7.644383 11.69832
Comparison
Classical Linear Regression Model is employed when the data have homoscedasticity. There are certain assumptions (around seven) that must be observed. If these assumptions are violated, the model may produce biased results. Robust regression is employed when data have heteroscedasticity. From the series of statistical analyses, it is apparent that robust regression produces slightly different results from CLRM. For instance, in CLRMs, for which we generated log values to reduce any heteroscedasticity, had different T and Coefficient values (parameter) than Robust Regression (RR). In the case of Germany, t-value did not change much; however, the standard error and coefficient values changed evidently. This implies that RR and CLRM produce different results. From this methodical study and scrutiny of these two different statistical models, we also learn that the decision about the opting/employment of particular model depends upon the attributes of data (homoscedasticity/heteroscedasticity). The robust regression tends to produce more convincing results than the classical linear regression model.
Conclusion
In the end, it can be concluded that CLRM and RR are two different types of regression models, which are employed on particular types of data. The use of the particular regression model is generally not the discretion of the researcher, but rather it depends upon the attributes of retrieved data. The results of CLRM could be different from RR.
References
Applied Econometrics. (2015). Macmillan International Higher Education.
Bhatia, K., Jain, P., & Kar, P. (2015). Robust regression via hard thresholding. Advances in Neural Information Processing Systems, 721-729.
Boslaugh, S. (2012). Statistics in a Nutshell. O’Reilly Media, Inc.
Ozyasar, H. (2018, June 26). Application of Regression Analysis in Business. Retrieved September 6, 2018, from https://smallbusiness.chron.com/application-regression-analysis-business-77200.html
Sen, A. K., & Srivastava, M. S. (2013). Regression Analysis: Theory, Methods and Applications. Springer.
World Bank. (2018, July 1). World Bank. Retrieved September 6, 2018, from http://databank.worldbank.org/data/reports.aspx?Code=CHN&id=556d8fa6&report_name=Popular_countries&populartype=country&ispopular=y#