---------------------------------------------------------------------------------------------------------- name: log: C:\Users\Aki\Documents\stata\april11.log log type: text opened on: 11 Apr 2022, 19:56:23 . . ****APRIL 4. LINEAR REGRESSION****** . use nlsw, clear (NLSW, 1988 extract) . . **checking for outliers . graph box wage, over(country) . sort wage country . . **checking for distribution . hist wage, by(country) . . **checking the variances for the homoscedasticity comes after running the OLS regression . . . **OLS REGRESSION****** . **dependent variable is continous . **assumptions for OLS (Gauss-Markov assumptions) to have OLS as BLUE . **reg dependentvar independentvar1 independentvar2 . help regress . . **what are the determinants of car price? . sysuse auto, clear (1978 Automobile Data) . . reg price mpg weight length Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(3, 70) = 12.98 Model | 226957412 3 75652470.6 Prob > F = 0.0000 Residual | 408107984 70 5830114.06 R-squared = 0.3574 -------------+---------------------------------- Adj R-squared = 0.3298 Total | 635065396 73 8699525.97 Root MSE = 2414.6 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -86.78928 83.94335 -1.03 0.305 -254.209 80.63046 weight | 4.364798 1.167455 3.74 0.000 2.036383 6.693213 length | -104.8682 39.72154 -2.64 0.010 -184.0903 -25.64607 _cons | 14542.43 5890.632 2.47 0.016 2793.94 26290.93 ------------------------------------------------------------------------------ . . **what if there are dummy/categorical variables in regression? . . reg price mpg weight length i.rep78 i.foreign Source | SS df MS Number of obs = 69 -------------+---------------------------------- F(8, 60) = 9.65 Model | 324598377 8 40574797.1 Prob > F = 0.0000 Residual | 252198582 60 4203309.7 R-squared = 0.5628 -------------+---------------------------------- Adj R-squared = 0.5045 Total | 576796959 68 8482308.22 Root MSE = 2050.2 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -37.70139 79.68181 -0.47 0.638 -197.0887 121.686 weight | 6.045295 1.073279 5.63 0.000 3.898417 8.192172 length | -105.5752 36.48828 -2.89 0.005 -178.5627 -32.5878 | rep78 | 2 | 893.7844 1628.744 0.55 0.585 -2364.189 4151.757 3 | 802.7751 1504.123 0.53 0.596 -2205.918 3811.469 4 | 843.8791 1576.237 0.54 0.594 -2309.063 3996.822 5 | 1618.893 1713.585 0.94 0.349 -1808.788 5046.574 | foreign | Foreign | 3277.552 849.3603 3.86 0.000 1578.579 4976.526 _cons | 6569.534 5933.985 1.11 0.273 -5300.203 18439.27 ------------------------------------------------------------------------------ . //foreign cars on average are 3277.5 dollars more expensive than domestic cars . . **OLS REGRESSION DIAGNOSTICS . **goodness of fit, outliers, heteroskedasticity, function form problems . . *linear prediction from the model into the variable xb . predict priceb, xb (5 missing values generated) . predict residual, residuals (5 missing values generated) . summarize priceb residual Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- priceb | 69 6146.043 2184.835 1212.799 11580.09 residual | 69 1.38e-07 1925.825 -3969.608 5525.199 . . sum price Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- price | 74 6165.257 2949.496 3291 15906 . . **in OLS, error term is normally distributed . kdensity residual . //not normally distributed . . gen lprice = log(price) . reg lprice mpg weight length i.rep78 i.foreign Source | SS df MS Number of obs = 69 -------------+---------------------------------- F(8, 60) = 11.00 Model | 6.08754955 8 .760943693 Prob > F = 0.0000 Residual | 4.14989653 60 .069164942 R-squared = 0.5946 -------------+---------------------------------- Adj R-squared = 0.5406 Total | 10.2374461 68 .150550678 Root MSE = .26299 ------------------------------------------------------------------------------ lprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -.0076699 .0102213 -0.75 0.456 -.0281156 .0127757 weight | .0007187 .0001377 5.22 0.000 .0004433 .0009941 length | -.0105475 .0046806 -2.25 0.028 -.0199101 -.0011849 | rep78 | 2 | .0778632 .2089297 0.37 0.711 -.3400584 .4957847 3 | .0824566 .1929437 0.43 0.671 -.3034883 .4684015 4 | .1397924 .2021942 0.69 0.492 -.2646563 .544241 5 | .2085164 .2198128 0.95 0.347 -.2311747 .6482075 | foreign | Foreign | .4784783 .108953 4.39 0.000 .2605398 .6964168 _cons | 8.349292 .7611912 10.97 0.000 6.826683 9.871902 ------------------------------------------------------------------------------ . . predict lpriceb, xb (5 missing values generated) . rename residual residual1 . predict residual, residuals (5 missing values generated) . kdensity residual . . rvfplot . //this is a scatterplot of the residuals against the predicted values . //you want the residuals to be randomly distributed (no clear pattern) . . **APRIL 11 . lvr2plot . //we are looking for observations in far right of this figure, which have high leverage and high residua > l. These observations are far away from their true values, and they significantly influence our model. . //here, no such observations, we are fine . . ** test for collinearity . estat vif Variable | VIF 1/VIF -------------+---------------------- mpg | 3.53 0.282891 weight | 11.71 0.085364 length | 11.15 0.089725 rep78 | 2 | 4.46 0.224035 3 | 9.13 0.109569 4 | 7.86 0.127161 5 | 6.46 0.154813 1.foreign | 2.51 0.398838 -------------+---------------------- Mean VIF | 7.10 . //how much each variable adds to collinearity. <10 (or <30 if you are not very conservative), the mean V > IF should not be significantly higher than 1 . corr mpg weight length rep78 (obs=69) | mpg weight length rep78 -------------+------------------------------------ mpg | 1.0000 weight | -0.8055 1.0000 length | -0.8037 0.9478 1.0000 rep78 | 0.4023 -0.4003 -0.3606 1.0000 . . **test for heteroskedasticity . estat imtest Cameron & Trivedi's decomposition of IM-test --------------------------------------------------- Source | chi2 df p ---------------------+----------------------------- Heteroskedasticity | 40.87 29 0.0706 Skewness | 9.71 8 0.2863 Kurtosis | 1.25 1 0.2644 ---------------------+----------------------------- Total | 51.83 38 0.0667 --------------------------------------------------- . estat hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of lprice chi2(1) = 2.06 Prob > chi2 = 0.1510 . //H0: there is homoscedasticity. With P-value 0.15, we fail to reject H0, so the homoscedasticity assump > tion holds . . //If you have a heteroskedasticity, you run the same model, with robust standard errors . . reg price mpg weight length i.rep78 i.foreign Source | SS df MS Number of obs = 69 -------------+---------------------------------- F(8, 60) = 9.65 Model | 324598377 8 40574797.1 Prob > F = 0.0000 Residual | 252198582 60 4203309.7 R-squared = 0.5628 -------------+---------------------------------- Adj R-squared = 0.5045 Total | 576796959 68 8482308.22 Root MSE = 2050.2 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -37.70139 79.68181 -0.47 0.638 -197.0887 121.686 weight | 6.045295 1.073279 5.63 0.000 3.898417 8.192172 length | -105.5752 36.48828 -2.89 0.005 -178.5627 -32.5878 | rep78 | 2 | 893.7844 1628.744 0.55 0.585 -2364.189 4151.757 3 | 802.7751 1504.123 0.53 0.596 -2205.918 3811.469 4 | 843.8791 1576.237 0.54 0.594 -2309.063 3996.822 5 | 1618.893 1713.585 0.94 0.349 -1808.788 5046.574 | foreign | Foreign | 3277.552 849.3603 3.86 0.000 1578.579 4976.526 _cons | 6569.534 5933.985 1.11 0.273 -5300.203 18439.27 ------------------------------------------------------------------------------ . estat hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of price chi2(1) = 11.71 Prob > chi2 = 0.0006 . //here, there is heteroskedasticity . reg price mpg weight length i.rep78 i.foreign, r Linear regression Number of obs = 69 F(8, 60) = 7.18 Prob > F = 0.0000 R-squared = 0.5628 Root MSE = 2050.2 ------------------------------------------------------------------------------ | Robust price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -37.70139 90.74026 -0.42 0.679 -219.2089 143.8062 weight | 6.045295 1.69974 3.56 0.001 2.645309 9.445281 length | -105.5752 55.39508 -1.91 0.061 -216.3819 5.231422 | rep78 | 2 | 893.7844 1204.381 0.74 0.461 -1515.337 3302.905 3 | 802.7751 875.8402 0.92 0.363 -949.1661 2554.716 4 | 843.8791 989.9492 0.85 0.397 -1136.314 2824.072 5 | 1618.893 1301.004 1.24 0.218 -983.5026 4221.289 | foreign | Foreign | 3277.552 924.6583 3.54 0.001 1427.96 5127.144 _cons | 6569.534 7308.457 0.90 0.372 -8049.557 21188.63 ------------------------------------------------------------------------------ . //if you have a heteroskedasticity problem, you can continue with OlS, but you need to re-run your model > with Robust standard errors. . //Robust standard errors are typically higher than usual standard errors, so some variables that were si > gnificant before, might become insignificant now . . **test for functional form . estat ovtest Ramsey RESET test using powers of the fitted values of price Ho: model has no omitted variables F(3, 57) = 15.18 Prob > F = 0.0000 . //with small p-value, we reject H0. So, the model has problems with its functional form (usually is rela > ted to higher value transformations for some variables) . . . //if you have problems with the assumptions related to residuals, functional form, try to use log form o > f variables . . reg lprice mpg weight length i.rep78 i.foreign, r Linear regression Number of obs = 69 F(8, 60) = 9.85 Prob > F = 0.0000 R-squared = 0.5946 Root MSE = .26299 ------------------------------------------------------------------------------ | Robust lprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -.0076699 .0104914 -0.73 0.468 -.0286558 .013316 weight | .0007187 .0001825 3.94 0.000 .0003537 .0010838 length | -.0105475 .0061924 -1.70 0.094 -.0229342 .0018392 | rep78 | 2 | .0778632 .1404377 0.55 0.581 -.203054 .3587803 3 | .0824566 .0990583 0.83 0.408 -.1156894 .2806026 4 | .1397924 .1203311 1.16 0.250 -.1009057 .3804905 5 | .2085164 .1624454 1.28 0.204 -.1164228 .5334555 | foreign | Foreign | .4784783 .1274966 3.75 0.000 .223447 .7335095 _cons | 8.349292 .8711142 9.58 0.000 6.606805 10.09178 ------------------------------------------------------------------------------ . estat ovtest Ramsey RESET test using powers of the fitted values of lprice Ho: model has no omitted variables F(3, 57) = 11.22 Prob > F = 0.0000 . . **INTERACTION TERMS . reg lprice c.mpg##i.foreign length Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(4, 69) = 12.56 Model | 4.72894164 4 1.18223541 Prob > F = 0.0000 Residual | 6.49459144 69 .094124514 R-squared = 0.4213 -------------+---------------------------------- Adj R-squared = 0.3878 Total | 11.2235331 73 .153747029 Root MSE = .3068 ------------------------------------------------------------------------------- lprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- mpg | -.008209 .0154596 -0.53 0.597 -.0390501 .022632 | foreign | Foreign | .7352345 .3919552 1.88 0.065 -.0466948 1.517164 | foreign#c.mpg | Foreign | -.0130098 .0157026 -0.83 0.410 -.0443357 .0183162 | length | .0108084 .0034701 3.11 0.003 .0038857 .0177312 _cons | 6.66144 .947172 7.03 0.000 4.771884 8.550997 ------------------------------------------------------------------------------- . . **some ols hypothesis testing (joing significance test) . test mpg 1.rep78 2.rep78 3.rep78 4.rep78 5.rep78 1.rep78 not found r(111); end of do-file r(111); . do "C:\Users\Aki\AppData\Local\Temp\STD1a90_000000.tmp" . reg lprice mpg weight length i.rep78 i.foreign, r Linear regression Number of obs = 69 F(8, 60) = 9.85 Prob > F = 0.0000 R-squared = 0.5946 Root MSE = .26299 ------------------------------------------------------------------------------ | Robust lprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -.0076699 .0104914 -0.73 0.468 -.0286558 .013316 weight | .0007187 .0001825 3.94 0.000 .0003537 .0010838 length | -.0105475 .0061924 -1.70 0.094 -.0229342 .0018392 | rep78 | 2 | .0778632 .1404377 0.55 0.581 -.203054 .3587803 3 | .0824566 .0990583 0.83 0.408 -.1156894 .2806026 4 | .1397924 .1203311 1.16 0.250 -.1009057 .3804905 5 | .2085164 .1624454 1.28 0.204 -.1164228 .5334555 | foreign | Foreign | .4784783 .1274966 3.75 0.000 .223447 .7335095 _cons | 8.349292 .8711142 9.58 0.000 6.606805 10.09178 ------------------------------------------------------------------------------ . test mpg 1.rep78 2.rep78 3.rep78 4.rep78 5.rep78 ( 1) mpg = 0 ( 2) 1b.rep78 = 0 ( 3) 2.rep78 = 0 ( 4) 3.rep78 = 0 ( 5) 4.rep78 = 0 ( 6) 5.rep78 = 0 Constraint 2 dropped F( 5, 60) = 0.39 Prob > F = 0.8539 . //with small p-value, fail to reject. these variables are not significant even as a group . . **PRESENTING OLS RESULTS . ssc install outreg2 checking outreg2 consistency and verifying not already installed... all files already exist and are up to date. . . reg lprice mpg weight length i.rep78 i.foreign, r Linear regression Number of obs = 69 F(8, 60) = 9.85 Prob > F = 0.0000 R-squared = 0.5946 Root MSE = .26299 ------------------------------------------------------------------------------ | Robust lprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -.0076699 .0104914 -0.73 0.468 -.0286558 .013316 weight | .0007187 .0001825 3.94 0.000 .0003537 .0010838 length | -.0105475 .0061924 -1.70 0.094 -.0229342 .0018392 | rep78 | 2 | .0778632 .1404377 0.55 0.581 -.203054 .3587803 3 | .0824566 .0990583 0.83 0.408 -.1156894 .2806026 4 | .1397924 .1203311 1.16 0.250 -.1009057 .3804905 5 | .2085164 .1624454 1.28 0.204 -.1164228 .5334555 | foreign | Foreign | .4784783 .1274966 3.75 0.000 .223447 .7335095 _cons | 8.349292 .8711142 9.58 0.000 6.606805 10.09178 ------------------------------------------------------------------------------ . outreg2 using apr11.doc, replace apr11.doc dir : seeout . reg lprice mpg weight length i.rep78 i.foreign turn, r Linear regression Number of obs = 69 F(9, 59) = 9.01 Prob > F = 0.0000 R-squared = 0.6034 Root MSE = .26233 ------------------------------------------------------------------------------ | Robust lprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -.0089263 .0108255 -0.82 0.413 -.0305881 .0127354 weight | .0007464 .0001927 3.87 0.000 .0003607 .001132 length | -.0090539 .0062201 -1.46 0.151 -.0215004 .0033926 | rep78 | 2 | .0975155 .1439421 0.68 0.501 -.1905118 .3855429 3 | .071544 .1062276 0.67 0.503 -.1410169 .2841049 4 | .1202323 .1265663 0.95 0.346 -.1330263 .3734909 5 | .1882845 .1713961 1.10 0.276 -.1546782 .5312473 | foreign | Foreign | .4507115 .1288667 3.50 0.001 .1928499 .7085732 turn | -.0187456 .0178207 -1.05 0.297 -.0544048 .0169135 _cons | 8.776235 1.003938 8.74 0.000 6.76736 10.78511 ------------------------------------------------------------------------------ . outreg2 using apr11.doc, append apr11.doc dir : seeout . . . **LOGIT PROBIT . //used in case of a binary/dummy (1,0) dependent variable . sysuse auto, clear (1978 Automobile Data) . . reg foreign price mpg i.rep78 Source | SS df MS Number of obs = 69 -------------+---------------------------------- F(6, 62) = 8.33 Model | 6.51883662 6 1.08647277 Prob > F = 0.0000 Residual | 8.08985903 62 .130481597 R-squared = 0.4462 -------------+---------------------------------- Adj R-squared = 0.3926 Total | 14.6086957 68 .21483376 Root MSE = .36122 ------------------------------------------------------------------------------ foreign | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- price | .0000213 .0000175 1.22 0.226 -.0000136 .0000563 mpg | .0235328 .0098398 2.39 0.020 .0038633 .0432022 | rep78 | 2 | .0141684 .2864299 0.05 0.961 -.5583968 .5867337 3 | .0970575 .2653092 0.37 0.716 -.433288 .6274031 4 | .4521384 .2709194 1.67 0.100 -.0894217 .9936984 5 | .6396386 .2881761 2.22 0.030 .0635828 1.215694 | _cons | -.591636 .3615003 -1.64 0.107 -1.314265 .1309928 ------------------------------------------------------------------------------ . predict yhat (option xb assumed; fitted values) (5 missing values generated) . su yhat Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- yhat | 69 .3043478 .3096211 -.0752825 1.128067 . //take a look at the minimum and maximum of predicted yhat. they are not within 0-1 range. This tells us > that the linear probability model predicts observations to have probabilities less than 0 and above one > . . su yhat if yhat<0 | yhat>1 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- yhat | 7 .1146603 .4474273 -.0752825 1.128067 . //almost 10% of observations that are outside of 0-1 range . log close name: log: C:\Users\Aki\Documents\stata\april11.log log type: text closed on: 11 Apr 2022, 19:57:30 ----------------------------------------------------------------------------------------------------------