**indicate the working path cd "C:\Users\Aki\Documents\stata" log using april11.log, replace ****APRIL 4. LINEAR REGRESSION****** use nlsw, clear **checking for outliers graph box wage, over(country) sort wage country **checking for distribution hist wage, by(country) **checking the variances for the homoscedasticity comes after running the OLS regression **OLS REGRESSION****** **dependent variable is continous **assumptions for OLS (Gauss-Markov assumptions) to have OLS as BLUE **reg dependentvar independentvar1 independentvar2 help regress **what are the determinants of car price? sysuse auto, clear reg price mpg weight length **what if there are dummy/categorical variables in regression? reg price mpg weight length i.rep78 i.foreign //foreign cars on average are 3277.5 dollars more expensive than domestic cars **OLS REGRESSION DIAGNOSTICS **goodness of fit, outliers, heteroskedasticity, function form problems *linear prediction from the model into the variable xb predict priceb, xb predict residual, residuals summarize priceb residual sum price **in OLS, error term is normally distributed kdensity residual //not normally distributed gen lprice = log(price) reg lprice mpg weight length i.rep78 i.foreign predict lpriceb, xb rename residual residual1 predict residual, residuals kdensity residual rvfplot //this is a scatterplot of the residuals against the predicted values //you want the residuals to be randomly distributed (no clear pattern) **APRIL 11 lvr2plot //we are looking for observations in far right of this figure, which have high leverage and high residual. These observations are far away from their true values, and they significantly influence our model. //here, no such observations, we are fine ** test for collinearity estat vif //how much each variable adds to collinearity. <10 (or <30 if you are not very conservative), the mean VIF should not be significantly higher than 1 corr mpg weight length rep78 **test for heteroskedasticity estat imtest estat hettest //H0: there is homoscedasticity. With P-value 0.15, we fail to reject H0, so the homoscedasticity assumption holds //If you have a heteroskedasticity, you run the same model, with robust standard errors reg price mpg weight length i.rep78 i.foreign estat hettest //here, there is heteroskedasticity reg price mpg weight length i.rep78 i.foreign, r //if you have a heteroskedasticity problem, you can continue with OlS, but you need to re-run your model with Robust standard errors. //Robust standard errors are typically higher than usual standard errors, so some variables that were significant before, might become insignificant now **test for functional form estat ovtest //with small p-value, we reject H0. So, the model has problems with its functional form (usually is related to higher value transformations for some variables) //if you have problems with the assumptions related to residuals, functional form, try to use log form of variables reg lprice mpg weight length i.rep78 i.foreign, r estat ovtest **INTERACTION TERMS reg lprice c.mpg##i.foreign length **some ols hypothesis testing (joing significance test) reg lprice mpg weight length i.rep78 i.foreign, r test mpg 1.rep78 2.rep78 3.rep78 4.rep78 5.rep78 //with small p-value, fail to reject. these variables are not significant even as a group **PRESENTING OLS RESULTS ssc install outreg2 reg lprice mpg weight length i.rep78 i.foreign, r outreg2 using apr11.doc, replace reg lprice mpg weight length i.rep78 i.foreign turn, r outreg2 using apr11.doc, append **LOGIT PROBIT //used in case of a binary/dummy (1,0) dependent variable sysuse auto, clear reg foreign price mpg i.rep78 predict yhat su yhat //take a look at the minimum and maximum of predicted yhat. they are not within 0-1 range. This tells us that the linear probability model predicts observations to have probabilities less than 0 and above one. su yhat if yhat<0 | yhat>1 //almost 10% of observations that are outside of 0-1 range log close