**indicate the working path cd "C:\Users\Aki\Documents\stata" log using april4.log, replace ****APRIL 4. LINEAR REGRESSION****** use nlsw, clear **checking for outliers graph box wage, over(country) sort wage country **checking for distribution hist wage, by(country) **checking the variances for the homoscedasticity comes after running the OLS regression **OLS REGRESSION****** **dependent variable is continous **assumptions for OLS (Gauss-Markov assumptions) to have OLS as BLUE **reg dependentvar independentvar1 independentvar2 help regress **what are the determinants of car price? sysuse auto, clear reg price mpg weight length **what if there are dummy/categorical variables in regression? reg price mpg weight length i.rep78 i.foreign //foreign cars on average are 3277.5 dollars more expensive than domestic cars **OLS REGRESSION DIAGNOSTICS **goodness of fit, outliers, heteroskedasticity, function form problems *linear prediction from the model into the variable xb predict priceb, xb predict residual, residuals summarize priceb residual sum price **in OLS, error term is normally distributed kdensity residual //not normally distributed gen lprice = log(price) reg lprice mpg weight length i.rep78 i.foreign predict lpriceb, xb rename residual residual1 predict residual, residuals kdensity residual rvfplot //this is a scatterplot of the residuals against the predicted values //you want the residuals to be randomly distributed (no clear pattern) log close