cd "C:\Users\Aki\Documents\stata" log using feb14.log ****DATA MANIPULATION. PART 2****** **naming and labeling variables**** *change a name of a variable sysuse auto rename rep78 repair_record rename repair_record rep78 **Stata differentiates between upper and lower cases sum Price //no such variable sum price **renaming many variables at once rename * *1978 rename *1978 * rename * *a rename *a * rename m* M* rename M* m* **labeling a variable label variable rep78 "repair record of cars in 1978" tab rep78 **labeling values (useful for dummy and categorical variables) label dir tab foreign **step 1. define a label label define repair 1 "A" 2 "B" 3 "C" 4 "D" 5 "E" label dir **step 2. connect a label to a variable label values rep78 repair tab rep78 ***if it was aboue labeling ethnicities *** label define ethnic 1 "kyrgyz" 2 "uzbek" 3 "russian" 4 "other" ***label values ethnicity ethnic *****EGEN OR EXTENDED GENERATE****** help egen ***you want to generate a new variable, ten equal deciles for price egen deciles = cut (price), group (10) tab deciles **we want to have average value of price, miles per gallon, weight egen rowmean = rowmean (price mpg weight) browse price mpg weight rowmean tab rep78, sum (price) egen pricerepair = mean (price), by(rep78) tab pricerepair ****if you want to generate an average cons per household **hh1 cons1 **hh1 cons2 **hh1 cons3 **hh2 cons2 **hh3 cons1 **hh3 cons3 **egen meancons = mean(cons), by(hhid) *** egen totalincome = total(income), by (hhid) ****DUMMY VARIABLES **dummy variable is a binary variable: two categories, usually 0 and 1 *e.g. gender (female and male), location (urban/rural), vote (yes/no) **gen a dummy variable for price: lowprice and highprice gen highprice = 1 if price>6165 replace highprice = 0 if price<=6165 //replace highprice = . if price==. in case of any missing observations in the original variable price tab highprice **be careful with missing observations! **gen a dummy var from a categorical variable tab rep78 gen A = 1 replace A = 0 if rep78!=1 gen B = 1 replace B = 0 if rep78!=2 gen C = 1 replace C = 0 if rep78!=3 ///continue with all the categories //2 lines of codes for each category **a shortcut version of doing this: tab rep78, gen(dummy) //5 new dummy variables are created. need to rename them rename dummy1 A rename dummy2 B ***KEEEPING AND DROPPING VARIABLES**** **to remove a variable from a dataset drop dummy5 drop A B **keep only selected variables**** keep make price foreign trunk **keep variables that have "e" in their name keep *e* **drop variables that are not domestic tab foreign drop if foreign==1 log close