Professional Documents
Culture Documents
Final - Finance and Acc STATA Assignment
Final - Finance and Acc STATA Assignment
Name
Course-Code
Faculty/Dept.
Date
2
In proceeding to merge the datasets using the unique identifiers "Ticker" and "Year," I
began by loading them separately to confirm their properties in STATA. I ran them in the
software as follows:
. describe
Contains data
obs: 2,686
vars: 4
size: 131,614
-----------------------------------------------------------------
---------------------------
storage display value
variable name type format label variable label
-----------------------------------------------------------------
---------------------------
Ticker str6 %9s Ticker
Year int %10.0g Year
Coname str40 %40s Coname
FamFirm byte %10.0g FamFirm
4
-----------------------------------------------------------------
---------------------------
Sorted by:
Note: dataset has changed since last s
I then merged the data using the unique identifiers "Ticker" and "Year" with the following
commands and outputs:
cd "C:\Users\MYPC\Desktop\ASSIGNMENT\finance
C:\Users\MYPC\Desktop\ASSIGNMENT\finance
.
.
.
. import excel using "datforhomework1_2023.xls", sheet("Sheet1")
firstrow clear
.
. keep if !missing(Ticker) & !missing(Year)
(318 observations deleted)
.
. save "datforhomework1_2023.dta", replace
(note: file datforhomework1_2023.dta not found)
file datforhomework1_2023.dta saved
.
.
.
. import excel using "famfirms.xls", firstrow clear
.
5
.
.
.
. use "datforhomework1_2023.dta", clear
.
. merge m:1 Ticker Year using "famfirms.dta"
Result # of obs.
-----------------------------------------
not matched 4,932
from master 4,468 (_merge==1)
from using 464 (_merge==2)
To generate the dummy variable "nonfounderfam" as 1 for family firms whose founder is not
also the CEO, and 0, I opened the merged dataset "merged_dataset.dta" in Stata using the
command:
use "C:\Users\MYPC\Desktop\ASSIGNMENT\finance\merged_dataset.dta", clear
I then generated the "nonfounderfam" variable using the gen command and conditional logic as
follows:
gen nonfounderfam = 0
replace nonfounderfam = 1 if FamFirm == 1 & founderCEO == 0
I then saved the dataset:
save "merged_dataset.dta", replace
Ticker: This is a unique identifier for each firm, representing the company's ticker
symbol.
year: The year is that for which the data is recorded (ranging from 1992 to 1999).
permid: This represents a unique identifier for each firm, representing the company's
permanent identifier.
7
agefirm: The age of the firm in years, indicates how long the company has been in
operation.
meanagef: The average age of the firm's founders is regardless of whether they are
currently working for the company.
assets: The book value of assets is asset value measured in millions of dollars, serving as
a proxy for firm size.
bs_volatility: The measure of uncertainty in the firm's environment is calculated as the
standard deviation of the firm's previous 60-month stock returns.
roa: This is the return on Assets, an accounting measure of the firm's performance.
founderCEO: The binary variable (0 or 1) indicates whether the current CEO of the firm
is one of its founders. (1: Founder is also the CEO, 0: Founder is not the CEO)
Q: The proxy for Tobin's Q is used as a measure of firm performance.
digit2_in: The two-digit industry code represents the industry in which the firm operates.
hightech: This is the dummy variable (0 or 1) indicating whether the firm operates in a
high-tech industry. (1: High-tech industry, 0: Non-high-tech industry)
Coname: The variable is the name of the company (i.e., the full or less abbreviated
name).
FamFirm: This binary variable represents family firm status (i.e., 1: Family firm, 0: Non-
family firm)
nonfounderfam: The binary variable indicates whether a firm is a family firm whose
founder is not also the CEO. (1: Family firm with non-founder CEO, 0: Founder is also
the CEO)
Table I
digit2_in FounderCEOFirms NonFounderFirms TotalFirms Percentage
10 16 16 32 50
13 45 45 90 50
15 7 11 18 61.11111
16 15 15 30 50
20 54 113 167 67.66467
21 9 9 18 50
22 0 14 14 100
23 8 13 21 61.90476
24 9 19 28 67.85714
25 8 8 16 50
26 49 71 120 59.16667
27 27 85 112 75.89286
28 151 223 374 59.62567
29 60 65 125 52
30 40 36 76 47.36842
31 9 15 24 62.5
32 7 7 14 50
33 64 75 139 53.95683
34 35 53 88 60.22727
35 102 142 244 58.19672
36 92 102 194 52.57732
8
Averaging variables for each firm and then taking averages of these firm-level averages
can lead to misleading results where variability is disregarded and the panel collapsed to a cross-
sectional dataset. The observations are meant to be over time for the firms, with potential time-
invariant heterogeneity and time-variance in variables like family ownership and firm
performance. Therefore, averaging variables within a firm and then calculating means across
firms leads to losing the information on within-firm variation when ownership changes. With
such an approach, we may accurately capture the effects of family ownership presence on firm
performance, but we’d be capturing differences across firms resulting from their unique
characteristics that may be beyond family ownership. A better approach would be the fixed-
effects models accounting for individual firms’ heterogeneity over time.
9
t-statistic
Variable t-statistic p-value degrees of freedom
agefirm 6.6412401 4.385e-11 1451
meanagef 11.145291 1.005e-27 1432
assets 4.0903122 .00004462 2220
bs_volatility -1.9035795 .05709589 .
roa -4.9590917 7.619e-07 2220
Q -4.747815 2.187e-06 2220
Correlations:
agefir meanage assets bs_volatil~ roa Q NonfamilyFouder
m f y s
agefirm 1.0000
meanagef 0.5325 1.0000
assets 0.1503 0.0868 1.000
0
bs_vol~y -0.2890 -0.3359 - 1.0000
0.109
7
roa -0.0035 -0.1139 - -0.1995 1.000
0.094 0
3
Q 0.0222 -0.1372 - -0.1287 0.618 1.000
0.075 4 0
0
NonfamFounder 0.1801 0.2844 0.064 -0.0526 - - 1.0000
s 8 0.131 0.106
7 0
Multivariate Analysis
Regression
10
To reestimate Model 1 while accounting for heteroskedasticity, I used the White heteroskedastic-
consistent standard errors:
regress Q FamFirm bs_volatility ln_assets_new hightech i.Year,
robust
Variable Coefficient Std. Error t-value P-value 95% Conf. 95% Conf.
Lower Upper
FamFirm 0.1453296 0.044961 3.23 0.001 0.0571587 0.2335004
11
In reestimating the specification in Column IT with the variable "founderCEQ" instead of "FamFirm,":
gen NonfamilyFirms = 1 - founderCEO
gen ln_assets_new = ln(assets)
regress Q founderCEO bs_volatility ln_assets_new hightech i.Year, robust
VII:
Variable Coefficient Std. Error t-stat p-value 95% CI Lower 95% CI Upper
FamFirm 0.8725552 0.2712304 3.22 0.001 0.3406583 1.404452
bs_volatility -17.49335 1.589865 -11.00 0.000 -20.61115 -14.37554
ln_assets_new -1.085033 0.1025711 -10.58 0.000 -1.28618 -0.8838852
hightech 0.6805309 0.3429308 1.98 0.047 0.0080257 1.353036
1993 -1.141379 0.4694491 -2.43 0.015 -2.061993 -0.2207646
1994 0.0555542 0.4350387 0.13 0.898 -0.7975793 0.9086878
1995 -0.4464318 0.464891 -0.96 0.337 -1.358107 0.4652438
1996 -0.4938149 0.4533737 -1.09 0.276 -1.382904 0.3952745
1997 -1.043172 0.5340777 -1.95 0.051 -2.090526 0.0041823
1998 -0.5699293 0.5529228 -1.03 0.303 -1.65424 0.5143812
1999 0.8916283 0.5034029 1.77 0.077 -0.0955712 1.878828
_cons 19.60219 1.164299 16.84 0.000 17.31894 21.88544
Columns II, III, and IV after replacing "Q" with "ln_Q" (Natural log of Q):
II:
Variable Coefficient Std. Err. t-stat P-value 95% Conf. 95% Conf.
FamFirm 0.3734733 0.2191741 1.70 0.089 -0.0563387 0.8032853
bs_volatility -9.100198 1.404039 -6.48 0.000 -11.85359 -6.346803
ln_assets_new -0.3281843 0.0834746 -3.93 0.000 -0.4918824 -0.1644863
hightech -0.2300397 0.2893818 -0.79 0.427 -0.7975326 0.3374532
1993 -0.8252379 0.3309317 -2.49 0.013 -1.474212 -0.1762634
1994 0.7038533 0.2847899 2.47 0.014 0.1453653 1.262341
1995 -0.0588121 0.3253736 -0.18 0.857 -0.696887 0.5792627
1996 -0.4333491 0.2998699 -1.45 0.149 -1.02141 0.1547116
1997 -1.620595 0.4284931 -3.78 0.000 -2.460892 -0.7802972
1998 -1.454903 0.4406229 -3.30 0.001 -2.318988 -0.5908181
1999 0.0659321 0.3565427 0.18 0.853 -0.633267 0.7651312
ln_Q 8.37467 0.2784779 30.07 0.000 7.82856 8.92078
_cons 6.584117 0.9693769 6.79 0.000 4.683117 8.485116
14
III:
Variable Coefficient Std. Error t-stat P-value Lower 95% CI Upper 95% CI
FamFirm 0.3734733 0.2191741 1.70 0.089 -0.0563387 0.8032853
bs_volatility -9.100198 1.404039 -6.48 0.000 -11.85359 -6.346803
ln_assets_new -0.3281843 0.0834746 -3.93 0.000 -0.4918824 -0.1644863
hightech -0.2300397 0.2893818 -0.79 0.427 -0.7975326 0.3374532
1993 -0.8252379 0.3309317 -2.49 0.013 -1.474212 -0.1762634
1994 0.7038533 0.2847899 2.47 0.014 0.1453653 1.262341
1995 -0.0588121 0.3253736 -0.18 0.857 -0.696887 0.5792627
1996 -0.4333491 0.2998699 -1.45 0.149 -1.02141 0.1547116
1997 -1.620595 0.4284931 -3.78 0.000 -2.460892 -0.7802972
1998 -1.454903 0.4406229 -3.30 0.001 -2.318988 -0.5908181
1999 0.0659321 0.3565427 0.18 0.853 -0.633267 0.7651312
ln_Q 8.37467 0.2784779 30.07 0.000 7.82856 8.92078
_cons 6.584117 0.9693769 6.79 0.000 4.683117 8.485116
15
Variable Coefficient Std. Error t-stat p-value Lower 95% CI Upper 95% CI
founderCEO 0.3339915 0.0995308 3.36 0.001 0.1387632 0.5292197
bs_volatility -1.928315 0.2261622 -8.53 0.000 -2.371929 -1.484701
ln_assets_new -0.181764 0.0174298 -10.43 0.000 -0.2159522 -0.1475758
hightech 0.1805779 0.052912 3.41 0.001 0.0767918 0.284364
1993 -0.0818767 0.0874207 -0.94 0.349 -0.2533512 0.0895978
1994 -0.1573232 0.0862384 -1.82 0.068 -0.3264786 0.0118322
1995 -0.0815038 0.0893594 -0.91 0.362 -0.2567809 0.0937734
1996 -0.0015462 0.0933073 -0.02 0.987 -0.1845671 0.1814747
1997 0.1311439 0.0957358 1.37 0.171 -0.0566405 0.3189282
1998 0.244607 0.1080777 2.26 0.024 0.0326141 0.4565999
1999 0.2326735 0.1074327 2.17 0.030 0.0219458 0.4434012
_cons 3.900178 0.1998218 19.52 0.000 3.508231 4.292126
16
Column XIII: Re-estimating model 1 assuming heteroskedasticity, but including firm dummies:
Variable Coefficient Std. Error t-value P-value Lower 95% CI Upper 95% CI
FamFirm 0.0322847 0.0864707 0.37 0.709 -0.1373052 0.201875
bs_volatility -1.335765 0.2755755 -4.85 0.000 -1.876234 -0.7952949
ln_assets_new -0.3606776 0.0605294 -5.96 0.000 -0.4793902 -0.241965
hightech 0 (omitted) - - - - -
1993 0.0284773 0.0483336 0.59 0.556 -0.0663165 0.123271
1994 -0.0393587 0.045634 -0.86 0.389 -0.1288579 0.0501405
1995 0.0693375 0.0461064 1.50 0.133 -0.0210882 0.159763
1996 0.1586567 0.0480367 3.30 0.001 0.0644453 0.252868
1997 0.3647576 0.0489925 7.45 0.000 0.2686716 0.460844
1998 0.4937141 0.0569458 8.67 0.000 0.3820297 0.605399
1999 0.4774547 0.0665994 7.17 0.000 0.3468373 0.608072
_cons 5.18374 0.5531294 9.37 0.000 4.09892 6.26856
To address the endogeneity of founderCEO in Column III, I apply instrumental variable (IV)
estimation. I used the variable meanagef as the instrument for founderCEO, but which is
measured only in 1994, and I treat it as an exogenous variable for all other years. Here is the
output:
In performing the first stage regression for the model in Column III using meanagef as the instrument
for founderCEO, I used the ivregress command with bs_volatility, ln_assets_new,
hightech, and i.Year as independent variables.
17
Output:
95% Conf.
Variable Coefficient Std. Err. t-statistic P-value 95% Conf. Upper
Lower
bs_volatility -42.86471 3.498083 -12.25 0.000 -49.72674 -36.00269
ln_assets_new 0.8933977 0.2891044 3.09 0.002 0.3262749 1.46052
hightech -1.989932 0.7376537 -2.70 0.007 -3.436953 -0.5429105
1992 0 (empty) (empty) (empty) (empty) (empty)
1993 0.6294254 1.190973 0.53 0.597 -1.706852 2.965703
1994 0.5692188 1.187144 0.48 0.632 -1.759546 2.897984
1995 -0.4005572 1.186799 -0.34 0.736 -2.728646 1.927531
1996 -1.091232 1.190296 -0.92 0.359 -3.42618 1.243716
1997 -0.96853 1.195973 -0.81 0.418 -3.314616 1.377556
1998 0.1587945 1.213241 0.13 0.896 -2.221164 2.538753
1999 2.842454 1.254228 2.27 0.024 0.3820932 5.302815
_cons 94.34776 3.028984 31.15 0.000 88.40594 100.2896
The estimated coefficient on the instrument meanagef is approximately -0.0160032, and the associated
t-statistic is -13.70.
meanagef is statistically significant (p < 0.001) and has a negative coefficient. This suggests that there is
a strong relationship between meanagef and founderCEO. Therefore, it is a good instrument, and
around 13.1% of the variation in founderCEO is explained by meanagef.
18
Hausman Test
ivregress 2sls founderCEO (meanagef = bs_volatility ln_assets_new
hightech i.Year), first
founderCEO Coefficient Std. Error t-value p-value Lower CI 95% Upper CI 95%
meanagef -0.0160032 0.001168 -13.70 0.000 -0.0182925 -0.013714
_cons 1.518479 0.1061368 14.31 0.000 1.310454 1.726503
Variable active
FamFirm .11070634
bs_volatility .67814427
ln_assets_weighted -.00359012
hightech .03843777
1993 -.01094484
1994 -.0101905
1995 .00333516
1996 .01857966
1997 .0163442
1998 -.00373617
1999 -.04617963
residuals .91636499
_cons -.11976399
considerably different from the coefficient obtained in the first-stage regression (approximately
0.0160). The t-statistic of the predicted residuals is also highly significant (17.18), indicating that
the instrumental variable is strongly correlated with the endogenous variable. As a result, we
reject the null hypothesis of exogeneity and conclude that founderCEO is endogenous in
Column III. The significance level used for the decision is less than 1% (p < 0.01).
19
5.3: Coefficients
The estimated coefficient on founderCEO using the IV procedure is -8.835202, and the
Comparatively, the Second Stage of the Hausman test: Estimated Coefficient on founderCEO ≈ 0.1107.
Therefore, the magnitude of the coefficient on founderCEO from the IV procedure is much larger in
In Column II, the coefficient on founderCEO was estimated to be approximately 0.0334, and the
associated t-statistic was around 2.96. The coefficient was positive and statistically significant.
When using the IV procedure in Column III, the magnitude of the coefficient on founderCEO is
substantially larger in magnitude (-8.835202) and is statistically significant with a much larger t-
statistic (-6.69). Furthermore, the sign of the coefficient indicates a negative relationship with the
The inference I gain from here is that there is a significant and negative relationship between founderCEO
and NonfamilyFirms. Additionally, the IV procedure provides a better approach to address endogeneity
concerns in the relationship between founderCEO and NonfamilyFirms, given that the standard OLS
Applying the age of a company as the instrument for founderCEO variable in Column III
seems reasonable. The idea is that older firms are less likely to have a founder as their CEO,
suggesting a potential correlation between a company's age and the founderCEO status. This
correlation is essential for the instrument to be reliable. If the company's age is independent of
the error term in the main regression equation (Column III), it can serve as a suitable instrument
for founderCEO. The caveat is to look out for confounding factors that might simultaneously
influence company age and founderCEO status thus tainting instrument validity.
Assessing the validity of using the company's age as an instrument requires conducting
tests for instrument relevance and over-identification. The tests could help evaluate whether the
instrument has strong correlation with founderCEO and the ruling out of bias in estimations. .
Discussion
Correcting for heteroskedasticity: The effect of such correction is potentially more reliable
coefficient estimates. The effect is due to the notion that heteroskedasticity biases standard OLS
estimates and makes standard errors of the coefficients inefficient, creating incorrect inferences.
Robust standard errors for correcting heteroskedasticity improve the estimates and confidence
efficiency in asset utilization to generate profits while Tobin's Q helps indicate whether a firm’s
market value is faring well against its replacement cost. ROA has more value for accounting-
based performance, thus being more informative to internal evaluations, while Tobin's Q is good
About firm dummies: Firm dummies account for unobservable factors that may be specific
to firms with potential impact on a dependent variable. Therefore, their inclusion helps control
for sources of time-invariant heterogeneity that firms may depict, and which, if unaccounted for,
may cause variations that appear to be from independent variables. Therefore, the dummies help
isolate independent variable effects. However, the inclusion lowers the degrees of freedom,
resulting in larger models, thus complicating coefficients in ways that make their interpretation
more difficult.
The main result drivers in Anderson & Reeb: The alignment of ownership with farm
management is the main driver of results for firms in this study. Founding families introduce
long-term orientation and stronger stewardship, thus creating grounds for better planning and
Founder CEO Firm Performance: While such firms generally tend to outperform others in
the long term, the gains may be slow and significant performance differences in shorter and
medium terms may show some of the firms failing (Do et al., 2022). Succession issues can also
cause failure of the firms that may have had long-term success. The performance measurement
of Founder-CEO may also vary based on ownership concentration and managerial control.
22
References
Anderson, R. C., & Reeb, D. M. (2003). Founding-family ownership and firm performance:
evidence from the S&P 500. The Journal of Finance, 58, 1301-1327.
https://doi.org/10.1111/1540-6261.00567.
Do, T. N. M., Ha, N. M., Bao, D., & Ngo, T. (2022). The impact of family ownership on firm
https://doi.org/10.1080/23322039.2022.2038417.