Professional Documents
Culture Documents
Reaserch Assignment Part I
Reaserch Assignment Part I
Reaserch Assignment Part I
variable variables category number per cent mean SD max min remark
1 tigray 10 25.64
region 2 amhara 8 20.51 nominal
3 oromiya 9 23.08
4 SNNP 12 30.77
1 12 30.77
woreda 2 14 35.9 nominal
3 13 33.33
employed public 15 38.46 binary
private 24 61.54
sex female 27 30.77 binary
male 12 69.23
1 5 12.82
2 8 20.51
Categorica happiness 3 3 7.69
l variable ordinal
4 4 10.26
5 7 17.95
6 3 7.69
7 4 10.26
8 5 12.82
1 9 23.08
Satisfaction rate 2 14 35.90 ordinal
3 16 41.03
income 8807.128 10925.38 45000 -100
Continuou consumption 5029.487 14112.63 90000 300
s variables Number of indivisuals 4.410256 2.807115 14 1
Age of hh head 34.92308 10.9024 65 23
Price of teff 1688.105 1704.931 7000 -1000
4. Check for the existence of outliers using graphs and statistical measures (K, S, mean, median and
sd)
Ans. Using graphs to detect outlier we use box plot only for continuous variables
Since the box plot method uses the cut-off point of 1.5 times the interquartile range it can tell us
those points above 150% of the interquartile range might be an outlier but there are also other
standard cut-off point in identifying outliers like 300% in which case box plot will not give us the
flexibility to do that hence we will check those suspected variables with other commands like
extremes command,
Let’s check those variables with extremes command again with 300% of the interquartile range
. extremes income,iqr(3)
The output shows no result means that that there is no value in this variable with value greater than
300% of the interquartile range.
. extremes consumption ,iqr(3)
The consumption variable has value greater than 300% of the interquartile range.
Let’s use Histogram just for income, consumption and price of teff variable
We can identify outliers using Z-Score we generate a variable for SD of each value and any value
greater than 3 times the standard deviation value is considered an outlier
After sorting the data we can clearly see that the 34 obs in the consumption variable is more than 6
times the Sd
We can use skewness kurtossis measures to check the exsistance outliers in our data
By checking the skewness and kurtosis values the cut-off point for skewness is between -3 to 3 and
for kurtosis -6 to 6 hence we can see that the consumption variable has a skewness greater than 3
and kurtosis of greater than 6 suggesting the existence of outliers.
We can see that the skewness and kurtossis of the newly generated variable is within the acceptable
range we may use lcons variable instead of the original consumption variable for further analysis.
6. Generate a variable saving (income-consumption) ….. use the new variable name as= saving
Ans
. gen saving= income- consumption
7. Generate a dummy variable for saving … use the new variable as = svd
Ans
. gen svd= saving>0
8. Generate a three category for saving (High saver, middle saver and low saver) and use the new
Ans:
. xtile sact= saving,nq(3)
. tab sact
3 quantiles
of saving Freq. Percent Cum.
Total 39 100.00
sex
svd Female Male Total
no saving 3 11 14
has saving 9 16 25
Total 12 27 39
Employed
svd public private Total
no saving 10 4 14
has saving 5 20 25
Total 15 24 39
region
svd Tigray Amhara Oromiya SNNP Total
no saving 3 7 3 1 14
has saving 7 1 6 11 25
Total 10 8 9 12 39
12. Compute the average saving difference between those who are employed in the public sector
and private sector
Summary of saving
Employed Mean Std. Dev. Freq.
income consum~n
income 1.0000
The correlation between income and consumption is they are positively correlated and the
correlation is medium and the significant level is less than 10% means there is significant correlation
between the two.
priceo~f consum~n
priceofteff 1.0000
The significance level is greater than 10% hence there is no significant correlation
saving income
saving 1.2e+08
income 2.2e+07 1.2e+08
a) What is the appropriate model and regress based on that … please use the following
Ans a. The appropriate model should be OLS regression since our dependent is continuous.
The overall significant test or group test shows that since P>F value is closer to zero the model is
good or the independents as a group significantly affect the dependent
The R2 value of 0.6386 indicates that the variation in the dependent variable, saving, 63.86% can be
Explained by the variation in the independent
Individually consumption and satisfaction 2 are significant since their P-Value is less than 10%
Coefficient interpretation will be for consumption since it is continuous the interpretation will be
direct and it tells us that the for every one unit increase in consumption saving will decrease by
0.6268
For the categorical variables the interpretation is against the base value hence our interpretation will
be on average male saving is 2186.426 birr higher than female and for satisfaction we can interpret
the result as the saving of satisfaction 2 is on average 8021.96 birr higher than the satisfaction 1 and
the satisfaction 3 saving is on average 2959.358 birr less than the satisfaction 1 category.
c) Do the four major post estimation tests and comment on it. If there are problems, what is the
solution?
Checking for the existence of multicolliniarity the mean vif is less than 10 then there is no
multicolliniarity between the independent variables.
. vif
. ovtest
Since the P value is less than 10 showing that omission of relevant variable or inclusion of irrelevant
variable hence we need to check our variables include the relevant and excluding the irrelevant and
regress again.
Heteroskedasticity test
The null hypothesis is that the variance of the residuals is homogenous. Therefore, if the p-
value is very small, we would have to reject the hypothesis and accept the alternative
hypothesis that the variance is not homogenous hence in our case the p value is smaller than
10 showing the alternate hypothesis which is the variance of residuals are not homogenous.
The appropriate measure for this issue is to regress again with the option robust!
Normality test
. predict e,resid
. swilk e
The p-value is based on the assumption that the distribution is normal. In our case, it is very
small (.0871), indicating that we can reject the assumption that r is normally distributed.
The solution is to check if there are outliers in the continuous data and take appropriate
measure
17. If your research is determinants of saving where your dependent is (svd) and you identified:
satisfactionrate
2 -.5364898 1.281625 -0.42 0.676 -3.048429 1.975449
3 -2.917887 1.305471 -2.24 0.025 -5.476563 -.3592119
Prob > chi2 – This is the probability of obtaining the chi-square statistic given that the null
hypothesis is true. In other words, this is the probability of obtaining this chi-square statistic
(11.17) if there is in fact no effect of the independent variables, taken together, on the
dependent variable. This is, of course, the p-value, which is compared to a critical value,
perhaps 0.1 .05 or .01 to determine if the overall model is statistically significant. In this
case, the model is statistically significant because the p-value is less than .0.1
The R2 in logit model is a pseudo R2 Logistic regression does not have an equivalent to the R-
squared that is found in OLS regression.
For the individual significance we will check the p-value (P>|Z|) and if it’s less than 0.1 the
variable will be considered significant otherwise insignificant. Hence from the above output
only sex 2 and satisfactionrate 3 are significant
The coefficient (or parameter estimate) for the variable consumption is -0.0000366. This
means that for a one-unit increase in consumption we expect a 0.0000366 decrease in the
log-odds of the dependent variable Svd, holding all other independent variables constant.
c) Check post estimation test
. lfit
number of observations = 39
number of covariate patterns = 34
Pearson chi2(29) = 27.56
Prob > chi2 = 0.5415
Since the p value is greater than 10 then we can say that the model is fit
d) Find the probability of saving for a person who is male and very satisfied and who has an
Average consumption value
log(p/1-p) = b0 + b1*mean consumption + b2*sex2 + b3*satisfactionrate 3
Where p is the probability of a person saving
log(p/1-p) = 3.5787 -0.0000366*5029.487-1.7989*2 -2.9178*3
=3.5787-0.1840-3.5989-8.7534=-8.9576
Using conversion table we will get that the probability to be close to zero
Ans: the dependent variable will be satisfactionrate and the appropriate model will
be ordered logit
Prob > chi2 – This is the probability of getting a likelihood ratio (LR) test statistic as
extreme as, or more so, than the observed under the null hypothesis; the null
hypothesis is that all of the regression coefficients in the model are equal to zero
This p-value is compared to a specified alpha level, our willingness to accept a type I
error, which is typically set at 0.1 The small p-value from the LR test, would lead us
to conclude that at least one of the regression coefficients in the model is not equal
to zero.
z and P>|z| – These are the test statistics and p-value, respectively, for the null
hypothesis that an individual predictor’s regression coefficient is zero given that the
rest of the predictors are in the model. The test statistic z is the ratio of the Coef. to
the Std. Err. of the respective predictor. The z value follows a standard normal
distribution which is used to test against a two-sided alternative hypothesis that the
Coef. is not equal to zero. The probability that a particular z test statistic is as
extreme as, or more so, than what has been observed under the null hypothesis is
defined by P>|z| The z test statistic for the predictor price (0.19/0.12) is -1.18 with
an associated p-value of <0.0001. If we again set our alpha level to 0.1, we would
accept the null hypothesis and conclude that the regression coefficient price has
been found to be statistically not different from zero in estimating satisfaction given
that employed and family size are in the model.
Coef. – These are the ordered log-odds (logit) regression coefficients. Standard
interpretation of the ordered logit coefficient is that for a one unit increase in the
predictor, the response variable level is expected to change by its respective
regression coefficient in the ordered log-odds scale while the other variables in the
model are held constant.
Ans. the dependent is choice of saving institution and the appropriate model will be
multinomial
20. If your study is determinants of saving and if your dependent variable is sact,
Ans.the dependent is sact and the approprait model will be orderd logit
Ans. and the interpretation will be the same as that of question 18 above.
Part II Qualitative part
Case I
1. What could be the possible title of the research?
ANS:
Determinant Factors of job satisfaction in public sector: the case of Mash Woreda,
Sheka zone.
2. What are the limitations associated with the problem statement?
Ans:
the problem statement as a whole lacks coherence and problem is not well stated,
besides the extent and severity of the problem is not mentioned at all
3. What are your comments with the objectives?
Ans:
Ans:
In this paper the researcher says nothing about how he plan to collect the
data
race
Figure 1.1
12. What are your comments on the model specifications and estimation method? If it
is wrong, what do you suggest?
ANS:
The model chosen on this paper is the OLS regression model but the nature
of the dependent shows it will be ordered data hence ordered logit model
looks appropriate.
Case II
1What could be the possible title of the research?
ANS:
Ans:
the problem statement as a whole lacks coherence and problem is not well stated,
besides the extent and severity of the problem is not mentioned at all
no scientific papers are mentioned
most of the issue is not about the objective
abbreviations are difficult to understand
language issues
3 What are your comments with the objectives?
Ans:
Ans:
In this paper the researcher says nothing about how he plan to collect the
data
8 Comment on the method of analysis
Ans:
the decision to use logit model is okay considering the dependent to be
binary like under poverty or not but it would be more handy if it is orderd
logit model to describe poverty in different level
9 Comment on the nature of the dependent variable
Ans:
The nature of the dependent variable can be binary or ordinal.
10 Could you list possible independent variables?
Ans:
The possible independent variables include Age, sex, marital status,
familysize, etc.
11 Can you draw conceptual framework Geography
Access to loan
Demographic
Age
Sex
religion
Personal
Marital status
Family size
Figure 2.1
12 What are your comments on the model specifications and estimation method? If it
is wrong, what do you suggest?
ANS:
The model chosen on this paper is the logit model and it is okay to use this
model since poverty can be described as a binary variable.
Case III: Given the following research title: “Factors affecting loan repayment performance
of small holder farmers: The case of ACSI” answer the following questions
In countries like Ethiopia where smallholder farmers are dominating the economy; smallholder
farmers work on 96.3 percent of the total cultivated area and produce over 95 percent of the
national crop production (CSA, 2007). However, smallholder farmers face severe shortage of
financial resources to purchase productive agricultural inputs. Besides this the rapid rise of the price
of agricultural input every year increase the demand for financial institution which can support these
groups of subsistence farmers
It is important that borrowed funds be invested for productive purposes, and the additional incomes
generated be used to repay loans to have sustainable and viable production processes and credit
institutions. However, failure by farmers to repay their loans on time or to repay them at all has
been a serious problem faced by both credit institutions and smallholder farmers. Poor loan
repayment in developing countries has become a major problem in agricultural credit
administration, especially to smallholders who have limited collateral capabilities (Okorie, 2004).
Failure to pay the loans by the farmers will discourage the credit institutions to give credit to other
farmers who need financial support that will finally lead to system failure.
Therefore this study will try to identify the different factors affecting the loan repayment
performance of smallholders’ farmers. There are few researches conducted on this topic (Million
2012; Kebede 2016).all previous studies were used descriptive analysis which doesn’t describe the
important variables well, hence this study will use econometric analysis to see the extent of
relationship among variables and will incorporate important socio economic variables which were
missing in previous studies.
1. To identify the different variables which affect the loan repayment of farmers?
2. To estimate the extent of the effect of the important variables in loan repayment and to
suggest policy options.