Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 3

clear

capture log close


log using cost.log, replace
import excel data.xlsx, first

/* Sort the data and take a look at the scatter


to see if there appears to be an obvious
relationship between y and x. */

gsort x
list y x
scatter y x

/* The general pattern is that when x <= 10, y=0


and when x >= 18, y=1. There are two exceptions
to this: (x=7, y=1) and (x=20, y=0) */

logit y x

/* Actual predictions using the sigmoid function */

predict y_hat, pr

/* Linear prediction of XB without feeding


XB into the sigmoid function */

predict xb, xb

/* Feed xb into the logistic function then


look at the average difference between the generated
predictions and our by hand prediction calculation. */

gen y_hat_hand= 1/(1+exp(-xb))

gen diff= y_hat - y_hat_hand


summ diff

/* differences are tiny and due to floating point


precision. */

drop diff

/* Round the predictions to 0 or 1 then compare to


actual yi's by examining the differences. */

gen y_hat_rounded= round(y_hat)

gen diff= y - y_hat_rounded

/* Map the differences into absolute values so


that positive and negative differences don't net. */

egen total_wrong= total(abs(diff))

/* Calculate accuracy percentage */


scalar accuracy= 1- (total_wrong/ _N)
di accuracy
/* Look at graph of actual yi's, the model's
predictions, and the x values */

twoway scatter y y_hat x

/* Can see above that the predictions follow the


logistic curve. */

/* Next, look at actual yi's vs. rounded predictions


for each yi. */

twoway scatter y y_hat_rounded x

/* On this graph, y_hat_rounded, which are the


red labels, overwrite the blue lables of y when
y_hat_rounded equals y.

There are two cases where we see a blue yi value


associated with a red prediction. Again, these
are x=7 and x=20 which are the two erroneous
predictions from the model. */

/* Next, consider the logistic regression cost


function. For values of yi=1, the cost function
is -ln(g(XB)). For values of yi=0, the
cost function is -(1-yi)ln(1-g(XB)) where
g() is the logistic function. */

/* Let's explore why the cost function is set


up this way. Look at the below values of
negative one times the log of g(XB) */

**** Large values of g(XB) ******

di -ln(1)
di -ln(.99)
di -ln(.8)
di -ln(.6)

*** Small values of g(XB) *****

di -ln(.12)
di -ln(.08)
di -ln(.01)
di -ln(.000001)

/* Large values of g(XB) are fairly


close to the log of 1, which is 0. Small
values of g(XB) result in taking the log
of a small number. Consider the asymptote
of ln(x) here. */

/* Generate a variable equal to the cost


function for each outcome of yi. */

gen costfx= -ln(y_hat) if y==1


replace costfx= -ln(1-y_hat) if y==0

/* Above note that (1-yi)= (1-0)= 1


when yi=0, and 1 times anything is itself. */

*************************
/* Cost Function Cases */
*************************

/* When yi=1, and we predict something close to 1,


we end up taking the log of something close to 1,
which is close to 0. So here, the cost function
will not blow up.

But if yi=1 and we predict something close to


0, we have ln(something close to 0) which
is a large number in absolute value terms and
thus blows up the cost function.

When yi= 0, we take the log of one minus


our prediction. If our prediction is also
zero, then we have ln(1- something close to 0)
which is approximately ln(1-0) = ln(1) = 0 so
again, the cost function will not blow up.

However, if yi=0 and we predict 1, then


we have ln(1- something close to one)=
ln(something small)= something relatively
large in absolute value terms.

And this is where the cost function blows up


for those cases. Examine the table below
to see this. */

list y x y_hat y_hat_rounded costfx


log close

You might also like