capture log close

log using cost.log, replace
import excel data.xlsx, first

/* Sort the data and take a look at the scatter

to see if there appears to be an obvious
relationship between y and x. */

gsort x
list y x
scatter y x

/* The general pattern is that when x <= 10, y=0

and when x >= 18, y=1. There are two exceptions
to this: (x=7, y=1) and (x=20, y=0) */

logit y x

/* Actual predictions using the sigmoid function */

predict y_hat, pr

/* Linear prediction of XB without feeding

XB into the sigmoid function */

predict xb, xb

/* Feed xb into the logistic function then

look at the average difference between the generated
predictions and our by hand prediction calculation. */

gen y_hat_hand= 1/(1+exp(-xb))

gen diff= y_hat - y_hat_hand

summ diff

/* differences are tiny and due to floating point

precision. */

drop diff

/* Round the predictions to 0 or 1 then compare to

actual yi's by examining the differences. */

gen y_hat_rounded= round(y_hat)

gen diff= y - y_hat_rounded

/* Map the differences into absolute values so

that positive and negative differences don't net. */

egen total_wrong= total(abs(diff))

/* Calculate accuracy percentage */

scalar accuracy= 1- (total_wrong/ _N)
di accuracy
/* Look at graph of actual yi's, the model's
predictions, and the x values */

twoway scatter y y_hat x

/* Can see above that the predictions follow the

logistic curve. */

/* Next, look at actual yi's vs. rounded predictions

for each yi. */

twoway scatter y y_hat_rounded x

/* On this graph, y_hat_rounded, which are the

red labels, overwrite the blue lables of y when
y_hat_rounded equals y.

There are two cases where we see a blue yi value

associated with a red prediction. Again, these
are x=7 and x=20 which are the two erroneous
predictions from the model. */

/* Next, consider the logistic regression cost

function. For values of yi=1, the cost function
is -ln(g(XB)). For values of yi=0, the
cost function is -(1-yi)ln(1-g(XB)) where
g() is the logistic function. */

/* Let's explore why the cost function is set

up this way. Look at the below values of
negative one times the log of g(XB) */

**** Large values of g(XB) ******

di -ln(1)
di -ln(.99)
di -ln(.8)
di -ln(.6)

*** Small values of g(XB) *****

di -ln(.12)
di -ln(.08)
di -ln(.01)
di -ln(.000001)

/* Large values of g(XB) are fairly

close to the log of 1, which is 0. Small
values of g(XB) result in taking the log
of a small number. Consider the asymptote
of ln(x) here. */

/* Generate a variable equal to the cost

function for each outcome of yi. */

gen costfx= -ln(y_hat) if y==1

replace costfx= -ln(1-y_hat) if y==0

/* Above note that (1-yi)= (1-0)= 1

when yi=0, and 1 times anything is itself. */

/* Cost Function Cases */

/* When yi=1, and we predict something close to 1,

we end up taking the log of something close to 1,
which is close to 0. So here, the cost function
will not blow up.

But if yi=1 and we predict something close to

0, we have ln(something close to 0) which
is a large number in absolute value terms and
thus blows up the cost function.

When yi= 0, we take the log of one minus

our prediction. If our prediction is also
zero, then we have ln(1- something close to 0)
which is approximately ln(1-0) = ln(1) = 0 so
again, the cost function will not blow up.

However, if yi=0 and we predict 1, then

we have ln(1- something close to one)=
ln(something small)= something relatively
large in absolute value terms.

And this is where the cost function blows up

for those cases. Examine the table below
to see this. */

list y x y_hat y_hat_rounded costfx

log close

