Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 1341

if you want to learn about computer

science and the Art of programming this

course is where to start cs50 is

considered by many to be one of the best

computer science courses in the world

this is a Harvard University course

taught by Dr David Malin and we are

proud to bring it to the freeco camp

YouTube channel throughout a series of

lectures Dr Malin will teach you how to

think algorithmically and solve problems

efficiently and make sure to check the

description for a lot of extra resources

that go along with the course

[Music]

thank you

[Music]

foreign

[Music]

Harvard University's introduction to the

intellectual Enterprises of computer

science and the Art of programming back

here on campus in beautiful Sanders

Theater for the first time in quite a

while so welcome to the class my name is

David Mann again

foreign

and I took this class myself some time

ago but almost didn't it was it was

sophomore fall and I was sitting in on


the class and I was a little curious but

I didn't really feel like the fields for

me I was definitely a computer person

but computer science felt like something

all together and I only got up the nerve

to take the class ultimately because the

professor at the time Brian Carnahan

allowed me to take the class pass fail

initially and that is what made all the

difference I quickly found that computer

science is not just about programming

and working in isolation on your

computer it's really about problem

solving more generally and there was

something about homework frankly that

was like actually fun for perhaps the

first time in what 19 years and there

was something about this ability that I

discovered along with all of my

classmates to actually create something

and bring a computer to life to solve a

problem and sort of bring to bear

something that I'd been using every day

but didn't really know how to harness

that's been gratifying ever since and

definitely challenging and frustrating

like to these to this day all these

years later you're going to run up

against mistakes otherwise known as bugs


and programming that just drive you nuts

and you feel like you've hit a wall but

the trick really is to give it enough

time to take a step back take a break

when you need to and there's nothing

better I dare say than that sense of

gratification and pride really when you

get something to work and in a class

like this present ultimately a term's

end something like your very own final

project now this isn't to say that I

took to it 100 perfectly in fact just

this uh this past week I I looked in my

old cs50 binder which I still have from

some 25 years ago and took a photo of

what was apparently the very first

program that I wrote and submitted and

quickly received minus two points on but

this is a program that we'll soon see in

the coming days that does something

quite simply like print hello cs50 in

this case to the screen and to be fair I

technically hadn't really followed the

directions which is why I lost those

couple of points but if you just look at

this especially if you've never

programmed before you might have heard

about programming languages but you've

never typed something like this out

undoubtedly it's going to look cryptic


but I'm unlike human languages frankly

which were a lot more sophisticated a

lot more vocabulary a lot more

grammatical rules programming once you

start to wrap your mind around what it

is and how it works and what these

various languages are it's so easy

you'll see after a few months of a class

like this to start teaching yourself

subsequently other languages as they may

come in the coming years as well so what

ultimately matters in this particular

course is not so much where you end up

relative to your classmates but where

you end up relative to yourself when you

began and indeed you'll begin today and

the only experience that matters

ultimately in this class is your own and

so consider where you are today consider

perhaps just how cryptic something like

that looked a few seconds ago and take

comfort in knowing just some months from

now all of that will be within your own

grasp and if you're thinking that okay

surely the person in front of me to the

left to the right behind me knows more

than me that's statistically not the

case two-thirds of cs50 students have

never taken a CS course before which is


to say you're in very good company any

throughout this whole term

so then what is computer science I claim

that it's problem solving and the upside

of that is that problem solving is

something we sort of do all the time but

a computer science class learning to

program I think kind of cleans up your

thoughts it helps you learn how to think

more methodically more carefully more

correctly more precisely because

honestly the computer is not going to do

what you want unless you are correct and

precise and methodical and so as such

there's these sort of fringe benefits of

just wanting to think like a computer

scientist and a programmer and it

doesn't take all that much to start

doing so this for instance is perhaps

the simplest picture of computer science

sure but really problem solving in

general problems are all about taking

input like the problem you want to solve

you want to get the solution AKA output

and so something interesting has got to

be happening in here in here when you're

trying to get from those inputs to

outputs now in the world of computers

specifically we need to decide in

advance how we represent these inputs


and outputs we all just need to decide

uh whether it's Macs or PCS or phones or

something else that we're all going to

speak some common language irrespective

of our human languages as well and you

may very well know that computers tend

to speak only what language so to speak

assembly one but binary two might be

your go-to in binary by implying two

means that the world of computers has

just two digits at its disposal zero and

one and indeed we humans have many more

than that certainly not just zeros and

ones alone but a computer indeed only

has zeros and ones and yet somehow they

can do so much they can crunch numbers

in Excel send text messages create

images and and artwork and movies and

more and so how do you get from

something as simple as a few zeros a few

ones to all of the stuff that we're

doing today in our pockets and laptops

and desktops well it turns out that we

can start quite simply if a computer

were to want to do something as simple

as count well what could it do well in

our human world we might count doing

this like one two three four five using

so-called unary notation literally the


digits on your fingers where one finger

represents one person in the room if I'm

for instance taking attendance now we

humans would typically actually count

one two three four five six and we'd go

past just those five digits and count

much higher using zeros through nine

lines but computers somehow only have

these zeros and ones so if a computer

only somehow speaks binary zeros and

ones how does it even count past the

number one well here are three zeros of

course and if you translate this number

in binary zero zero zero to a more

familiar number in decimal we would just

call this zero enough said if we were to

represent with a computer the number one

it would actually be zero zero one which

not surprisingly is exactly the same as

we might do in our human world but we

might not bother writing out the two

zeros at the beginning but a computer

now if it wants to count as high as two

it doesn't have the digit two and so it

has to use a different pattern of zeros

and ones and that happens to be zero one

zero so this is not ten with a zero in

front of it it's indeed zero one zero in

the context of binary and if we want to

count higher now than two we're going to


have to tweak these zeros and ones

further to get three and then if we want

four or five or six or seven we're just

kind of toggling these zeros and ones a

Ka bits for binary digits that represent

via these different patterns different

numbers that you and I as humans know of

course as the so-called Decimal System

zero through nine deck implying ten ten

digits those zeros through nine so why

that particular pattern and why these

particular zeros and ones well it turns

out that representing one thing or the

other is just really simple for a

computer why at the end of the day

they're powered by electricity and it's

a really simple thing to just either

store some electricity or don't store

some electricity like that's as simple

as the world can get on or off one or

zero so to speak so in fact inside of a

computer a phone anything these days

that's electronic pretty much is some

number of switches otherwise known as

transistors and they're tiny you've got

thousands millions of them in your Mac

or PC or phone these days and these are

just tiny little switches that can get

turned on and off and by turning those


things on and off in patterns a computer

can count from zero on up to seven and

even higher than that and so these

switches really you can think of being

is like switches like this let me just

borrow one of our little stage lights

here here's a light bulb it's currently

off and so I could just think of this as

representing in my laptop a transistor a

switch representing zero but if I allow

some electricity to flow now I in fact

have a one well how do I count higher

than one I of course need another light

bulb so let me grab another one here and

if I put it in that same kind of pattern

I don't want to just do this that's sort

of the old finger counting way of unary

just one two I want to actually take

into account the pattern of these things

being on and off so if this was one a

moment ago what I think I did earlier

was I turned it off and let the next one

over be on AKA zero one zero and let me

get us a third bit if you will and that

feels like enough here is that same

pattern now starting at the beginning

with three so here is zero zero zero

here is zero zero one here is zero one

zero AKA in our human world of decimal

two and then we could of course keep


counting further this now would be three

and dot dot dot if this other bulb now

goes on and that switch is turned and

all three stay on this again was what

number

okay so seven so it's just as simple

relatively as that if you will but how

is it that these patterns came to be

well these patterns actually follow

something very familiar you and I don't

really think about it at this level

anymore because we've probably been

doing math and numbers since grade

school or whatnot but if we consider

something in decimal like the number 123

I immediately jumped to that this looks

like 123 in decimal but why it's really

just three symbols a one a two with a

bit of curve a three with a couple of

curves that you and I know instinctively

just assign meaning to but if we do

rewind a few years that is 123 because

you're assigning meaning to each of

these columns the three is in the

so-called ones place the two

is in the so-called tens place and the

one is in the so-called hundreds place

and then the math ensues quickly in your

head this is technically 100 times one


plus ten times two plus one times three

AKA 100 plus 20 plus three and there we

get the sort of mathematical notion we

know is 123.

well nicely enough in binary it's

actually the same thing it's just these

columns mean a little something

different if you use three digits in

decimal and you have the ones place the

tens place and the hundreds place well

why was that 10 110 and 100 they're

technically just powers of ten so 10 to

the zero ten to the one ten to the 2 why

10 Decimal System deck meaning ten you

have eight and ten digits zero through

nine in the binary system if you're

going to use three digits just change

the bases if you're using only what

zeros and ones so now it's powers of two

two to the zero two to the one two to

the two AKA one and two and four

respectively and if you keep going it's

going to be eights columns 16's column

32 64 and so forth

so why did we get these patterns that we

did here's your zero zero zero because

it's four times zero two times zero one

times zero obviously zero this is why we

got the decimal number one in binary

this is why we got the number two in


binary because it's four times zero plus

two times one plus one times zero and

now three and now four and now five and

now six and now seven and of course if

you wanted to count as high as eight to

be clear like what do you have to do

what does a computer need to do to count

even higher than seven

add a bit add another light bulb another

switch and indeed computers have

standardized just how many zeros and

ones or bits or switches they throw at

these kinds of problems and in fact most

computers would typically use at least

eight at a time and even if you're on

accounting as high as three or seven you

would still use eight and have a whole

bunch of zeros but that's okay because

the computers these days certainly have

so many more thousands millions of

transistors and switches that that's

quite okay

all right so if with that said if we can

now count as high as seven or frankly as

high as we want that only seems to make

computers useful for things like Excel

like number crunching but computers of

course let you send text messages write

documents and so much more so how would


a computer represent something like a

letter like the letter A of the English

alphabet

if at the end of the day all they have

is switches

and he thoughts yeah

okay so we could represent letters using

numbers okay so give me what's a

proposal what number should represent

what

perfect yeah we just all have to agree

somehow that one number is going to

represent one letter so one is a two is

B three is C uh Z is 26 and so forth

maybe we can even take into account

uppercase and lowercase we just have to

agree and sort of write it down in some

global standard and humans indeed did

just that they didn't use one two three

turns out they started a little higher

up capital A has been standardized as

the number 65 and capital B has been

standardized as the number 66 and you

can kind of imagine how it goes up from

there and that's because

whatever you're representing ultimately

can only be stored at the end of the day

as zeros and ones and so some humans in

a room before decided that capital A

shall be 65 or really this pattern of


zeros and ones inside of every computer

in the world zero one zero zero zero

zero zero one so if that pattern of

zeros and ones if it appears in a

computer it might be interpreted then as

indeed a capital letter a eight of those

bits at a time but I worry just to be

clear we might have now created a

problem it might seem if I play this

naively that okay how do I now actually

do math with the number 65 if now Excel

display is 65 is an a let alone B's and

C's so how might a computer do as you've

proposed have this mapping from numbers

to letters but still support numbers

feels like we've given something up yeah

by having a prefixed

okay so we could perhaps have some kind

of prefix like some pattern of zeros and

ones I like this that rep that indicates

to the computer here comes another

pattern that represents a letter here

comes another pattern that represents a

one a number or a letter so not bad I

like that other thoughts

how might a computer distinguish these

two yeah

indeed and that's spot on nothing wrong

with what you suggested but the world


generally does just that the reason we

have all of these different file formats

in the world like uh JPEG and Jif and

pings and

worddocuments.d-o-s-c-x and Excel files

and so forth is because a bunch of

humans got in a room and decided Well in

the context of this type of file or

really more specifically in the context

of this type of program Excel versus

Photoshop versus Google Docs or the like

we shall interpret any patterns of zeros

and ones as being maybe numbers for

Excel maybe letters in like a text

messaging program or Google Docs or

maybe even colors of the rainbow in

something like Photoshop and more so

it's context dependent and we'll see

when we ourselves start programming you

the programmer will ultimately provide

some hints to the computer that tells

the computer interpret it as follows so

similar in spirit to that but not quite

as standardized with these prefixes so

this system here actually has a name

ASCII the American Standard code for

information interchange and indeed it

began here in the U.S and that's why

it's actually a little biased toward A's

through Z's and a bit of punctuation as


well and that quickly became a problem

but if we start simply now in English

the mapping itself is fairly

straightforward so if a is 65 B is 66

and dot dot dot suppose that you

received a text message an email from a

friend and underneath the hood so to

speak if you kind of looked inside the

computer what you technically received

in this text or this email happened to

be the numbers 72 73 33

or really the underlying pattern of

zeros and ones what might your friend

have sent you as a message if it's 72 73

33

. hey close

hi it's indeed High Why well apparently

according to this little cheat sheet H

is 72 I is 73 it's not obvious from this

chart what the 33 is but indeed this

pattern represents high and anyone want

to guess or if you know what 33 is

exclamation point and this is frankly

not the kind of thing most people know

but it's easily accessible by a nice

user-friendly chart like this so this is

an ASCII chart when I said that we just

need to write down this mapping earlier

this is what people did they wrote it


down in a book or in a chart and for

instance here is our 72 for eight here

is our 73 for I and here is our 7 or 33

for exclamation points and computers Max

PCS iPhones Android devices just know

this mapping by heart if you will

they've been designed to understand

those letters so here I might have

received hi technically what I've

received is these patterns of zeros and

ones but it's important to note that

when you get these patterns of zeros and

ones in any format be it email or text

or a file they do tend to come in

standard lengths with a certain number

of zeros and ones all together and this

happens to be eight plus eight plus

eight so just to get the message Hi

exclamation point you would have

received at least it would seem some 24

bits but frankly bits are so tiny

literally and mathematically that we

don't tend to think or talk generally in

terms of bits you're probably more

familiar with bytes

b-y-t-e-s is a bite is a bite is a bite

a bite is just eight Bits And even those

frankly aren't that useful if we do out

the math how high can you count if you

have eight bits anyone know


say again

uh higher than that unless you want to

go negative that's that's fine

256 technically 255 long story short if

we actually got into the weeds of all of

these zeros and ones and we figured out

what one one one one one one one one one

one mathematically adds up to in decimal

it would indeed be 255 or less if you

want to represent negative numbers as

well so this is useful because now we

can speak not just in terms of bytes but

if the files are bigger kilobytes is

thousands of bytes megabytes is millions

of bytes gigabytes is billions of bytes

terabytes are trillions of bytes and so

forth

we have a vocabulary for these

increasingly large quantities of data

the problem is that if you're using

ASCII and therefore eight bits or one

byte per character

and originally only seven you can only

represent 255 characters and that's

actually five or 256 total characters

including zero and that's fine if you're

using literally English in this case

plus a bunch of punctuation but there's

many human languages in the world that


need many more symbols and therefore

many more bits so thankfully the world

decided that will indeed support not

just the U.S English keyboard but all of

the accented characters that you might

want for some languages and Heck if we

use enough bits zeros and ones not only

can we represent all human languages in

written form as well as some emotions

along the way we can capture the latter

with these things called emojis and

indeed these are very much in Vogue

these days you probably send them or

receive many of these things any given

day these are just characters like

letters of an alphabet patterns of zeros

and ones that you're receiving that the

world has also standardized for instance

there are certain emojis that are

represented with certain patterns of

bits and when you receive them your

phone your laptop your desktop displays

them as such and this newer standard is

called Unicode so it's a superset of

what we call ASCII and unicode is just a

mapping of many more numbers to many

more letters or characters more

generally that might use eight bits for

backwards compatibility with the old way

of doing things with ASCII but they


might also use 16 Bits And if you have

16 bits you can actually represent more

than 65 000 possible letters and that's

getting up there and heck Unicode might

even use

32-bits to represent letters and numbers

and punctuation symbols and emojis and

that would give you up to 4 billion

possibilities and I dare say one of the

reasons we see so many emojis these days

is we have so much room I mean there's

got room for billions more literally so

in fact just as a little bit of trivia

has anyone ever received this decimal

number or if you prefer binary now has

anyone ever received this pattern of

zeros and ones on your phone in a text

or an email perhaps this past year

well if you actually look this up this

esoteric uh sequence of zeros and ones

happens to represent uh faced with

medical mask and notice that if you've

got an iPhone or an Android device you

might be seeing different things in fact

this is the Android version of this most

recently this is the iOS version of it

most recently and there's Bunches of

other interpretations by other companies

as well
so Unicode as a Consortium if you will

has standardized the descriptions of

what these things are but the companies

themselves

manufacturers out there have generally

interpreted mgc fit and this can lead to

some human uh miscommunications in fact

for like literally embarrassingly like a

year or two I started being in a habit

of using the Emoji that kind of looks

like this because I thought it was like

happy face or whatever I didn't realize

this is the emoji for hug because

whatever device I was using sort of

looked like this not like this and

that's because of their interpretation

of the data this has happened too when

what was a gun became a a water pistol

in some manufacturer's eyes and so it's

an interesting dichotomy between

what information we all want to

represent and how we choose ultimately

to represent it

questions then on these representations

of formats be it numbers or letters or

soon more yeah

and sorry why is what so popular

yeah so we'll come back to this in a few

weeks in fact there are other ways to

represent numbers binary is one decimal


is another unary is another and

hexadecimal is yet a fourth that uses 16

total digits literally zero through nine

plus a b c d e f and somehow you can

similarly count even higher with those

uh we'll see in a few weeks why this is

uh compelling but hexadecimal long story

short uses four bits per digit and so

four bits if you have two digits in HEX

that gives you eight and it's just a

very convenient unit of measure and it's

also human convention in the world of

like files and other things but we'll

come back to that soon other questions

through the lights on the stage

supposedly say anything well if we had

thought in advance to use maybe 64 light

bulbs that would seem to give us uh

eight total bites on stage eight times

uh eight giving us just that maybe

good question other questions on zeros

and ones

it's a little bright in here

no oh yes

where everyone's pointing somewhere

specific

there we go sorry very bright in this

corner

oh sure and we'll come back to this in


some form in the coming days too at a

slower Pace too we have with eight bits

two possible values for the first and

then two for the next two for the next

and so forth so that's two times two

times two that's two to the eighth power

total which means you can have 256 total

possible patterns of zeros and ones but

as we'll see soon computer scientists

programmers software often starts

counting at zero by convention and if

you use one of those patterns zero zero

zero zero zero zero zero zero to

represent the decimal number we know is

zero you only have 255 other patterns

left to count as high as therefore 255.

that's all good question

all right so what then might we have

besides uh these emojis and letters and

numbers well we of course have things

like colors and programs like Photoshop

and pictures and photos well let me ask

the question again how might a computer

do you think knowing what you know now

represents something like a color

like what are our options if all we've

got are zeros and ones and switches yeah

RGB when uh RGB indeed is This Acronym

that represents some amount of red and

some amount of green and blue and indeed


computers can represent Colors by just

doing that remembering for instance this

dot this yellow dot on the screen that

might be part of any of those emojis

these days well that's some amount of

red some amount of green some amount of

blue and if you sort of mix those colors

together we can indeed get a very

specific one and we'll see in just a

moment just that

so indeed earlier on did humans only use

seven bits total and it was only once

they decided Well let's add an eighth

bit that they got extended ASCII and

that was initially in part A solution to

the same problem of not having enough

room if you will in those patterns of

zeros and ones to represent all of the

characters that you might want but even

that wasn't enough and that's why we've

now gone up to 16 and 32 and long past

seven so if we come back now to this one

particular color RGB was proposed as a

scheme but how might this work well

consider for instance this if we do

indeed decide as a group to represent

any color of the rainbow with some

mixture of some red some green and some

blue we have to decide how to represent


the amount of red and green and blue

well it turns out if all we have are

zeros and ones Ergo numbers let's do

just that for instance suppose a

computer we're using these three numbers

72 73 33 no longer in the context of an

email or a 10 text message but now in

the context of something like Photoshop

a program for editing and creating

graphical files maybe this first number

could be interpreted as representing

some amount of red green and blue

respectively and that's exactly what

happens you can think of the first digit

as red second is Green Third is blue and

so ultimately when you combine that

amount of red that amount of green that

amount of blue it turns out it's going

to resemble this shade of yellow and

indeed you can come up with numbers

between 0 and 255 for each of those

colors to mix any other color that you

might want and you can actually see this

in practice even though our screens

admittedly are getting really good on

our phones and laptops such that you

barely see the dots they are there you

might have heard the term pixel before

pixel is just a DOT on the screen and

you've got thousands millions of them


these days horizontally and vertically

and if I take even this Emoji which

again happens to be one company's

interpretation of faced with medical

mask and zoom in in a bit maybe zoom in

a bit more you can actually start to see

these pixels things get pixelated

because what you're seeing is each of

the individual dots that compose this

particular image and apparently each of

these individual dots are probably using

24 bits eight bits for red 8 Bits for

green eight bits for blue in some

pattern and this program or some other

like Photoshop is interpreting one

pattern it's white or yellow or black or

some brown in between and so if you look

sort of awkwardly but up close to your

phone or your laptop or maybe your TV

you can see exactly this too

all right well what about things that we

also watch every day on YouTube or the

like things like videos how would a

computer knowing what we know now

represents something like a video

how might you represent a video using

only zeros and ones yeah

exactly and to summarize what video

really adds is just some notion of time


it's not just one image it's not just

one letter or number it's presumably

some kind of sequence because time is

passing and so with a whole bunch of

images maybe 24 maybe 30 per second if

you fly them by the human's eyes we can

interpret them using our eyes and brain

that there is now movement and therefore

video and similarly with audio or music

if we just came up with some convention

for representing those same notes on a

musical instrument could we have the

computer synthesize them too and this

might be actually pretty familiar let me

pull up a quick video here which happens

to be an old school version of this same

idea you might remember from childhood

foreign

[Music]

so granted that particular video is an

actual video of a paper-based animation

but indeed that's really all you need is

some sequence of these uh these images

which themselves of course are just

zeros and ones because they're just this

grid of these pixels or dots now

something like musical notes like these

those of you who are musicians might

just naturally play these on physical

devices but computers can certainly


represent those sounds too and for

instance a popular format for audio is

called midi and midi might just

represent each note that you saw a

moment ago essentially as a sequence of

numbers but more generally you might

think about music as having notes for

instance a through G maybe some flats

and some Sharps you might have the

duration like how long is the note being

heard or played on a piano or some other

device and then just the volume like how

hard does the human in the real world

press down on that key and therefore how

loud is that sound it would seem that

just remembering little details like

that quantitatively we can then

represent really all of these these

otherwise anal Vlog uh human realities

so that then is really a laundry list of

ways that we can just represent

information again computers or digital

have all these different formats but at

the end of the day and as fancy as those

devices and yours are it's just zeros

and ones tiny little switches or light

bulbs if you will represented in some

way and it's up to the software that you

and I and others write to use those


zeros and ones in ways we want to get

the computers to do something more

powerfully questions then on this

representation of information which I

dare say is ultimately what problem

solving is all about taking in

information and producing new via some

process in between

any questions out there

uh yeah and back

so a really good question there are many

other file formats out there you allude

to MP4 for video and more generally

these are these things called codex and

containers it's not quite as simple when

using larger files for instance in more

modern formats that a video is just a

sequence of images for instance why if

you stored that many images for like a

Hollywood movie like 24 or 30 of them

per second that's a huge number of

images and if you've ever taken phone

photos on your phone you might know how

many megabytes or larger even individual

photographs might be so humans have

developed over the years uh fancier

software that uses much more math to

represent the same information more

minimally just using somehow shorter

patterns of zeros and ones than our most


simplistic representation here and they

use what might be called compression if

you've ever used a zip file or something

else somehow your computer is using

fewer zeros in one strip represent the

same amount of information ideally

without losing any information and in

the world of multimedia which we'll

touch on a little bit in a few weeks

there are both lossy and lossless

formats out there lost list means you

lose no information whatsoever but more

commonly as you're alluding to one is

lossy compression l-o-s-s-y where you're

actually throwing away some amount of

quality you're getting a sum amount of

pixelation that might not look perfect

to the human but heck it's a lot cheaper

and a lot easier to distribute and in

the world of multimedia you have

containers like Quicktime and other MPEG

containers that can combine different

formats of video different formats of

audio in one file but there too do

designers have discretion so more in a

few weeks two other questions then on

information

here as well yeah

exactly I mean back in the day you might


have heard of the expression of vacuum

tube which is like some physically large

device

um that might have only stored some zero

or one

um yes it is the miniaturization of

Hardware these days that has allowed us

to store as many and many more zeros and

ones much more closely together and as

we've built more fancy machines that can

sort of Design This Hardware at an even

smaller scale we're just packing more

and more into these devices but there

too is a trade-off for instance you

might know by using your phone or your

laptop for quite a while maybe on your

lap starts to get warm and so there are

these literal physical side effects of

this where now some of our devices run

hot and this is why like a data center

in the real world might need more air

conditioning than a typical place

because there are these physical

artifacts as well and in fact if you'd

like to see one of the earliest

computers from decades ago across the

river here in now Austin in the new

engineering building is the Harvard mark

one computer that will will give you a

much big a better mental model of just


that well if we come back now to this

first picture being computer science or

really problem solving I dare say we

have more than enough ways now to

represent information input and output

so long as we all just agree on

something and thankfully oh those before

us have given us things like ASCII and

unicode not to mention MP4s Word

documents and the like but what's inside

of this proverbial black box into which

these inputs are going and the outputs

are coming well that's where we get this

term you might have heard too an

algorithm which is just step-by-step

instructions for solving some problem

incarnated in the world of computers by

software when you write software AKA

programs you are implementing one or

more algorithms one or more step sets of

instructions for solving some problem

and maybe you're using this language or

that but at the end of the day no matter

the language you use the computer is

going to represent what you type using

just zeros and ones

so what might be a representative

algorithm nowadays you might use uh your

phone quite a bit to make calls or send


texts or emails and therefore you have a

whole bunch of contacts in your address

book nowadays of course this is very

digital but whether on iOS or Android or

the like

you might have a whole bunch of names uh

first name and or last as well as

numbers and emails and the like and you

might be in the habit of like scrolling

through on your phone all of those names

to find the person you want to call it's

probably sorted alphabetically by first

name or last name a through z or some

other symbol and this is frankly quite

the same as we used to do you know back

in in my day cs50 when we just used a

physical book and this physical book

might be a whole bunch of names

alphabetically sorted from left to right

corresponding to a whole bunch of

numbers so suppose that in this Old

Harvard phone book we want to search for

John Harvard we might of course start

quite simply at the beginning here

looking at one page at a time and this

is an algorithm this is like literally

step by step looking for the solution to

this problem in that sense if John

Harvard's in the phone book is this

algorithm Page by Page correct would you


say

yes like if uh John Harvard's in the

phone book obviously I'm eventually

going to get to him so that's what we

mean by correct is it efficient is it

well designed would you say

no I mean this is going to take forever

even just to get to the JS or the H's

depending how this thing sorted all

right well let me go a little faster

I'll start like two pages at a time two

four six eight ten twelve and so forth

sounds faster is faster is it correct

okay why is it not correct yeah

exactly if I start an odd number of

pages and I'm going two at a time I met

Miss pages in between and if I therefore

conclude when I get to the back of the

book there was no John Harvard I might

have just aired this would be again one

of these bugs but if I try a little

harder I feel like there's a solution we

don't have to completely throw out this

algorithm I think we can probably go

roughly twice as fast still but what

should we do instead to fix this yeah

and back

foreign

so I think what many of us most of us if


we even use this technology anymore

these days we might go roughly to the

middle of the phone book just to kind of

get us started and now I'm looking down

I'm looking for Jay assuming first name

j Harvard and it looks like I'm in the M

section so just to be clear what should

I do next

okay and presumably it is John Harvard

would be to the left of this so here's

an opportunity to figuratively and

literally tear this particular problem

in half throw half of the problem away

it's it's actually pretty easy if you

just do it that way the hard way is this

way but I have now just uh decreased the

size of this problem really in half so

if I started with like a thousand pages

of phone numbers and names now I'm down

to 500 and already we haven't found John

Harvard but that's a big bite out of

this problem and I do think it's correct

because if J is to the left of M of

course he's definitely not going to be

over there and I think if I repeat this

again dividing and conquering if you

will here I might have gone a little too

far now I'm in like the E section so let

me tear the problem in half again throw

another 250 Pages away and again repeat


dividing and dividing and conquering

until finally presumably I end up with

just one page of a phone book on which

John Harvard's name either is or is not

but because of the algorithm you propose

step by step I know that he's not in

anything I discarded so traumatic is

that might have been

um been made out to be it's actually

just harnessing pretty good human

intuition indeed this is what

programming's all about too it's not

about learning a completely new world

but really just had a harness intuition

and ideas that you might already have

and take naturally but learning how to

express them now more succinctly more

precisely and using things called

programming languages why is an

algorithm like that if I found John

Harvard

better than ultimately just doing the

first one or even the second and maybe

doubling back to check those even pages

well let's just look at a little charts

here again we don't have to get into the

nuances of numbers but if we've got like

a chart here X Y plot on the x-axis here

I claim is the size of the problem so


measured in the numbers of pages in the

phone book so the farther you go out

here the more pages are in the phone

book and here we have time to solve on

the y-axis so the higher you go up the

more time it's going to be taking to

solve that particular problem so let's

just arbitrarily say that the first

algorithm involving like n Pages might

be represented graphically like this no

matter the slope it's a straight line

because there's presumably a one-to-one

relationship between numbers of pages

and number of seconds or number of page

turns why if the phone company adds

another page next year because some new

people move to town that's going to

require one additional page for me one

to one

if though we use the second algorithm

flawed though it was unless we double

back a little bit to fix someone being

in between that's two going to be a

straight line but it's going to be a

different slope because now there's a

two to one or a one to two relationship

because I'm going two pages at a time so

if the phone company adds another page

that's going to take me only or another

two pages that's still only just one


more step and you can see the difference

if I kind of draw this if this is the

phone book in question this number of

pages it might take this many seconds on

the yellow line to represent or to solve

uh to find someone like John Harvard but

of course on the first algorithm the red

line it's literally going to take twice

as many steps and what do the ends here

Meet n is the go-to variable for

computer scientist or programmer just

generically representing a number so if

the number of pages in the phone book is

in the number of steps the second

algorithm would have taken would be in

the worst case n over two half as many

because you're going twice as fast but

the third algorithm actually if you

recall your your logarithms looks a

little something like this there's a

fundamentally different relationship

between the size of the problem and the

amount of time required to solve it that

technically is log base 2 of n but it's

really the shape that's different and

the implication there is that if for

instance Cambridge and Alston two

different towns here in Massachusetts

merge next year and there's just one


phone book that's twice as big no big

deal for that third and final algorithm

why you just tear the problem one more

time in half taking one more bite that's

it not another thousand bites just to

get to the solution and put another way

you can walk out way way way out here to

a much bigger phone book and ultimately

that green line is barely going to have

budged so this then it's just a way of

now formalizing and thinking about

what the performance or quality of these

algorithms might be

and before we now make one more

formalization of the algorithm itself

any questions then on this notion of

efficiency or now performance of ideas

um a lot of phone books over the years

and if you or your parents have any more

still somewhere we could definitely use

them because they're hard to find

other questions but thanks

other questions here too

oh

is that a murmur yes over here

sorry say again

oh yeah hopefully and then we could uh

then we'd have a little something more

to use here so now if we want to

formalize further what it is we just did


we can go ahead and introduce this a

form of code AKA pseudo code pseudocode

is not a specific language it's not like

something we're about to start coding in

it's just a way of expressing yourself

in English or any human language

succinctly correctly toward an end of

getting your idea for an algorithm

across so for instance here might be how

we could formalize the code the pseudo

code for that same algorithm step one

was pick up the phone book as I did step

two might be open to the middle of the

phone book as you propose that we do

first step three was probably to look

down at the pages I did and step four

gets a little more interesting because I

had to quickly make a decision and ask

myself a question if person is on page

then I should probably just go ahead and

call that person but that probably

wasn't the case at least for John

Harvard and I opened the M section and

so there's this other question I should

now ask else if the person is earlier in

the book then I should tear the problem

in half as I did but go left so to speak

and then not just open to the middle of

the left half of the book but really


just go back to step three repeat myself

why because I can just repeat what I

just did but with a smaller problem

having taken this big bite but if the

person was later in the book

as might have happened with a different

person than John Harvard then I should

open to the middle of the right half of

the book again go back to line three but

again I'm not going to get stuck doing

something forever like this because I

keep shrinking the size of the problem

lastly the only possible scenario that's

left if John Harvard is not on the page

and he's not to the left and he's not to

the right what should our conclusion be

he's not there he's not listed and so we

need to quit in some other form now as

an aside it's kind of deliberate that I

buried that last question at the end

because this is how what happens all too

often in programming whether you're new

at it or professional just not

considering all possible cases Corner

cases if you will that might not happen

that often but if you don't anticipate

them in your own code pseudocode or

otherwise this is when and why programs

might crash or you might see stupid

little spinning beach balls or


hourglasses or your computer might

reboot why it's doing something sort of

unpredictable if a human maybe myself

didn't anticipate this like what does

this program do if John Harvard's not in

the phone book If I Had omitted lines 12

and 13 I don't know maybe it would

behave differently on a Mac or PC

because it's sort of undefined behavior

and these are the kinds of omissions

that frankly you're invariably going to

make bugs you're going to introduce

mistakes you're going to make early on

in me too 25 years later but you'll get

better at thinking about those Corner

cases and handling anything that can

possibly go wrong and as a result your

code will be all the better for it now

the problem ultimately with learning how

to program especially if you've never

had experience or even if you do but you

learned a but one language only is that

they all look a little cryptic at first

glance but they do share certain

commonalities and in fact we'll use this

pseudo code to define those first

highlighted in yellow here or what

henceforth we're going to start calling

functions lots of different programming


languages exist but most of them have

what we might call functions which are

actions or verbs that solve some smaller

problem that is to say you might use a

whole bunch of functions to solve a

bigger problem because each functions

tend to do each function tends to do

something very specific or precise these

then in English might be translated in

code actual computer code to these

things called functions highlighted in

yellow now are what we might call

conditionals conditionals are things

that you do conditionally based on the

answer to some question you can think of

them kind of like Forks in the road you

go left or go right or some other

direction in based on the answer to some

question well what are those questions

highlighted now in yellow or what we

would call Boolean Expressions named

after a mathematician last name bull

that simply have yes no answers or if

you prefer true or false answers or Heck

if you prefer one or zero answers we

just need to distinguish one scenario

from another the last thing manifest in

this pseudo code is what I might

highlight now and call Loops some kind

of cycle some kind of directive that


tells us to do something again and again

so that I don't need a thousand line

program to search a thousand page phone

book I can get away with a 13 line

program but sort of repeat myself

inherently in order to solve some

problem until I get to that last step so

this then is what we might call

pseudocode and indeed there are other

characteristics of programs that we'll

touch on before long things like

arguments and return values variables

and more

but unfortunately in most languages

including some we will very deliberately

use in this class and that everyone in

the rural World these days still uses is

programs tend to look like this this for

instance is a distillation of that very

first program I wrote in 1996 in cs50

itself just to print something on the

screen and in fact this version here

just tries to print quote unquote hello

world which is dare say the most

canonical first thing that most any

programmer ever gets a computer to say

just because

but look at this mess I mean there's a

hash symbol these angled brackets


parentheses words like end curly braces

quotes parentheses semicolons and

backslashes I mean there's more overhead

and more syntax and clutter than there

is an actual idea now that's not to say

that you won't be able to understand

this before long because honestly

there's not that many patterns indeed

programming languages have typically a

much smaller vocabulary than any actual

human language but at first it might

indeed look quite cryptic but you can

perhaps infer I have no idea what these

other ones do yet but hello world is

presumably quote unquote what will be

printed on the screen but what we'll do

today after a short break and set the

stage for next week is introduce these

exact same ideas in just a bit using

scratch something that you yourselves

might have used when you're quite

younger but without the same vocabulary

apply to those ideas and the upside of

what we'll soon do using scratch this

graphical programming language from her

friends down the road at MIT it'll let

us to day start to drag and drop things

that look like puzzle pieces that

interlock together from it's logical

sense to do so but without the


distraction of hashes parentheses curly

braces angle brackets semicolons and

things that are quite beside the point

but for now let's go ahead and take a 10

minute break here and when we resume we

will start programming so this on the

screen is a language called C something

that we'll dive into next week and

thankfully this now on the screen is

another language called python that will

also take a look at in a few weeks

before long along with other languages

along the way today though and for this

first week week zero so to speak we use

scratch because again it'll allow us to

explore some of those programming

fundamentals that will be in C and in

Python and in JavaScript and other

languages too but in a way where we

don't have to worry about the

distractions of syntax so the world of

scratch looks like this it's a web-based

or downloadable programming environment

that has this layout here by default on

the left here we'll soon see is a

palette of puzzle pieces programming

blocks that represent all of those ideas

we just discussed and by dragging and

dropping these puzzle pieces or blocks


over this to Big area and connecting

them together if it makes logical sense

to do so we'll start programming in this

environment the environment allows you

to have multiple Sprites so to speak

multiple characters things like a cat or

anything else and those Sprites exist in

this rectangular World up here that you

can full screen to make bigger and this

here by default is scratch who can move

up down left right into many more things

too and within it scratches world you

can think of it as perhaps a familiar a

coordinate system with x's and y's which

is helpful only when it comes time to

like position things on the screen right

now scratch is at the default zero comma

zero where x equals zero and Y equals

zero if you were to move the cat way up

to the top X would stay zero y would be

positive 180. if you move the cat all

the way to the bottom X would stay zero

but y would now be negative 180 and if

you went left X would become negative

240 but y would stay zero or to the

right X would be 240 and Y would stay

zero so those numbers generally don't so

much matter because you can just move

relatively in this world up down left

right but when it comes time to like


precisely position some of these Sprites

or other imagery it'll be helpful just

to have that mental model of up down

left and right well let's go ahead and

make perhaps the simplest of programs

here I'm going to switch over to the

same programming environment now for a

tour of the left hand side so by default

selected here are the category in blue

motion which has a whole bunch of puzzle

pieces or blocks that relate to motion

and whereas scratch as a graphical

language categorizes things by the type

of things that these pieces do we'll see

that throughout this whole palette we'll

have functions and variables and

conditionals and Boolean expressions and

more each in a different color and shape

so for instance moving 10 steps or

turning one way or or the other would be

functions categorized here as things

like motion under looks in purple you

might have speech bubbles that you can

create by dragging and dropping these

that might say hello or whatever for

some number of seconds or you could

switch costumes change the cat to look

like a dog or a bird or anything else in

between sounds too you can play sounds


like meow or anything you might import

or record yourself and then there's

these things scratch calls events and

the most important of these is the first

when green flag clicked because if we

look over to the right of scratch's

world here this rectangular region has

this green flag and red stop sign up

above one of which is for play one of

which is for stop and so that's going to

allow us to start and stop our actual

programs when that green flag is

initially clicked but you can listen for

other types of events when the space bar

is pressed or something else when this

Sprite is clicked or something else and

here you already see like a programmer's

incarnation of things you and I take for

granted like every day now on our phones

anytime you tap an icon or drag your

finger or hit a button on the side these

are what a programmer would call events

things that happen and are often

triggered by us humans and things that a

program be it in scratch or python or C

or anything else can listen for and

respond to indeed that's why when you

tap the phone icon on your phone the

phone application starts up because

someone wrote software that's listening


for a finger press on that particular

icon and so scratch has these same

things too under control in Orange you

can see that we can wait for one second

or repeat something some number of times

10 by default but we can change anything

in these white circles to anything else

there's another puzzle piece here

forever which implies some kind of loop

where we can do something again and

again and even though it seems a little

tight there's not much room to fit

something there scratch is going to have

these things grow and Shrink however we

want to fill similarly shaped pieces and

here's those conditionals if something

is true or false then do this this next

thing and that's how we can put in this

little trapezoid-like shape some form of

Boolean expression a question with a yes

no true false or one zero answer and

decide whether to do something or not

you can mess combine these things too if

something is true do this else do this

other thing and you can even tuck one

inside of the other if you want to ask

three or four or more questions sensing

two is going to be a thing you can ask

questions AKA Boolean expressions like


is the Sprite touching the mouse pointer

the arrow on the screen so that you can

start to interact with these programs

what is the distance between a Sprite

and a mouse pointer you can do simple

calculations just to figure out maybe if

the the enemy is getting close to the

cat under operator some lower level

stuff like math but also the ability to

pick random numbers which for a game is

great because then you can kind of vary

the difficulty or what's happening in a

game without the same game playing the

same way every time and you can combine

ideas something and something must be

true in order to make that kind of

decisions before or we can even join two

words together says apple and banana

default but you can type in or drag and

drop whatever you want there to combine

multiple words into full larger

sentences and then lastly down here

there's an orange things called

variables in math we've obviously got X

and Y and whatnot in programming we'll

have the same ability to sort of store

in these named symbols X or Y of values

that we care about numbers or letters or

words or colors or anything ultimately

but in programming you'll see that it's


much more conventional not to just use

Simple letters like X and Y and Z but to

actually give variables full words or

were singular or plural words to

describe what they are

and then lastly if this isn't enough

colors uh blocks for you you can create

your own blocks and indeed this is going

to be a programming principle will apply

today and with the first problem set

whereby once you start to assemble these

puzzle pieces and you realize oh would

it be nice if those several pieces could

have just been replaced by one had MIT

fought to give me that one puzzle piece

you yourself can make your own blocks by

connecting these all together giving

them a name and boom a new puzzle piece

will exist so let's do the simplest most

canonical programs here starting up with

control and I'm going to click and drag

and drop this thing here when green flag

clicked and then I'm going to grab one

more for instance under looks and under

looks I'm going to go ahead and just say

something like initially

not just hello but the more canonical

hello comma world now you might guess

that in this programming environment I


can go over here now and click the green

flag and voila hello comma world so

that's my first program and obviously

much more user friendly than typing out

the much more cryptic text that we saw

on the screen that you too will type out

next week but for now we'll just focus

on these ideas and in this case a

function so what it is that just that

just happened this purple block here is

say that's the function and it seems to

take some form of input in the white

oval specifically hello comma World well

this actually fits the Paradigm that we

looked at earlier of just inputs and

outputs so if I may if you consider what

this puzzle piece is doing it actually

fits this model the input in this case

is going to be hello comma World in

white the algorithm is going to be

implemented as a function by MIT called

say and the output of that is going to

be some kind of side effect like the cat

and the speech bubble are saying hello

world so already even that simple drag

and drop mimics exactly this relatively

simple mental model so let's take things

further let's go ahead now and make the

program a little more interactive so

that it says something like hello David


or hello Carter or hello to you

specifically and for this I'm going to

go under sensing and you might have to

poke around to find these things the

first time around but I've done this a

few times so I kind of know where things

are and what color there's this function

here ask what's your name but that's in

white so we can change the question to

anything we want and it's going to wait

for the human to type in their answer

and this function called ask is a little

different from the say block which just

had this side effect of printing a

speech bubble to the screen the ask

function is even more powerful in that

after it asks the human to type

something in this function is going to

hand you back what they typed in in the

form of what's called a return value

which is stored ultimately and by

default this thing called answer this

little blue oval here called answer is

again one of these variables that in

math would be called just X or Y but in

programming we were saying what it does

so I'm going to go ahead and do this let

me go ahead and drag and drop this block

and I want to ask the question before


saying anything but you'll notice that

scratch is smart and it's going to

realize I want to insert something in

between and it's just going to move

things up and down I'm going to let go

and ask the default question what's your

name and now if I want to go ahead and

say hello David or Carter let's just do

hello comma because I obviously don't

know when I'm writing the program who's

going to use it so let me now grab

another looks block up here say

something again and now let me go back

to sensing and now grab the return value

represented by this other puzzle piece

and let me just drag and drop it here

and notice it's the same shape even if

it's not quite the same size things will

grow or Shrink as needed all right so

let's now zoom out let me go and stop

the old versions because I don't want to

say hello world anymore let me hit the

green flag and what's my name all right

David enter

huh all right maybe I just wasn't paying

close enough attention let me try it

again green flag d-a-v-i-d enter

this seems like a bug

what's the bugger mistake might you

think
uh yeah

yeah we kind of want to combine them in

the same text box and it's you know it's

technically a bug because this just

looks kind of stupid it's just saying

David after I asked for my name I'd like

it to say maybe hello then David but

it's just blowing past the hello and

printing David but let's put our finger

on why this is happening you're right

for the solution but what's the actual

fundamental problem in back

perfect I mean computers are really darn

fast these days it is saying hello all

of us are just too slow in this room to

even see it because it's then saying

David on the screen so fast as well so

there's a couple of solutions here and

yours is spot on but just to poke around

you'll see the first example of how many

ways in programming be it scratch or C

or python or anything else that they're

going to be to solve problems and will

teach you over the course of this weeks

Sometimes some ways are better

relatively than others but rarely is

there a Best Way necessarily because

again reasonable people will disagree

and what we'll try to teach you over the


coming weeks is how to kind of think

through those nuances and it's not going

to be obvious at first glance but the

more programs you write the more

feedback you get the more bugs that you

introduce the more you'll get your

footing with exactly this kind of

problem solving so let me try this in a

couple of ways up here would be one

solution to the problem MIT anticipated

this kind of issue especially with

first-time programmers and I could just

use puzzle piece that says say the

following for two seconds or one second

or whatever then do the same with the

next word and it might be kind of a bit

of a pause hello one second two seconds

David one second two seconds but at

least it would look a little more

grammatically correct but I can do it a

little more elegantly as you proposed

let me go ahead and throw away one of

these blocks and you can just drag and

let go and it'll delete itself let me go

down to operators

because this join block here is the

right shape and so even if you're not

sure what goes where just focus on the

shapes first let me drag this over here

and it grew to fill that let me go ahead


and say hello comma space and now it

could just say by default hello banana

but let me go back to let me go back to

sensing drag answer and that's going to

drag and drop there and so now notice

we're sort of stacking or nesting one

block on another so that the output of

one becomes the input to another but

that's okay here let me go ahead and

zoom out hit stop and hit play all right

what's your name d-a-b-i-d enter and

voila now it's presumably as we first

intended so thank you

foreign

thank you no no minus two this time

let's consider that even with these uh

this additional example it still fits

the same mental model but in a little

more interesting way here's that new

function ask something and wait and

notice that in this case too there's an

input Otherwise Known henceforth as an

argument or a parameter programming

speak for just an input in the context

of a function and if we use our drawing

as before to represent this thing here

we'll see that the input now is going to

be quote unquote what's your name the

algorithm is going to be implemented by


way of this new puzzle piece the

function called ask and the output of

that thing this time is not going to be

the cat saying anything yet but rather

it's going to be the actual answer so

instead of the visual side effect of the

speech bubble appearing now nothing

visible is happening yet thanks to this

function it's sort of handing me back

like a scrap of paper with whatever I

typed in written on it so I can reuse

David one or more times even like I did

now what did I then do with that value

well consider that with the subsequent

function

with the subsequent function we had this

say block 2 combined with a join so we

have this variable called answer we're

joining it with that first argument

hello so already we see that some

functions like join can take not one but

two arguments or inputs and that's fine

the output of join is presumably going

to be hello David or hello Carter or

whatever the human typed in and that

output notice is essentially becoming

the input to another function say just

because we've kind of stacked things or

nested them on top of one another but

graphic or but um but methodically it's


really the same idea the input now are

two things hello comma and the return

value from the previous ask function the

function now is going to be joined the

output is going to be hello David but

that hello David output is now going to

become the input to another function

namely that first block called say and

that's then going to have the side

effect of printing out hello David on

the screen so again it's sort of

sophisticated as ours is yours as others

programs are going to get they really do

fit this very simple mental model of

inputs and outputs and you just have to

learn to recognize the vocabulary and to

know what kinds of puzzle pieces or

concepts ultimately to apply but you can

ultimately really kind of spice these

things up let me go back to my program

here that just is using the speech

bubble at the moment scratch as an aside

has some pretty fancy interactive

features too I click the extensions

button in the bottom left corner and let

me go ahead and choose the text to

speech extension this is using a cloud

service so if you have an internet

connection it can actually talk to the


cloud or a third-party service and this

one's going to give me a few new green

puzzle pieces namely the ability to

speak something from my speakers instead

of just saying it textually so let me go

ahead and drag this and now notice I

don't have to interlock them if I'm just

kind of playing around and I want to

move some things around I just want to

use this as like a canvas temporarily

let me go ahead and steal the join from

here put it there let me throw away the

save block by just moving it left and

letting go and now let me show join this

in so I've now changed my program to be

a little more interesting so now let me

stop the old version let me start the

new

what's your name type in David and voila

hello banana

okay minus two for real all right so

what I accidentally threw away there

uh intentionally for instructional

purposes was the actual answer that came

back from the ask block that's

embarrassing so now if I play this again

let's click the green icon what's your

name David and now hello David there we

go hello David all right thank you

okay so we have these functions then in


place but what more can we do well what

about those conditionals and and loops

and other constructs how can we bring

these programs to life so it's not just

clicking a button and voila something's

happening let's go ahead and make this

now even more interactive let me go

ahead and throw away most of these

pieces and let me just spice things up

with some more audio under sound I'm

going to go to play sound meow until

done here we go green flag

okay it's a little loud but it did

exactly do what it said let's hear it

again

[Music]

okay it's kind of an underwhelming

program eventually since you'd like to

think that the cat would just meow on

its own but I have to keep hitting the

button well this seems like an

opportunity for uh doing something again

and again so all right well if I wanted

to meow meow meow let me just grab a few

of these or you can even right click or

control click and you can copy paste

even in code here let me play this now

all right so now like it's not really

emoting happiness in quite the same way


it might be hungry or upset so you know

let's slow it down let me go to control

wait one second in between which might

be a little

uh let's worry some here we go play

[Music]

Okay so

if my goal was to make the cat meow

three times I dare say this code or

algorithm is correct but let's now

critique its design is this

well-designed and if not why not

what are your thoughts here uh yeah

sure yeah

yeah so yeah agreed I could use forever

a repeat but let me push a little harder

but why like this works I'm kind of done

with the assignments what do I what's

bad about it

yeah there's too much repetition right

if I wanted to change the sound that the

cat is making to like a different

variant of meow or have it bark instead

like a dog I could change it from the

drop down here apparently but then I'd

have to change it here and then I'd have

to change it here and God if this were

even longer that just gets tedious

quickly and you're probably increasing

the probability that you're going to


screw up and you're going to miss one of

the drop downs or something stupid and

introduce a bug or if you wanted to

change the number of seconds you're

waiting you've got to change it into

maybe even more places again you're just

creating risk for yourself and potential

bugs in the program so I do like the

repeat or the forever idea so that I

don't repeat myself and indeed what I

alluded to being possible copy pasting

earlier doesn't mean it's a good thing

and in code generally speaking when you

start to copy and paste puzzle pieces or

text next week you're probably not doing

something quite well so let me go ahead

and throw away most of these to get rid

of the duplication keeping just two of

the blocks that I care about let me grab

the repeat block for now let me move

this inside of the repeat heat Block

it's going to grow to fit it let me

reconnect all this and change the 10

just to a 3 and now play

[Music]

so better it's the same thing it's still

correct but now I've set the stage to

let the cat meow for instance four times

by changing one thing 40 times by


changing one thing or I could just use

the forever block and just walk away and

it will meow forever instead if that's

your goal that would be better a better

design but still correct but you know

what now that I have a program that's

designed to have a cat meow wow like why

I mean MIT invented scratch scratch is a

cat why is there no puzzle piece called

meow this feels like a missed

opportunity now to be fair they gave us

all the building blocks with which we

could Implement that idea but a

principle of programming and really

computer science is to leverage what

we're going to now start calling

abstraction we have a step-by-step

instructions here the repeat the play

and the weight that collectively

implements this idea that we humans

would call meowing wouldn't it be nice

to abstract away those several puzzle

pieces into just one that literally just

says what it does meow well here's where

we can make our own blocks let me go

over here to scratch under the pink

block category here

and let me click make a block and here I

see a slightly different interface where

I can choose a name for it I'm going to


call it meow and I'm going to keep it

simple that's it no inputs to meow yet

I'm just going to click ok now just

going to clean this up a bit here let me

drag and drop play sound and wait over

here and you know what I'm just going to

drag this way down here way down here

because now that I'm done implementing

meow I'm going to literally abstract it

away sort of out of sight out of mind

because now notice at top left there is

a new pink puzzle piece called meow and

so at this point I'd argue it doesn't

really matter how meow is implemented

frankly I don't know how ask or say was

implemented by MIT they abstracted those

things away for us now I have a brand

new puzzle piece that just says what it

is and this is now still correct but

arguably better design why because it's

just more readable to me to you it's

more maintainable when you look at your

code a year from now for the first time

because you're sort of fondly looking

back at the very first program you wrote

it says what it does the function itself

has semantics which conveys what's going

on if you really care about how meow is

implemented you could scroll down and


start to Tinker with the underlying

implementation details but otherwise you

don't need to care anymore now I feel

like there's a even additional

opportunity here for abstraction and to

sort of factor out some of this

functionality it's kind of lame that I

have this repeat block that lets me call

the meow function so to speak use the

meow function three times wouldn't it be

nice if I could just call the meow

function AKA use the meow function and

pass it in input that tells the puzzle

piece how many times I want it to meow

well let me go ahead and zoom out and

scroll down let me right click or

control click on the pink piece here and

choose edit or I could just start from

scratch no pun intended with a new one

and now here rather than just give this

thing a name now let me go ahead and add

an input here and I'm going to go ahead

and type in for instance n for number of

times to meow and just to make this even

more user friendly and self-descriptive

I'm going to add a label which has no

functional impact it's just an aesthetic

and I'm just going to say times just to

make it read more like English in this

case that tells me what the puzzle piece


does and now I'm going to click OK and

now I need to refine this a little bit

let me go ahead and grab under control a

repeat block

let me move the play sound and weight

into the repeat block I don't want 10

and I also don't want 3 here what I want

now is This n That is my actual variable

that scratch is creating for me that

represents whatever input the human

programmer provides notice it snaps

right in place let me connect this and

now voila I have an even fancier version

of meow that is parameterized it takes

input that affects Its Behavior

accordingly now I'm going to scroll back

up because out of sight out of mind I

just care that meow exists now I can

tighten up my code so to speak use even

fewer lines to do the same thing by

throwing away the repeat block

reconnecting this new puzzle piece here

that takes an input like three and voila

now we're really programming right we've

not made any forward progress

functionally the thing just meows three

times but it's a better design and as

you program more and more these are the

kinds of instincts still start to


acquire so that one you can start to

take a big assignment a big problem set

something for homework even that feels

is kind of overwhelming at first like oh

my God where do I even begin but if you

start to identify what are the sub

problems of a bigger problem then you

can start making progress and I do this

to this day where if I have to tackle

some programming related project I'm so

easy to like drag my feet or oh it's

going to take forever to start until I

just start writing down like a to-do

list and I start to modularize the

program and say all right well what do I

want this thing to do meowing what's

that mean I gotta have it say something

on the screen all right I need to have

it say something on the screen some

number of times like literally a mental

or written checklist or pseudo code if

you will in English on a piece of paper

or text file and then you can decide

okay the first thing I need to do for

homework to solve this real world

problem I just need a meow function I

need to use a bunch of other code too

but I need to create a meow function and

boom now you have a piece of the problem

solved not unlike we did with the phone


book there but in this case we'll have

presumably other problems to solve all

right so what more can we do let's add a

few more pieces just to the puzzle here

let's actually interact with the cat now

let me go ahead and now when the green

flag is clicked let me go ahead and ask

a question using an event here let me go

ahead and say if

let's see if the cursor I want to do

something like uh implement the notion

of petting the cat so if the cursor is

petting touching the cat like here

something like this it'd be cute if like

the cat meows like you're petting a cat

so I'm going to ask the question when

the green flag is clicked if let's see I

think I need sensing so if touching

Mouse pointer this is way too big but

again the shape is fine so there it goes

screw to fill and then if it's touching

the mouse pointer that is if the cat to

whom this script or this program anytime

I attach puzzle pieces MIT calls them a

script or like a program if you will let

me go ahead then and choose a sound and

say play sound meow until done alright

so here it is to be clear when the green

flag is clicked ask the question if the


cat is touching the mouse pointer then

play sound meow here we go play

huh all right let's try again

play

huh

I'm worried it's not

scratch's fault Feels Like Mine what's

the bug here

why doesn't this work yeah and back

yeah who just turned

yeah the problem is the moment I click

that green flag scratch asks the

question is the cat touching the mouse

pointer and obviously it's not because

the cursor was like up there a moment

ago and it's not down there it's fine if

I move the cursor down there but too

late the program already asked the

question the answer was no or false or

zero however you want to think about it

so no sound was played so what might be

the solution here be I could move my my

cursor quickly but that feels like never

going to work out right other Solutions

here

yeah and way back

the forever Loop so I could indeed use

this forever Loop because if I want my

program to just constantly listen to me

well let's literally do something


forever or at least forever as long as

the program is running until I

explicitly hit stop so let me grab that

let me go to control let me grab the

forever block let me move the if inside

of this forever block reconnect this go

back up here click the green flag and

now nothing's happened yet but let me

try moving my cursor now

ah so now

that's kind of cute so now the cat is

actually responding and it's going to

keep doing this again and again and so

now we have this idea of taking these

different ideas these different puzzle

pieces assembling them into something

more complicated and I could definitely

put a a name to this I could create a

custom block but for now let's just

consider what kind of more interactivity

we can do let me go ahead and do this by

again grabbing a when green flag clicked

let me go ahead and click the video

sensing

and I'm going to rotate the laptop

because otherwise we're going to get a

little Inception thing here where the

camera is picking up the camera is up

there so I'm gonna go


reveal to you what's inside the lectern

here

well we rotate this

and now that we have a non-video

backdrop I'm going to say this instead

of the green flag clicked actually I'm

going to say when the video motion is

greater than some arbitrary measurement

of motion I'm going to go ahead and play

sound meow until done

and then I'm going to get out of the way

so here's the cat and let's put it we'll

put them on top

on top of there just okay

all right and here we go

[Music]

so my hand is moving faster than 50

something or other whatever the unit of

measure is

and we thank you so now we have even

more interactive version

but I think if I sort of

slowly

right I'm not it's completely creepy but

I'm not like exceeding

the threshold

until finally my hand moves as fast as

that and so here actually is an

opportunity to show you something a

former student did let me go ahead here


and okay I gotta stop this let me go

ahead and zoom out of this in just a

moment uh if someone would be

if someone would be comfortable coming

up of not only Mass but also on camera

on the internet thought we'd play one of

your former classmates projects here up

on stage would anyone like to volunteer

here and be up on stage who's that yeah

come on down what's your name

Sahar all right come on down let me get

it set up for you here

[Applause]

all right let me go ahead and full

screen this here so this is a

whack-a-mole by one of your firmer

predecessors it's going to use the

camera focusing on your head which we'll

have to position inside of this

rectangle and have you ever played the

like whack-a-mole game at an arcade okay

so for those who haven't like these

little moles pop up and with a very

fuzzy Hammer you sort of hit down you

though if you don't mind you're going to

use your head to do this virtually

so let's line up your head with this red

rectangle if you could

we'll do beginner
all right here we go Sahar

give it a moment

okay come a little closer

and now hit the moles with your head

[Music]

there we go one point

one point

nice

15 seconds to go there we go oh yep one

point

that night

six seconds

there we go quick

all right a round of applause thank you

so beyond having a little bit of fun

here the goal was to demonstrate that by

using some fairly simple primitive some

basic building blocks but assembling

them in a fun way with some music maybe

some new costumes or artwork you can

really bring programs to life but at the

end of the day the only puzzle piece is

really involved were ones like the ones

I just dragged and dropped and a few

more because there were clearly lots of

moles so the student probably created a

few different Sprites not a single cap

but at least four different moles they

had like some kind of graphic on the

screen that showed the hardware to


position her head there was some kind of

timer maybe a variable that every second

was counting down so you can imagine

taking what looks like a pretty

impressive project at first glance and

perhaps overwhelming to solve yourself

but just think about what are the basic

building blocks and pluck off one piece

of the puzzle so to speak at a time and

so indeed if we rewind a little bit let

me go ahead here and introduce a program

that I myself made back in graduate

school when scratch was first being

developed by MIT let me go ahead and

open here give me just one second

something that I called back in the day

Oscar time that looks a little something

like this if I full screen it and hit

play

so you'll notice a piece of trash is

falling I can click on it and drag and

as I get close and close to the trash

can notice

it wants to go in it seems and if I let

go

one point

here comes another

I'll do the same two points

there's a sneaker falling from the sky


so another Sprite of some sort

I can also get just a little a little

lazy and just let them fall into the

trash themselves if I want to

so you can see it doesn't have to do

with my mouse cursor it has to do

apparently with the distance here let's

listen a little further I think there's

some additional trash is about to make

its appearance presumably there's some

kind of like variable that's keeping

track of this score

okay let's see what the last the last

chorus here is

[Music]

okay and thus it continues and uh this

song actually goes on and on and on and

I do not have fond memories of

implementing this and hearing this song

for like 10 straight hours but it's a

good example to just consider how was

this program composed how did I go about

implementing it the first time around

and let me go ahead and open up some

programs now that I wrote in advance

just so that we could see how these

things are assembled honestly the first

thing I probably did

was probably to do something a little

like this here is just a version of the


program where I set out to solve just

one problem first of planting a lamp

post in the program right I kind of had

a vision of what I wanted you know it

evolved over time certainly but I knew I

went to trash to fall I wanted a cute

little Oscar the Grouch to pop out of

the trash can and some other stuff but

wow that's a lot to just tackle all at

once I'm going to start easy download a

picture of a lamp post and then drag and

drop it into the stage as a costume and

boom that's version one it doesn't

functionally do anything I mean

literally that's the code that I wrote

to do this all I did was use like the

backdrops feature and drag and drop and

move things around but it got me to

version one of my program then what

might version two be well I considered

what piece of functionality frankly

might be the easiest to pluck off next

and the trash can that seems like a

pretty core piece of functionality it

just needs to sit there most of the time

so the next thing I probably did was to

open up for instance the trash can

version here that looks a little

something now like this so this time


I'll show you what's inside here there

is some code but not much notice at

bottom right I change the default cat to

a picture of a trash can instead but

it's the same principle that I can

control and then over here I added this

code when the green flag is clicked

switch the costume to something I

arbitrarily called Oscar one so I found

a couple of different pictures of a

trash can one that looks closed one that

looks partly open and eventually one

that has Oscar coming out and I just

gave them different names so I said

switch to Oscar one which is the closed

one by default then forever do the

following if touching the mouse pointer

then switch the costume to Oscar 2 I'll

switch to Oscar one that is to say I

just wanted to implement this idea of

like the can opening and closing even if

it's not exactly what I wanted

ultimately I just wanted to make some

forward progress so here when I run this

program

by clicking play notice what happens

nothing yet but if I get closer to the

trash can

it indeed pops open because it's forever

listening for whether the Sprite the


trash can in this case is touching the

mouse pointer and that's it that was

version two if you will and if I went in

now and added the lamp post and composed

the program together now we're starting

to make progress right now it would look

a little something more like the program

I intended ultimately to create what

piece did I probably bite off after that

well I think what I did is I probably

decided let me Implement one of the

pieces of trash not the shoe in the

newspaper all at once let's just get one

piece of trash working correctly first

and so let me go ahead and open this one

and again all these examples will be

available on the course's website so you

can see all of these examples too it's

not terribly long I just implemented in

advance so we could flip through kind of

quickly here's what I did here on the

right hand side I turned my Sprite into

a piece of trash this time instead of a

cat instead of a trash can and I also

created with Carter's help a second

Sprite this one a floor it's literally

just a black line because I just want it

initially to have some notion of a floor

so I could detect if the trash is


touching the floor now without seeing

the code yet just hearing that

description why might I have wanted the

second Sprite and this black line for a

floor

with the trash intending to fall from

the sky what might I have been thinking

like what problem might I be trying to

solve yeah

yeah you don't want the first Sprites

you start at the top go through and then

boom like you completely lose it like

that would not be a very uh useful thing

or it would seem to maybe eat up more

and more of the computer's memory if the

trash is just endlessly falling and I

can't grab it uh might be a little

traumatic if you try to get it and you

can't pull it back out and you can't fix

the program and so I just wanted the

thing to stop so how might I have

implemented this let's look at the code

at left here I have

a bit of Randomness like I proposed

earlier exists there's this blue

function called go to x comma y that

lets me move a spray to any position up

down left right I picked a random X

location either here or over here

negative 240 to positive four 240 and


then a y value of 180 which is the top

and this just makes the game more

interesting it's kind of lame pretty

quickly if the trash always falls from

the same spot here's just a little bit

of Randomness like most any game would

have that spices things up and so now if

I click the green flag you'll see that

it just Falls nothing interesting is

going to happen but it does stop when it

touches the black line because notice

what we did here I'm forever asking the

question if the distance of the Sprite

the trash is to the floor is greater

than zero that's fine change the Y

location to by negative three so move it

down three pixels down three pixels

until the distance to the floor is not

greater than zero it is zero or even

negative at which point it should to

stop moving all together there's other

ways we could have implemented this but

this felt like a nice clean way that

logically just made it make sense and

okay now I got some trash falling I got

a trash can that opens and closes I have

a lamp post now I'm you know a good

three steps into the program we're

making progress if we consider one or


two final pieces something like the

dragging of the trash let me go ahead

and open up this version to

dragging the trash requires a different

type of question let me zoom in here

here's the piece of trash I only need

one Sprite no floor here because I just

want the human to move it up down left

right and the human's not going to

physically be able to move it outside of

the world and if we zoom in on this code

the way we've solved this is as follows

we're using that and conjunction that we

glimpsed earlier because when the green

flag is clicked we're forever asking

this question or really these questions

plural if the mouse is down and the cat

or sorry the trash is touching the mouse

pointer that's equivalent logically to

clicking on the trash go ahead and move

the trash to the mouse pointer so again

it takes this very familiar idea that

you and I take for granted every day on

Max and PCs of clicking and dragging and

dropping how is that implemented well

Maco uh Mac OS or windows are probably

asking a question over every icon is the

mouse down and is the icon touching the

mouse if so go to the location of the

mouse forever while that Mouse button is


clicked down so how does this work in

reality now let me go ahead and click on

the play nothing happens at first but if

I click on it I can move it up down left

right it doesn't move thereafter so I

now need to kind of combine this idea of

dragging with falling but I bet I could

just start to use just one single

program right now I'm using separate

ones to show different ideas but now

that's another bite out of the problem

and if we do one last one something like

the score keeping is interesting because

recall that every time we dragged a

piece of trash into the can Oscar popped

out and told us the current score so let

me go ahead and find this one Oscar

variables

and let me zoom in on this one and this

one is longer because we combined all of

these elements so this is the kind of

thing that if you looked at at first

glance like I have no idea how I would

have implemented this from nothing from

scratch literally but again if you

Vision take your vision and componentize

it into these smaller bite-sized

problems you could take these baby steps

so to speak and then solve everything


collectively so what's new here is this

bottom one

forever do the following if the trash is

touching Oscar the other Sprite that

we've now added to the program change

the score by one this is an orange and

indeed if we poke around we'll see that

orange is a variable like an X or a y

but with a better name changing it means

to add one or if it's negative subtract

one and then go ahead and have the trash

go to pick random what what is this all

about well let me let me show you what

it's doing and then we can infer

backwards let me go ahead and hit play

all right it's falling I'm clicking and

dragging it I'm moving it over and I'm

Letting Go

all right let me do it once more letting

go

let me stop why do I have this function

at the end called go to X and Y

randomly like what problem is this

solving here yeah and way back

yeah exactly even though the human

perceives this as like a lot of trash

falling from the sky it's actually the

same piece of trash just kind of being

magically moved back to the top as

though it's a new one and there too you


have this idea of reusable code if you

were constantly copying and pasting your

pieces of trash and creating 20 pieces

of trash 30 pieces of trash just because

you want the game to have that many

levels probably doing something wrong

reuse the code that you wrote reuse the

Sprites that you wrote and that would

give you not just correctness but also a

better design well let's take a look at

one final set of building blocks that we

can compose ultimately into something

particularly interactive as follows let

me go ahead and zoom out here and let me

propose that we Implement something like

um like some kind of maze based game and

let me go ahead here so I want to

implement some maze based game that

looks at first glance like this let me

hit play it's not a very fun game yet

but here's a little Harvard Shield a

couple of black lines this time vertical

instead of horizontal but notice you

can't quite see my hand here but I'm

using my arrow keys to go down to go up

to go left to go right but if I keep

going right right right right right

right right right it's not going

anywhere and left left left left left


left left left left left left it

eventually stops so before we look at

the code how might this be working

what kinds of scripts collections of

puzzle pieces might collectively help us

implement this what do you think

perfect yeah there's probably some

question being asked if touching the

black line and it happens to be a couple

of Sprites Each of which is just

literally a vertical black line we're

probably asking a question like are you

touching it or is the distance to it

zero or close to zero and if so we just

ignore the up down left or rather we

ignore the left or the right arrow at

that point so that works but otherwise

if we're not touching a wall what are we

probably doing instead forever here how

is the movement working presumably yeah

and back

oh you might are you scratching

okay sure let's go on

sorry say a little louder

exactly it's continually forever looking

or listening for the arrow keys up down

left right and if the up arrow is

pressed we're probably changing the Y by

a positive value uh if the down arrow is

pressed we're High going down by Y and


left and right accordingly so let's

actually take a quick look if I zoom out

here and take a look at the code that

implements this there's a lot going on

at first glance but let's see first of

all let me drag some stuff out of the

way because it's kind of overwhelming at

first glance especially if you for

instance were poking around online as

for problem set zero just to get

inspiration most projects out there are

going to look overwhelming at first

glance until you start to wrap your mind

around what's going on but in this case

we've implemented some abstractions from

the get-go to sort of explain to

ourselves and to anyone else looking at

the code what's going on this is that

program with the two black lines and the

Harvard Shield going up down left and

right it initially puts the shield in

the middle zero comma zero it then

forever listens for keyboard as I think

you were describing and it feels for the

walls as I think you were just driving

now how is that implemented don't know

yet these are custom blocks we created

as abstractions to kind of hide those

implementation details because honestly


that's all I need to know right now but

as aspiring programmers if we're curious

now let's scroll down to the actual

implementation of listening for keyboard

this is the one on the left and it is a

little long but it's a lot of similar

structure we're doing the following if

the up arrow is pressed then change by y

by one go up if the down arrow is

pressed then change by negative one go

down right arrow left arrow and that's

it so it just assembles all of those

ideas combines it into one new block

just because it's kind of overwhelming

let's just implement it once and tuck it

away and if we scroll now over to the

feel for walls function this now is

asking the question as hypothesized if

I'm touching the left wall change my x

value by one sort of move away from it a

little bit if I'm touching the right

wall then move X by negative 1 to move a

little bit away from it so it kind of

bounces off the wall just in case it

slightly went over we keep the crest

within those two walls all right so then

a couple of more pieces here to

introduce what if we want to actually

add some kind of adversary or opponent

to this game well let me go ahead to um


maybe this one here where the adversary

in this game might for instance be

designed to be bouncing to stand in your

way if this is like a maze and you're

trying to get the hardwood shield from

the bottom to the top or vice versa uh

oh Yale is in the way and it seems to be

automatically bouncing back and forth

here well let me ask someone else

hypothesize how is this working

this is an idea you have this is an idea

you see let's reverse engineer

in your head how it works

how might this be working yeah and back

yeah so if the Yale symbol is touching

the left wall or the right wall we

somehow have it bounce and indeed we'll

see there's a puzzle piece that can do

exactly that technically off the edge as

we'll see but there's another way we can

do this and let's look at the code the

way we ourselves can Implement exactly

that idea bounce is just with a little

bit of logic so here's what this version

of the program is doing it's moving Yale

by default to zero zero just to

arbitrarily put it somewhere pointing at

Direction 90 degrees which means just

horizontally essentially and then it's


forever doing this if touching the left

wall or touching the right wall here's

our translation of Bounce we're just

turning 180 degrees and the nice thing

about that is we don't have to worry if

we're going from right to left or left

to right 180 degrees is going to work on

both of the walls and that's it

after we do that we just move one step

one pixel at a time but we're doing it

forever so something is happening

continually and the Yale icon is

bouncing back and forth well one final

piece here what if now we want a more uh

another adversary a more advanced

adversary down the road for instance to

go and follow us wherever we are such

that this time

we want

the other Sprite

to not just bounce back and forth but

literally follow us no matter where we

go

how might this be implemented

on the screen I bet it's another forever

block but what's inside

yeah forever point at the location of

the Harvard shield and go one step

toward it this is just going to go on

forever if I just give up at least in


this version notice it's about it's sort

of twitching back and forth because it

goes one pixel then one pixel then one

pixel it's sort of in a frantic State

here we haven't finished the game yet

but if we see inside we'll see exactly

that it didn't take much to implement

this simple idea go to a random position

just to make it kind of fair initially

then forever Point towards Harvard which

is what we call the Harvard Crest spray

move one step suppose we now wanted to

make a more advanced level what's a

minor change I could logically make to

this code just to make MIT even better

at this

all right change the number of steps to

two so let's try that so now they got

twice as fast let me go ahead and just

get this out of the way whoops let me

make it a fair fight

green flag

all right I unfortunately am still

moving one pixel at a time so this isn't

going to end well it caught up to me and

if we're really aggressive

and do something like 20 steps at a time

click the green flag Jesus okay so

that's how you might then make your


levels progressively harder and harder

so it's not an accident that we chose

these particular examples here involving

these particular schools because we have

one more demonstration we thought we'd

introduce today if we could get one

other volunteer to come up and play what

was called by one of your predecessors

Ivy's hardest game let's see Hugh in the

middle do you want to come on up what's

your name

let's say again

it comes a little closer actually sorry

hard to hear here

all right a round of applause here if we

could too

okay sorry what was your name

Celeste Celeste come on over nice to

meet you too so here we have on this

other screen Ivy's hardest game written

by a former cs50 student I think you'll

see that it combines these same

principles the maze is clearly a little

more advanced the goal at hand is to

initially move the Harvard Crest to the

Sprite all the way on the right so that

you catch up to him in this case but

you'll see that there's different levels

and different levels of sophistication

so if you're up for you can use just


these arrow keys up down left right

you'll be controlling the Harvard Sprite

and if we could raise the volume just a

little bit we'll make this our final

example

there we go clicking the green flag

[Music]

[Applause]

[Music]

[Applause]

[Music]

[Music]

[Music]

foreign

[Music]

[Music]

foreign

[Music]

[Music]

[Music]

foreign

[Music]

that's it for cs50 Welcome to the class

we'll see you next time

[Applause]

thank you

[Music]

foreign

[Music]
[Music]

thank you

[Music]

all right so this is cs50 and this is

week one the one in which you learn a

new language which is something we

technically said last week at least if

you had never played with this graphical

language known as scratch before which

itself was a programming language but

today is promised we transitioned to

something a little more traditional a

little more text-based not puzzle piece

or block based known as C this is an

older language it's been around for

decades but it's a language that

underlies so many of today's more modern

languages among them something called

python that will also come to in a few

weeks time indeed at the end of the

semester the goal is for you to feel

that you've not learned scratch you've

not learned C or even python for that

matter but fundamentally that you've

learned how to program unfortunately

when you learn how to program with a

more traditional language like this

there's just so much distraction last

week I described all of the syntax all

of the weird punctuation that you see in


this like the hash symbol these angle

brackets parentheses curly braces

backslash n and more well today we're

not going to reveal what all of those

little particulars mean but by next week

will this no longer look like the

proverbial Greek to you a language that

presumably you've never actually seen or

typed before but to do that we'll

explore some of the very same topics

since last week so recall that via

scratch and presumably by a problem set

one we took a look at things called

functions that are actions or verbs and

related to functions where arguments

like inputs and related to some

functions were return values like

outputs then we talked a bit about

conditionals Forks in the road so to

speak Boolean Expressions which are just

yes no questions or true false questions

Loops which let you do things again and

again variables like in math that let

you store values temporarily and then

even other topics still so if you were

comfortable on the heels of problems at

zero in last week realize that all of

these topics are going to be remained

with us so really today is just about


acquiring all the more of a mental model

for how you translate those ideas into

presumably a very cryptic new syntax a

new syntax frankly that's actually more

simple in some ways than your own human

language be it English or something else

because there's far fewer vocabulary

words there's actually far less syntax

that you might have in say a typical

human language but you need to be with

these computer languages all the more

precise so that your most uh ultimately

correct and ultimately we'll see too

your code is successful along a few

other lines as well so if you think

about like the last time you kind of

wandered around not really knowing what

you were doing or encountered something

new might not have been that long ago

entering Harvard Yard for the very first

time or old campus or the light be it in

Cambridge or New Haven you know you

didn't really need to know how to do

everything as a first year you didn't

need to know who everyone was where

everything was Hal Harvard or Yale or

anything else for that matter worked you

sort of got by day to day by just

focusing on those things that matter and

anything you didn't really understand


you sort of turned a blind eye to until

it's important and that's indeed what

we're going to do today and really for

the next several weeks we'll focus on

details that are initially important and

try to wave our hands so to speak it

details that yeah eventually we'll get

to might be interesting but for now they

might be distractions and by

distractions I really mean some of that

syntax to which I alluded earlier so by

the end of today and really by the end

of problem set one your first foray

presumably into this language called C

you'll have written some code and you'll

be asking yourself We'll be asking

yourselves just how good is that code

well first and foremost per last week be

it in scratch or phone book form like

code ultimately needs to be correct to

be Well Done Right you want the problem

to be solved correctly so that one sort

of goes without saying and along the way

this term will provide you with tools

and techniques so you don't have to just

sit there sort of endlessly trying an

input checking the output trying another

input checking the output there's a lot

of automation tools in the real world


and in this class and others like it

that'll help facilitate you answering

that question for yourself is my code

correct according to our specifications

or the like but then something that's

going to take more time and you're

probably not going to feel 100

comfortable with the first week the

first weeks is just how well designed

your code is it's one thing to sort sort

of speak English or write English but

it's another thing or any language for

that matter but it's another thing to

speak it or write it well and we spend

all these years in middle school high

school presumably writing papers and

other documents getting grades and

feedback on them as to how well

formulated your arguments were how well

structured your paper was and the like

and there's that same idea in

programming it doesn't matter

necessarily that you've just solved the

program a problem correctly if your code

is a complete visual mess or if it's

crazy long it's going to be really hard

for someone else to wrap their mind

around what your code is doing and

indeed to be confident if it is correct

and honestly you the next morning the


next year the next time you look at that

code might have no idea what you

yourself were even thinking but you will

if you focus too on designing good code

getting your algorithms efficient

getting your code nice and clean and

even making sure your code looks pretty

which would describe as a matter of

style so in the sort of written human

world you know having punctuation in the

right place capitalization and like the

sort of way you write an essay but not

necessarily send a text message relates

to style for instance and so good style

and code is going to have a few of these

characteristics that are pretty easily

taught and remembered but you just have

to start to get in the habit of writing

code in a certain way so these three

axes so to speak correctness design and

style are really the overarching goals

when writing code that ultimately is

going to look like this so this program

we conjectured last week does what if

you run it on a Mac or PC or somewhere

else presumably

what does it do yeah

it just prints hello world and honestly

that's kind of atrocious that you need


to hit your keyboard keys this many

times with this cryptic syntax just to

get a program to say hello world so a

spoiler in a few weeks time when we

introduce other more modern languages

like python you can distill this same

logic into literally one line of code

and so we're getting there ultimately

but it's in it's helpful to understand

what it is that's going on here because

even though this is a pretty cryptic

syntax there's nothing after this week

and really next week that you shouldn't

be able to understand even about

something that right now looks a little

something like this so how do you write

code well I've given us sort of the

answer to a problem how do you print

hello world on the screen so what do I

do with this code well we're in the

habit of typically writing things with

like Microsoft Word or Google documents

and yeah I could open up word or Google

Docs or Pages or the like and just

literally transcribe that character for

character save it and boom I've got a

program but the problem per last week is

that computers only understand or speak

what other language so to speak

yeah so binary zeros and ones and so


this obviously is not zeros and one so

it doesn't matter if I put it in a word

dot Google doc Pages file or the like

the computer's not going to understand

it until I somehow translate it to zeros

and ones and honestly none of those

tolls that I rattled off are really

appropriate for programming why well

they come with features like bold facing

and italics and sort of fluffy aesthetic

stuff that has no functional impact on

what you're trying to do with your code

and they don't have the ability it would

seem to convert that code ultimately to

zeros and ones but tools that do have

this capability might be called

integrated development environments or

Ides or more simply text editors a text

editor is a tool that a programmer uses

perhaps every day to write their code

and it's a simple program here for

instance a very popular one called

Visual Studio code or vs code and at the

top here you see that I've actually

created in advance before class A very

simple empty file called hello.c why

well dot C indicates by convention that

this is going to be a file file in which

there is C code it's not DOT docx which


would mean in this file is a Microsoft

Word document or dot pages is a Pages

file this is dot C which means in this

file is going to be text in the language

called C this number one here is just

sort of an automatic line number that's

going to help me keep track of how long

or short this program is and the cursor

is just blinking there waiting for me to

start typing some code well let me go

ahead and type out exactly the same code

for me it comes pretty comfortably from

memory so I'm going to go ahead and

include something called standard io.h

more on that later I'm going to sort of

magically type int main void whatever

that means we'll come back to that later

one of these curly braces and then a

sibling there that closes the same then

I'm going to hit tab to indent a few

spaces and then I'm going to type not

print but print f

then hello comma World backslash n close

quote close parenthesis semicolon and I

dare say this was essentially the very

first program I wrote some 25 years ago

I wrote it to say hi cs50 now it just

says the more canonical conventional

hello world but that's it that's my very

first program and all I need to now do


is maybe hit command s or control s to

save the file and voila I am a

programmer

the catch though is like okay how do I

run this like on your Mac or PC how do

you run a program we'll usually double

click an icon on your phone you tap an

icon in this environment that we're

using and that many programmers there

say most programmers use you don't have

immediately a nice pretty icon to double

click on that's very user friendly but

it's not very necessary especially when

you get more comfortable with

programming you're going to want to type

commands because it's just faster than

pointing and clicking a mouse and you're

going to want to automate things which

is a lot easier if it's all command or

text based as opposed to Mouse and mus

muscular movements and so here I have my

program it lives in this file called

hello.c I need to now convert it though

to zeros and ones well how do I go about

doing this and how am I going to get

from those uh this so-called code or

source code as it's conventionally

called to this these zeros and ones that

will now start calling machine code the


zeros and ones from last week can be

used not only to represent numbers and

letter Colors audio video and more it

can also represent instructions to a

computer like print or play a sound or

delete a file or save a file all the

sort of basics of a computer somehow can

be represented by other patterns of

zeros and ones and just like last week

it depends on the context in which these

numbers are stored sometimes they're

interpreted as numbers like in a

spreadsheet sometimes they're

interpreted as colors sometimes they're

interpreted as instructions commands to

your computer to do very low level

operations like print something on the

screen

so fortunately like last week's

definition of like computer science of

problem solving is a nice mental model

for exactly the goal at hand I have some

input AKA source code I want to Output

ultimately machine code those zeros and

Wands I certainly don't want to do this

kind of process by hand so hopefully

there's an algorithm implemented by some

special program that does exactly that

and those of you who do have some prior

experience this program might be called


a

a compiler so a few of you have indeed

programmed before not all languages use

compiler C in fact is a language that

does use a compiler and so I just need

to find myself on my computer somewhere

presumably a so-called compiler a

program whose purpose in life is to

convert one language to another and

source code intextually in C like we saw

a moment ago is source code the machine

code is the corresponding zeros and ones

so let me go back to the same

programming environment called Visual

Studio code or vs code this is typically

a program you or any programmer on the

internet can download onto their own Mac

or PC and be on their way with whatever

computer you own writing some code a

downside though of that approach is that

all of us have slightly different

versions of Macs or PCS we have slightly

different versions of operating systems

they may or may not be up to date it's

just sort of a technical support

nightmare to create a uniform

environment especially for like an

introductory class where everyone should

ideally be on the same page so we can


get you up and running quickly and so

I'm actually using a cloud-based version

of vs code something that you only need

a browser to access and then you can be

on any computer today or tomorrow by the

end of the semester we're going to get

you out of the cloud so to speak as best

we can and get you onto your own Mac or

PC so that after this class especially

if it's the only CS class you ever take

you feel like you can continue

programming in any number of languages

even with cs50 behind you but for now

wonderfully the browser version of BS

code should pretty much be identical to

what the eventual downloadable version

of the same would be and you'll see in

problem set one how to access this and

how to get going yourself with your

first programs

but I haven't mentioned this bottom part

of the screen this bottom part of the

screen and this is an area where we have

what's called a terminal window so this

is sort of old school technology that

allows you with a keyboard to interact

with a computer wherever it may be on

your lap in your pocket or even in this

case in the cloud so on the top hand

portion of this screen is my code my


text editor like tabbed Windows like in

many programs so I can just create files

and write code the bottom of the screen

here my so-called terminal window gives

me the ability to run commands on a

server that currently I have exclusive

access to so because I logged into vs

code with my account online I have my

own sort of virtual server if you will

in the cloud otherwise known as in this

context a container this has its own

operating system for me its own hard

drive if you will where I can save and

create files of my own separate from

yours and vice versa and it's at this

very simple prompt which is

conventionally but not always

abbreviated by a dollar sign has nothing

to do with currency it just means type

your commands here this is where I'm

going to be able to type commands like

compile my source code into machine code

so it's a command line interface or CLI

on top of an operating system that you

might not have ever used or seen but

it's very popular called Linux odds are

almost all of us in this room are using

Mac OS or Windows right now but we're

all going to start using an operating


system called Linux which is in a family

of operating systems that offer not only

this command line interface but are used

not just for programming but for serving

websites and developing applications and

the like and it's indeed a familiar and

very powerful interface as we'll see so

how do I go about making this file

hello.c into a program there's no double

there's no icon to double click but

there is a command I can type make hello

at this dollar sign prompt go ahead and

hit enter and nothing appears to happen

but that's a good thing and as we'll see

in programming almost always if you

don't see anything go wrong that means

everything went right so this is going

to be a rarity at first but this is a

good thing that it just seems to do

nothing but now there is in the folder

in my account in this on the cloud a

file called hello and it's a bit of a

weird command but you'll get familiar

with it before long dot just means go

into my current folder

slash Hello means run the program called

hello in this current folder so dot

slash hello and then enter and voila now

I'm actually not just programming but

running my actual code


so what have I just done let me go ahead

and do this I'm going to go ahead and

open up the sidebar of this program and

you'll see in problem set one how to do

this and this might look a little

different based on your own

configuration even the color scheme I'm

using might ultimately look different

from yours because it supports nice

colorful theme so you can have different

colors and uh brightnesses depending on

your mood or the time of day what I've

opened here though is what is called in

vs code Explorer and this is just all of

the files in my cloud account and

there's not many right now there's only

two one is the file called hello.c and

it's highlighted because I've got it

open right there and the other is a file

called hello which is brand new and was

created when I ran that command and

what's now worth noting is that now

things are getting a little more like

Mac OS and windows like on the left hand

side you have a GUI a graphical user

interface but on the bottom here again

you have a CLI command line interface

these are just different ways to

interact with computers and you'll get


comfortable with both and honestly

you're certainly familiar and with

gooeys already so it's the command line

one with which we'll spend some time now

suppose that I just wanted to do

something more than compile this program

suppose I wanted to go ahead and remove

it like uh no I made a mistake I want to

say hello cs50 not Hello World I could

just hover up here like in any software

and I could right click and I could poke

around in there delete permanently so

most of us might have that instinct on a

Mac or PC you right click or control

click and you poke around but in a

command line interface let me do this

instead the command for removing or

deleting a file in the world of Linux

this other operating system is just to

type RM for remove and then hello enter

it's a somewhat cryptic confirmation

message but this just means are you sure

I'm going to go ahead and type y for yes

and now when I hit enter Watch What

Happens at top left in the Explorer the

GUI the graphical interface voila it

disappears right not terribly exciting

but this just means this is a graphical

version of what we're seeing here and it

fact if you want to never use the GUI


again I'll go ahead and close it with a

keyboard shortcut here you can forever

just type LS for list and hit enter and

you will see in the command line

interface all of the files in your

current folder so anything you can do

with a mouse you can do with this

command line interface and indeed we'll

see many more things that you can do as

well but the inventors of this and it's

uh this operating system is predecessors

we're very succinct like the command is

RM for remove the command is LS for list

it's very terse why because it's just

faster to type

so before we Forge ahead with making

something more interesting than just

hello world let me pause here to see if

there's questions on source code or

machine code or compiler or this command

line interface

really good question and let me recap if

I were to make changes to the program

run it and maybe make other changes and

try to rerun it would those changes be

reflected even though I've reworded

slightly well let's do this I already

removed the old version so let me go

ahead and point out that if I do dot


slash hello now I'm going to see some

kind of error because I just deleted the

file

no such file or directory so it's not

terribly user friendly but it's saying

what the problem is let me go ahead and

remake it by typing make hello now if I

type LS I'll see not one but two files

again and one of them is even green with

a little asterisk to indicate that it's

executable it's sort of the textual

version of something you could double

click in our human world so now of

course if I run hello we're back where I

started hello world but now suppose I

change it to hello comma cs50 like I did

years ago let me go ahead and save the

file with command s or control s down

here now let me run dot slash hello

again and voila

huh

so let me ask someone else to answer

that question what's the missing step

why did it not say hello cs50 yeah

yeah so you didn't I didn't compile it

again so sort of newbie mistake you're

going to make this mistake and many

others before long but now let me go

ahead and remake hello enter it's going

to seemingly make uh the same program


but this time when I run it it's hello

cs50

or any other questions on

some of these building blocks and we'll

come back to all the crazy syntax I

typed before long but for now we're

focusing on just the output yeah

uh when I keep running make it creates a

new version of the machine code so it

keeps changing the hello program and the

hello file and that's it there's no make

file per se

ah good question no if I open up that

directory you'll see that there's just

the one and it doesn't matter how many

times I run make hello three four five

it just keeps overwriting the original

so it's kind of like just saving in the

world of Google docs or Microsoft Word

or the like but there's an additional

step today we have to then convert my

words to the computers the zeros and

ones

yeah in front

ah what happens if I run hello.c so let

me go ahead and do dot slash hello.c

which is a mistake you'll invariably

make early on permission denied so what

does that mean this is where the error


message is mean something to the people

who design the operating system but it's

a little cryptic it's not that you don't

have access to the file it means that

it's not executable this is not it's

something you have permission to run but

you do have permission to read or write

it that is change it

ah really good question so if I have

named my file hello.c or more generally

something.c one of the things that make

does is it automatically picks the file

name for me and we'll discuss a bit

we'll discuss this a bit more next week

make itself is kind of the first of

White Lies today itself is not a

compiler it's a program that knows how

to find and use the compiler on the

system and automatically create the

program if I use as we'll discuss next

week the actual compiler myself I have

to type a much longer sequence of

commands to specify explicitly what do I

want the name of my program to be make

is a nice program especially in week one

because it just automates all of that

for us and so here we have now a program

that very simply prints something on the

screen so let's now put this into the

context of where we left off last time


in the context of scratch and inputs and

outputs so we discussed last time of

course functions and arguments functions

again are those actions and verbs like

say or ask or the like and the arguments

were the inputs to those functions

generally in those little white ovals

that in scratch you could type words or

numbers into well C in all of the

languages we're going to see this term

have that same capability and let's just

start to translate one of these things

to another so for instance let's put

this same program in C in the context of

scratch this is what hello world looked

like last week in the form of one

function this week of course it looks

like print and then the parentheses

notice are kind of deliberately designed

in the world of scratch to resemble that

same shape even though this is a white

oval you kind of get that

um it's uh it's kind of evoking that

same idea with the parentheses

technically the function in C it's not

called say it's not even called print

it's called print F the F stands for

formatted but we'll see what that means

in a moment but printf is the closest


analogous function for say in the world

of C notice if though you want to print

something like hello world or hello cs50

in C you don't just write the words as

we did last week you also had to add

what if you notice already what's

missing from this version

yeah so the double quotes on the left

and the right so that's necessary in C

whenever you have a a string of words

and I'm using that word deliberately

whenever you have multiple words like

this this is known as a string as we'll

see and you have to put it in double

quotes not single quotes you have to put

it in double quotes there's one other

stupid thing that we need to have in my

C code in order to get this function to

do something ultimately which is what

semicolon so just like in our human

world you eventually got into the habit

of using at least informal writing

periods semicolon is generally what you

use to finish your thought in the world

of programming with c all right so we

have that function in place now what

does this really fit into in terms of

the mental model well functions take

arguments and it turns out functions can

have different types of outputs and


we've actually seen both already last

week one type of output from a function

can be something called a side effect

and it generally refers to something

visual like something appearing on the

screen or a sound playing from your

computer it's sort of a side effect of

the function doing its thing and indeed

last week we saw this in the context of

passing in a something like hello world

as input to the save function and we saw

on the screen hello world but it was

kind of a one-off it's it's one and done

you can't actually do anything with that

visual output other than consume it

visually with your human eyes but

sometimes recall last week we had

functions like the ask block that

actually returned me some value remember

the ask what's your name it handed me

back whatever answer the human typed in

it didn't just arbitrarily display it on

the screen the cat didn't necessarily

say it on the screen it was stored

instead in that special variable

that was called answer because some

functions have not side effects but

return values they hand you back an

output that you can use and reuse unlike


the side effect which again displays and

that's it you can't sort of catch it and

hold on to it so in the context of last

week we had the ask block and that had

this special answer return value in C

we're going to see in just a moment we

could translate this as follows the

closest match I can propose for the ask

block is a function that we're going to

start calling get string string is again

a word a set of words like a phrase or a

sentence in programming it too is a

function insofar as it takes input and

pretty much this isn't always true but

very often when you have a word in C

followed by an open parenthesis and a

closed parenthesis it's most likely the

name of a function and we're going to

see that there's some exceptions to that

but for now this indeed looks like a

function because it matches that pattern

if I want to ask the question what's

your name question mark and I'm even

going to deliberately put a space there

just to kind of move the cursor a little

bit over so that the human isn't typing

literally after the question mark so

that's just the nitpicky aesthetic this

is perhaps the closest analog to just

asking that question but because the ask


block returns a value the analog here

forget string is that it too returns a

value it doesn't just print the human's

input it hands it back to you in the

form of a variable AKA a return value

that I can then use and reuse

now ideally it would be as simple as

this literally saying answer on the left

equals and this is where things start to

diverge from math and sort of our human

world this equal sign henceforth is not

the equal sign it is the assignment

operator to assign a value means to

store a value in some variable and you

read these things weirdly right to left

so here is a function called get string

I claim that it's going to return to you

whatever the human types in is their

name

it's going to get stored over here on

the left because of this so-called

assignment operator that yes is an equal

sign but it doesn't mean equality in

this context it makes things equal but

it does so by copying the value on the

right into the thing on the left

unfortunately we're not quite done yet

with c and this is where again it gets a

little Annoying at first where a scratch


just let us express our ideas without so

much syntax

in C when you have a variable you don't

just give it a name like you did in

scratch you also have to tell the

computer in advance what type of value

it is storing

string is one such type of value int for

integer is going to be another and

there's even more than that that we'll

see today and Beyond and this is partly

an answer to the question that came up

one or more times last week which was

how does a computer distinguish this

pattern of zeros and ones from this like

is this a letter a number a color a

piece of video and I just claimed last

week that it totally depends on the

program it depends on the context and

that's true but within those programs it

often depends on what the human

programmer said the type of the value is

if this specifies it's a string which

means interpret the following zeros and

ones that are stored in my program as

words or letters more generally if it

said int for integer it would be

implying by the programmer treat the

following zeros and ones in my program

as a number an integer not a string so


here's where this week unlike with

scratch which just kind of figures out

what you mean with C in a lot of

languages you have to be this pedantic

and tell it what you mean there's still

one stupid thing missing from my code

here

what's still missing here yeah

and we still need the stupid semicolon

and I'm sort of impuning it here because

honestly these are the kinds of stupid

mistakes you're gonna make today

tomorrow this weekend next week a few

weeks from now until you start to notice

this and recognize it as well as you do

English or whatever your uh spoken

language is Yeah question

good question suppose I mix apples and

oranges so to speak and I try to put a

string in an INT or an INT in a string

the compiler is going to complain so

when I run that make Command as I did

earlier it's not going to be nice and

blissfully quiet and just give me

another prompt it's going to yell at me

with honestly a very cryptic looking

error message until we get the muscle

memory for reading it other questions

um what happened to the backslash n so


we'll come back to that in just a moment

if we may because I have deliberately

omitted it here but we did have it

earlier and we'll see the different

behavior in a sec

other questions

yeah not at all nitpicky these are the

kinds of things that just matter and

it's going to take time to recognize and

develop this muscle memory everything

I've typed here except for the W at the

moment is lowercase and the W is

capitalized just because it's English

everything else is lowercase and this

kind of varies by language and also

context so in many languages the

convention is to use all lowercase

letters for your variable names other

languages might use some capitals as

well but we'll talk about that before

long but this is the kind of thing that

matters and is hard to see at first

especially when a little s doesn't look

that different you know when it's on

your tiny laptop screen from a

capitalist but you'll start to develop

these instincts all right so besides

this particular block let's go ahead and

consider how we can go about

implementing this now in code so let me


switch back to vs code here this was the

program I had earlier and let me go

ahead and undo my cs50 change and this

time just rerun it rerun make on hello

with the original version with the

backslash n enter nothing bad seems to

have happened so dot slash hello enter

hello world now if you're curious this

is a good instinct to start to acquire

what happens if I get rid of this well

I'm probably not going to break things

too badly so let's try let me go ahead

now and do make hello

still compile so it's not a really bad

mistake so let me go ahead and run dot

slash hello

what's the difference here

yeah what do you see that's different

yeah the dollar sign my so-called prompt

stayed on the same line why well we can

presumably infer now that the backslash

n is some fancy notation for saying

create a new line move the cursor so to

speak to the next line notice that the

cursor will move to the next line in my

terminal window if I keep hitting it it

just automatically by nature of hitting

enter does it but it'd be kind of stupid

if when you run a program in this world


simple as it is if like the next command

is now weirdly spaced in the middle of

the terminal with the dollars on it just

looks sloppy right that's it's really

just an aesthetic argument and notice

that it's not acceptable or correct to

do this

to hit enter there let me go ahead and

save that though and see what happens

let me go ahead now and run uh make

hello enter oh my God like four errors

this is like what 10 lines of Errors for

like a one-line program and this is

where again you'll start to develop the

instincts for just reading this stuff

these kinds of tools like the compiler

tool we're using were not designed

necessarily with user friendliness in

mind that's changed over the decades but

certainly early on it's really just

meant to be correct and precise with its

errors so what did I do here missing

terminating close quote character long

story short when you have a string in C

your double quotes just have to be on

the same line just because now there's

the slight white lie there's ways around

this but the Cur best way around it is

to you this use this so-called escape

sequence to escape something means


generally to put a backslash and then a

special symbol like n for new line and

this is just the agreed upon way that

humans decades ago decided okay you

don't just hit your Enter key you

instead put backslash n and that tells

the computer to move the cursor to the

new line

so again kind of cryptic but once you

know it like that's it it's just another

word in our vocabulary so now let me

transition to making my program a little

more interactive instead of just saying

hello world let me change it like last

week to say hello David or whoever's

interacting with the program so I'm

going to do string answer gets get

string quote unquote what's your name

I'm not going to bother with a new line

here I could this is now just a judgment

call I I deliberately want the human to

type their name on the same line just

because

and how do I now print this well last

week recall we used say and then we used

the other block called join so the idea

here is the same but the syntax this

week is going to be a little different

it's going to be printf which prints


something on the screen

I'm going to go ahead and say hello

comma and let me just go with this

initially with the backslash n semicolon

let me go ahead and re-compile my code

whoops

huh damn doesn't work still and look at

all these errors like there's more

errors than code I wrote

but what's going on here well this is

actually something a mistake you'll see

somewhat often at least initially and

let's start to glean what's going on

here so here if I look at the very first

line of output after the dollar sign so

even though it jumped down the screen

pretty fast I wrote make hello at the

dollar sign prompt and then here's the

first error on hello.c line five

technically character five but generally

line is enough to get you going there's

an error use of Undeclared identifier

string did you mean standard in

so I didn't and this is not an obvious

solution at first but you'll start to

recognize these patterns and error

messages it turns out that if I want to

use string I actually have to do this I

have to include another Library up here

another line of code rather called


cs50.h we'll come back to what this

means in just a moment but if I now

retroactively say all right what does

standard i o do for us up here before I

added that new line

what is standard IO doing well if you

think back to scratch there were a few

examples with like the camera and with

the speech to uh the text to voice

remember I had to poke around in like

the extensions button and then I had to

load it into scratch it didn't come

natively with scratch C is quite like

that some functions come with the

language but in for the most part if you

want to use a function an action or verb

like printf you have to load that

extension so to speak that more

traditionally is called a library so

there is a standard i o Library stdio

standard IO where I O just means input

and output which means just like in

mit's World there was an extension for

doing text to voice or for using your

camera in C there's an extension AKA a

library for doing standard input and

output and so if you want to use any

functions related to standard input and

output like keyboard text from a


keyboard

you have to include standardio.h and you

have to then can you use printf

same goes here get string it turns out

is a function that cs50 wrote some time

ago and on as we'll see over the coming

weeks it just makes it way easier to get

input from a user C is very good with

printf at printing output on the screen

C makes it really annoying and hard as

we'll see in a few weeks to just get

input from the user so we wrote a

function called get string but the only

way you can use that is to load the

extension AKA load the library called

cs50 and we'll come back in time like

why is it dot h why is it a hash symbol

but for now

standard i o is a library that gives you

access to printf and input and output

related stuff cs50 is the second library

that provides you with access to

functions that don't come with C that

include something like get string so

with that said

we've now kind of teased apart at a high

level what lines two and now one are

doing let me go ahead and rerun make

hello now it worked so all those crazy

error messages were resolved by just one


fix so key takeaway is not to get

overwhelmed by the sheer number of

Errors let me now do dot slash hello

and if I type in my name what am I going

to say

what do you think

yeah hello answer because the computer

is going to take me literally and it

turns out that if you just write hello

comma answer all in the double quotes

you're really just practicing English as

the input to the printf function you're

not actually passing in the variable and

unfortunately in C it's not quite as

easy to plug things in to other things

that you've typed remember in scratch

there was not just the save block but

the join block which was kind of pretty

it does you go you can combine apples

and oranges or most apple and banana

then we changed it to hello and then the

answer that the human typed in and see

the syntax is going to be a little

different you tell the computer inside

of your double quotes that you want to

have a placeholder there so-called

format code percent s means hey computer

put a string here eventually

then outside of your quotes you just add


a comma and then you type in whatever

variable you want the computer to plug

in at that percent s location for you so

percent s is a format code which serves

as a placeholder and now the printf

function was designed by humans years

ago to figure out how to do the apple

and banana thing of joining two words

together it's not nearly as user

friendly as it is in scratch but it's a

very common Paradigm so let me try and

rerun this now make hello

no errors that's good dot slash hello

what's my name David if I type enter now

now it's hello David and the printf

here's the f in printf it formats its

input for you by using these

placeholders for things like strings

represented Again by

percent s

so a quick question then if I focus here

on line seven for just a moment and even

zoom in here

how many inputs is printf taking as a

function a moment ago I'd admit that it

was taking one input hello world quote

unquote how many inputs might you infer

printf is taking now

two and it's implied by this comma here

which is separating the first one quote


unquote hello percent s from the second

one answer and then just as a quick

safety check here why is it not three

because there's obviously two commas

here

why is it not actually three arguments

or inputs

exactly the comma in the to the left is

actually part of my English grammar

that's all so same syntax and again

here's where again programming can just

be confusing early on because we're

using the same special punctuation to

mean different things it just depends on

the context and so now is actually a

good time to point out all of the

somewhat pretty colors that have been

popping up on the screen here even

though I wasn't going to like a format

menu I wasn't bold facing things I

certainly wasn't changing things to red

or blue or whatnot that's because a text

editor like vs code

syntax highlights for you this is a

feature of so many different programming

environments nowadays vs code does it as

well if your text editor understands the

language that you're programming in see

in this case it highlights in different


colors the different types of ideas in

your code so for instance string and

answer here are in black but get string

a function is in this uh sort of nasty

Brown yellow here right now but that's

just how it's displays on the screen the

screen though here and red is kind of

jumping out at me and that's marginally

useful the percent s is in blue that's

kind of nice because it's jumping out at

me and so it's just using different

colors to make different things on the

screen pop so you can focus on how these

ideas interrelate and honestly when you

might make a mistake for instance let me

accidentally leave off this quote here

and now all of a sudden everything

notice if I delete the quote the colors

start to get a little awry

but if I go back there and put it back

now everything's back in place what's

another feature of this text editor

notice when my cursor is next to this

parenthesis which demarcates the end of

the inputs to the function notice that

highlighted in green here is the opening

parenthesis why it's just a visually

useful thing especially when you start

writing more and more code just to make

sure your parentheses are lining up and


that's true for these curly braces over

here on the left and the right we'll

come back to those in a moment if I put

my cursor there you can see that these

things correspond to one another so it's

nothing in your code fundamentally it's

just the editor trying to help you the

human program and you can even see it

though it's a little subtle see these

four dots here and these four dots here

that's my indentation I'm configured vs

code to indent by four spaces which is a

very common convention anytime I hit the

Tab Key this too can help you make sure

once we have more interesting and longer

programs

that everything lines up nice and neatly

all right any questions then on printf

or more yeah

short answer yes printf can handle more

than one type of variable or value

percent s is one we're going to see

percent I as another for plugging in an

integer you can have multiple I's

multiple S's and even other symbols too

we'll come back to that in just a little

bit printf can take many more arguments

than just these two this is just meant

to be representative
yeah over here

can you declare variables within the

printf no the variables the only

variable I'm using right now is answer

and it's got to be done outside the

context of printf in this case good

question we'll see more of that before

long

uh yeah in back

how do we download the cs50 library so

uh we will show you in problem set one

exactly how to do that it's

automatically done for you in our

version of vs code in the cloud if

ultimately you program on your own Mac

or PC either initially or later on it's

also installable online but if you want

to ask that

um a via online or afterward we can

point you in the right direction but

pset one will itself yeah

string is the type of the variable or

the more properly the data type of the

variable int is another keyword I

alluded to earlier I haven't used it yet

int for integer is going to be another

type or data type of variable

yeah

ah good question could I go ahead and

just plug in this function kind of like


we did in scratch getting rid of the

variable altogether and just do this

which recall is reminiscent of what I

did in Scratch by plopping block on top

of block on block am I answering that

right

can I put string in front of get string

no you only put the word string in front

of a variable that you want to make

string and even though I'm apparently

answering the wrong question let me go

ahead and zoom out save this do make

hello again seems to compile okay if I

run dot slash Hello type in David voila

that too works and so actually let's go

down this rabbit hole for just a moment

clearly it's still correct at least

based on my limited testing

is this better designed or Worse design

let's open that question like we did

last week yeah

yeah I kind of agree with that um

reasonable people could disagree but I I

do agree that this seems harder to read

because like I have to I start reading

here but wait a minute get string is

going to get used first and then it's

going to give me back a value so yeah it

just feels like it was nicer to read top


to bottom I would say your thoughts

yeah and so this is useful if I only

want to print out the person's name once

if I want to use it later in a longer

program I'm out of luck and so I haven't

saved it in a variable so I think long

story short we could sort of debate this

all day long but in this case yeah like

if you can make a reasonable argument

one way or the other like that's a

pretty solid ground to stand on but

invariably reasonable people are going

to disagree with their first time

programmers or many years after that so

let's frame this one last example in the

context of the same process of taking

inputs and outputs the functions we've

been talking about all take inputs

otherwise now known as arguments or

parameters pretty much synonymous that's

just the fancy word for an input to a

function and some functions have either

side effects like we saw printing

something saying something on the screen

sort of visually or audibly or they

return a value which is a reusable value

like name or answer in this case if we

look then at what we did last time in

the world of scratch last week the input

was what's your name the function was


asked and the return value was answer

and now let's take a look at this block

which is honestly a more user-friendly

version of what we just did with the

percent s last week we said say then

join then hello and answer but the

interesting takeaway there was not how

to say hello anything it was the fact

that in Scratch 2 the output of one

function like the green join could

become the input to another function the

purple say the syntax in C is admittedly

pretty different but the idea is

essentially the same here though we have

hello a placeholder but we have to in

this world of C tell printf what we want

to plug in for that placeholder it's

just different but that's the way to do

it when we get to Python and other

languages later in the term there's

actually easier ways to do this but this

is a very common Paradigm particularly

when you want to format your data in

some way all right let's then take a

step back to where we began which was

with that whole program which had the

include and it had int main void and all

of this other cryptic syntax

this scratch piece last week was kind of


like the go-to whenever you want to have

a main part of your program it's not the

only way to start a scratch program you

could listen for clicks or other things

not just the green flag but this was

probably the most popular place to start

a program in scratch

in C the closest analog is to literally

write this out so just like last week if

you were in the habit of dragging and

dropping when green flag clicked as a c

programmer the first thing you would do

is after creating an empty file like I

did with hello.c you'd probably type int

main void open curly brace curly brace

and then you can put all of your code

inside of those curly braces so just

like scratch had this sort of magnetic

nature to it where the puzzle pieces

would snap together C is a text-based

language tends to use these curly braces

one of them opened the other one closed

and anything inside of those braces so

to speak is part of this puzzle piece

AKA Main

so what was atop them we went down this

Rabbit Hole a moment ago with these

things called header files even though I

didn't call them by this name but indeed

when we have a whole program in scratch


super easy just have the one green flag

clicked and then say hello world there's

no special syntax after all it's meant

to be very user friendly and graphical

in C though you technically can't just

put in main void printf hello world you

also need this because again you need to

tell the compiler to load the library

code that someone else will wrote so

that the compiler knows what printf even

is you have to load the cs50 library

whenever you want to use get string or

other functions like get int as we'll

soon see otherwise the compiler won't

know what get string is you just have to

do it this way the specific file name

I'm mentioning here

standardio.h cs50.h is what C

programmers called a call a header file

we'll see eventually what's inside of

those files but long story short it's

like a menu of all of the available

functions so in cs50.8 there's a menu

mentioning get string get int and a

bunch of other stuff and in as

standardio.h there's a menu of functions

among which are printf and that menu is

what prepares the compiler to know how

to implement those same functions


all right let me pause here question

a not quite a library provides all of

the functionality we're talking about a

header file is the very specific

mechanism via which you include it and

we'll discuss this more next week for

now they're essentially the same but

we'll discuss nuances between the two

next week

yeah the library would be standard i o

the library would be cs50 the

corresponding header file is

standardio.h cs50.h

indeed other questions yeah

indeed I that too is on the menu we'll

come back to that but the word string

incredibly common in the world of

programming it's not a cs50 idea but in

C there's technically no such data type

as string by default we have sort of

conjured it up to simplify the first few

weeks that's a training wheel that will

vary deliberately in a few weeks take

away and we'll see why we've even been

using get string and string because C

otherwise makes things uh quite uh quite

more challenging early on which then

gets besides the point for us yeah

yes early on you will have to use

whatever's prescribed by the


specification that will include cs50s

functions long story short you referred

I think a moment ago to another function

called scanf we won't talk about for a

few weeks long story short in C it's

pretty easy and possible to get input

from a user the catch is that it's

really easy to do it dangerously and C

because it's an older lower level

language so to speak that gives you

pretty much all ultimate control over

your computer's Hardware it's very easy

to make mistakes and indeed

um that's to why we use the library so

your code won't crash unintendedly

all right so with this in mind we have

this now mapping between the scratch

version and the other let me just give

you a quick tour of some of the other

placeholders and data types that we'll

soon start seeing as we assemble more

interesting programs in the world of

Linux here is a non-exhaustive list of

commands with which you'll get familiar

over the next few weeks by playing with

problem sets we've only seen two of

these so far LS for list RM for others

but I mentioned them now just so that it

doesn't feel too foreign when you see


them on screen or online in a problem

set CP is going to stand for copy make

dur is going to stand for make directory

MV is going to stand for move or rename

rmdur is going to be remove directory

and CD is going to be for change

directory and let me show you this last

one here first only because it's

something you'll use so commonly if I go

back to my code here on the screen

I'm going to go ahead and reopen the

little GUI on the left hand side the

so-called Explorer revealing that I've

got two files hello and hello.c so

nothing has changed since there suppose

now that you know it's a few weeks into

class and I want to start organizing the

code I'm writing so that I have a folder

for this week or next week or maybe a

folder for problem set one problem set

two I can do this in a few ways in the

GUI I can go up here and do what most of

you would do instinctively on a Mac or

PC you look for like a folder icon you

click it and then you name a folder like

pset1 enter voila you've got a folder

called pset1 I can confirm as much with

my command line interface by typing what

command

how can I list what's in my folder yeah


so LS for list and now I see Hello and

it's green with an asterisk because

that's my executable my runnable program

hello.c which is my source code and now

pset one with a slash at the end which

just implies that it's indeed a folder

all right I didn't really want to do it

that way I'd like to do it more uh

Advanced so let me go ahead and right

click on pset one delete permanently I

get a scary irreversible error message

but there's nothing in it so that's fine

now I've deleted it using the GUI but

now let me go ahead and start

doing the same thing from the command

line and if you're wondering how things

keep disappearing if you hit Ctrl l in

your terminal window or explicitly type

clear it will delete everything you

previously typed just to kind of clean

things up in practice you don't need to

be doing this often I'm doing it just to

keep our focus on my latest commands if

I do what was the command to make a new

directory

yeah so make dur make directory let me

create pset one enter and notice it left

there's my piece at one if I want to get

a little overzealous planned for next


week here's my pset 2 directory suppose

now I want to open those folders one on

Mac or PC or in this GUI I could double

click on it like this and you'd see this

little arrow is moving it's not doing

anything because there's nothing in

there but that's fine but suppose again

I want to get more comfortable with my

command line notice if I type LS now I

see all four same things let me change

directories with CD

with CD Space pset 1 enter and now

notice two things will have happened one

my prompt has changed slightly to remind

me where I am just to keep me sane so

that I don't forget what folder I'm

actually in so here is just a visual

reminder of what folder I'm currently in

if I type LS now what should I see after

hitting enter

nothing because I've only created empty

folders so far and indeed I see nothing

if I wanted to create a folder called

Mario for a program that might be called

Mario this week I can do that now if I

type LS there's Mario now if I do CD

Mario notice my prompt is going to

change to be a little more precise now

I'm in peace at one slash Mario and

notice what's happening at top left


nothing now because these folders are

collapsed but if I click the Little

Triangle there I see Mario nothing's

going on in there because there's no

files yet but suppose now I want to

create a file called mario.c

I could go up here I could click the

little plus icon and use the GUI or I

can just type code mario.c voila that

creates a new tab for me I'm not going

to write any code in here yet but I am

going to save the file and now at top

left you'll see that mario.c appears so

at some point you can eventually just

close the Explorer because again it's

not providing you with any new

information it's maybe more user

friendly but there's nothing you can't

do at the command line that you could do

with the GUI

all right but now I'm kind of stuck how

do I get out of this folder in my Mac or

PC World I'd probably click the back

button or something like that or just

close it and start all over in the

terminal window I can do CD dot dot dot

dot is a nickname if you will for The

Parent Directory that is the previous

directory so if I hit enter now notice


I'm going to close the Mario folder AKA

directory and now I'm back in pset one

or if I want to be fancy let me go back

into Mario temporarily if I type LS

there's mario.c just to orient us if I

want to do multiple things at a time I

could do CD slash CD dot dot slash dot

dot which goes to my parent to my

grandparent all in one breath and voila

now I'm back in my default folder if you

will and one last little trick of the

trade if I'm in P set one slash Mario

like I was a moment ago and you're just

tired of all the navigation if you just

type CD and hit enter it'll whisk you

away back to your default folder and you

don't have to worry about getting there

manually recall a bit ago though that I

was running hello as this dot slash

hello

if dot dot refers to my parent perhaps

infer here syntactically what does a

single dot mean instead

it means this directory your current

directory why is that necessary it just

makes super explicit to the computer

that I want the program called hello

that's installed here not in some random

other folder on my hard drive so to

speak I want the one that's right here


instead all right so besides these

commands there's going to be others that

we encounter over time those are kind of

the basics that allows you to sort of

wean yourself off of a GUI graphical

user interface and start using more

comfortably with practice and time a

command line interface instead well what

about those other types now back in the

world of c those commands were not c

those are just commands specific to a

command line interface like in Linux

which again we're using in the cloud

it's an alternative to Mac OS and

windows back in the world of C now we've

seen strings which are words I mentioned

int or integer but there's others as

well in the world of C we've seen string

we will see int if you want a bigger

integer there's something literally

called along if you Want A Single

Character there's something called a

Char if you want a Boolean value true or

false there is a bull and if you want a

floating point value a fancy way of

saying a real number something with a

decimal point in it that is what C and

other languages call a float and if you

want even more numbers after the decimal


point that is more Precision you can use

something called a double

that is to say here is again an example

in programming where it's up to you now

to provide the computer with hints

essentially that it will rely on to know

that what is this pattern of zeros and

ones is it a number a letter is it a a

sound an image a color or the like these

are the types of data types that provide

exactly those hints

what are the functions that come in the

menu that is the cs50 library we talked

about standard i o and that's just one

function so far printf in the cs50

library you can see that it follows a

pattern the cs50 library exists largely

for the first few weeks of the class to

make our lives easier when you got when

you just want to get user input so if

you want to get a string like word or

words from the human you use get string

if you want to get an integer from the

user you're going to use get int when

you want to get any of those other data

types for the most part you use get

underscore something else and they're

indeed all lower case by convention what

about printf if we have the ability now

to store different types of data and we


have functions with which to get

different types of data how might you go

about printing different types of data

well we've seen percent s for string

percent I for integer percent C for Char

percent f for a float or a double those

real numbers I described earlier and

then percent Li for a long integer so

here's the first example of like

inconsistencies in an ideal world it

would just be percent L and we'd move on

it's percent Li instead in this case

that's printf and some of its format

codes what more might we do well in C as

we'll see no pun intended there's a

whole bunch of operators and indeed

computers one of the first things they

did was a lot of math and calculations

so there's a lot of operators like these

computers and in turn C really good at

addition subtraction multiplication

division and even the percent sign which

is the remainder operator there's a

special symbol in C and other languages

just for getting the remainder when you

divide one number by another

there are other features in the world of

C like variables as we've seen and

there's also what is sort of playfully


called syntactic sugar that makes it

easier over time to write fewer

characters but express your thoughts the

same so just as a single example of this

as a single example consider this

use of a variable last week here in

scratch is how you might set a variable

called counter to zero in C it's going

to be similar if you want the variable

to be called counter you literally write

the word counter or whatever you want it

to be called you then use the assignment

operator AKA you Google sign and you

assign it whatever its initial value

should be here on the right so again the

zero is going to get copied from right

to left into the variable because of

that single equal sign but this isn't

sufficient in C

what else is missing on the right hand

side instinctively now even if you've

never programmed in this before yeah in

front

a semicolon at the end and one other

thing I think is probably miss sing

again

a data type so what would you if we can

uh keep going back and forth here what

data type seems appropriate intuitively

for a counter
int for integer so indeed we need to

tell the computer when creating a

variable what type of data we want and

we need to finish our thought with the

semicolon so there might be a

counterpart there what about inscribe if

we wanted to increment that counter

variable we had this very user friendly

puzzle piece last time that was change

counter by one or add one to counter in

C here's where things get a little more

interesting

and pretty commonly done you might do

this counter equals counter plus one

with a semicolon and this is where again

it's important to note the equal sign

it's not equality otherwise this makes

no sense counter cannot equal counter

plus one right like that just doesn't

work if we're talking about integers

here that's because the equal sign is

assignment so it can certainly be the

case that you calculate counter plus one

whatever that is then you update the

value of counter from right to left to

be that new value this as we'll see is a

very common thing to do in programming

just to kind of count upward for

whatever reason you can write this more


succinctly this code here is what we'll

call syntactic Sugar sort of a fancy way

of saying the same thing with fewer

words or fewer characters on the screen

this also adds one or whatever number

you type over here to the variable on

the left and there's one other form of

syntactic sugar we're going to start

seeing too and it's even more terse than

this that too will increment counter by

one by literally changing its value by

one or if you change it to minus minus

subtracting one from it you can't do

that with two and three and four but you

can do it by default with just plus plus

or minus minus adding or subtracting one

ah so when you are changing a variable

that already has been created as we did

with the code that looks like this you

no longer need to remind the computer

what the data type is thankfully the

computers uh at least as smart as that

it will remember the type of the data

that you intended

other questions or comments on this

all right that's quite a lot why don't

we go ahead and here take a 10 minute

break and we'll be back we'll start

writing some code

all right
so we are back we've just looked at some

of the basics of compiling even if it

doesn't quite feel that basic but now

let's actually start focusing really on

writing more and more code more and more

interesting code kind of like we dove

into scratch uh last week so here I have

vs code open I've closed the GUI I'm

going to focus more on my terminal

window and my code editors many

different ways I can create new files

but I want to create something called a

calculator so again within this

environment of vs code I can literally

write the code command which is vs code

specific and it just creates a new file

for me automatically or I could do that

in the GUI I'm going to go ahead and

create this file called calculator.c and

I'm going to go ahead and include some

familiar things so I'm just going to go

ahead and proactively include cs50.h

standardio.h I'm going to go ahead from

memory and do the in void main more on

that next week why it's in why it's void

and so forth and now let me just

Implement a very simple calculator we

saw like some mathematical operators

like Plus and the like so let's actually


use this so let me go ahead and first

give myself a variable called X sort of

like grade school math or algebra let me

go ahead then and get an INT which is

new but I mentioned this exists and then

let me just ask the user for whatever

their x value is the thing in the quotes

is just the English or the string that

I'm printing on the screen so I could

say anything I want I'm just going to

say x colon to prompt the user

accordingly now I'm going to go ahead

and get another variable called y I'm

going to get int again and now I'm going

to prompt the user for y and I'm just

very nitpically losing a space just to

like move the cursor so it doesn't look

too messy on the screen and then lastly

let me go ahead and just print out the

sum of X and Y in an Ideal World I would

just say something like print F X Plus y

but that is not valid in C the first

argument recall in printf has to be a

string in double quotes So if I want to

print out the value of an integer I need

to put put something in quotes here may

be followed by a new line if I want to

move the cursor as well so again we only

glimpsed it briefly but what do I

replace these question marks with if I


want a placeholder for an integer

yeah so percent I right just like

percent s with string percent I is

integer so I change this percent I and

now if I want to add X and Y for

instance super simple calculator doesn't

do much of anything other than addition

of two integers I think this works and

again it looks definitely cryptic at

first glance like it would be nice if

programming weren't this cryptic other

languages will clean this up for us but

again if you focus on the basics printf

takes one input first which is a format

string with English or whatever language

some placeholders maybe then it takes

potentially more arguments after the

comma like the value of x plus y all

right let me go ahead now and make

calculator which again compiles my

source code in C pictured above and

converts it into corresponding machine

code or zeros and ones no error messages

so that's that's already good now I do

dot slash calculator let's do one plus

one and enter voila now I have the

makings of account calculator

now let's start to Tinker with this a

little bit what if I instead had done


this int Z gets X Plus Y and then plug

in Z here

if I rerun make calculator enter rerun

dot slash calculator type in one plus

one still equals two and let me claim

that it will work for other values as

well

which of these versions is better

designed if both seem to be correct at

very cursory glance is this version

better or is the previous one without

the Z

okay so this one's arguably better

because I've now got a reusable variable

called Z that I can not only print but

Heck if my program is longer I can use

it elsewhere counter thoughts

yeah debatable like before because it

depends on my intent and honestly I

think a pretty good argument could be

made for the first version because if I

have no intention of as you note using

that letter uh that variable again you

know what maybe I might as well do this

just because it's one less thing to

think about it's one less distraction

it's one less line of code to have to

understand it's just a little tighter so

here again I think it does depend on

your intention but this feels pretty


reasonable and I think as someone noted

earlier when I did the same thing with

get string that yeah maybe kind of cross

the line because get string and the

what's your name inside of it was just

so much longer but X Plus Y and it's not

that hard to wrap our mind around what's

going on inside of the printf argument

so again these are the kinds of thoughts

that hopefully you'll acquire the

Instinct for not necessarily reaching

the same answer as someone else but

again the thought process is what

matters here all right so how might I

enhance this program a little bit let's

just talk about style for just a moment

so X and Y at least in this case are

pretty reasonable variable names why

because like that's the go-to variable

names in math when you're adding two

things together so X and Y seem pretty

reasonable I could have done something

like well maybe my first variable should

be called first number and my next

variable should be called second number

and then down here I would have to

change this to first number plus second

number like this isn't really adding

anything semantically to help my


comprehension but that would be one

other direction we could have taken

things so if you have very simple ideas

that are conventionally expressed with

common variable names like X and Y

totally fine here what if I want to

annotate this program and remind myself

what it is it does well I can add in C

what are called comments with a slash

slash to forward slashes you can write a

note to yourself like prompt user for x

and then down here I could do something

like prompt user for y just to remind

myself what I'm doing there and down

here perform addition now in this case

not sure these comments are really

adding all that much because in the time

it took me to write and eventually read

these comments I could have just read

the three lines of code but as our

programs get more sophisticated and you

start to learn more syntax that honestly

you might forget the next day the next

week the next month might be useful to

have these sort of notes to self that

remind you of what your code is doing or

maybe even how it is doing that thing

with these early programs not really

necessary doesn't really add all that

much to our comprehension but it is a


mechanism you have in place that can

help you actually remind yourself or

remind someone else what it is that's

going on well let me go ahead and rerun

this again in this current version make

calculator and here too you might think

I'm typing Crazy Fast not really I'm

hitting tab a lot so it turns out that

Linux the operating system we're using

here in the cloud but actually Windows

and Mac OS nowadays support this too

supports autocomplete so if you only

have one program that starts with c a Al

you don't have to finish writing

calculator you can just hit Tab and the

computer will finish your thought for

you the other thing you can do is if you

hit up and keep going up you'll scroll

through your entire history of commands

so there too I've been saving some

keystrokes by hitting up quickly rather

than retyping the same darn thing again

and again so again just another little

convenience to make programming and

interacting with a command line

interface even faster all right let me

go ahead and just make sure it's

compiled in the current form the

comments have no functional impact these


green things are just notes to self let

me run calculator with maybe how about

this instead of one plus one how about

1 billion

uh oops

one million one billion and another one

billion

and that answer is 2 billion all right

so that seems correct let's run this

program one more time how about two

billion plus another 2 billion

did you know that

so apparently it's not so correct and

clearly running one plus one was not the

most robust testing of my code here what

might have gone wrong

what might have gone wrong yeah

yeah the computer probably ran out of

space with bits so it turns out with

these data types we've been talking

about string and int and also float and

Char and those other things they all use

a specific and most importantly finite

number of bits to represent them it can

vary by computer newer computers use

more bits older computers tended to use

fewer bits it's not necessarily

standardized for all of these data types

but in this case in this environment it

is using 32 bits for an integer that's a


lot with 32 bits You Can Count pretty

high this is 64 light bulbs on the stage

and could count even higher and int is

only using half of these or we have two

integers here on the stage now if you

think back to last week we talked about

eight bits at one point and if you have

eight bits eight zeros and ones you can

count as high as

256 just a good number to generally

remember is trivia eight bits gives you

256 permutations of zeros and ones 32

gives you roughly how many if anyone

knows

2 to the 32 power

so it's roughly 4 billion 2 to the 32.

if you don't know that it's it's fine

most uh programmers though eventually

remember these kinds of heuristics so

it's roughly 4 billion so that feels

like enough 2 billion plus two billion

is exactly 4 billion and that actually

should fit in a 32-bit integer the catch

is that my Mac your PC and the like also

like to support negative numbers and if

you want to support both positive and

negative numbers that technically means

with 32-bit integers you can count as

high as roughly 2 billion positive or 2


billion negative in the other direction

that's still 4 billion give or take but

it's only half as many in One Direction

or the other so how could I go about

implementing a correct calculator here

what might the solution be

yeah so not just Li which was for long

integer I have to make one more change

which is to the data type itself so let

me go back up here and change X from an

INT to a long AKA long integer and then

let me change y as well and then let me

change the format code per the little

cheat sheet we had up a few minutes ago

to Li let me recompile the calculator

seems to work okay let's rerun it now

let's do one plus one that should

obviously be the same now let's do two

billion and another 2 billion and cross

our fingers this time now we're counting

as high as 4 billion and we can go way

higher than 4 billion but we're only

kicking the can down the street a bit

even though we're now using with a long

64 bits which is as long as this stage

now that's still a finite value it might

be a really big value but it's still

finite and we'll come back at the end of

today to these kinds of fundamental

limitations because arguably now my


calculator is correct for like millions

billions of possible inputs but not all

and that's problematic if you actually

want to use my calculator for any

possible

inputs not just ones that are roughly

less than say 2 billion as in this case

are any questions then on that but it's

really just a precursor for all the

problems that we're going to have to

eventually deal with later on

a good question yes if we were still

using Z we would also have to change it

to a long otherwise we'd be ignoring 32

of the bits that had been added together

via the Longs good question

all right so how about we spice things

up with maybe not just addition here how

about

um

something with some conditions let's

start to ask some actual questions so a

moment ago recall that we had

recall that we had just the Declaration

of variables now let's look back at

something in scratch that looked a

little something like this a bunch of

puzzle pieces asking questions by way of

these conditionals and then these


Boolean Expressions here in green maybe

saying something like X is less than y

in C this actually Maps pretty cleanly

it's much cleaner from left to right

than it was with printf and join here we

have just code that looks like this if a

space two parentheses and then X less

than y and then we have something like

printf there in the middle so here it's

actually kind of a nice mapping notice

that just as the yellow puzzle piece in

scratch is kind of hugging the purple

puzzle piece that's effectively the role

that these curly braces are playing

they're sort of encapsulating all of the

code on the inside

the parentheses represent the Boolean

expression that needs to be asked and

answered to decide whether or not to do

this thing and here's an exception to

what I alluded to earlier usually when

you see a word and then a parenthesis

something and then close parenthesis I

claim that's usually a function and I'm

still feel pretty good about that claim

but there are exceptions and the word if

is not a function it's just a

programming construct it's a feature of

the C language that similarly uses

parentheses just for different purposes


for a Boolean expression how about

something like this last week if you

wanted to may have a two-way fork in the

road go this way or that way you can

have if and else in C that would look a

little something like this and if we add

in the printfs it now looks quite like

the same but it adds of course the word

else and then a couple of more curly

braces as an aside in C it's not

strictly necessary to have curly braces

if you have only one line of code

indented under beneath for best practice

though do so anyway because it makes

super clear to you and ultimately anyone

else reading your code that you intend

for just that one or more line of code

to execute how about this from last week

here was a three-way fork in the road if

x is less than y else if x is greater

than or equal to a greater than y else

if x equals y now here's where you have

some disparities between scratch and C

scratch uses an equal sign for equality

to compare two values C uses a single

equal sign for assignment from right to

left minor difference between the two

worlds in C we could implement the same

code like this the addition being just


this additional else if and if we add in

the printfs it looks a little something

now like this

this is correct both in the scratch

world and in the sea world but could

someone make a claim that this is not

again well designed exactly we don't

need the last if we need the else at

least but we don't need the at last if

because at least in the world of

comparing integers it's either going to

be less than greater than or equal to

there is no other case so you can save

you know a few seconds if you will if

your program running a blink of the eye

by only asking two questions and then

inferring what the answer to the third

must be just by nature of your own human

logic here now why is that a good thing

if for instance X and Y happen to equal

each other I type in 1 and 1 for both

values either in scratch or in the C

world in the case of this version you're

sort of stupidly asking three questions

all of which are going to get asked even

though the answer is no no yes that is

true false false true that seems to be

unnecessary because if we instead

optimize this code get rid of the

unnecessary if and just do as you


propose logically else print that X is

equal to Y now if x and D equals y

because they're both 1 or some other

value now you're only going to ask two

questions so two-thirds as many

questions and then you're going to get

your same correct result so again A

Minor Detail but again the kinds of

things you should be thinking about not

only as you write your code to be

correct but also write it to be well

designed as well all right so why don't

we go ahead and translate this into the

context of an actual program here I'll

create a blank window here and let's do

something with like points like points

on my own very first cs50 problem set

let me go ahead and run code of points.c

that's just going to give me a new text

file and then up here I'm going to do my

usual include cs50.h include

standardio.h int main void so a lot of

boilerplate so to speak in these early

programs and now let me just let's see

let's ask the user how many points did

they lose on their most recent cs50p

sets to sort of evoke my photograph of

my own very first preset last week where

I lost a couple of points myself so int


points equals get int then I'll ask a

question in English like how many points

did you lose question mark space

and then once I have this answer let's

now ask some questions of it so if

points is less than 2 borrowing the

syntax syntax that we saw on the screen

a moment ago let's go ahead and print

out something explanatory like you lost

fewer points than me backslash n all

right else if points greater than 2

which is again how many I lost I'm going

to go ahead and print out you lost more

points than me backslash n

else if wait a minute else seems to be

sufficient logically here I'm just going

to go ahead and print out something like

uh you lost the same number of points as

me backslash n so really just a

straightforward application of those

that simple idea but to like a concrete

scenario here so let me go ahead and

save this let me go ahead and run make

points enter no errors that's good run

points and then how many points did you

lose how about it's one point

all right you lost fewer points than me

how about zero points even better how

about three points and so forth so again

we have the ability to express in C now


pretty basic idea from last week in

reality which is this notion of

conditionals and asking questions

there's something subtle here though

that's maybe not super well designed

that someone might call a magic number

this is sort of programming speak for

something I've done here there's a bit

of redundancy unrelated to the if and

the else if and the else

but is there something I typed twice

just to ask perhaps for the obvious

exactly I've hard-coded so to speak

manually typed out the number two in two

locations in this case that did not come

from the user so this is apparently once

I compile this this is it you're always

comparing yourself to me in like 1996

which For Better or For Worse is all the

program can do but this is an example

two of a magic number in the sense like

wait where did that two come from and

why is it in two places it feels like we

are setting the stage for just a higher

probability of screwing up down the road

because the longer this code gets

suppose I'm comparing against two points

elsewhere two three four five places am

I going to keep typing the number two


like yeah that's fine it's correct it's

going to work but honestly eventually

you're going to screw up and you're

going to miss one of the twos you're

going to change it to a three because

maybe I did worse the next year or one I

did better and you don't want these

numbers to get out of sync so what would

be like a logical Improvement to this

design rather than hard coding the same

number sort of magically in two or more

places

yeah why don't I make a variable that I

can use in there so for instance I could

create a variable like this another

integer called mine and I'm just going

to initialize it to 2 and then I'm going

to change mentions of 2 to this and mine

is a pretty reasonable name for a

variable and so far as it refers to

exactly whose points are in question

there's a risk here though minor though

it is I could accidentally change mine

at some point maybe I forget what mine

represents and I do some addition or

subtraction so there's a way to tell the

computer don't trust me because I'm

going to screw up eventually by making a

variable constant 2. so a constant in a

programming language this did not exist


in scratch is just an additional hint to

the computer that essentially enables

you to program more defensively if you

don't trust yourself necessarily to not

screw up later or honestly in practice

if you know that number should never

change make it constant and never think

about it again this tells the compiler

to make sure that even you later in your

code cannot change the number two and

another convention in C and other

languages when you have a constant it's

often common to just capitalize the

variable kind of like you're yelling but

it really just visually makes it stand

out so it's kind of like a nice rule of

thumb that helps you realize oh that

must be a constant capitalization alone

does not make it constant the word const

does but the capitalization is just a

visual reminder that this is somewhere

somehow a constant

so just a minor refinement but again

we're sort of getting better at

programming just by sort of instilling

these kinds of heuristics questions then

on conditionals

and see or these constants yeah

yeah why do you not use a semicolon and


like not lines nine thirteen no good

like just because like this is the way

the language was designed and it's

confusing early on generally speaking

when you're using conditionals and

eventually we'll see Loops there's no

semicolons involved for now assume that

semicolons usually finish your thought

after a function that's not a hundred

percent reliable of a heuristic but

it'll get you most of the way there

and just because left hand was not

talking to right hand when some of these

languages were designed

all right so let's do something else how

about this if I have the ability to ask

something conditionally is this thing

true or is this other thing could I

write a very simple program that does

something basic like tells me if a

number of the human types is even or odd

well let me just get the framework for

that in place let me go ahead and write

code of a parity is a fancy way of

saying even or odd and let me go ahead

and include cs50.h

include standardio.h int main void again

more on those down the road but for now

I'm going to go ahead and get a number n

from the user by calling get int and


asking them for whatever n is and then

now I'm going to introduce some pseudo

code so here's the first example of a

program honestly that I'm not really

sure how to proceed so let me just

resort to some pseudo code using

comments eventually I'll get rid of this

and write actual code but if n is even

then

print actually let me just print that

let me just go ahead and say printf

quote unquote even because I know how to

use printf else

all right I know how to print F odd so

let me just say printf quote unquote odd

so here I've sort of taken a bite out of

the problem if you will and let me go

ahead and put in my little placeholders

I want to do some kind of condition so

if question marks now let me go ahead

and fill in the blanks here else I'll

put this here so I think I'm okay now

I'm getting closer to solving this but I

still have this question mark here

how using syntax we've seen might I

determine if n is even or odd what do

you think

nice there's this little operator we I

mentioned by name earlier the remainder


operator that will let you do exactly

that if you divide any number by two

that mathematical heuristic is going to

tell you if it's even or odd based on

whether there's a remainder of 0 or 1

and that's nice because the alternative

would seem to be doing something stupid

like if n equals equals zero or n equals

equals if n equals 2 or n equals 4 and

right you would your code would be

infinitely long if you had to ask all

possible questions but if I do n uh

divided by 2

and look at the remainder

it's a little cryptic but this will

indeed do the trick so the percent sign

is the remainder operator it does

numerator divided by denominator and

returns not the result of that but

rather the remainder of that so if you

divide anything by 2 it's going to be a

zero or one remainder and if indeed 2

divides into n evenly giving you zero

then you're going to print even else

it's got to be odd but there is

something odd pun intended in this

highlighted line

what is another new piece of syntax

apparently besides the percent sign

what's a little off there yeah


yeah so that's not a typo and I even

caught myself verbally saying it a

moment ago just because it's so

ingrained what must this mean here

yeah

yeah if something's equivalent to the

other so now this is the equality

operator it's not assignment from right

to left and this one too is an example

of like literally humans not really

planning ahead perhaps left hand not

talking to right hand and that someone

decided let's use the equal sign for

assignment and then some number of

minutes or days later people are like

damn how do we now compare for equality

well let's just use two and if you think

this is a little weird in some languages

like JavaScript there's a third version

where you use three equal signs so again

it's humans that design these languages

so if you're ever frustrated by them

confused by them admittedly it might

just not have been the best design but

we just kind of have to live with it

ever since so let me go ahead and zoom

out here let me go ahead and make parity

here so make parity and again parody is

just the name of my file parity.c dot


slash parity type in a number like two

that's indeed even four that's indeed

even three that's indeed odd and so

forth if we continue testing presumably

we'll get the same kinds of answers how

about something else let me go ahead now

and let me start copying and pasting

some of this code because admittedly

it's getting a little tedious to keep

typing out all of that boilerplate at

the top let me create a program called

agree dot C that's reminiscent of like

any of those forms you have to agree to

online with a check box or typing in yes

or no or the like so let me throw away

all the guts of this main program and

now ask something like this let me go

ahead and prompt user to agree to

something I'm going to go ahead and say

about get string do you agree whatever

the question might be and I want the

human to type y or no at y or n for yes

or no respectively so if it's only a

single character actually I can actually

get by with just get Char not used it

before but it was on our menu of

functions from the cs50 library and if I

want to get the user's response the

return value should be a Char also on

the left so now we've seen strings ins


and now chars if we only care about a

single letter and now let's go ahead

whether check whether user agreed so how

about if C equals equals quote unquote y

then let me go ahead and inside of my

curly braces print out agreed or some

such sentence like that

else if they did not type c or you know

what let's be explicit here just so they

can't type Z or B or some random letter

else if C equals equals quote unquote n

for no then let me go ahead and print

out not agreed or something like that

and I'm just going to ignore the user if

they don't cooperate in the type Z or B

or something that's not y or n all right

let me go ahead now and compile this

code make agree

dot slash agree all right do I agree yes

let's go with the default okay so that

seems to work no I don't agree this time

that seems to work how about my caps

lock key is on or I'm just uh really

yelling capital Y

it ignores me capital n

it ignores me

so obviously a bug at least if I want to

tolerate uppercase and lowercase which

is kind of reasonable
so what would be the possible solutions

here do you think

how do I solve this and tolerate both up

capital and lowercase maybe what's the

simplest most naive implementation

yeah so why not just ask two questions

or you know what even more simplistic

based only on what we've seen before let

me if you will let me just kind of copy

and paste some of this code change this

to an else whoops not in cops else if

quote unquote capital Y and then I bet I

could do the same thing with n but here

too just like with scratch as soon as

you start to find yourself copy pasting

you're probably doing something wrong

and what you said verbally if I may was

actually better because you're implying

that I could just say something like or

C equals equals capital y or down here C

equals equals capital N the catch is you

can't use the word or in C it's actually

two vertical bars so you can express one

question or another you only won't need

one of the answers to be yes or true and

you use two vertical Bars by contrast

just so you've seen it if you wanted to

check if something is equal to something

and something else you could use two

ampersands this logically would make no


sense here though certainly what the

human types can't both be lowercase and

uppercase that just makes no sense so in

this case we do want or but that allows

me to tighten my code up I don't have to

start copying and pasting whole branches

I can now ask two questions at once

questions then on this variation

really good question can you convert the

input to all lowercase absolutely you

could we don't have the capability yet

it turns out that's going to require to

be easy another library that we could do

it ourselves knowing a little bit about

ASCII or Unicode from last week but yes

that would be an alternative but more on

that a different time

other questions

good question unfortunately you have to

be explicit in C you can't just say this

even though that's kind of how you might

think about it you have to ask a

complete question using the equality

sign twice in this case let me ask a

question now too it's not a typo I

deliberately used single quotes around

all of my single letters here why might

that be previously we used double quotes

for anything that looked like text


yeah

correct string is double quotes for

multiple characters or even one

technically but yes and single quotes

for

for single characters because my data

type is different I chose the simple

route of just using a single Char in

fact this program won't work with y e s

or n o That's not supported at the

moment more on that another time I had

to use single quotes because that's how

C does it if you're dealing with single

characters AKA chars use single quotes

if it's a string even if it's one single

character in a string as though you're

starting to write out a longer word or

sentence that would be double quotes and

we'll see why this is before long too

but again just sort of things to keep in

mind whenever writing code in this

particular language

yeah down here

so short answer if I'm understanding

correctly even this would be incorrect

and this would be even more incorrect

but if you don't mind let me Kick the

Can a couple of weeks on this as to why

this doesn't work the right the most

Pleasant way to do this would indeed be


to do something like this but even this

is a slippery slope because what if the

user does something weird like they

capitalize just the Y you can imagine

this getting messy quickly I like your

idea earlier about just forcing

everything to lowercase just to

standardize things unfortunately you

cannot compare strings for equality like

this for again reasons will come to

before long so for today we're keeping

it simple even though arguably it's not

nearly as user friendly to only tolerate

individual letters and there's a

question over here

uh they are on a U.S English keyboard

it's shift and then the backslash key

above return but depending on your

keyboard it will vary

all right so let's actually now uh look

back at something we did a little bit of

last week let me go ahead and open a

file called meow.c because recall that's

what we had scratch do initially let me

include not the cs50 library this time

but just standardio.h because I only

want printf for this demo let me go

ahead now and just print out meow

and then if I want the cat to meow three


times like it did last week meow meow

meow save it make meow dot slash meow

voila program is written correct I claim

it ran it compiled okay but again this

was the beginning of our conversation

last week of not being particularly well

designed and if someone wants to maybe

point out the now obvious like why is

this not well designed necessarily

yeah it's just repetition right again I

literally resorted to copy paste like

that should be the signal that you're

probably doing something uh wrong or at

best just lazy of you in this case so

the solution is you might glean from

last week is probably going to be one of

those things called Loops so let's just

take a look at some of the Syntax for

loops and see but again no new ideas

it's just some new syntax that'll take

some getting used to in scratch if you

wanted to meow forever with something

like this there's not a forever keyword

in C so this one's a little weird to

look at but this is the best we can do

it turns out there is a keyword called

while in C and that kind of has the

right command semantics because it's

like while like do something again and

again that's the best I can do


but just like in if condition or an else

if condition those took a Boolean

expression in parentheses a while loop

also takes a Boolean expression in

parentheses so I have to ask a question

now if I want to do something forever I

could kind of stupidly just say while 2

is greater than one while three is

greater than 2 or just something

completely arbitrary but that should rub

you it the wrong way because like Y2

versus one why three like just if you

want true just say true so it turns out

in C there are special keywords true and

false that are literally true and false

respectively I could also put the number

one for true and the number zero for

false but most people would just say

true to be explicit so it's a little

hackish if you will but very

conventional there's no forever keyword

in C if I want to then print meow

forever I'm going to just use something

like printf here so again not a perfect

translation from one to the other but

absolutely possible able in C what about

this this is a little more common if you

want to do something a finite number of

times like repeat three


there's a few different ways we can do

this in C

here's one I would approach and here's

where C like a lot of text-based

languages you kind of have to whip out

that toolkit of all the basic building

blocks and think about all right how can

I build a little machine in software

that does something some number of times

well let me give myself a variable

called counter set it equal to zero let

me create a loop whose Boolean

expression is is counter less than three

the idea being here why don't I just

kind of count one two three so how do I

implement this physicality in code I

give myself a variable set it to zero

zero fingers up now I ask the question

is counter less than three if so go

ahead and print out meow and just

intuitively even if you've never seen C

code or any code before scratch what

more do I need to do I've left a room

here for one more line of logic

yeah

we have to increase counter so I need

code like I showed earlier like counter

equals counter plus one and so here's

where like programming sometimes becomes

a bit more like Plumbing like you can't


just say what you mean like you could in

scratch you have to build a little sort

of software machine that initializes

value does something increments it

checks it and so it's kind of like this

software-based machine but together

that's just using some familiar building

blocks but this is pretty common just

like in scratch you might have used

Loops a bunch of times pretty common in

C so can we tighten this code up this is

correct but here are some new here's

some conventions that are popular

you don't if you're going to count just

say I a convention in programming with

at least languages like C is just use I

as an integer if all its purpose is is

to count from like zero on up counter is

not wrong it's not bad it's just uh like

it's more verbose than you need to be

just call it I you don't need more

semantics than that all right what else

can I do here there's another

opportunity to tighten up this code do

you recall yeah

yeah that syntactic sugar that does

nothing new but it does it more

succinctly I can change this to either

the intermediate format or the even a


tighter format of just I plus plus now

this is pretty canonical like this is

how many people most people would

Implement something three times using a

loop in C using a while loop that is

turns out that it's so common in C and

other languages to do something finitely

many times there's a couple of ways to

do it in this model to be clear the

logic though is that we start by

initializing the variable like I've

highlighted here we then ask the

question is I less than zero if so

everything that's indented inside the

curly braces gets executed namely meow

then the update then the computer is

going to recheck the condition to make

sure that I hasn't gotten so big that

it's greater than three but if not it

then does this again and it does this

again and then it repeats constantly

checking the condition and executing

what's in the block checking the

condition and executing what's in the

block after three times of that the

condition is going to be false or a no

answer and that's it for the code it

just proceeds to whatever's down here

just like with scratch it jumps to the

next blocks down below


all right what's another way though to

do this well I've deliberately been

counting from zero and that's a

programming convention right we started

last week with all the light bulbs off

which was zero so it's pretty reasonable

to start counting at zeros just like you

would here like no fingers are up this

is zero fingers on your hand but if you

prefer you could start counting at I

equals one but then you don't want to do

it while I is less than three you want

to do I is less than or equal to three

on most keyboards there's no symbol for

less than or equal to or greater than or

equal to so in C you use two characters

less than and then an equal sign with no

spaces in between that just means less

than or equal to we could change it to

set I to two and make this condition be

less than or equal to four we could make

this be

10 and less than or equal to 12 but

again just stick with the basics start

at zero and count on up would be the

convention or if you prefer to count

down that's fine too said eyed is three

and then do this so long as I is greater

than zero but you have to decrement


instead of increments so again we could

do this all day long there's like

literally an infinite number of ways to

implement this idea and that's why I

keep emphasizing convention call the

variable I for something like this

initialize it to zero for something like

this and just generally count up unless

you really prefer to count down again

just certain human conventions all right

how about another way to do this this is

what's called a for Loop in C also very

common it's not quite as straightforward

in that it doesn't really read top to

bottom in the exactly the same way this

kind of has a lot more logic tucked into

its first line but it does exactly the

same thing what happens here is notice

that inside the parentheses next to the

word four there's two semicolons which

is another weird use of syntax they're

not at the end of the line now they're

in the middle of the parentheses but

that's what the humans chose years ago

the first thing before the semicolons

initial realizes your variable int I

equals zero the next thing is the

condition that's going to constantly get

checked every cycle through this Loop

and the last thing is going to be what


you do after each Loop which in this

case is going to be count up so again if

I rewind we initialize I to zero we then

ask the question is I Less Than 3 if so

execute what's inside of the loop

then the computer asks does this it does

the update incrementing I by 1

and then it's not going to blindly meow

again it's going to check again the

condition is I less than three then it's

going to meow if so then it might go

ahead and increment I and check the

condition again so again this does not

read quite in the same simple Fashion

top to bottom you kind of read it left

to right and then jump around but again

the initialization

the constant Boolean expression being

checked and the update after each time

does the exact same thing as what we saw

a moment ago in this while loop

formats which one is better and they're

the same I think most people would

probably eventually use a for Loop once

comfortable but just because is really

the answer there

all right any questions then on Loops as

we've translated them to see yeah

uh for Loop and while loop can both be


used to do exactly the same thing there

are subtle differences with issues of

scope which we'll discuss before long

where when you create a variable in a

for Loop notice that it was again inside

of those parentheses which technically

means it's only going to exist in these

four lines of Code by contrast with the

while loop I declared my variable

outside of the loop that variable is

going to continue it to Exist Elsewhere

in my program so that's one of the minor

differences there

good question but you'll see some others

over time all right so we claim then

that it's better in some form to do this

with loops so let's actually jump back

to the code let me go ahead and now

re-implement meowing with a for Loop for

instance so how about four into I get

zero I less than three whoops I less

than three I plus plus then inside my

curly braces let me go ahead and print

out with printf meow with a new line and

a semicolon so I did it pretty quickly

just because I long acquired the muscle

memory but if I now make meow no errors

there run dot slash meow and I see meow

meow meow well let's do now what we did

last week which was to begin to make our


own custom functions if you will by

using our own in C so here's where the

syntax gets a little funky but we'll

explain over time why what each of these

keywords is doing if I want to create a

function called meow because the authors

of C did not create a function called

meow decades ago I need to give it a

name like meow I need to specify if it

takes any inputs for now I'm going to

say no

and I'm going to explicitly Say No by

writing this special word void

it's also necessary when implementing a

function in C which was not necessary in

scratch to specify what its return type

is but for now I'm just going to say

that meow is the name of the function it

takes no inputs and that's what the void

in parentheses means and it does not

return anything like get uh like ask did

or like get string or get intuz meow's

purpose in life is just to have side

effects visual Side Effects by printing

something on the screen

so what is meow gonna do I'm gonna have

it quite simply say printf quote unquote

uh meow backslash n and now just like in

scratch I can now just call a brand new


function called meow and here's where

too if you really don't like the curly

braces technically speaking you can get

rid of them when there's only one line

of code inside your Loop but again

stylistically I would encourage you to

preserve them to make super clear to

yourself and others what it is that's

going on let me go ahead and save this

and do make meow whoops

darn all right what did I do something

stupid

yeah so zero does not belong there I

meant to hit parenthesis parenthesis so

let me rerun make meow okay fixed my

mistake all right it's still working

okay but recall what I did in scratch

kind of out of sight out of mind and

just to make a point let me just kind of

highlight this and move it way down in

the file because again like now that

meow exists it's an abstraction I just

know a meow function exists I want to be

able to use it so let me scroll back up

my main function is the same let me go

ahead and make meow again

and now just by moving that function

I've created all these lines of errors

and let's look at the first again the

rule of thumb here it's a little small


but it says meow.c in bold which is the

name of the file where the bug is 5 is

the line number and 20 is the character

so line numbers and no enough alone uh

let's see

oh this is what happens when I scrolled

up too far sorry this is the error we're

now looking at line seven I was looking

at the old error message from earlier

before I fixed the zero meow see line

seven all right apparently C does not

know what the meow function is implicit

declaration a function meow is invalid

in c99 well what does that mean

Declaration of function means your

creation of a function like I'm

declaring that meow exists but I haven't

apparently defined it yet and then c99

is the version of C from the year 1999

which we generally use here it's one of

the more recent versions

so why is that the case

can you infer from the mere fact that I

just moved me out to the bottom of the

file which was fine in scratch but now

is bad why is that

yeah C is just kind of old school it

reads your code top to bottom and if it

does not know what meow is when you


first try to use it it just freaks out

and prints out these error messages so

the solution is quite simply don't do

that just leave it where it was but you

could imagine this getting a little

Annoying over time if only because main

is by name the main part of your program

and honestly it would just be nice

honestly if Maine were always at the top

of your code because if you want to

understand what a file is doing it makes

sense you just read it top to bottom

well there is a solution to this you can

put functions in different orders with

main at the top so long as you and this

is perhaps the only time copy paste is

appropriate so long as you leave a

little breadcrumb for the compiler at

the very top of your file that literally

repeats the return value the name and

the arguments to that function semicolon

this is so to speak declaring your

function and the real fancy way is this

is a prototype it's like what is this

thing going to look like but the

semicolon means I'm not going to deal

with this yet I'm going to actually

Define the function or implement it down

below here this is kind of a stupid

detail more recent languages get away


get rid of this need you can put your

functions in any order but again if you

just think about the basics of

programming languages like this one here

and as you noted it must just be reading

your code top to bottom so annoying yes

but explained yes too so let me go ahead

and make meow once more time dot slash

meow still working okay and let me make

one final enhancement to this meow

program here let me go ahead now and say

something like this let me go ahead and

say all right wouldn't it be nice ice if

my meow function could do something for

me some number of times so suppose I

want to do this this male function at

the moment is going to meow three times

but suppose I want to meow n times where

n is just some number provided by the

user well just like in scratch custom

functions can take inputs I just

presently am saying void but if I change

this to int n thereby telling the

compiler hey meow still doesn't return

something but it does take something as

input it takes an integer and I want to

call it n so this is another way of

declaring a variable but a way of

declaring a variable that gets handed


into as input the function so now if I

tighten up main here now I can actually

do something really cool just like in

scratch which is this if I now look at

this code let me zoom in here now my

main program is really well written in

the sense that it just says what it does

meow three times this works though

because I Define meow is now taking an

input an integer called n and then using

n in my

now familiar for Loop there's one change

might have caught my one mistake I also

have to remind myself up here to make

that change too again this is one of the

only redundancies or copy paste that's

sort of reasonable

but there I have now a better version so

let me go ahead and rerun this uh make

meow dot slash meow voila so again no

change in correctness but now again

we're sort of modularizing our code and

heck what you could do now and this is

just a teaser featured down the road

those header files we talked about early

those libraries this is the kind of

modularization we're talking about we

the staff wrote a function called get

string get int and so forth we put it in

a file called cs50.c and we put little


breadcrumbs specifically these things

called prototypes in cs50.h

so that when you all as aspiring

programmers include cs50.8 you are sort

of secretly telling the compiler at the

very top of your code what the menu of

available functions is why because in

cs50.8 is lines like these obviously not

for meow but for get string get int and

so forth and in standard io.h is the

same lines of code for things like

printf so that's all that's going on

there it's just a way of telling the

computer in advance what functions to

expect

or any questions then

on these here

correct so if you don't mind I want to

continue to wave my hand at that detail

for today indeed int main void is a

little weird because like what would the

input domain be we have no mechanism for

providing input yet and what does it

mean for Maine to return anything like

who is it returning to for another day

if we may they're going to come into

play but that for now today is just

something you should take at face value

as necessary copy paste to begin


programs so meow is a function that

takes an input the number of times to

meow but it didn't actually have a

return value hence the void but what if

we actually want to create our own

function that not only takes zero or

more inputs as arguments but also

returns some value maybe an INT maybe a

float maybe something else altogether

well it turns out since C we can do that

as well let me go ahead and create a new

file here called discount and let's

Implement a quick program via which we

can discount some regular price by some

percentage as though there's a sale

going on in a store let me go ahead and

include our usual

cs50.h followed by standard i o and at

the top let me give myself in main void

as before and inside of main let's go

ahead and do something simple let's give

ourselves a float called regular

representing like the regular price of

something in a store let's go ahead and

get a float from the user asking them

what that regular price is then next

let's go ahead and declare a second

variable also a float called sale

ultimately representing the sale price

after some percentage discount off and


let's go ahead and simply calculate

whatever regular is and say 15 off is a

pretty good discount so let's go ahead

and discount regular Whatever It Is by

15 which is equivalent of course to

multiplying it with the asterisk by 0.85

of course if we're taking off 15 we

multiply the regular price by 0.85 now

let's go ahead and print out the results

here let me go ahead and say printf

sale price colon let me go ahead and use

percent F but more specifically percent

dot 2f because at least in U.S currency

we typically show cents to two decimal

places followed by a new line and then

let me go ahead and plug in the value of

sale all right let's go down here and do

make discount enter so far so good dot

slash discounts and the regular price is

maybe one hundred dollars so the sale

price should be eighty five dollars so

our arithmetic seems to be correct here

but let's fast forward now in time

suppose that we find ourselves

discounting a lot of prices in an

application maybe a website like Amazon

where they're offering some kind of

percentage discount and be nice to have

a reusable function that just does this


arithmetic for us simple though it may

be so let's go ahead and modify discount

this time to give ourselves our own

function called discount for instance

that takes an input like the regular

price that you want to Discount and then

it also returns a value it doesn't just

print it out it returns a value namely a

float that represents what the sale

price is so let let me go down below

Main and go ahead and Define a function

that's going to return a float because

we're dealing with dollar amounts still

the function is going to be called

discount and it's going to take one

input like the price that we want to

Discount in here I'm going to do

something very simple I'm going to say

float sale equals whatever that price is

times 0.85 and then I'm going to go

ahead and return sale now for that

matter I can actually tighten this up a

bit if I'm only declaring a variable to

store a value then I'm then returning

with this keyword return I actually

don't even need that variable so I can

delete the second line and I can

actually just go ahead and get rid of

that variable altogether and immediately

return whatever the arithmetic result is


of taking the price input the argument

that's being passed in times .85 so very

simple function that simply does the

discounting for me as always let me go

ahead and copy paste the almost the only

time it's okay to copy paste the

Prototype of that function to the top of

the file so that when compiling this

code code main has already seen the word

discount before and now let me go into

the code here and instead of doing the

math myself in main let me presume that

we have some function already in our

toolkit called discount that lets me

discount the regular price and return

that value and then down here my code

doesn't need to change I'm still going

to print out sale the variable in which

I'm storing that result but notice what

I've done here I've sort of abstracted

the way the notion of taking a discount

by creating my own function that takes a

float called price or anything else as

input it does a little bit of math

simple though it is here and then it

returns a value but notice that discount

is not printing that value it's

literally using this other keyword

called return so that I can hand back


that value just like get string hands

back a value just like get in hands back

and integer without printing it for you

so that I up here on line 9 can go ahead

and store that value in a variable if I

want and then actually print it out let

me go ahead head now and recompile this

code with make discount

let me go ahead and do dot slash

discount and let's again do a hundred

dollars

sale price is going to be point eighty

five dollars as well now it turns out

that functions don't have to take just

zero or one argument as input they can

actually take two or three or more so in

fact suppose we wanted to now enhance

this version of my program and take in

as input to the discount function not

just the price that I want to discount

but also the percentage off thereby

allowing us to support not just 15

percent off but any number of percentage

points off well let me go up here and

declare an insay and call it percent off

and let me ask the user for how many

percentage points they want to take off

so I'm going to say percent off inside

of the prompt here get that int call

percent off and now in addition to


passing in regular as an input to the

discount function I'm also going to pass

in percent off but I need to tell the

computer that it is taking now two

arguments and the way I do this is just

with a comma down here in the function

Zone definition here is going to be a

percentage argument a second argument

per the comma and I'm now going to use

that percentage in a slightly a familiar

way I don't want to just do percentage

like this because of course that's going

to increase the size of the total price

I actually need to do a little bit of

real world math where if this is a

percentage off like the number 15 for 15

percentage points I need to do like a

hundred minus that many percentage

points thereby giving me 100 minus 15.85

and then I need to divide that by 100 in

order now to give myself 0.85 times the

price that was passed in but if I go

ahead now and save this run make

discount one last time I notice that

I've actually got an error here what

have I done wrong well I need to change

that prototype too and again this is

admittedly an annoying aspect of C that

you have to maintain consistency here


but that's fine I'm just going to go up

here change this to int percentage

spelling incorrectly and now let me

retry compilation make discount crossing

my fingers this time worked okay a DOT

slash discount and voila 100 and percent

off say 15 points and voila 85

now it's worth noting that I've

deliberately returned the results of my

math from this function I haven't just

done the math on the original variable

that's being passed in in fact if we

take a look at this second version where

discount is now taking a price argument

and a percentage argument notice that

I'm not doing something like this I'm

not just saying price equals price times

100 minus percentage divided by 100 and

leaving it that the problem there is

that this variable price is going to be

scoped to that discount function and

we'll encounter this again before long

but this notion of scope just refers to

where in which a variable actually lives

or exists or is accessible so it turns

out if I change price in the context of

this discount function that's not going

to have a lasting effect if I actually

want to get the result back to the

function that use the discount function


namely Maine I actually do need to take

this approach of actually returning the

value explicitly so that ultimately I'm

handing back the discounted price all

right well let's go ahead and maybe how

about let's just use these Primitives in

just a few different ways how about a

little game of uh yesteryear Super Mario

Brothers and in the original Super Mario

Brothers and in bunches of variants in

so you have like these uh side-scrolling

worlds that look like this where there's

some coins in the sky hidden behind

these question marks so let's just use

this as a visual to consider how in C

could I start to make something

semi-graphical like not actual colors or

fanciness that feels like too much too

soon just something like printing out

some question marks well if I go back

over here let me create that actual file

that I alluded to earlier so let me code

up mario.c let me go ahead and include

standardio.h int main void again which

will continue to copy paste for today

and then let me just go ahead and do

something simple like one two three four

and a new line all right this is what we

might call ASCII art which just means


Graphics but really just implemented

with your keyboard and if I make Mario

and Dot slash Mario it's not nearly as

engaging visually as this but it's the

beginning of this kind of map for a game

well if I wanted to now print out of

those things dynamically let me go back

to my code here and instead of printing

out four all at once I could do

something like for INT I get 0 I less

than four I plus plus and then inside

here I could just print out one of them

at a time let me save that make Mario

and at the risk of disappointing so

close

but I made a mistake just stupid

aesthetic the The Prompt is not on the

new line how could I move it

yeah I need an escape character the

backslash n but where should I put it

here

okay no because that's going to put it

after everyone and it's going to make

this thing vertical instead of

horizontal so logically just like in

scratch put it at the end of the loop so

something out here and just print out

for instance only quote unquote new line

and now if I do make Mario again dot

slash Mario okay we're back in business


but a little better designed in that now

I'm not repeating myself multiple times

I'm doing this again and again but let's

do one other thing here with Mario let

me go ahead and ask the user how many

question marks or coins to print the

catch here is that there's another type

of loop that's helpful for this and it's

called a do while loop generally a do

while loop is similar to a while loop

but it checks the condition last instead

of first recall earlier on the slide we

had while open parenthesis close

parenthesis and I kept claiming that we

checked whether I is less than whatever

it was three in advance again and again

a do while loop just inverts the logic

so that you can actually do something

like this at the top of this program I'm

going to go ahead now and give myself a

variable n like this of type integer and

then I'm going to do literally the

following with the keyword do n equals

get int and I'm going to ask the user

for the width like the number of dollar

signs to print and I'm going to do this

while n is less than say one

so this is a little cryptic but the

Salient differences are the while the


Boolean expression is now at the bottom

of my block of code not at the top

now why is this well the difference here

if I make Mario

is whoops uh I need to add cs50.h

because I'm now using get int

if I now compile this version of Mario

and do dot slash Mario

a do while loop is helpful when you want

to do something no matter what first and

then check some condition or some

Boolean expression to see if maybe in

this case the user cooperated it would

make no sense if the user typed in say

zero because like there's no work to be

done it'd be really weird if they said

negative 100 because that makes no sense

logically so with this simple construct

here

I am doing the following while n is less

than one the implication is that as soon

as n equals one or is bigger than one

I'm going to break out of this Loop and

I've got myself a variable called n

containing essentially a positive value

one through two billion or so and I can

now use this for instance here change

the 4 to an N so now my program is

completely Dynamic let me go ahead and

do make Mario dot slash Mario again and


I'll do four still works I'll do 40.

still works and the difference here with

the do while is if something like this

involves getting user input well there's

no question to ask like the user hasn't

given you anything yet so you have to do

something first then check and break out

of the loop if the human has for

instance cooperated in this case

all right well why don't we escalate to

something more like this in the same

game where your underground is Mario and

this is like a two-dimensional uh wall

That's popping up here it looks like a

three by three for instance for the sake

of discussion and it's like made of

bricks so I'll use maybe hash symbols

this time well it turns out that we can

Nest that is combine some of these same

ideas as follows let me go ahead now

and change back to this code and I'm

going to keep the do while loop from

before and I'm going to ask though this

question what's the size of this Square

I'm going to assume it's like x n by n

so three by three four by four whatever

so I'm just going to ask for the size of

the square of bricks and now how do I do

this well I'm going to go ahead for


instance and print out about four and I

get zero I less than n i plus plus let

me just keep it simple and print out

something like this just a single hash

symbol that is a brick and a new line

after it all right let's make Mario Run

Mario of three okay that's close to

being it I've got a column all right but

I need to be wider so the solution last

time was to get rid of the new line and

then we put the new lines here out after

the loop all right so let's do make

Mario dot slash Mario and type in three

and aha all right so I kind of need to

combine these two ideas somehow so how

might we solve this problem

I want to print like rows and columns

not row or column

how do I do this yeah

yeah add another loop in the for Loop

right if you use one Loop conceptually

to kind of count the rows from top to

bottom and then within each row you then

sort of typewriter style old school

typewriter do like character character

character character horizontally I think

we could do exactly what we want to

achieve here so how about this let me

get rid of this line and get rid of this

line for now and let me just give myself


another loop on the inside and since I'm

already using I another reasonable

convention here would be to say

something like J so J also gets zero J

is less than n j plus plus and now

what's going to happen let me go ahead

and print out just one of these things

at a time

and let me save and write let me run

this let me see if it how close we are

make Mario's three okay three that's

clearly wrong but I see nine things

there on the screen so we're close

what's the one fix I need now to sort of

move the old school typewriter head down

to the next row when appropriate what do

you think

yeah I need one of these backslash ends

and probably let me add some comments

now to help everyone visualize what I've

done for each row

for each column how about print a brick

just to kind of explain the logic and so

I add that because now

move to next row I could do something

like this with a backslash n so here is

where this the comments really my pseudo

code actually kind of illuminates the

situation a bit let me go ahead and


recompile Mario dot slash Mario 3. now

we're talking it's not a perfect square

just because these hash symbols are a

little taller than they are wide but

that's just a font detail here now I've

done something that's quite more akin to

something like this

all right so let me pause here and see

if there are any questions

again the code's getting a little more

complicated but we're just building like

more complicated programs like in

scratch with like familiar puzzle pieces

some variables Some Loops some

conditionals it's all the same as before

yeah

can you multiply strings and see no uh

but ask that same question again in a

few weeks when we get to Python and the

answer will be yes

other questions yeah

in C you must specify the return type

the name of the function and the inputs

or arguments to the function in that

order and if none of them are applicable

you write the word void

uh so same question as earlier let me

kick that can a week or so and we'll

come back to that and we'll see why but

for now just take on faith that you need


to do that with Maine because Maine is a

little special similar to the when green

flag is clicked it too was a little

special as well

yes if you want to get out of a loop

early you could do this so let me answer

this question this way an alternative to

a do while loop would be to do something

like this uh

how about while true so do the following

forever let me go ahead and get an end

from the user for the size of this thing

if n is greater than zero that is a

positive integer then go ahead and use a

new keyword called break this is

identical to what we just did it's just

a little longer it's like a couple extra

lines a lot of them are blank and so

it's just an alternative but a do while

does the same thing but a little tighter

if that's an answer to your question

all right so let's now introduce finally

a sequence of problems that I've kind of

been brushing under the rug though we

did see a little bit of evidence of this

earlier when we tried to add 2 billion

and 2 billion and it overflowed the

number of bits in an inch so to speak

let me go ahead and code up a program uh


called calculator again but I'm going to

go ahead now and change us to floats so

I'm going to change X to a float and I'm

going to use get float and a float again

is just a floating point value which is

a fancy way of saying a real number with

a decimal point in it and down here I'm

going to go ahead and use percent f for

float and I'm going to go ahead now and

do one more thing instead of addition I

want to do something fancier like

division so divide X by Y and I'm going

to give myself another third float

called Z as we did at the beginning of

today and I'm going to print out Z

instead of X and Y explicitly so I'm

going to go ahead now and do make

calculator

dot slash calculator and let's do

something like oh two-thirds two divided

by three is point six six six six seven

so that's what you would rather expect

let me run it again one tenth

all right so point one and a bunch of

zeros that two is what you would rather

expect but now let me get a little

curious it turns out that in C you can

modify the behavior of these format

codes a little bit by default you get

like six or so digits suppose that you


want to get exactly two digits you can

more succinctly say 0.2 before the F and

after the percent this is the kind of

thing that's hard to remember but you

Google it and you find that okay format

code for floats uses 0.2 to do two

decimal points so let me do make

calculator again dot slash calculator

how about two-thirds point six seven so

it handles the display of significant

digits for us here and now let me go

ahead and do one-tenth and 0.10 so it's

adhering to that well maybe I really

want a lot of precision right I've got a

really powerful computer let me see 50

numbers after the decimal point that's a

lot of significant digits let me remake

the calculator whoops typo let me remake

the calculator

dot slash not Mario calculator and how

about two-thirds again

well that's interesting

pretty sure it's supposed to be like a

0.6 with like a line over it right and

in grade school math all right well

maybe that's just a bug how about one

tenth okay that's really getting funky

so what's going on it seems that my

program can not only not do addition


very well we eventually hit problems in

the billions we can't even do very

precise numbers here

what's going on

exactly in a nutshell the computer is

approximating the answer using that many

uh numbers after the decimal point but

the problem fundamentally is actually

very similar to that integer overflow

from before and I'm using that now as a

term of R integers can overflow if

you're trying to use more bits than you

actually have available to you you sort

of change them all to ones and then

you're out of bits so to speak same

thing here but in the different context

of floats if you only have 32 bits or

Heck if we change to double and only

have 64 bits that's a lot of precision

but it's not infinite and yet pretty

sure there's an infinite number of real

numbers in the world which is to say a

computer with finite memory cannot

possibly represent all possible numbers

in the world because again there's not

an infinite number of permutations of 32

or 64 bits it might be a lot in the

billions or more but it's still finite

and so indeed this is the computer's

closest approximation to what's actually


going on there and so this is an example

of what we would actually generally call

floating point in Precision floating

point in Precision refers to the

inability for computers fundamentally to

represent all possible real numbers one

hundred percent precisely at least by

default in languages like C thankfully

in the world of scientific Computing and

so forth there are solutions to this

problem that just give you more digits

but the problem fundamentally is still

going to be there so there's a reason I

changed X and Y to floats let's see what

would happen if we re-round a bit and

instead of using floats for X and Y

again you say integer so into X and int

Y and let's go far back into get int as

well thereby giving us integers X and Y

let's still leave Z as a float because

at the end of the day we want to be able

to handle fractions or floating Point

values but let's go ahead now and print

out this value of Z having changed X and

Y now to ins make calculator

dot slash calculator and let's do say 2

for the numerator 3 for the denominator

and voila it's not .66 and it's not even

rounding oddly it's just all zeros this


time so why is that well it turns out

that c when dividing an integer by an

integer is always going to give you back

an integer an INT the problem is that

floating Point values don't fit in ends

only the integral part to the left of

the decimal point does everything at and

Beyond the decimal point itself get

thrown away known as a feature in C

called truncation when dividing an

integer by an integer you get back an

integer but if you're trying to then

store it what's actually a floating

Point result in that integer C is just

going to throw away everything at and

Beyond the decimal point leaving us with

this case in just the Zero from what

should have been

0.666666 and so forth so let's see one

more example in fact let me go back to

my terminal here let me do dot slash

calculator again and let's do four

thirds this time it should be 1.3 3333

and so forth but let's see 4 divided by

3 both as integers this time gives us

1.000 but they are two the answer should

be 1.333 but the floating Point part is

getting truncated or thrown away leaving

us with just one so how do we solve this

well certainly we could just use floats


from the get-go as I did but if by

nature of your program you only have

access to integers or maybe even Longs

for which the same problem would occur

what we can actually do is called type

conversion and we can explicitly tell

the computer that we actually want to

treat this int as though it's a floating

point value and we can do that for both

X and Y so let me go back to my code

here and I have a couple of options in

fact I can convert y to a float by doing

this I can cast y to a float by

literally writing the type float inside

of parentheses right before the Y and if

I really want to be explicit I can also

do the same to X but strictly speaking

it suffices to just change one or the

other not necessarily both let me go

ahead now and do make calculator again

dot slash calculator and let's try 2

divided by three and now we're back to

an answer that's closer to correct but

indeed we're still having some rounding

issues there let's run it one more time

for 4 divided by three there too we're

closer to the right answer at least but

we still have that floating point in

Precision but that's going to be another


problem altogether to solve and here in

a little more detail is that issue of

integer overflow which is in the context

of ins suppose that we think back last

week when we had three bits and we

counted from like zero to seven zero one

two three four five six seven I think I

asked the question how would we count to

eight someone proposed well we need a

fourth bit that's fine if you have a

fourth bit if you have access to another

light bulb or transistor if you don't

though the next number after this is

technically one zero zero zero but if

you don't have space for or hardware for

that fourth bit you might as well just

be representing the number zero so in

the world of integers if you're only

using three bits those three bits

eventually overflow when you count past

seven because what should be eight can't

fit so to speak so it rolls back over to

zero and as Arcane as this problem might

seem we humans have done this a couple

of times you might recall knowing about

or reading about the Y2K problem where a

lot of people thought the world was

going to end why because does on January

1st of 2000 a lot of computers

presumably were going to update their


clocks from 1999 to the year 2000. the

problem is though for decades for

efficiency we humans were honestly in

the habit of not storing years as four

digits why because that's just a lot of

space to waste especially since like

centuries don't happen that often so a

lot of computer systems especially early

on when Hardware was very expensive and

memory was very tight just stored the

last two digits of any year problem of

course on January 1st of 2000 is that 99

rolls over to a hundred but if you don't

have room for another digit it's just

zero zero and if your code assumes a

prefix of 19 well we just went from the

year 1999 back to the year 1900

thankfully long story short a lot of

people wrote a lot of code and a lot of

old languages and mostly worded off this

problem so the world did not end the

next time the world might end though is

on January 19th 2038 right now that

might feel like a long time away but so

did the year 2000 at one point why might

clocks again break in today's modern

computers in 2038 might you think

indeed so this refers to some number of

seconds so it turns out that the way


computers generally keep track of time

is they count the total number of

seconds since the epoch which is which

is defined as January 1st 1970. why it

was just a good year to sort of start

counting at when computers really came

onto the scene unfortunately most

computers use 32 bits to count the

number of seconds since January 1st 1970

the implication of which is we can only

count up to roughly 2 billion seconds 2

billion seconds is going to happen on

2038 at which point 31 ones are going to

roll over as follows that number 2

billion which is the max because if

you're representing positive and

negative numbers recall that you can

only count as high as positive 2 billion

or negative 2 billion looks like this

this is roughly the number 2 billion in

binary it's all ones with one zero way

over here if I count one second past

that two billion number give or take

that means like all right I add one I

carry the one it's just like nines

becoming zeros in decimal if I keep this

sort of simple animation and I keep

carrying the one carrying the one

carrying the one one second after two

billion seconds give or take I have this


number in the computer's memory so

there's still one bit that's a one all

the way to the left unfortunately that

bit often represents negativity

whereby if that first bit is negative

that represents that the rest of this

somehow represents a negative number

it's not negative zero there's a fancier

representation but a very big positive

number very suddenly becomes a very big

negative number and that number is

roughly negative two billion that means

computers in 2038 on that date are going

to accidentally think that it's been

negative two billion seconds since

January 1st 1970 which is going to make

computers potentially think it's 1901.

so what is the solution to the 2038

problem perhaps Y2K was because we were

using two digits for years what about

2038 more bits and thankfully we're

getting a little better at Lessons

Learned here and computers now are

increasingly using 64 bits and all of us

will be long gone by the time we run out

of that number of seconds so it's

someone else's problem many many years

from now but that's really the

fundamental solution if you're running


up against something finite well just

kick the can further and just give

yourself more bits and frankly because

Hardware is so much cheaper these days

computers are so much faster it's not as

big of a deal as it might have been

decades ago but that's indeed the

solution but this arises in very common

context in fact let me go ahead and

write a real quick program here called

pennies right you might think that just

converting Dollars to pennies in U.S

currency might be simple but let me go

ahead and do this in pennies.c I'm going

to go ahead and include cs50.h and I'm

going to include uh standard io.h int

main void as my start Turning Point

and now down here I'm going to do this

I'm going to get a float called amount

and I'm going to ask the user for some

amount of dollars so a dollar amount and

I'm going to store that in a variable

called amount then I'm going to Simply

convert that amount to pennies by doing

say how about uh

uh how about amount times 100 and then

I'm going to go ahead and print out that

the number of pennies

is percent I because that's just an

integer in pennies backslash n quote


unquote comma pennies

all right so if I didn't make any

mistakes here let me make pennies

dot slash pennies and suppose I have say

99 Cents so 0.99 that's 99 pennies

suppose I have a dollar twenty three

that's pretty good suppose I have four

dollars and 20 cents

huh there's that imprecision issue and

this isn't even that big of an amount

now not a big deal if like the cashier

gives you one penny less than your owed

but you can imagine this adding up you

can imagine this being worrisome for

financial implications

um for financial transactions for

scientific measurements that are like my

program can't even handle this well

there are some solutions here and it

looks like what's really happening if I

print it out using the percent f with a

0.50 or whatever to see more decimal

points presumably the computer is

struggling to represent four dollars and

twenty cents precisely it's probably

storing four dollars and

19.9999 something cents so it's close

but it's not quite there

so I could at least solve this by


rounding up for instance and it turns

out there is a round function out there

and it turns out that it's in a library

called the math library and you would

know this by looking at online

documentation and the like as we'll

point you to and if I now make pennies

again and do dot slash pennies I can now

do four dollars and twenty cents and

voila now it's correct so at least in

this context it seems like a solvable

problem but it's certainly something I

need to be mindful of nonetheless

unfortunately even professional

full-time programmers over the years

have not been particularly attentive to

these kinds of details and in a class

like this the goal is not just to teach

you programming but to really teach you

what's going on underneath the hood so

to speak so that you have a bottom-up

understanding of how data is represented

how computers are manipulating it so

that you are not on the failing end of

some program having some bug and so that

we as a society are not beholden to

those kinds of mistakes too and this

happens unfortunately all of the time

this is a Boeing airplane that a few

years ago needed to be rebooted after


every 248 days why because this Boeing

airplane software was using a 32-bit

integer counting up tenths of a second

to keep track of something or other

related to its electrical power and

unfortunately after 248 days of the

airplane being continuously on which in

the airline industry is apparently not

uncommon to make every dollar count

keeping the planes up and running all

the time the 32-bit number would roll

over and the power would shut off on the

airplane as a side effect because of

sort of undefined behavior in that case

the temporary Solution by voting at the

time was apparently essentially a sort

of operating system style well have you

rebooted your plane and that was indeed

the fix until they rolled out an actual

software patch this stuff really matters

and the more Hardware we carry around

and the more we as a society use these

kinds of devices the more of these

problems we're going to run into down

the road that's it for cs50 we'll see

you next time

[Music]

thank you

[Music]
thank you

[Music]

foreign

[Music]

this is cs50 and this is week two now

that you have some programming

experience under your belt in this more

Arcane language called C among our goals

today is to help you understand exactly

what you have been doing these past

several days wrestling with your first

programs in C so that you have more of a

bottom-up understanding of what some of

these commands do and ultimately what

more we can do with this language so

this recall was the very first program

you wrote I wrote in this language

called C much more textual certainly

than the scratch equivalent but at the

end of the day computers your Mac your

PC uh vs code doesn't understand this

actual code what's the format into which

we need to get any program that we write

just to recap

so so binary otherwise known as machine

code right the zeros and ones that your

computer actually does understand so

somehow we need to get to this format

and up until now we've been using this

command called make which is sort of


aptly named because it lets you make

programs and the invocation of that has

been pretty simple make hello sort of

looks in your current directory or

folder for a file called hello.c

implicitly and then it compiles that

into a file called hello which itself is

executable which just means runnable so

that you can then do dot slash hello but

it turns out that make is actually not a

compiler itself it does help you make

programs but make is this sort of

utility that comes on a lot of systems

that makes it easier to actually compile

code by using an actual compiler the

program that converts source code to

machine code on your own Mac or PC or

whatever Cloud environment you might be

using in fact what make is doing for us

is actually running a command

automatically known as clang for C

language and so in fact here for

instance in vs code is that very first

program again this time in the context

of a text editor and I could compile

this with make hello but let me go ahead

and use the compiler itself manually and

we'll see in a moment why we've been

automating the process with make I'm


going to run clang instead and then I'm

going to run hello.c so it's a little

different how the compiler is used it

needs to know explicitly what the file

is called I'll go ahead and run clang

hello.c enter nothing seems to happen

which generally speaking is a good thing

because no errors have popped up and if

I do LS now for list you'll see there is

not a file called hello but there is a

curiously named file called A.L this is

a historical convention stands for

assembler output and this is just the

default file name for a program that you

might compile yourself manually using

clang itself let me go ahead now though

and point out that that's kind of a

stupid name for a program even though it

works dot slash a DOT out would work but

if you actually want to customize the

name of your program we could just

resort to make or we could do explicitly

what make is doing for us because it

turns out some programs among them make

support what are called command line

arguments and more on those later today

but these are like literally words or

numbers that you type at your prompt

after the name of a program that just

influences Its Behavior in some way it


uh modifies Its Behavior and it turns

out if you read the documentation for

clang you can actually pass a dash o for

output command line argument that lets

you specify explicitly what do you want

your outputted program to be called and

then you go ahead and type the name of

the file that you actually want to

compile from source code to machine code

let me go ahead and hit enter now again

nothing seems to happen and I type LS

and voila now we still have the old a

DOT out because I didn't delete it yet

and I do have hello now so dot slash

hello voila runs hello world again and

let me go ahead and remove this file I

could of course resort to using the

Explorer on the left hand side which I

am in the habit of closing just to give

us more room to see see but I could go

ahead and right click or control click

on a DOT out if I want to get rid of it

or again let me focus on the command

line interface and I can use anyone

recall we didn't really use it much but

we'll command removes a file

RM so RM for remove RM a DOT out enter

remove regular file a DOT out y for yes

enter and now if I do LS again voila


it's gone all right so let's now enhance

this program to do the second version we

ever did which was to also include

cs50.h so that we have access to

functions like get string and the like

and let me go ahead and do string name

gets get string what's your name

question mark and now let me go ahead

and say hello to that name with our

percent s placeholder comma name so this

was version two of our program last time

that very easily compiled with make

hello but notice the difference now if I

want to compile this thing myself with

clang using that same lesson learned all

right let's do it clang Dash o hello

just so I get a better name for the

program

hello.c enter

and a new error pops up that some of you

might have encountered on your own so

it's a bit Arcane here and there's this

mention of a cryptic looking path with

temp for temporary there but somehow my

issues in Maine as we can see here uh it

somehow relates to hello.c even though

we might not have seen this language

last time in class but there's an

undefined reference to get string as

though get string doesn't exist now your


first instinct might be well maybe I

forgot cs50.h but of course I didn't

that's the very first line of my program

but it turns out make is doing something

else for us all this time just putting

cs50.h or any header file at the top of

your code for that matter just teaches

the compiler that a function will exist

it sort of asks the compiler to it asks

the compiler to trust that I will

eventually get around to implementing

functions like get string in cs50.h and

standard io.h like printf they're in

but this error here some kind of linker

command relates to the fact that there's

a separate process for actually finding

the zeros and ones that cs50 compiled

long ago for you that the authors of

this operating system compiled for you

long ago in the form of printf we need

to somehow tell the compiler that we

need to link in code that someone else

wrote the actual machine code that

someone else wrote and then compiled so

to do that you'd have to type dash lcs50

for instance at the end of the command

so additionally telling clang that not

only do you want to Output a file called

hello and you want to compile a file


called hello.c you also want to quote

unquote Link in a bunch of zeros and

ones that collectively Implement get

string and printf so now if I hit enter

this time it compiled OK and now if I

run dot slash hello it works as it did

last week

just like that but honestly this is just

going to get really tedious really

quickly notice already just to compile

my code I have to run clang Dash oh

hello hello Dot C Dash lcs50 and you're

gonna have to type more things too if

you wanted to use the math Library like

to use that round function you would

also have to do Dash LM typically to

specify give me the math bits that

someone else compiled and the commands

just get longer and longer so moving

forward we won't have to resort to

running clang itself but clang is indeed

the compiler that is the program that

converts from source code to machine

code but will continue to use make

because it just automates that process

and the commands are only going to get

more cryptic the more sophisticated and

more featureful your programs get and

make again is just a tool that makes all

that happen so to speak let me pause


there to see if there's any questions

before then we take a look further under

the hood yeah in front

[Music]

sure let me come back to that in a

moment what does the dash lcs50 mean

we'll come back to that visually in just

a moment but it means to link in the

zeros and ones that collectively

Implement get string and printf but

we'll see that visually in a sec yeah

behind you

[Music]

really good question how come I didn't

have to link in standard i o because I

use printf in version one standard I O

is just literally so standard that it's

built in it just works for free cs50 of

course is not it did not come with the

language C or the compiler we ourselves

wrote it and other libraries even though

they might come with the language C they

might not be enabled by default

generally for efficiency purposes so

you're not loading more zeros and ones

into the computer's memory than you need

to so standard i o is special if you

will other questions yeah

[Music]
oh what does the Des o mean so Dash o is

shorthand for the English word output

and so Dash o is telling clang to please

output a file called hello because the

next thing I wrote after the command

line recall was clang Dash o hello then

the name of the file then Dash lcs50 and

this is where these commands do get and

and stay fairly Arcane it's just through

muscle memory and practice that you'll

start to remember oh what are the other

commands that you what are the command

line arguments you can provide to

programs but we've seen this before

Technically when you run make hello the

program is called make hello is the

command line argument it's an input to

the make function albeit typed at the

prompt that tells make what you want to

make even when I used RM a moment ago

and did RM of a DOT out the command line

argument there was called a DOT out and

it's just telling RM what to delete it

is entirely dependent on the programs to

decide what their conventions are

whether you use Dash this or Dash that

but we'll see over time which ones

actually matter in practice so to come

back to the first question about what

actually is happening there let's


consider the code more closely up so

here is that first version of the code

again with standardio.h and only printf

so no cs50 stuff yet until we add it

back in it had the second version where

we actually get the human's name

when you run this command then there's

actually a few things that are happening

underneath the hood and we won't dwell

on these kinds of details indeed we'll

abstract it away so to speak by using

make but it's worth understanding at

least from the get-go how much

automation is going on so that when you

run these commands it's not magic you

actually do have this bottom-up

understanding of what's going on so when

we say you've been compiling your code

with make that's a bit of an

oversimplification technically every

time you compile your code you're having

the computer actually do four distinct

things for you and this is not four

distinct things that you need to sort of

memorize and remember every time you run

your program what's happening but it

helps to sort of break it down into

building blocks as to how we're getting

from source code like C into zeros and


ones it turns out that when you compile

quote unquote your code technically

speaking you're doing four things sort

of automatically and all at once

pre-processing it compiling it

assembling it and linking it just humans

decided let's just call the whole

process compiling but for a moment let's

just consider what these steps are are

so pre-processing refers to this if we

look at our source code here version 2

that uses the cs50 library and therefore

get string notice that we indeed have

these include lines at top and they're

kind of special versus all the other

code we've written because they start

with hash symbols specifically and

that's sort of a special syntax that

means that these are technically called

preprocessor directives fancy way of

saying they're handled special versus

the rest of your code in fact if we

focus on cs50.h recall from last week

that I provided a hint as to what's

actually in cs50.h among other things

like what was the one Salient thing that

I said was in cs50.h and therefore why

we were including it in the first place

so get strength specifically the

Prototype forget string we haven't made


many of our own functions yet but recall

that anytime we've made our own

functions and we've written them like

below Main in a file we've also had to

somewhat stupidly copy paste the

Prototype of the function at the top of

the file just to teach the compiler that

this function doesn't exist yet it does

down there but it will exist just trust

me so again that's what these prototypes

are doing for us so therefore in my code

if I want to use a function like get

string or printf for that matter

they're not implemented clearly in the

same file they're implemented elsewhere

so I need to tell the compiler to trust

me that they're implemented somewhere

else and so technically inside of cs50.h

which is installed somewhere in the

clouds hard drive so to speak that you

all are accessing via vs code there's a

line that looks like this a prototype

for the getstring function that says the

name of the function's get string it

takes one input or argument called

prompt

and that type of that prompt is a string

get string not surprisingly has a return

value and it returns a string so


literally that line and a bunch of

others are in cs50.h and so rather than

you all having to copy paste the

Prototype you can just trust that cs50

figured out what it is you can include

cs50.h and the compiler is going to go

find that prototype for you same thing

in standard i o someone else what must

clearly be in standard io.h among other

stuff that motivates our including

standard io.h2

yeah

printf the prototype for printf and

indeed I'll just change it here in

yellow to be the same and it turns out

the formats the prototype for printf is

actually pretty fancy because as you

might have noticed printf can take one

argument just something to print two if

you want to plug a value into it three

or more so the dot dot dot just

represents exactly that it's not quite

as simple a prototype as get string but

more on that another time so what does

it mean to pre-process your code the

very first thing the compiler clang in

this case is doing for you when it reads

your code top to bottom left to right is

it notices ooh here is Hash include oh

here's another hash include and it


essentially finds those files on the

hard drive cs50.h standardio.h and does

the equivalent of copying and pasting

them automatically into your code at the

very top thereby teaching the compiler

that get string and printf will

eventually exist somewhere so that's the

pre-processing step whereby again it's

just doing a find and replace of

anything that starts with hash include

it's plugging in the files there so that

you essentially get all the prototypes

you need automatically

okay what does it mean then to compile

the results because at this point in the

story your code now looks like this in

the computer's memory it doesn't change

your file it's doing all of this in the

computer's memory or RAM for you but it

essentially looks like this well the

next step is what's technically really

compiling even though again we use

compile as an umbrella term compiling

code in c means to take code that now

looks like this in the computer's memory

and turn it into something that looks

like this which is way more cryptic but

it was just a few decades ago that if

you were taking a class like cs50 in its


earlier form we wouldn't be using C if

it didn't exist yet we would actually be

using this something called Assembly

Language and there's different types of

or flavors of Assembly Language but this

is about as low level as you can get to

what a computer really understands be it

a Mac or PC or a phone before you start

getting into actual zeros and ones and

most of this is cryptic I couldn't

really tell you what this is doing

unless I really thought it through

carefully and rewound mentally years ago

from having studied it but let's

highlight a few key words in yellow

notice that this Assembly Language

that the computer is outputting for you

automatically still has mention of Main

and it has mention of get string and it

has mentioned of printf so there's some

relationship to the C code we saw a

moment ago and then if I highlight these

other things these are what are called

computer instructions at the end of the

day your Mac your PC your phone actually

only understands very basic instructions

like addition subtraction division

multiplication move into memory load

from memory print something to the

screen like very basic operations and


that's what you're seeing here these

assembly instructions are what the

computer actually feeds into the brains

of the computer the CPU the central

processing unit and it's that Intel CPU

or whatever you have that understands

this instruction and this one and this

one and this one and collectively long

story short all they do is print hello

world on the screen but in a way that

the machine understands how to do

[Music]

so let me pause here are there any

questions on what we mean by

pre-processing which just finds and

replaces the hash includes symbols among

others and and compiling which

technically takes your source code once

pre-processed and converts it to that

stuff called Assembly Language

[Music]

correct each type of CPU has its own

instruction set

indeed and as a teaser this is why at

least back in the day when we used to

like install software from CD-ROMs or

some other type of media this is why you

can't take a program that was sold for a

Windows computer and run it on a Mac or


vice versa because the commands the

instructions that those two products

understand are actually different now

Microsoft or any company could generally

write code in one language like C or

another and they can compile it twice

saving a PC version and saving a Mac

version it's twice as much work and

sometimes you get into some

incompatibilities but that's why these

steps are somewhat distinct you can now

use the same code and support even

different platforms or systems if you'd

want all right assembly assembly

thankfully this part is fairly

straightforward at least in concept to

assemble code which is step three of

four that is just happening literally

for you every time you run make or in

turn clang

this Assembly Language which the

computer generated automatically for you

from your source code is turned into

zeros and ones so that's the step that

last week I simplified and said when you

compile your code you convert it to

source code from source code to machine

code technically that happens when you

assemble your code but no one in normal

conversations says that they just say


compile for all of these terms

all right so that's

assembling there's one final step even

in this simple program of getting the

user's name and then plugging it into

printf I'm using three different

people's code if you will my own which

is in hello.c some of cs50s which is

apparently in hello.c and sorry which is

in

cs50.c which is not a file I've

mentioned yet but it stands to reason

that if there's a cs50.h that has

prototypes turns out the actual

implementation of get string and other

things are in cs50.c and there's a third

file somewhere on the hard drive so to

speak that's involved in compiling even

this simple program

hello.c

s50.c and by that logic what might the

other be

yeah

standardio.c and that's a bit of a white

lie because that's such a big fancy

library that there's actually multiple

files that compose it but the same ID

and we'll take the simplification so

when I have this code now and I compile


my code here I get those zeros and ones

that end up taking hello.c and turning

it effectively into zeros and ones that

are combined with cs50.c followed by

standardio.c as well so let me rewind

here here might be the zeros and ones

for my code the two lines of code

essentially that I wrote here might be

the zeros and ones for what cs50 wrote

some years ago in cs50.c here might be

the zeros and ones that someone wrote

for standard i o decades ago the last

and final step is that linking command

that links all of these zeros and ones

together essentially stitches them

together into one single file called

hello or called a DOT out whatever you

name it that last step is what combines

all of these different programmers zeros

and ones and my God like now we're

really in the weeds who wants to even

think about running code at this level

you shouldn't need to but it's not magic

when you're running make there's just

some very concrete steps that are

happening that humans have developed

over the years over the decades that

sort of break down this big problem of

source code going to zeros and ones or

machine code into these very specific


steps but henceforth you can call all of

this compiling

questions or confusion yeah

[Music]

sure what is a DOT out signify a DOT out

is just the conventional default file

name for any program that you compile

directly with a compiler like clang it's

just kind of a meaningless name though

it stands for assembler output an

assembler might now sound familiar from

this assembling process it's just kind

of a lame name for a computer program

and so we can override it by outputting

something like hello instead yeah

[Music]

foreign

so there are to recap there are other

prototypes in those files cs50.h

standard io.h technically they're all

included on top of your file even though

you strictly speaking don't need most of

them but indeed they are there just in

case you might want them

and finally any other questions yeah

[Music]

does it matter what order we're telling

the computer to run sometimes with

libraries yes it matters what order they


are linked in together but for our

purposes it's really not going to matter

it's just going to make is going to take

care of automating that process for us

all right so with that said henceforth

compiling technically is these four

things but we'll focus on it just as a

higher level concept and abstraction if

you will known as compiling itself so

another process that we'll now begin to

focus on all the more this week is

invariably this past week you ran

against uh ran up against some

challenges you probably created your

very first bugs or mistakes in a program

and so let's focus for a moment on

actual techniques for debugging as you

spend more time this semester in the

years to come if you continue to program

you're never frankly probably going to

write bug-free code ultimately your

programs are just going to get more

featureful more sophisticated and we're

all just going to start to make more

sophisticated mistakes and to this day I

write buggy code all the time and I'm

always horrified when I do it up here

but hopefully that won't happen too

often but when it does it's just a

process now of debugging trying to find


the mistakes in your program and you

don't have to just stare at your code or

sort of shake your fist at your code

there are actual tools that real world

programmers use to help debug their code

and find these faults so what are some

of the techniques and tools that folks

use well as an aside if you've ever

[Music]

bug in a program is a mistake that's

actually been around for some time if

you've ever heard this tale

um some 50 plus years ago in 1947 this

is actually an entry in a log book

written by famous computer scientist

known named Grace Hopper who happened to

be the one to record the very first

discovery of a quote unquote actual bug

in a computer this is actually like a

moth that had flown into at the time it

was a very sophisticated system known as

the Harvard Mark II computer sort of

very large sort of refrigerator size

type systems in which an actual bug

caused an issue

the animal bug though actually predates

this particular instance but here you

have is any computer scientist might

know the example of the first physical


bug in a computer

how though do you go about removing such

a thing well let's consider a very

simple scenario from last time for

instance when we're trying to print out

various aspects of Mario like this

column of three bricks let's consider

how I might go about implementing a

program like this and let me switch back

over to vs code here and I'm going to go

ahead and run uh write a program

and I'm not going to trust myself so I'm

going to call it buggy.c from the get-go

knowing that I'm going to mess something

up but I'm going to go ahead and include

standard io.h and I'm going to go ahead

and Define main as usual so hopefully no

mistakes just yet and now I want to

print those three bricks on the screen

using just hashes for bricks so how

about for INT I get zero I less than or

equal to 3 I plus plus now inside of my

curly braces I'm going to go ahead and

print out a hash followed by a backslash

n semicolon all right saving the file

doing make buggy enter it compiles so

there's no syntactical errors like my

code is syntactically correct but some

of you have probably seen The Logical

error already because when I run this


program I don't get this picture which

was three bricks High I seem to have

four bricks instead now this might be

jumping out at you why it's happening

but I've kept the program simple just so

that we don't have to actually find an

actual bug we can use a tool to find one

that we already know about in this case

what might be the first strategy for

actually finding a bug like this rather

than just staring at your code asking a

question trying to sort of just think

through the problem well let's actually

try to diagnose the problem more

proactively and the simplest way to do

this now and years from now is honestly

going to be used to use a function like

printf printf is a wonderfully useful

function not for formatting printing

formatted strings and all that but just

looking inside the values of variables

that you might be curious about to see

what's going on so you know what let me

do it do this I see that there's four

coming out but I intended three so

clearly something's wrong with my I

variables so let me just be a little

more pedantic let me go inside of this

Loop and just temporarily say something


explicit like I is percent I backslash n

and then just plug in the value of I

right this is not the program I want to

write it's the program I'm temporarily

writing because now I'm going to go

ahead and say make buggy dot slash buggy

and if I look now at the output I have

some helpful diagnostic information I is

0 and I get a hash I is 1 and I get a

hash two and I get a hash three and I

get half okay wait a minute I'm clearly

going too many steps because maybe I

forgot that computers are essentially

counting from zero and now oh it's less

than or equal to now you see it right

again trivial example but just by using

printf you can see inside of the

computer's memory by just printing stuff

out like this and now once you've

figured it out oh so this should

probably be less than three or I should

start counting from one there's any

number of ways I could fix this but the

most conventional is probably just to

say less than three now I can go ahead

and delete my temporary print statement

rerun make buggy dot slash buggy and

voila problem solved all right and to

this day I do this like whether it's

making a command line application or a


web application or a mobile application

it's very common to use printf or some

equivalent in any language just to poke

around and see what's inside the

computer's memory thankfully there's

more sophisticated tools than this let

me go ahead and reintroduce the bug here

and let me go ahead and reopen my

sidebar left here and let me go ahead

now and recompile the code to make sure

it's current and I'm going to run a

command called debug 50 which is a

command that's representative of a type

of program known as a debugger and this

debugger is actually built into vs code

and all debug 50 is doing for us it's

just automating the process of starting

vs codes built-in debugger so this isn't

even a cs50 specific tool we've just

given you a debug 50 command to make it

easier to start it up from the get-go

and the way you run this debugger is you

say debug 50 space and then the name of

the program that you want to debug so in

this case dot slash buggy so you don't

mention your C file you mentioned your

already compiled code and what this

debugger is going to let me do is most

powerfully walk through my code step by


step because every program we've written

thus far just kind of runs from start to

finish even if I'm not done sort of

thinking through each step at a time

with the debugger I can actually like

click on a line number and say pause

execution here and the debugger will let

me walk through my code one step at a

time one second at a time one minute at

a time at my own human pace which is

super compelling when the programs get

more complicated and they might

otherwise just fly by on the screen so

I'm going to click to the left of line

five and notice that these little red

dots appear and if I click on one it

stays and gets even redder and I'm going

to now run debug 50 on dot slash buggy

and in just a moment you'll see that a

new panel opens on the left hand side

it's doing some configuration of the

screen and now let me go ahead and zoom

out just a little bit here

so we can see more on the screen at once

and sometimes you'll see in vs code that

debug console opens up which looks very

cryptic just go back to terminal window

if that happens because at the terminal

window is where you can still interact

with your code and let's now take a look


at what's going on if I zoom in on my

buggy.c code here you'll notice that we

have

uh the same program as before but

highlighted in yellow is line five not a

coincidence that's the line I set a

so-called break point at the little sort

of Red Dot means break here pause

execution here and the yellow line has

not yet been executed but if I now at

the top of my screen notice these little

arrows there's one for play there's one

for this which if I hover over it says

step over there's another that's going

to say step into there's a third that

says step out I'm just going to use the

first of these step over and I'm going

to do this and you'll see that the

yellow highlight moved from line five to

line seven because now it's ready but

hasn't yet printed out that hash but the

most powerful thing here notice is that

top left here it's a little cryptic

because there's a bunch of things going

on that'll make more sense over time but

at the top there's a section called

variables below that something called

locals which means local to my current

function Main and notice there's my


variable called I and its current value

is zero so now once I click step over

again

Watch What Happens we go from line seven

back to line five but look in the

terminal window one of the hashes has

printed but now it's kind of printed at

my own pace I can sort of think through

this step by step notice that I has not

changed yet it's still zero because the

yellow highlighted line hasn't yet

executed but the moment I click step

over it's going to execute line five now

notice at top left I has become one

and nothing has printed yet because now

highlighted is line seven and so if I

click step over again we'll see the hash

and if I repeat this process at my own

human comfortable Pace I can see my

variables changing I can see output

changing on the screen and I can just

think about should that have just

happened and I can pause and give

thought to what's actually going on

without trying to race the computer and

figure it all out at once I'm going to

go ahead and just stop here because we

already know what this particular

problem is and that just brings me back

to my default terminal window but this


debugger let me disable the break point

now so it doesn't keep breaking this

debugger will be your friend moving

forward in order to step through your

code step by step at your own pace to

figure out where something has gone

wrong printf is great but it gets

annoying if you have to constantly add

print this print this print this print

this recompile re-run it oh wait a

minute print this print this like the

debugger just lets you do the equivalent

but automatically

questions on then this debugger which

you'll see all the more Hands-On over

time

questions on debugger yeah

you were using like a step over feature

what do the other features in this

really good question we'll see this

before long but those other buttons that

I glossed over like step into and step

out of actually let you step into

specific functions if I had any more

than main so if Maine called a function

called something and something called a

function called something else instead

of just stepping over the entire

execution of that function I could step


into it and walk through its lines of

code one by one so anytime you have a

problem set you're working on that has

multiple functions you can set a

breakpoint in main if you want or you

can set it inside of one of your

additional functions to focus your

attention only on that and we'll see

examples of that over time

all right so what else and what's the um

you know the sort of elephant in the

room so to speak is actually a duck in

this case why is there this duck in all

of these ducks here well it turns out a

third genuinely recommended debugging

technique is talking through problems

talking through code with someone else

now in the absence of having a family

member or a friend or a roommate who

actually wants to hear you talk about

code of all things

um generally programmers turn to a

rubber duck or other inanimate objects

uh if something animate is not available

and the idea behind rubber duck

debugging so to speak is that simply by

looking at your code

and talking it through okay on line

three I'm I'm starting a for Loop and

I'm initializing I to zero okay then I'm


printing out a hash just by talking

through your code step by step

invariably finds you having the

proverbial light bulb go off over your

head because you realize wait a minute I

just said something stupid or I just

said something wrong and this is really

just a proxy for any other human

teaching fellow teacher friend colleague

but in the absence of any of those

people in the room you're welcome to

take on your way out today one of these

little rubber ducks and consider using

it for real anytime you just want to

talk through one of your problems in

cs50 or maybe life more generally but

having it there on your desk is just a

way to help you hear ill logic in what

you think might otherwise be logical

code so printf debugging rubber duck

debugging are just three of the ways

you'll see over time to sort of get to

the source of code that you will write

that has mistakes which is going to

happen but it'll Empower you all the

more to solve of those mistakes

are any questions

on debugging in general or these three

techniques
yeah

what's the difference between step over

and step into at the moment the only one

that's applicable to the code I just

wrote is step over because it means step

over each line of code if though I had

other functions that I had written in

this program maybe lower down in the

file I could step into those function

calls and walk through them one at a

time so we'll come back to this with an

actual example but step into will allow

me to do exactly that in fact this is a

perfect segue to doing a little

something like this let me go ahead and

open up maybe another file here actually

we'll use the same buggy and we're just

going to write one other thing that's

buggy as well let me go ahead up here

and include as before cs50.h let me

include standard standard io.ah

let me do in main void so all of this I

think is correct so far and let's do

this let's give myself an INT called I

and let's ask the user for a negative

integer this is not a function that

exists technically yet but I'm going to

assume for the sake of discussion that

it does and then I'm just going to go

ahead and print out with percent I and a


new line whatever the human typed in so

at this point in the story my program I

think is correct except for the fact

that get Negative int is not a function

in the cs50 library or anywhere else I'm

going to need to invent it myself so

suppose in this case that I declare a

function called get Negative int uh its

return type so to speak should be int

because as its name suggests I want to

hand the user back an integer and it's

going to take no input to keep it simple

so I'm just going to say void there no

inputs no special prompts nothing like

that let me now give myself some curly

braces and let me do something familiar

perhaps now from problem set one let me

give myself a variable like n and let me

do the following

within this block of code assign n the

value of get int asking the user for a

negative integer using get in its own

prompt and I want to do this while n is

less than zero because I want to get a

negative int from the user and recall

from having used this block in the past

I can now return n as the very last step

to hand back whatever the user has typed

in so long as they cooperated and gave


me an actual negative integer now I've

deliberately made a mistake here and

it's a subtle sort of silly mathematical

one but let me compile this program

after copying now the Prototype up to

the top just so I don't make that

mistake again let me do make buggy enter

and now let me dot slash buggy I'll give

it a negative integer like negative 50.

uh huh that did not take uh how about

how about negative five maybe it's two

no uh how about zero

huh all right so it's clearly sort of

working backwards or incorrectly here

logically so how could I go about

debugging this well I could do what I've

done before I could use my printf10

technique and say something explicit

like n is percent I

new line comma I just sorry comma n just

to print it out let me recompile buggy

let me rerun buggy let me type in

negative 50. okay n is negative 50. so

that didn't really help me at this point

um because that's the same as before so

let me do this debug 50 dot slash buggy

oh but I've made a mistake so I didn't

set my break point yet so let me go

ahead and do this and I'll set a break

point this time I could set it here on


line eight let's do it in main as before

let me rerun debug 50 now on dot slash

buggy that's fancy user interface is

going to pop up it's going to highlight

the line that I set the break point on

notice that on the left hand side of the

screen I is defaulting at the moment to

zero because I haven't typed anything in

yet but let me go ahead now and step

over this line that's highlighted in

yellow

and you'll see that I'm being prompted

so let's type in my negative 50 enter

all right and notice now that

I'm stuck in that function so all right

so clearly the issue seems to be in my

get Negative int function so okay let me

go ahead and stop this execution my

problem doesn't seem to be in main per

se maybe it's down here so that's fine

let me set my same breakpoint at line

eight let me rerun debug 50 one more

time but this time instead of just

stepping over that line let's step into

it so notice line 8 is again highlighted

in yellow in the past I've been clicking

step over let's click step into now and

when I click step into boom now the

debugger jumps into that specific


function and now I can step through

these lines of code again and again I

can see what the value of n is as I'm

typing it in I can think through my

logic and voila hopefully once I've

solved the issue I can exit the debugger

fix my code and move on

so step over just goes over the line but

executes it step into lets you go into

other functions you've written

all right so let's go ahead and do this

we've got a bunch of possible approaches

that we can take to solving some problem

let's go ahead and Pace ourselves today

though let's take a five minute break

here and when we come back we'll

actually take a look at that computer's

memory we've been talking about see you

in five

all right

so let's

let's dive back in and up until now both

by way of week one and problem set one

for the most part we've just translated

from scratch and to see all of these

basic building blocks like loops and

conditionals Boolean Expressions

variables so sort of more of the same

but there are features in C that we've

already stumbled across already like


data types the types of variables that

doesn't exist in scratch but that in

fact does exist in other languages

inspect a few that we'll see before long

so to summarize the types we saw last

week just recall this little list here

we had in syn floats and Longs and

doubles and chars there's also bulls and

also string which we've seen a few times

but today let's actually start to

formalize what these things are and

actually like what your Mac and PC are

doing when you manipulate bits as an INT

versus a Char versus a string versus

something else and see if we can't put

more tools into your toolkit so to speak

so we can start quickly writing more

featureful more sophisticated programs

in C so it turns out that on most

systems nowadays though this can vary by

actual computer this is how large each

of the data types typically is in C when

you store Boolean value a zero or One A

true or a false or true it actually uses

one byte that's actually a little

excessive because strictly speaking you

only need one bit which is 1 8 of this

size but for Simplicity computers use a

whole byte to represent a bull true or


false a Char we saw last week is

actually only one byte or eight bits and

this is why ASCII which uses one byte or

technically only seven bits early on was

confined to only 256 maximally possible

characters notice that an INT is four

bytes or 32-bit 32 bits a float is also

four bytes or 32 bits but the things

that we called long it's literally twice

as long eight bytes or 64 bits and so is

a double a double is 64 bits of

precision for floating Point values and

a string for today we're going to leave

as a question mark because we'll come

back to that later today day and next

week as to how much space of string

takes up but suffice it to say it's

going to take up a variable amount of

space depending on whether the string is

short or long but we'll see exactly what

that means before long

so here's a photograph of a typical

piece of memory inside of your Mac or PC

or phone and odds are it might be just a

little smaller in some devices this is

known as Ram or random access memory and

each of these little black chips on this

circuit board the green thing these

little black chips are where zeros and

ones are actually stored each of those


stores some number of bytes maybe

megabytes maybe even gigabytes nowadays

so let's actually focus on just one of

those chips just to give us a sort of

zoomed in version thereof and let's

consider the fact that even though we

don't have to care exactly how this kind

of thing is made if this is like one

gigabyte of memory for the sake of

discussion it stands to reason that if

this thing is storing 1 billion bytes

one gigabyte then we can number them

kind of arbitrarily like maybe this will

be byte zero one two three four five six

seven eight and then maybe way down here

in the bottom right corner is byte

number one billion right we can just

number these things as might be our our

convention so let's actually draw that

graphically no with a billion squares

but fewer than those and let's just zoom

in further and consider that all right

at this point in the story let's

abstract away all the hardware and all

the little wires and just think of

memory as taking up a rather just think

of data as taking up some number of

bytes so for instance if you were to

store a Char in a computer's memory


which was one byte it might be stored

literally at this like top left-hand

location of this this black Chip of

memory if you were to store something

like an integer that uses four bytes

well it might use four of those bytes

but they're going to be contiguous back

to back to back in this case if it were

to store a long or a double you might

actually need eight bytes so I'm just

kind of filling in these squares to

represent how much memory and given

variable of some data type would take up

one or four or eight in this case here

well from here let's go ahead and just

abstract away from all of the hardware

and just really focus on memory as being

a grid or really like a canvas that we

can paint any types of data onto that we

want at the end of the day all of this

data is just going to be zeros and ones

but it's up to you and I to sort of

build abstractions on top of that things

like actual numbers and colors and

images and movies and Beyond but we'll

start lower level here first suppose I

had a program that needs three integers

like a simple program whose purpose in

life is to like average your three

scores on an exam or some such thing and


suppose that your three scores were the

72 and 73 not too bad and 33 which is

particularly low let's go ahead and

write a program that actually does this

kind of averaging for us let me go back

to vs code here let me open up a file

called scores.c

and let me go ahead and implement this

as follows let me include standard io.h

at the top int main void as before and

then inside of main let me go ahead and

declare score one which is 72 give me

another score uh 73 and then a third

score called score three which is going

to be 33 and now I'm just going to use

printf to print out the average of those

things and I can do this in a few

different ways but I'm going to just

print out percent F and I'm going to do

score one plus score two plus score 3

divided by three

close parenthesis semicolon so just some

relatively simple arithmetic just to

compute the average of three scores if

I'm curious like what my average grade

is in the class with these three

assessments all right let me go ahead

now and do make scores

huh all right so I've somehow made an


error already but this one is actually

germane to a problem we

hopefully won't can counter too

frequently what's going on here so

underline to score one plus score two

plus score three divided by three format

specifies type double but the argument

has Type int

well what's going on here because the

arithmetic seems to check out yeah

[Music]

correct and we'll come back to this in

more detail but indeed What's Happening

Here is I'm adding three ins together

obviously because I Define them right up

here and I'm dividing by another ant

three but the catches recall that c when

it performs math treats all of these

things as integers but integers are not

floating Point values so if you actually

want to get a precise average for your

score without throwing away the

remainder everything after the decimal

point it turns out in this case we're

going to have to we're going to oh we're

gonna have to we're gonna have to

convert this whole expression somehow to

a float and there's a few ways to do

this but the easiest way for now I'm

just going to go ahead and do this up


here I'm going to change the divide by 3

to divide by 3.0 because it turns out

long story short in C so long is one of

the values participating in an

arithmetic expression like this is

something like a float the rest will be

treated as promoted to so to speak a

floating point value as well so let me

now recompile this code code with make

scores enter this time it worked okay

because I'm treating a float as a float

and let me dot slash scores enter all

right my average is

59.333333 and so forth all right so the

math presumably checks out floating

point in Precision per last week aside

but let's consider the design of this

program like what is kind of

bad about it or if we maintain this

program longer term are we going to

regret the design of this program

what might not be ideal here yeah

[Music]

yeah so in this case I have hard-coded

my three score so if I'm hearing you

correctly uh this program is only ever

going to tell me this specific average

I'm not even using something like get

int or get flow to get three different


scores so that's not good and suppose

that we wait later in the semester I

think other problems could arise yeah

[Music]

I can't reuse the number because I

haven't stored the average in some

variable which in this program not a big

deal but certainly if I wanted to reuse

it elsewhere that's a problem and let's

fast forward again a little later in the

semester I don't just have three test

scores or exam scores maybe I have four

or five or six where might this take us

[Music]

yeah I've sort of capped this program at

three and honestly this is just kind of

bordering on copy paste even though the

variables yes have different names score

one score two score three imagine doing

this for like a whole grade book for a

class having score four five six eleven

ten twelve twenty Thirty like that's a

lot of variables and you can imagine

just how ugly the code starts to get if

you're just defining variable after

variable after variable so it turns out

there are better ways in languages like

C if you actually want to have multiple

values stored in memory that happen to

be of the same data type and so let's


take a look back at this memory here to

see what these things might look like in

memory here's that grid of memory and

each of these recall represents a byte

so just to be clear if I store score one

in memory first how many bytes will it

take up

so four AKA 32 bits so I might draw

score one as filling up this part of the

memory it's really up to the computer as

to whether it goes here or down there or

wherever I'm just keeping the pictures

clean though for today from the top left

on down if I then declare another

variable called score two it might end

up over there also taking up four bytes

and then score three might end up here

and so that's just representing what's

going on inside of the computer's memory

but technically speaking to be clear per

week zero What's really being stored in

the computer's memory are patterns of

zeros and ones 32 total in this case

because 32 bits is four bytes but again

it sort of gets boring quickly to think

at think in and look at binary all the

time so we'll generally abstract this

away as just using decimal numbers in

this case instead but there might be a


better way to store not just three of

these things but maybe four maybe five

maybe ten maybe more by declaring one

variable to store all of them instead of

three or four or five or more individual

variables and the way to do this is by

way of something generally known as an

array an array is another type of data

that allows you to store

multiple values of the same type back to

back to back that is to say contiguously

so an array can let you create memory

for one int or two or three or even more

than that but describe them all using

the same variable name the same one name

so for instance if for one program I

only need three integers but I don't

want to sort of

uh messily declared them as score one

score two score three I can actually do

this instead and this is today's first

new piece of syntax the square brackets

that we're now seeing this line of code

here is similar to int score one

semicolon or int score one equals 72

semicolon this line of code is declaring

for me so to speak an array of size

three and that array is going to store

three integers why because the type of

that array
is an INT here the square brackets tell

the computer how many inch you want in

this case three and the name is of

course scores which in English I've just

deliberately pluralized now so that I

can describe this array as storing

multiple scores indeed so if I want to

now assign values to this variable

called scores I can do code like this I

can say scores bracket zero equals 72

scores bracket 1 equals 73 and scores

bracket 2 equals 33. the only thing

weird there is admittedly the square

brackets which are still new but we're

also notice zero indexing things to zero

index means to start counting at zero

and we've talked about that before our

for Loops have generally been zero

indexed arrays in C

are zero indexed and you do not have

Choice over that you can't just start

counting it one in arrays just because

you prefer to you'd be sacrificing one

of the elements you have to start an

arrays counting from zero

so out of context this doesn't

necessarily solve a problem but it

definitely is going to once we have more

than even three scores here in fact let


me go ahead and change this program a

little bit let me go back to vs code

here and let me go ahead and delete

these three lines here and let me

replace it with a scores variable that's

ready to store three total integers and

then let me go ahead and initialize them

as follows scores bracket zero is 72 as

before scores bracket one is going to be

73 scores bracket 2 is going to be 33.

notice I do not need to say int before

any of these lines because that's been

taken care of already for me on line

five where I already specified that

everything in this array is going to be

an INT

now down here this code needs to change

because I no longer have three variables

score one two and three I have one

variable but that I can index into

I'm going to here then do scores bracket

zero plus scores bracket one plus scores

bracket two which is equivalent to what

I did earlier giving me back those three

integers but notice I'm using the same

variable name every time and again I'm

using this new square bracket notation

to quote unquote index into the array to

get at the first int the second int and

the third and then to do it again down


here now this program's still not really

solving all the problems we described

like I still can only store three scores

but we'll come back to something like

that before long but for now we're just

introducing a new syntax and a new

feature whereby I can now store multiple

values in the same variable

well let's enhance this a bit more

instead of hard coding these scores as

was identified as a problem let's go

ahead and use get int to ask the user

for a score let's then use get int to

ask the user for another score let's use

get int to ask the user for a third

score storing them in those respective

locations and now if I go ahead and save

this program recompile scores

huh I've messed up here but now these

errors should be getting a little

familiar what mistake did I make

let me give folks a moment

cs50.h so that was not intentional so

still making mistakes all these years

later I need to include cs50.h now I'm

going to go back to the bottom in the

terminal window make scores okay we're

back in business dot slash scores now

the program's getting a little more


interesting so maybe this year was

better and I got a 199 and a 98 and

there my average is

99.000. so now it's a little more

Dynamic so it's a little more

interesting but it's still capping the

number of scores at three admittedly but

now I've kind of introduced another

sort of symptom of bad programming

there's this expression in programming

too called code smell where like

something smells a little off and

there's something off here in that I

could do better with this code here does

anyone see an opportunity to improve the

design of this code here if my goal

still is just to get three scores from

the user but without it like smelling

kind of bad yeah

[Music]

yeah exactly those lines of code are

almost identical and honestly the only

thing that's changing is the number and

it's just incrementing by one we have

all of the building blocks to do this

better so let me go ahead and improve

this let me go ahead and delete that

code let me go ahead now and have a for

Loop so for INT I get zero I less than

three I plus plus then inside of this


for loop I can distill all three of

those lines into something more generic

like scores bracket I equals get int and

now ask the user just once if I get into

four score so this is where arrays start

to get pretty powerful you don't have to

hard code that is literally type in all

of these magic numbers like zero one and

two you can start to do it

programmatically as you propose with a

loop so now I've kind of tightened

things up I'm now dynamically getting

three different scores but putting them

in three different locations and so this

program ultimately is going to work

pretty much the same make scores dot

slash scores and 199 98 and we're back

to the same answer but it's a little

better design too if I really want to

nitpick there's something that still

smells a little bit here the fact that I

have indeed this magic number three here

that really kind of has to be the same

as this number here otherwise who knows

what's going to go wrong so what might

be a solution per last week to kind of

cleaning that code up further too

okay so we could leave it up to the

user's discretion and so we could


actually do something like this let me

take this a few steps ahead let me say

something like int and get get int how

many scores question mark then I could

actually change this to an n and then

this to an end and indeed make the whole

program Dynamic ask the human how many

tests have there been this semester then

you can type in each of those scores

because the loop is going to iterate

that many times and then you'll get the

average of one test two test three I

lost another

or however many scores that were

actually specified by the user

Yeah question

[Music]

how how many bytes are used in an array

[Music]

uh so the purpose of an array is not to

save space it's to eliminate having

multiple variable names because that

just gets very messy quickly if you

literally have score one score two score

three dot dot score 99 that's literally

like 99 different variables potentially

that you could actually collapse into

one variable that has 99 locations if

you will at different indices or indexes

as someone would say the index for an


array is whatever's in the square

brackets

thank you

[Music]

so it's a good question so if you I'm

using ins for everything and honestly we

don't really need ants for scores

because I'm not really likely to get a 2

billion on a test anytime soon and so

you could actually use different data

types and that list we had on the screen

earlier is not all of them there's

actually a data type called short which

is literally shorter than an INT you

could actually technically use Char in

some form or even other data types as

well generally speaking in the year 2021

these tend to be over overly optimized

uh decisions like everyone just uses ins

even though no one's going to get a test

score that's 2 billion or more because

int is just kind of the go-to years ago

memory was expensive and every one of

your instincts would have been spot on

because memory is so tight but nowadays

we don't worry as much about it yeah

[Music]

uh so what is the difference between

dividing two ins and not getting an


error as you might have encountered in a

program like cash versus dividing two

ants and getting an error like I did a

moment ago the problem with the scenario

I created moment ago was printf was

involved and I was telling printf to use

a percent F but I was giving printf the

result of dividing integers by another

integer so it was printf that was

yelling at me and I'm guessing in the

scenario you're describing for something

like cash printf was not involved in

that particular line of code so that's

the difference there

all right so we now have this ability to

create uh an array and an array can

store multiple values what then might we

do that's more interesting than just

storing numbers in memory well let's

take this one step further as opposed to

just storing 72 73 33 or 199.98 at these

given locations because again an array

gives you one variable name but multiple

locations or indices they're in bracket

zero bracket one bracket two on up if it

were even bigger than that let's now

start to consider something more modest

like simple chars chars being one byte

each so they're even smaller they take

up much less space and indeed if I


wanted to say a message like hi I could

use three variables if I wanted a

program to print high h i exclamation

point literally I could of course store

those in three variables like C1 C2 C3

and let's just for the sake of

discussion let's go ahead and whip this

up real quickly let me create a new

program here now in vs code this time

I'm going to call it High Dot C and I'm

not going to bother with the cs50

library here I just need the standard i

o one for now int main void and then

inside of main I'm going to Simply

create three variables and this is

already hopefully striking you as a bad

idea but we'll go down this road

temporarily with C1 and C2 and finally

C3 storing each character in the phrase

I want to print and I'm going to go

ahead now and print this in a different

way than usual now I'm doing with chars

and we've generally dealt with strings

which was easier certainly last week but

percent C percent C percent C well let

me print out three chars and like C1 C2

and C3 so kind of a stupid way of

printing out a string so we already have

a solution to this problem last week but


let's just poke around at what's

actually going on underneath the hood

here so let's make High dot slash high

and voila no surprise but we again could

have done this last week with a string

and just one variable or even zero at

that but let's go ahead now and start

converting these characters to their

apparent numeric equivalents like we

talked about in week zero two let me go

ahead and modify these percent C's just

to be fun to be percent I's and let me

just add some spaces so that there are

gaps between each of them let me now

recompile High

and let me rerun it and just to guess

what should I see on the screen now

and he guesses yeah

the ASCII values and it's intentional

that I keep using the same word hi

because it should be hopefully the old

friends 72 73 and 33 which is to say

that c knows about ASCII or equivalently

Unicode and can do this conversion for

us automatically and it seems to be

doing it implicitly for us so to speak

notice that C1 C2 and C3 are obviously

charged but printf is able to tolerate

printing them as integers if I really

want it to be pedantic I could use this


technique again known as type casting

where I can actually convert one data

type to another if it makes logical

sense to do so and at the end of the day

we saw in week zero chars or characters

are just numbers like 72 73 and 33 so I

can actually use this parenthetical

expression to convert incorrectly three

chars to three integers instead so

that's what I meant to type the first

time there we go strike two today so

parenthesis and close parenthesis just

says take whatever variable comes after

this C1 or C2 or C3 and convert it to an

INT the effect is going to be no

different here make high and then

re-running whoops then running dot slash

High still works the same but now I'm

explicitly converting chars to ins and

we can do this all day long charge to

ins floats to ins insta floats sometimes

it's equivalent other times you're going

to lose information taking a float to an

INT just intuitively is going to throw

away everything after the decimal point

because after all an INT has no decimal

point but for now I'm going to go ahead

and Rewind to the version of this that

just did implicit type conversion or


implicit casting just to demonstrate

that we can indeed see the values

underneath the hood all right let me go

ahead and do this now the week one way

this was kind of stupid let's just do

printf quote unquote

actually let's do this string s equals

quote unquote high and then let's go

ahead and do a simple printf with

percent s printing out s is there so now

I've rewound to last week where we began

this story but you'll notice that if we

keep playing around with this whoops uh

what did I do here oh and let me

introduce the cs50 library here more on

that before long let me go ahead and

recompile re-run this we seem to be

coding in circles here like I've just

done the same thing multiple different

ways but there's clearly an equivalence

then between sequences of chars and

strings and if you do it the real

pedantic way you have like three

different variables C1 C2 C3

representing Hi exclamation point or you

can just treat them all together like

this h i exclamation point but it turns

out that strings are actually

implemented by the computer in a pretty

now familiar way


what might a string actually be as of

this point in the story

where are we going with this let me try

to look farther back in way back yeah

yeah a string might be and indeed is

just an array of characters so last week

we just took for granted that strings

exist technically strings exist but

they're implemented as arrays of

characters which actually opens up some

interesting possibilities for us because

let me see let me see if I can do this

let me try to print out now three

integers again but if string s is but an

array as you propose maybe I can do s

bracket zero s bracket one and S bracket

two so maybe I can start poking around

inside of strings even though we didn't

do this last week so I can get at those

individual values so make High Dot slash

high and voila there we go again it's

the same 72 73 33 but now I'm sort of

hopefully like wrapping my mind around

the fact that all right a string is just

an array of characters an arrays you can

index into them using this new square

bracket notation so I can get it any one

of these individual characters and heck

convert it to an integer like we did in


week zero as I might

but let me get a little curious now too

what might what else might be in the

computer's memory well let's toggle back

to the the depiction of these same

things here might be how we originally

implemented high with three variables C1

C2 C3 of course that map to these

decimal digits or equivalent these

binary values but what was this looking

like in memory literally when you create

a string in memory like this string s

equals quote unquote High let's consider

what's going on underneath the hood so

to speak well as an abstraction a string

it's h i exclamation point taking up it

would seem three bytes right I've gotten

rid of the bars there because if you

think of a string as a type I'm just

going to use one big box of size three

but technically

a string we've just revealed is an array

and the array is of size three so

technically if the string is called s s

bracket zero will give you the first

character s bracket one the second and S

bracket three the Third

but let me ask this question now if this

at the end of the day is the only thing

in your computer memory and the ability


like a canvas to draw zeros and ones or

numbers or characters or whatever on it

but that's it like this is what your Mac

and PC and phone ultimately reduced to

suppose that I'm running a piece of

software like a text messenger and now I

write down by exclamation point well

where might that go in memory well it

might go here b y e and then the next

thing I type I go here here here and so

forth my memory just might get filled up

over time with things that you or

someone else are typing

but then how does the computer know if

potentially b y e exclamation point is

right after h i exclamation point where

one string ends and the next one begins

right all we have are bytes or zeros in

one so if you were designing this how

would you implement some kind of

delimiter between the two or figure out

what the length of a string is what do

you think

okay so the right answer is use a null

character and for those who don't know

what does that mean

yeah so it's a special character let me

describe it as a sentinel character

humans decided some time ago that you


know what if we want to delineate where

one string ends and where the next one

begins we just need some special symbol

and the symbol they'll use is generally

written as backslash zero this is just

shorthand notation for literally eight

zero bits zero zero zero zero zero zero

zero zero and the nickname for eight

zero bits in this context is null n-u-l

so to speak and we can actually see this

as follows if you look at the

corresponding decimal digits like you

could do by just doing out the math or

doing the conversion like we've done in

code you would see for storing High 72

73 33 but then one extra byte that's

sort of invisibly there but that is all

zeros and now I've just written it as

the decimal number zero the implication

of this is that the computer is

apparently using not three bytes to

store a word like hi

but four bytes whatever the length of

the string is plus one for this special

Sentinel value that demarcates the end

of the string so we might draw it like

this instead and this character is again

sort of pronounced null or written n-u-l

so that's all right if humans at the end

of the day just have this canvas of


memory they just needed to decide all

right well how do we distinguish one

string from another because it's a lot

easier with chars individually it's a

lot easier with ins it's even easier

with floats why because per that chart

earlier every character is always one

byte every int is always four bytes

every Wong is always eight bytes how

long is a string well high is one two

three with an exclamation point bi is

one two three four with an exclamation

point David is d-a-v-i-d5 without an

exclamation point and so a string can be

any number of uh bytes long so you

somehow need to draw a line in the sand

to separate in memory one string from

another her

so what's the implication of this well

let me go back to code here let's

actually poke around this is a bit

dangerous but I'm going to start looking

at memory locations past my string here

so let me go ahead and recompile uh make

High

whoops what did I do here oh I forgot a

format code let me add one more percent

now let me go ahead and rerun make High


Dot slash High enter there it is so you

can actually see in the computer

unbeknownst to you previously that

there's indeed something else going on

there and if I were to make like one

other variant of this program let's get

rid of just this one word and Libby

let's have two so let me give myself

another string called t for instance

just just this common convention with by

exclamation point let me then go ahead

and print out with percent s s and let

me also print out with percent s whoops

print F print out t as well let me just

recompile this program and obviously the

out this is what happens when I go too

fast

all right third mistake today close

quote as I was missing make High

fourth mistake today make High Dot slash

High okay voila now we have a program

that's printing both high and by only so

that we can consider what's going on in

the computer's memory if s is storing

high and apparently one bonus byte that

demarcates the end of that string bi is

apparently going to fit into the

location directly after and it's

wrapping around but that's just an

artist's rendition here but bye


exclamation point is taking up one two

three four plus a fifth byte

as well

all right any questions on this

underlying representation of strings and

will contextualize this before long so

that this isn't just like okay who

really cares this is going to be the

source of actually implementing things

in fact for problems that too like

cryptography and encryption and actually

scrambling actual human messages but

some questions first

[Music]

a good question too and let me summarize

as if we were instead to use chars all

the time we would indeed have to know in

advance how many charge you want for a

given string that you're storing how

then does something like get string work

because when cs50 wrote the get string

function we obviously don't know how

long the words are going to be that you

all are typing in it turns out next uh

two weeks from now we'll see that get

string uses a technique known as dynamic

memory allocation and it's going to grow

or Shrink the array uh automatically for

you but more on that soon other


questions

about

that good question why are we using a

null value isn't it wasting a bite yes

but I claim there's really no other way

to distinguish the end of one string

from the start of another unless we make

some sort of Mark uh notation so to

speak in memory all we have at the end

of the day inside of a computer are bits

therefore all we can do is spin those

bits in some creative way to solve this

problem and so we're minimally going to

spend one byte to solve this problem

here yeah

[Music]

slash and if we don't have it or

if you don't how does a the computer

know to move to a next line when you

have a backslash and so backslash n even

though it looks like two characters it's

actually stored as just one byte in the

computer's memory there's a mapping

between it and an actual number and you

can see that for instance on the ASCII

chart from the other day

[Music]

it would be if I had put a backslash n

in my code here right after the

exclamation point here and here that


would actually shift everything in

memory because we would need to make

room for a backslash in here and another

one over here so it would take two more

bytes exactly other questions

my exclamation point is

72 733 if we are to write those numbers

in the screen and

how will the computer know what's 72 and

what's a

and what's the last thing you said

it's context sensitive so if at the end

of the day all we're storing is these

numbers like 72 73 33 recall that it's

up to the program to decide based on

context how to interpret them and I

simplified this story in week zero

saying that photoshop interprets them as

RGB colors and iMessage or a text

messaging program interprets them as uh

letters and Excel interprets them as

numbers

how those programs do it is by way of

variables like string and int and float

and in fact later this semester we'll

see a data type via which you can

represent a color as a triple of numbers

and red value a green value and a blue

value so we'll see other data types as


well yeah

[Music]

really interesting question why could we

not just make all data types variable in

size in some languages some libraries do

exactly this C is an older language and

so because memory was expensive memory

was limited the reality was you gain

benefits from just standardizing the

size of these things you also get

performance increases in the sense that

if you know every int is four bytes you

can very quickly and we'll see this next

week jump from integer to another to

another in memory just by adding four

inside of those square brackets you can

very quickly poke around whereas if you

had variable length numbers you would

have to kind of follow follow looking

for the end of it follow follow you

would have to look at more locations in

memory so that's a topic we'll come back

to but it was generally for efficiency

and other question yeah

[Music]

good question why not store the okay

same one why not store the null

character at the beginning uh you could

I uh let's see if why not store it at

the beginning
you could do that

um you could absolutely well could you

do this

[Music]

if you were to do that at the beginning

short answer no okay no I retract that

no I because I finally thought of a

problem with this if you store it at the

beginning instead we'll see in just a

moment how you can actually write code

to figure out where the end of a string

is and the problem there is you wouldn't

necessarily know if you eventually hit a

zero at the end of the string because

it's the number zero in the context of

like Excel using some memory or if it's

the context of some other data type

altogether so the fact that we've

standardized the fact that we've

standardized strings as ending with null

means that we can reliably distinguish

one variable from another in memory and

that's actually a perfect segue not

actually using this primitive to

building up our own code that

manipulates these things at a lower

level so let me go ahead and do this let

me create a new file this time called

length and let's just use this basic


idea to figure out what the length of a

string is after uh it's been stored in a

variable here so let's go ahead and do

this let me include both the cs50

header and the standard i o header give

myself int main void again here and

inside of main let me go ahead and do

this let me prompt the user for a string

s and I'll ask them for a string like

their name here

and then let me go ahead and actually

let me name it more verbosely name this

time and now let me go ahead and do this

let me iterate over every character in

this string in order to figure out what

its length is so initially I'm going to

go ahead and say this int length equals

zero because I don't know what it is yet

so we're going to start at zero and then

while the following is true while

let me do I want to do this

let me change this to I just for clarity

let me go ahead and do this while name

bracket I does not equal that special

null character so I typed it on the

slide as nul but you don't write nul in

code you actually use its numeric

equivalent which is backslash zero in

single quotes while

name bracket I does not equal the null


character I'm going to go ahead and

increment I to I plus plus and then down

here I'm going to print out the value of

I to see what we actually get printing

out the value of I alright so what's

going to happen here let me go ahead and

run make length fortunately no errors

dot slash length and let me type in

something like Hi exclamation point

Enter

and I get three let me try by

exclamation point Enter and I get four

let me try my own name David enter 5 and

so forth so what's actually going on

here well it seems that by way of this

for Loop we are specifying a local

variable called I initialized to zero

because we're figuring out the length of

the string as we go I'm then asking the

question does location zero that is I in

the name

string which we now know is an array

does it not equal backslash zero because

if it doesn't that means it's an actual

character like H or b or d so let's

increment I then let's come back around

to line nine and let's ask the question

again now I equals one so does name

bracket one not equal backslash zero


well if it doesn't and it won't if it's

an i or a y or an A based on what I

typed in we're going to increment I once

more fast forward to the end of the

story once I get to the end of the

string technically one space

past the end of the string name bracket

I will equal backslash zero so I don't

increment I anymore I end up just

printing the result so what we seem to

have here with some low level C code

just this while loop is a program that

figures out the length of a given string

that's been typed in let's practice our

abstraction and decompose this into

maybe a helper function here let me

actually grab all of this code here and

let me assume for the sake of discussion

for a moment that I can just call a

function now called string length and

the length of the string is name that I

want to get and then I'll go ahead and

print out just as before with percent I

the length of that string so now I'm

abstracting away this notion of figuring

out the length of the string that's an

opportunity for demi to create my own

function if I want to create a function

called string length I'll claim that I

want to
take a string as input

and what should I have this function

return as its return type

what should get string presumably return

but yeah

an inch right and it makes sense float

really wouldn't make sense because we're

measuring things that are uh integers in

this case the length of something so

indeed let's have it return an INT I can

pretty much use the same code as before

so I'm just going to paste what I cut

earlier in the file and the only thing I

have to change here is the name of the

variable because now this function I

decided kind of arbitrarily that I'm

going to call it s just to be more

generic so I'm going to look at s

bracket I at each location and I don't

want to print it at the end this would

be a side effect what's the line of code

I should include here if I actually want

to hand back the total length yeah

say again

return I in this case so I'm going to go

ahead and return I not print it because

now my main function can use the return

value stored in length and print it on

the next line itself I just need a


prototype so that's my one forgivable

copy paste here I'm going to rerun make

length hopefully I didn't screw up I

didn't dot slash length I'll type in hot

oops I'll type in high again

that works I'll type in buy again and so

forth all right so now we have a

function that determines the length of a

string well it turns out we didn't

actually need this all along it turns

out that we can get rid of my own custom

string length function here I can

definitely delete the whole

implementation down here because it

turns out in a file called string.h

which is a new header file today we

actually have access to a function

called more succinctly Sterling

s-t-r-l-e-n which literally does that

this is a function that comes with C

albeit in the string.h header file and

it does pretty much what we just

implemented manually so here's an

example of admittedly a wheel we just

reinvented but no more we don't have to

do that and how do you know what kinds

of functions exist well let me actually

pop out of my browser here to a website

that is a cs50s incarnation of what are

called manual pages it turns out that in


a lot of systems Macs and Unix and Linux

systems including the visual studio code

instance that we have in the cloud there

are publicly accessible manual pages for

functions they tend to be written very

expertly in a way that's not very

beginner friendly so what we have here

at manual.cs50.ao is cs50s version of

manual pages that have this less

comfortable mode that give you a sort of

cheat sheet of very frequently used

helpful functions in C and we've

translated the sort of expert notation

to things that a beginner can understand

so for instance let me go ahead and

search for string up at the top here

you'll see that there's documentation

for our own get string function but more

interestingly down here there's a whole

bunch of string related functions that

we haven't even seen most of yet but

there's indeed one here called Sterling

calculate the length of a string and so

if I actually go to Sterling here I'll

see some less comfortable documentation

for this function and the way a manual

page typically Works whether in cs50s

format or any other system is you see

typically a synopsis of what header


files you need to use the function so

you would copy paste these couple of

lines here you see what the Prototype is

of the function so that you know what

its inputs are if any and its outputs

are if any then down below you might see

a description which in this case is

pretty straightforward this function

calculates the length of s then you see

what the return value is if any and you

might even see an example like this one

that we've whipped up here so these

manual pages which are again accessible

here and we'll link to these in the

problems that's moving forward are

pretty much the place to start when you

want to figure out has a wheel been

invented already is there a function

that might help me solve some problem

set problem so that I don't have to

really get into the weeds of doing all

of those lower level steps as I've had

sometimes the answer is going to be yes

sometimes it's going to be no but again

the point of our having just done this

together is to feel that even the

functions you start taking for granted

they all reduce to some of these basic

building blocks at the end of the day

this is all that's inside of your


computer is zeros and ones we're just

learning now how to harness those and

how to manipulate them

ourselves

all right any questions here

on this

any questions at all

[Music]

good question is it so common that you

would have to specify it or not you do

need to include its header files because

that's where all of those prototypes are

you don't need to worry about linking it

in with Dash L anything and in fact

moving forward you do not ever need to

worry about linking in libraries when

compiling your code we the staff have

configured make to do all of that for

you automatically we want you to

understand that it is doing it but we'll

take care of all of the dash L's for you

but the onus is on you for the

prototypes and the header files other

questions

on these representations or techniques

yeah

[Music]

[Music]

oh good question if you you were to have


a string with actual spaces in it that

is multiple words what would the

computer actually do well for this let

me go to asciiart.com which is just a

random website that's my go-to for the

first 127 characters of ASCII um this is

in fact what we had a screenshot of the

other day and if you look here it's a

little non-obvious but SP is space if a

computer were to store a space it would

actually store the decimal number 32 or

technically the pattern of zeros and

ones that represent the number 32. all

of the US English keys that you might

type on a keyboard can be represented

with a number and using Unicode can you

express even things like emojis and

other languages yeah

[Music]

good question only strings are

accompanied by nulls at the end because

every other data type we've talked about

thus far is of well-defined finite

length one byte for Char four bytes for

ants and so forth if we think back

though the last week we did end the week

with a couple of problems integer

overflow because like four bytes heck

even eight bytes is sometimes not enough

we also talked about floating point and


precision thankfully in the world of

scientific Computing and financial

Computing there are libraries you can

use that draw inspiration from this idea

of a string and they might use nine

bytes for an integer value or maybe 20

bytes You Can Count really high but they

will then start to manage that memory

for you and what they're really probably

doing is just grabbing a whole bunch of

bytes and somehow remembering how long

the sequence of bytes is that's how

these higher level libraries work too

all right this has been a lot let's take

one more break here we'll do like a

seven minute break here and when we come

back we'll flesh out a few more details

all right

so we just saw Sterling as an example of

a function that comes in the string

library and let's start to take more of

these Library functions out for a spin

so we're not relying only on the

built-ins that we saw last week let me

go ahead and switch over to vs code and

let me create a file called say string.h

just to kind of apply this lesson

learned as follows let me go ahead and

include cs50.h let me include standard


io.h and this new thing string.h as well

at the top I'm going to do the usual int

main void here and then in this program

suppose for the sake of discussion that

I didn't know about percent s for printf

or heck maybe early on there was no

percent s format code and so there was

no easy way to print strings well at

least if we know that strings are just

arrays of characters we could use

percent c as a workaround so to speak a

solution to that sort of contrived

problem so let me ask myself for a

string s by using get string here and

I'll ask the user for some inputs and

and then let me go ahead and print out

say output and all I want to do is print

back out what the user typed now the

simplest way to do this of course is

going to be like last week printf

percent s and plug in the S and we're

done but again for the sake of

discussion I forgot about or someone

didn't Implement percent s so how else

could we do this well in pseudo code or

in English like what's the gist of how

we could solve this problem printing out

the string s on the screen without using

percent s

how might we go about


solving this just in English high level

what would your pseudo code look like

yeah

[Music]

okay so just print each letter and maybe

more precisely like some kind of loop

like let's iterate over all of the

characters in s and print one at a time

so how can I do that well for ins I get

zero is kind of the go-to starting point

for most Loops I is less than okay how

long do I want to iterate well it's

going to depend on what I type in but

that's why we have Sterling now so

iterate up to the length of s and then

increment I with plus plus on each

iteration and then let's just print out

percent C with no new line because I

want everything on the same line uh

whatever the character is at s bracket I

and then at the very end I'll give

myself that new line just to move the

cursor down to the next line so the

dollar sign is not in a weird place all

right so let's see if I didn't screw up

any of the code make a string enter so

far so good string and let me type in

something like hi enter and I see output

of high two let me do it once more with


Buy enter and that works too note this I

very deliberately and quickly gave

myself two spaces here and one space

here just because I literally wanted

these things to line up properly and

input is shorter than output but that

was just a deliberate formatting detail

so this code is correct which is a claim

I've made before but it's not well

designed

now

it's It is Well designed and then I'm

using someone else's Library function

like I've not reinvented a wheel there's

no line 15 or below I didn't Implement

string length myself

so I'm at least kind of practicing what

I've preached

but there's still an imperfection a

sub-optimality this one's really subtle

though

and you have to think about how loops

work

what am I doing that's not

super efficient yeah and back

yeah this is a little subtle but if you

think back to the basic definition of a

for Loop and recall when I highlighted

things last week what happens well the

first thing is that oh I get set to zero


then we check the condition how do we

check the condition we call Sterling on

S we get back an answer like three if

it's h i exclamation point and zero is

less than three so that's fine and then

we print out the character then we

increment I from zero to one we recheck

the condition how do I recheck the

condition I call Sterling of s get back

the same answer three compare three

against one we're still good so we print

out another character I gets incremented

again I is now two we check the

condition what's the condition well

what's the string like the best it's

still three two is still less than

three so I keep asking the same question

sort of stupidly because the string is

presumably never changing in length and

indeed every time I check that condition

that function is going to get called and

every time the answer for high is going

to be three three

so it's a marginal sort of

sub-optimality but I I could do better

right like don't ask multiple times

questions that you can remember the

answer to so how could I remember the

answer to this question and ask it just


once

how could I remember the answer to this

question

let me say yeah back there

so stored in a variable right that's

been our answer most any time we want to

keep something around so how could I do

this well I could do something like this

into maybe length equals Sterling of s

then I can just change this function

call so to speak and let me re-fix my

spelling here let me fix this to be now

comparing against length and this is now

okay because now Sterling is only called

once on line nine and I'm reusing the

value of that variable AKA length again

and again and again so that's more

efficient turns out that for Loops

actually let you declare multiple

variables at once so we can actually do

this a little more elegantly all in one

line and this is just now some syntactic

improvement I could actually do

something like this n equals Sterling of

and then I could just say n here or I

could call it length but heck while I'm

being succinct I'm just going to use n

for number so now it's just a marginal

change but I've now declared two


variables inside of my loop I and N is

set to 0 and is thanks to the string

length of s but now Hereafter all of my

condition checks are just I less than n

i less than n and n is never now

changing all right so a marginal

Improvement there now that I've used

this new function let's use some other

functions that might be of interest let

me go ahead and write a quick program

here that may be like upper capitalizes

the beginning of uh that changes to

uppercase some string that the user

types in so let me go ahead and code a

file called uppercase.c up here I'll use

my new friends cs50.h and standard i o

and

string.h so standard i o and string.ace

so just as before int main void and then

inside of main what I'm going to do this

time is let's ask the user for a string

s using get string asking them for the

before value

and then let me go ahead and just print

out something like after uh

so that it uh just so I can see what the

uppercase version thereof is and then

after this let me go ahead and do the

following for INT I equals zero oh let's


practice that same lesson so n equals

the string length of s i is less than n

i plus plus so really nothing new really

fundamentally yet

how do I now convert characters from

lowercase if they are to uppercase in

other words if I type in high h i and

lowercase I want my program now to

uppercase everything to capital h

capital i

well how can I go about

doing this well you might recall that

there is this you might recall that

there is this ASCII chart so let's just

consult this real quick on the

asciiart.com we've looked at this last

week notice that a capital A is 65

capital B is 66 Capital C is 67 and heck

here's lowercase a lowercase b lowercase

C and that's 97 98.99 and if I actually

do some math there's like a distance of

32 right so if I want to go from

uppercase to lowercase I can do 65 plus

32 will give me 97 and that actually

works out across the board for

everything else 66 plus 32 gets me to 98

or lowercase b or conversely if you have

a lowercase a and its value is 97

subtract 32 and boom you have capital A

so all right there's some arithmetic


here involved but now that we know that

strings are just arrays and we know that

characters which are in those arrays are

just binary representations of numbers I

think we can manipulate a few of these

things as follows let me go back to my

program here and first ask the question

if the current character in the array

during this Loop is lowercase let's

force it to uppercase

so how am I going to do that if the

character at s bracket I the current

location in the array

is greater than or equal to lowercase A

and S bracket I is less than or equal to

lowercase z kind of a weird con Boolean

expression but completely legitimate

because in this array s is a whole bunch

of characters that the humans typed in

because that's what a string is greater

than or equal to a might be a little

nonsensical because when have you ever

compared numbers to letters but we know

from week zero lowercase a is 97

lowercase z is what is it one I don't

even remember

what's that

132 we know and so that would allow us

to answer the question is the current


letter lowercase all right so let me go

ahead here and answer that question if

it is what do I want to print out I

don't want to print out the letter

itself I want to print out the letter

minus 32 right because if it happens to

be a lowercase a 97 97 minus 32 gives me

65 which is uppercase a and I know that

just from having stared at that chart uh

in the past else if the character is not

between little a and big a I'm just

going to print out the character Itself

by printing s bracket I and at the very

end of this I'm going to go ahead and

print out a new line just to move the

cursor to the next line so again it's a

little wordy but this Loop here

which I borrowed from our code

previously just iterates over the string

AKA array character by character through

its length this line 11 here is just

asking the question if that current

character the ith character of s is

greater than or equal to little a and

less than or equal to little Z that is

between 97 and 132 then we're going to

go ahead and force it to

uh uppercase instead all right and let

me go ahead and zoom out here for just a

second
and

sorry I missed both 122 which is what

you might have said there's only 26

letters so 122 is little Z let me go

ahead now and compile and run this

program so make uppercase

dot slash uppercase and let me type in

high and lowercase enter and there's the

capitalized version thereof let me do it

again with like my own name and

lowercase and now it's capitalized as

well well what could we do to improve

this well you know what let's stop

Reinventing Wheels let's go to the

manual pages so let me go here and

search for something like uh I don't

know lowercase and there I go I did some

autocomplete here our little search box

is saying that okay there's an is lower

function check whether a character is

lowercase well how do I use this well

let me check is lower now I see the

actual man page for this function now we

see include C type dot h so that's the

Proto that's the header file I need to

include this is the prototype for is

lower it apparently takes a Char as

input and returns an INT which is a

little weird I feel like is lower should


return true or false so let's scroll

down to the description and return value

it returns oh this is interesting and

this is a convention in C this function

returns a non-zero int if C is a

lowercase letter and zero if C is not a

lowercase letter so it returns non-zero

so like one negative one something

that's not zero if C is a lowercase

letter and zero if it is not a lowercase

letter so how can we use this building

block let me go back to my code here

let me add this file include C type dot

h and down here let me get rid of this

cryptic expression which was kind of you

know painful to come up with and just

ask this is lower s bracket I

uh

that should actually work

but why well is lower again returns a

non-zero value if the letter is

lowercase well what does that mean that

means it could return one it could

return negative one it could return 50

or negative 50. it's actually not

precisely defined why just because like

this was a common convention to use zero

to represent false and use any other

value to represent true and so it turns

out that inside of Boolean Expressions


if you put a value like a function call

like this that returns zero that's going

to be equivalent to false it's like the

answer being no it is not lower but you

can also just in parentheses put the

name of the function and its argument

and not compare it against anything

because we could do something like this

well if it's not equal to zero then it

must be lowercase because that's the

definition if it returns a non-zero

value it's lowercase but a more succinct

way to do that is just a bit more like

English if it's is lower then print out

the character minus 32 so this would be

the common way of using one of these is

functions to check if the answer is true

or false

okay well we might be done okay

[Music]

no so it's not necessarily one it would

be incorrect to check for one or

negative one or anything else you want

to check for the opposite of zero so not

equal zero or more succinctly like I did

by just putting it into parentheses

let me see what happens here

so this is great but some of you might

have spotted a better solution to this


problem a moment ago when we were on the

manual pages searching for things

related to lowercase what might be

another building block we can employ

here

based on what's on the screen here yeah

so two upper there's a function that

would literally do the upper casing for

me so I don't have to get into the weeds

of like negative 32 plus 32 I don't have

to consult that chart someone has solved

this problem for me in the past

and let's see if I can actually get back

to it

there we go let me go ahead now and use

this so instead of doing s bracket I

minus 32 let's use a function that

someone else wrote and just say two

upper s bracket I and now it's going to

do the pro the solution for me so if I

rerun make uppercase and then do

slowly dot uppercase type in high now

it's working as expected and honestly if

I read the documentation for two upper

by actually going back to its man page

or manual page what you'll see is that

it says if it's lowercase it will return

the uppercase version thereof if it's

not lowercase it's already uppercase

it's punctuation it will just return the


original character which means thanks to

this function I can actually tighten

this up significantly get rid of all of

my conditional there and just print out

the two upper return value and leave it

to whoever wrote that function to figure

out if something's uppercase or

lowercase

all right questions on

these kinds of Tricks again it all

reduces to like week zero Basics but

we're just building these abstractions

on top yeah

[Music]

yes unfortunately no there is no easy

way in C to say give me everything that

was for historically uh performance

reasons they want you to be explicit as

to what you want to include in other

languages like python Java one of which

we'll see later this term you can say

give me everything but that actually

tends to be best practice because it can

actually slow down execution or

compilation of your code yeah

[Music]

ah does two upper accommodate special

characters like punctuation yes if I

read the documentation more pedantically


we would see exactly that it will

properly hand me back an exclamation

point even if I passed it in so if I do

make uppercase here and let me do dot

slash upper sorry a DOT slash uppercase

high with an exclamation point it's

going to handle that too and just pass

it through unchanged yeah

[Music]

all of that

just bring it rather

Administration really good question too

no we do not have access to a function

that at least comes with C or comes with

cs50s library that will just force the

whole thing to uppercase in C that's

actually easier said than done in Python

it's trivial so stay tuned for another

language what that will let us do

exactly that

all right so what does this leave us

with there's just a let's come full

circle now to where we began today where

we were talking about those command line

arguments recall that we talked about RM

taking a command line argument the word

the file you want to delete we talked

about clang taking command line

arguments that again modify the behavior

of the program how is it that maybe you


and I can start to write programs that

actually take command line arguments

well here is where I can finally explain

why we've been typing int main void for

the past week and just asking that you

take on faith that it's just the way you

do things well by default in C at least

the most late the most uh recent

versions thereof there's only two

official ways to write main functions

you might see other formats online but

they're generally not consistent with

the current specification this again was

sort of the boilerplate for the simplest

function we might write last week and

recall that we've been doing this the

whole time void what that void means for

all the programs I have written thus far

and you have written thus far is that

none of our programs that we've written

take command line arguments that's what

the void there means it turns out that

main is the way you can specify that

your program does in fact take command

line arguments that is words after the

command in your terminal window if you

want to actually not use get int or get

string you want the human to be able to

say something like hello David and hit


enter and just run hello print hello

David on the screen you can use command

line arguments words after the program

name on your command line so we're going

to change this in a moment to be

something more verbose but something

that's now a bit more familiar

syntactically if you change that void in

main to be this incantation instead int

rxc comma string ARG V Open Bracket

close bracket you are now giving

yourself access to writing programs that

take command line arguments ARG C which

stands for argument count is going to be

an integer that stores how many words

the human typed at the prompt we the C

automatically gives that to you

string ARG v stands for argument Vector

that's going to be an array of all of

the words that the human typed at the

prompt so with today's building block of

an array we have the ability now to let

the humans type as many words or as few

words as they want at the prompt C is

going to automatically put them in an

array called ARG V and it's going to

tell us how many words there are in an

ant called ARG C

the int as the return type here we'll

come back to in just a moment let's


actually use now this definition to make

maybe just a couple of simple programs

but in problem set 2 will we actually

use this to control the behavior of your

own code let me go ahead and code up a

file called

argv.0 just to keep its uh aptly named

let me go ahead and include cs50.h let

me go ahead and include whoops that is

not the right name of a program let's

start that over

let's go ahead and code up arcv.c

and here we have uh include cs50.h

includes standardio.h int main not void

let's actually say int rxc

string ARG V Open Bracket close bracket

no numbers in between because you don't

know in advance how many words the

human's going to type at their prompt

now let's go ahead and do this let's

write a very simple program that just

says hello David hello Carter whoever

the name is that gets typed but not

using get string let's instead have the

human just type their name at the prompt

just like RM just like clang just like

make so it's just one and done when you

hit enter no additional prompts let me

go ahead then and do this printf quote


unquote hello comma and instead of world

today I want to print out whatever the

human typed in so let's go ahead and do

this ARG V bracket zero for now

but I don't think this is quite what I

want

because of course that's going to

literally print out argv bracket zero

bracket then I need a placeholder so let

me put a percent s here and then put

that here

so if ARG V is an array but it's an

array of strings then ARG V bracket zero

is itself a single string and so it can

be plugged into that percent s

placeholder let me go ahead and save my

program

and let me go ahead and compile ARG V so

far so good let me go ahead now and type

in my name after the name of the program

so no get string I'm literally typing an

extra word my own name at the prompt

enter

okay it's it's apparently a little buggy

in a couple of ways I forgot my

backslash n but that's not a huge deal

but apparently inside of argv is

literally everything the humans typed in

including the name of the program so

logically how do I print out hello David


or hello so and so and not the actual

name of the program what needs to change

here yeah

yeah so presumably index to one if

that's the second thing I or who

whichever human has typed at the prompt

so let's do make arcv again dot slash

ARG V enter

huh hello null so this is another form

of null but this is user error now on my

part

I didn't do exactly what I said I would

yeah

yeah I forgot the parameter so that's

actually I should probably deal with

that somehow so that people aren't sort

of breaking my program and printing out

random things like no but if I do say

argue David now you see hello David I

can get a little curious like what's at

location two well we can see make ARG V

bracket dot slash RV David enter all

right so just nothing is there but it

turns out in a couple weeks we'll start

really poking around memory and see if

we can't crash programs deliberately

because nothing is technically stopping

me from saying oh what's that location 2

million for instance we could really


start to get curious but for now we'll

do the right thing but let's now make

sure the human has typed in the right

number of words so let's say this if ARG

C equals 2 that is the name of the

program and one more word after that go

ahead and trust that an rv1 as you

proposed is the person's name else let's

go ahead and just default here to

something simple and basic like well if

we don't get a name from the user just

say hello world like always

so now we're sort of programming

defensively this time the human even if

they screw up they don't give us a name

or they give us too many names we're

just going to say hello world because I

now have some error handling here

because again ARG C is argument count

the number of words total typed at the

command line so make arc V

dot slash Arc vein let me make the same

mistake as before okay I don't get this

weird null Behavior I get something well

defined I could now do David I could do

David Malin but that's not currently

supported I would need to alter my logic

to support more than just two words

after the prompt so what's the point of

this at the moment it's just a simple


exercise to actually

then they run the program

consider it's just more convenient in

this new command line interface world if

you had to use get string every time you

compile your code it'd be kind of

annoying right you type make then you

might get a prompt what would you like

to make then you type in hello or cash

or something else then you hit enter it

just really slows the process but in

this command line interface world if you

support command line arguments then you

can use these little tricks like

scrolling up and down in your history

with your arrow keys you can just type

commands more quickly because you can do

it all once all at once and you don't

have to keep prompting the user more

pedantically for more and more info

so any questions then on command line

arguments which finally reveals why we

had void initially but what more we can

now put in main that's how you take

command line arguments yeah

[Music]

yes if you were to type at the command

line something like not a a word but

something like the number 42 that would


actually be treated as a string why

because again context matters so if your

program is currently manipulating memory

as though it's characters or strings

whatever those patterns of zeros and

ones are they will be interpreted as

ASCII text or Unicode text if we

therefore go to the Chart here that

might make you wonder well then how do

you distinguish numbers from letters in

the context of something like chars and

strings well notice 65 is a 97 is a but

also 49 is one and 50 is 2. so the

designers of ASCII and then later

Unicode realized well wait a minute if

we want to support programs that let you

type things that look like numbers even

though they're not technically ins or

floats we need a way in ASCII and

unicode to represent even numbers so

here are your numbers and it's a little

silly that we have numbers representing

other numbers but again if you're in the

world of letters and characters you

gotta come up with a mapping for

everything and notice here here's the

dot even if you were to represent 1.23

as a string or as characters even the

dot now is going to be represented as an

ASCII character so again context here


matters all right one final example to

tease apart what this int is and what

it's been doing here for so long so I'm

going to go ahead and add one bit of

logic here to a new file that I'm going

to call exit dot C so in exit.c we're

going to introduce that something that

are generally known as exit status it

turns out this is not a feature we've

used yet but it's just useful to know

about especially when automating tests

of your own code when it comes to

figuring out if a program succeeded or

failed it turns out that main has one

more feature we haven't leveraged an

ability to signal to the user whether

something was successful or not and

that's by way of Maine's return value so

I'm going to go ahead and now modify

this program as follows like this

suppose I want to write a similar

program that requires that the user type

a Word at the prompt so that rxc has to

be 2 for whatever design purpose if rxc

does not equal 2 I want to quit out of

my program prematurely because I want to

just insist that the user operate the

program correctly so I might give them

an error message like missing command


line argument backslash n but now I want

to quit out of the program now how can I

do that the right way quote unquote to

do that is to return a value from Main

now it's a little weird because no one

called main yet right main just gets

called automatically but the convention

is anytime something goes wrong in a

program you should return a non-zero

value from Maine one is fine as a go-to

we don't need to get into the weeds of

having many different exit statuses so

to speak but if you return one that is a

clue to the system the Mac the PC the

cloud device that something went wrong

why because one is not zero

if everything works fine like let's go

ahead and print out hello comma percent

s like before uh quote unquote ardv

bracket one

so this is just a version of the program

without an else so this is the same as

doing essentially an else here like I

did earlier I want to signal to the

computer that all is well and so I

return zero but strictly speaking if I'm

already returning here I don't

technically need if I really want to be

nitpicky I don't technically need the

else because the only way I'm going to


get to line 11 is if I didn't already

return so what's going on here the only

new thing here logically is that for the

first time ever I'm returning a value

from Main that's something I could

always have done because Maine has

always been defined by us as taking an

INT as a return value

by default main automatically sort of

secretly returns zero for you if you've

never once used the return keyword which

you probably haven't in main it just

automatically returns zero and the

system assumes that all went well but

now that we're starting to get a little

more sophisticated with our code and you

know the programmer something went wrong

you can abort programs early you can

exit out of them by returning some other

value besides xero from Main and this is

sort of fortuitous that it's an INT

right zero means everything worked

unfortunately in programming there are

seemingly an infinite number of things

that can go wrong and ants gives you

four billion possible codes that you can

use AKA exit statuses to signify errors

so if you've ever on your Mac or PC

gotten some weird pop-up that an error


happened sometimes there's a cryptic

number in it maybe it's positive maybe

it's negative it might say error code

123 or negative 49 or something like

that what you're generally seeing are

these exit statuses these return values

for Main in a program that someone at

Microsoft or apple or somewhere else

wrote something went wrong they are sort

of unnecessarily showing you the user

what the error code is if only so that

when you call customer support or submit

a ticket you can tell them what exit

status you encountered what error code

you encounter

all right any questions then on exit

statuses which is the last of our new

building blocks

for now

any questions at all yeah

foreign

[Music]

you do things again and again at the

command line like you could with get

string and get in which by default

recall are automatically designed to

keep prompting the user in their own

Loop until they give you a stint or a

float or the like with command line

arguments no you're going to get an


error message but then you're going to

be returned to your prompt and it's up

to you to type it correctly the next

time

good question yeah

[Music]

if you do not return a value explicitly

Maine will automatically return zero for

you like that is the way C simply works

so it's not strictly necessary but now

that we're starting to return values

explicitly if something goes wrong it

would be good practice to also start

returning a value for main when

something goes right and there are no

errors in fact

so let's now get out of the weeds and

contextualize this for some actual

problems that we'll be solving in the

coming Days by way of problem set two

and Beyond so here for instance

so here for instance is a problem that

you might think back to when you were a

kid the the readability of some text or

some book the grade level in which some

book is written if you're a young

student you might read at a first grade

level or third grade level in the U.S or

if you're in college presumably you're


reading at a university level of text

but what does it mean for text like in a

book or in an essay or something like

that to correspond to some kind of grade

level well here's a quote a title of a a

childhood book One Fish Two Fish Redfish

Bluefish

what might the grade level be for a book

that has words like this maybe when you

were a kid or if you have a sibling

still reading these things what might

the grade level of this thing be

[Music]

any guesses yeah

sorry again

before grade one is in fact correct so

that's for really young kids and and why

is that well let's consider these are

actually pretty simple phrases Right One

Fish Two Fish Red I mean there's not

even verbs in these sentences they're

just uh nouns and adjectives and very

short sentences and so that might be a

heuristic we could use when analyzing

text well if the words are kind of short

the sentences are kind of short

everything's very simple that's probably

a very young or early grade level and so

by one formulation it might indeed be

even before grade one for someone quite


young how about this Mr and Mrs dursley

of number four privet Drive we're proud

to say that they were perfectly normal

thank you very much they were the last

people you would expect to be involved

in anything strange or mysterious

because they just didn't hold with such

nonsense and onward all right what grade

level is this book at

okay I heard third

seventh fifth okay all over the place

but grade seven according to one

particular measure and whether or not we

can we can debate exactly what age you

were when you read this and maybe you're

feeling ahead of your time or behind now

but here

we have a snippet of text what makes

this text assume an older audience a

more mature audience a higher grade

level would you think

yeah it's longer different types of

words there's commas now and phrases and

so forth so there's just some kind of

sophistication to this so it turns out

for the upcoming problem set among the

things you'll do is take as input texts

like this and analyze them considering

well how many words are in the text how


many sentences are in the text how many

letters are in the text and use those

according to a well-defined formula to

prescribe what exactly the grade level

of some actual text there's the third

might actually be well what else are we

going to do in the coming days well I've

alluded to this notion of cryptography

in the past this notion of scrambling

information in such a way that you can

hide the contents of a message from

someone who might otherwise intercept it

right the earliest form of this might

also be when you're younger and you're

in class and you're passing a note from

one person to another from yourself to

someone else you don't want to just

necessarily write a note in English or

some other written language you might

want to scramble it somehow or encrypt

it maybe you change the A's to a b and

the B's to a c so that if the teacher

snaps it up and intercepts it they can't

actually understand what it is you've

written because it's encrypted now so

long as your friend the recipient of

this note knows how you manipulated it

how you added or subtracted sort of

letters to each other they can decrypt

it which is to say reverse that process


so formally in the world of cryptography

and computer science this is just

another problem to solve your input

though when you have a message you want

to send securely is what's generally

known as plain text there's some

algorithm that's going to then encipher

or encrypt that information into what's

called ciphertext which is The Scrambled

version that theoretically can get

safely intercepted and your message has

not been spoiled unless that intercept

actually knows what algorithm you used

inside of this process so that that

would be generally known as a cipher the

ciphers typically take though not one

input but two if for instance your

Cipher is as simple as a becomes b b

becomes c c becomes D dot dot Z becomes

a you're essentially adding one one to

every letter and encrypting it now that

would be what we call the key you and

the recipient both have to agree

presumably before class in advance what

number you're going to use that day to

rotate or change all of these letters by

because when you add one they upon

receiving your ciphertext have to

subtract one to get back the answer so


for instance if the input plain text is

high as before and the key is one the

cipher text using this simple rotational

algorithm otherwise known as the Caesar

Cipher might be IJ exclamation point so

it's similar but it's at least scrambled

at first glance and unless the teacher

really cares to figure out what

algorithm are they using today or what

key are they using today it's probably

sufficiently secure for your purposes

how do you reverse the process while

your friend gets this and reverses it by

negative one so I becomes h j becomes I

and things like punctuation remain

untouched at least in this scheme so

let's consider one final example here if

the input

to the algorithm is uijt xbt dt50 and

the key this time is negative one such

that now B should become a and C should

become b and a should become Z so we're

going in the other direction how might

we analyze this well if we spread all

the letters out and we start from left

to right and we start subtracting one

letter U becomes t i becomes h j becomes

i t becomes s x becomes w a was d t this

was cs50 we'll see you next time

[Music]
thank you

[Music]

thank you

foreign

[Music]

this is cs50 and this is already week

three and even as we've gotten much more

into the minutia of programming and some

of the sea stuff that we've been doing

is all the more cryptic looking recall

that at the end of the day like

everything we've been doing ultimately

fits into to this model so keep that in

mind particularly as things seem like

they're getting more complicated more

sophisticated it's just a process of

learning a new language that ultimately

lets us express this process and of

course last week we really went into the

weeds of like how inputs and outputs are

represented and this thing here a

photograph thereof is called what

this is what

Ram I heard random access memory or just

generally known as memory and recall

that we looked at one of these little

black chips that that contains all of

the bytes all of the bits ultimately

it's just kind of a grid sort of an


artist's grid that allows us to think

about every one of these memory

locations is just having a number or an

address so to speak like this might be

byte number zero and then one and then

two and then maybe way down here again

something like two billion if you have

two gigabytes of memory and so as we did

that we started to explore how we could

use this canvas to create kind of our

own information our own inputs and

outputs not just the basics like ants

and floats and so forth but we also

talked about strings and what is a

string as you now know it how would you

describe in layperson's terms a string

yeah over there

an array of characters and an array

meanwhile let's go there how might

someone else Define an array in more

familiar now terms what would be an

array

[Music]

an indexed set of things not bad and I

think a key characteristic to keep in

mind with an array is that it does

actually pertain to memory and it's

contiguous memory byte after byte after

bite is what constitutes an array and

we'll see in a couple of weeks time that


there's actually more interesting ways

to use this same primitive canvas to

stitch together things that are sort of

two-directional even that have some kind

of shape to them but for now all we've

talked about is arrays and just using

these things from left to right top to

bottom continuously to represent

information so today we'll consider

still an array but we won't focus so

much on representation of strings or

other data types we'll actually now

focus on the other part of that process

of inputs becoming outputs namely the

thing in the middle uh algorithms but we

have to keep in mind even though every

time we've looked at an array thus far

certainly on the board like this you as

a human certainly have the luxury of

just kind of eyeballing the whole thing

with the bird's eye view and seeing

where all of those numbers are if I ask

you where a particular number is like

zero odds are your eyes would go right

to where it is and boom problem solved

in sort of one step but the catch is

with a computer that has this memory

even though you the human can corner see

everything at once a computer cannot


it's better to think of your computer's

memory your phone's memory or more

specifically an array of memory like

this as really being a set of closed

doors not unlike Lockers in a school and

only by opening each of those doors can

the computer actually see what's in

there which is to say that the computer

unlike you doesn't have this bird's eye

view of all of the data in all these

locations it has to much more

methodically look here maybe look here

maybe look here and so forth in order to

find something now fortunately we

already have some building blocks Loops

conditions Boolean expressions and the

like where you could imagine writing

some code that very methodically goes

from left to right or right to left or

something more sophisticated that

actually finds something you're looking

for and just remember that the

convention we've had since last week now

is that these arrays are zero indexed so

to speak to be zero index just means

that the data type starts counting from

zero so this is location zero one two

three four five six and notice even

though there are seven total doors here

the rightmost one of course is called


six just because we've started counting

it zero so in the general case if you

had n doors or n bytes of memory zero

would always be at the left and N minus

one would always be at the right that's

sort of a generalization of just

thinking about this kind of convention

all right so let's revisit the problem

that we started the whole term off with

in week one week zero which was this

notion of searching and what does it

mean to search for something well to

find information and this of course is

omnipresent anytime you take out your

phone you're searching for a friend's

contact anytime you pull up a browser

you're Googling for this or that so

search is kind of one of the most

omnipresent topics and features of any

device these days so let's consider how

the Googles the apples the microsofts of

the world are implementing something as

seemingly familiar as this so here might

be the problem statement we want some

input to become some output what's that

input going to be maybe it's a bunch of

closed doors like this out of which we

want to get back and answer true or

false is something we're looking for


there or not you could imagine taking

this one step further and trying to find

where is the thing you're looking for

but for now let's just take one bite out

of the problem can we tell ourselves

true or false is some number behind one

of these doors or Lockers in memory but

before we go there and start talking

about ways to do that that is algorithms

let's consider how we might lay the

foundation of like comparing whether one

algorithm is better than another we

talked about correctness and it sort of

goes without saying that any code you

write any algorithm you implement had

better be correct otherwise what's the

point if it doesn't give you the right

answers but we also talked about design

and in your own words like what do we

mean when we say a program is better

designed at this stage than another

how do you think about this notion of

design now yeah in the middle

okay so easier to understand I like that

other thoughts yeah

efficiency and what do you mean by

efficiency precisely

[Music]

nice it doesn't use up too much memory

and it isn't redundant so you can think


about design along a few of these axes

sort of the quality of the code but also

the quality of the performance and as

our programs get bigger and more

sophisticated and uh more and just

longer those kinds of things are really

going to matter and in the real world if

you start writing code not just by

yourself but with someone else getting

the design right is just going to make

it easier to collaborate and ultimately

produce right code with just higher

probability so let's consider how we

might focus on exactly the second

characteristic the efficiency of an

algorithm and the way we might talk

about the efficiency of algorithms just

how fast or how slow they are is in

terms of their running time that is to

say when they're running how much time

do they take and we might measure this

in seconds or milliseconds or minutes or

just some number of steps in the general

case because presumably fewer steps to

your point is better than more steps so

how might we think about running times

well there's one General notation we

should Define today so computer

scientists tend to describe the running


time of an algorithm or a piece of code

for that matter in terms of what's

called Big O notation this is literally

a capitalized O A Big O and this

generally means that the running time of

some algorithm is on the order of such

and such where such and such we'll see

is just going to be a very simple

mathematical formula it's kind of a way

of waving your hands mathematically to

convey the idea of just how fast or how

slow some algorithm or code is without

getting into the weeds of like it took

this many milliseconds or this many

specific number of steps so you might

recall then from week zero I even

introduced this picture but without much

context at the time we just use this to

compare those phone book algorithms

recall that this red straight line was

the first algorithm one page at a time

the yellow line that's still straight

different how if you recall

that line represented What alternative

algorithm

[Music]

looking out and back what is that second

algorithm yeah over there

two pages at a time which was almost

correct so long as we potentially double


back a page if maybe we go a little too

far in the phone book so it had a

potential bug but arguably solvable this

last algorithm though was the so-called

divide and conquer strategy where I sort

of unnecessarily tore the phone book in

half and then in half and then in half

which dramatic as that was unnecessarily

it actually took significantly bigger

bites out of the problem like 500 pages

the first time another 250 another 125

versus just one or two bytes at a time

and so we described its running time as

this picture there though I didn't use

that expression at the time running

times but indeed time to solve might be

measured just abstractly in some unit of

measure seconds milliseconds minutes

Pages via this y-axis here so let's now

slap some numbers on this if we had n

pages in that phone book and just

representing a generic number the first

algorithm here we might describe as

taking end steps second algorithm we

might describe as taking n divided by

two steps maybe give or take one if we

have to double back but generally n

divided by two and then this thing if

you remember your logarithms was sort of


a fundamentally different formula log

base 2 of n or just log of n for short

so this is sort of a fundamentally

different formula but what's noteworthy

is that these first two algorithms even

though yes the second algorithm was

hands down faster I mean literally twice

as fast when you start to zoom out and

if I increase my y-axis and x-axis these

first two whoops

these first two start to look awfully

similar to one another and if we keep

zooming out zooming out zooming out as n

gets really large that is the x-axis

gets really long these first two

algorithms start to become essentially

the same and so this is where computer

scientists use Big O notation instead of

saying specifically this algorithm takes

n steps and this one n divided by two a

computer scientist would say ah each of

those algorithms takes on the order of n

steps or on the order of n over two but

you know what on the order of n over 2

is pretty much the same when n gets

really large as being equivalent to Big

O of n itself so yes in practice it's

obviously fewer steps to move twice as

fast but in the big picture when n

becomes a million a billion the numbers


are already so darn big at that point

that these are as the shapes of these

curves imply pretty much functionally

equivalent but this one still looks

better in bed better is and gets large

because it's rising so much less quickly

and so here a computer scientist would

say that that third algorithm was on the

order of that is Big O of log n and you

don't have to bother with the base

because it's a smaller mathematical

detail that is also just in some sense a

constant multiplicative Factor so in

short what are the takeaways here this

is just a new vocabulary that we'll

start to use when we just want to

describe the running time of an

algorithm to make this more real if any

of you have implemented a for loop at

this point in any of your code and that

for Loop iterated n times where maybe n

was the height of your pyramid or maybe

n was something else that you wanted to

do n times you wrote code or you

implemented an algorithm that operated

in Big O of n time if you will so this

is just a way now to retroactively start

describing with somewhat mathematical

notation what we've been doing in


practice for a while now so here's a

list of commonly

seen running times in the real world

this is not a thorough list because you

could come up with an infinite number of

mathematical formulas certainly but the

common ones we'll discuss and you will

see in your own code probably reduced to

this list here and if you were to study

more computer science theory this list

would get longer and longer but for now

these are sort of the most familiar ones

that we'll soon save all right two other

pieces of vocabulary if you will before

we start to use this stuff so this a big

Omega Capital omega symbol is used now

to describe a lower bound on the running

time of an algorithm so to be clear Big

O is on the order of that is an upper

bound on how many steps an algorithm

might take on the order of so many steps

if you want to talk though from the

other perspective well how few steps

might my algorithm take maybe in the

so-called best case it'd be nice if we

had a notation to just describe what a

lower bound is because some algorithms

might be super fast in these so-called

best cases so the symbology is almost

the same but we replace the Big O with


the big Omega so to be clear Big O

describes an upper bound and Omega

describes a lower bound and we'll see

examples of this before long and then

lastly last one here big Theta is used

by a computer scientist when you have a

case where both the upper bound on an

algorithm's running time is the same as

the lower bound you can then describe it

in one breath as being in Theta of such

and such instead of saying it's in Big O

and in Omega of something else

all right so out of context sort of just

sort of um seemingly cryptic symbols but

all they refer to is upper bound lower

bounds or when they happen to be one and

the same and we'll now introduce over

time examples of how we might actually

apply these to concrete problems but

first let me pause

to see if there's any questions

any questions here

any questions

I see pointing somewhere uh where are

you pointing to

over here there we go okay sorry very

bright

[Music]

smaller end functions move faster so yes


if you have something like n that takes

only n steps if you have a formula like

N squared just by nature of the math

that would take more steps and therefore

be slower so the larger the mathematical

expression the slower your algorithm is

because the more time or more steps that

it takes

you want your end function so to speak

to be small yes and in fact the Holy

Grail so to speak would be this last one

here either in Big O notation or even

Theta when an algorithm is on the order

of a single step that means it literally

takes constant time one step or maybe 10

steps 100 steps but a fixed constant

number of steps that's the best because

even as the phone book gets bigger even

as the data set you're searching gets

larger and larger if something only

takes a finite number of steps

constantly then it doesn't matter how

big the data set actually gets

questions as well on these notations yep

thank you for the pointing this is

actually very helpful I'm seeing

pointing this way

[Music]

what is the input to each of these

functions it is an expression of how


many steps an algorithm takes so in fact

let me go ahead and make this more

concrete with an actual example here if

we could so on stage here we have seven

lockers which represent if you will an

array of memory and this array of memory

is maybe storing seven integers seven

integers that we might actually want to

search for and if we want to search for

these values how might we go about doing

this well for this why don't we make

things interesting would a volunteer

like to come on up have to be masked and

on the internet if you are comfortable

both of those or someone putting their

friend's hand up and back yes okay come

on down

[Music]

and in just a moment our brave volunteer

is going to help me find a specific

number in the data set that we have here

on the screen so come on down and I'll

get things ready for you in advance here

come on down

[Music]

nice to meet and what is your name

nice to meet you come on over so here we

have for namira uh seven lockers or an

array of memory and behind each of these


doors is a number and the goal quite

simply is given this array of memory as

input to return true or false is the

number I care about actually there so

suppose I care about the number zero

what would be the simplest most correct

algorithm you could apply in order to

find us the number zero

[Music]

okay try open the first one

all right and just maybe just step aside

so the audience can see I think you have

not found xero yet okay so keep the door

open let's move on to your next choice

second door sure

oh go ahead second door well let's keep

it simple let's just move from left to

right sort of searching our way and what

do you see there up six not zero how

about the next door

[Music]

all right it's also not working out so

well yet but that's okay if you want to

go on to the next we're still looking

for zero

all right I see a two all right it's not

so good yet let's keep going next door

two seven no okay next door

no that's uh all right very well done

all right so I kind of set you up for a


fairly slow algorithm but let me just

ask you to describe what is it you did

by following the steps I gave you

you went one by one to each character

and if you want to talk into here

so you went one by one by each character

and would you say that algorithm left to

right is correct

no no

yes in the scenario okay yes and this is

an area and why are you hesitating

because it's not the most efficient way

to do it okay good so we see a contrast

here between correctness and design I

mean I do think it was correct because

even though it was slow you eventually

found zero but it took some number of

steps so in fact this would be an

algorithm it has a name called linear

search and the mirror as you did you

kind of walked along a line going from

left to right now let me ask if you had

gone from right to left would the

algorithm have been fundamentally better

[Music]

yes okay and why because the zeros here

and the first scenario but then um if it

was like the zeros in the middle it

wouldn't have been yeah and so here is


sort of where the the right way to do

things becomes a little less obvious you

would absolutely have given yourself a

better result if you would just happen

to start from the right or if I had

pointed you to start over there but the

catch is if I asked her to find another

number like the number eight well that

would have backfired and this time it

would have taken longer to find that

number because it's way over here

instead and so in the general case you

know going left to right or heck right

to left is probably as correct as you

can get because if you know nothing

about the order of these numbers and

indeed they seem to be fairly random

some of them are smaller some of them

are bigger there doesn't seem to be

Rhyme or Reason linear search is about

as as good as you can do when you don't

know anything up priori about the

numbers so I have a little thank you

gifts here a little cs50 stress ball a

round of applause for our first

volunteer

thank you so much

let's try to formalize what I just

described as linear search because

indeed no matter which end the mirror


had started on I could have kind of

changed up the problem to make sure that

it appears to be running slow but it is

correct if zero were among those doors

he absolutely would have found it and

indeed did so let's now try to translate

what we did into what we might call

again pseudocode as from week zero so

with pseudo code we just need a terse

English like or any language syntax to

describe what we did so here might be

one formulation of what Amira did for

each door from left to right if the

number is behind the door return true

else at the very end of the program you

would return false by default and now

you got lucky and by the seventh door

namira had indeed returned True by

saying well there is the zero

but let's consider if this pseudo code

is now correct an accurate translation

first of all normally when we've seen

ifs we might see an if else and yet down

here return false is aligned with the

four

why did I not indent the return false or

put another way why did I not do if

number is behind door return true else

returned false
why would that version of this code have

been problematic way and back

[Music]

okay I'm not sure it's because of

redundancy let me go ahead and just make

this explicit if I had instead done else

return false I don't think it's so much

redundancy that I'd be worried about let

me bounce somewhere else yeah in front

[Music]

yeah I would be returning false for uh

even though I'd only looked at her nimir

had only looked at one element and it

would have been as though if all these

doors were still closed she opens this

up and sees nope this is not zero return

false that would give me an incorrect

result because obviously at that stage

in the algorithm she wouldn't have even

looked through any of the other doors so

just the original indentation of this if

you will without the else is correct

because only if I get to the bottom of

this algorithm or this pseudo code does

it make sense to conclude at that point

once she's gone through all of the doors

that nope there's in fact the number the

number I'm looking for is in fact not

actually there so how might we consider

now the running time of this algorithm


we have a few different uh types of

vocabulary now and if we consider now

how we might think about this let's

start to translate it from sort of

higher level pseudocode to something a

little lower level right we've been

writing code using n and loops and the

like so let's take this higher level

pseudocode and now just kind of get a

middle ground between English and C let

me propose that we think about this

version of the same algorithm as being a

little more pedantic for I from 0 to n

minus 1 if number behind doors bracket I

return true otherwise at the end of the

program return false now I'm kind of

mixing English and C here but that's

reasonable if the reader is familiar

with C or some similar language and

notice this pattern here this is a way

of just saying in pseudo code

uh give myself a variable called I start

at zero and then just count up to n

minus one and recall n minus 1 is not

one PSI of the end of the array n minus

1 is the end of the array because again

we started counting at zero so this is a

very common way of expressing this kind

of loop from the left all the way to the


right of an array doors I'm kind of

implicitly treating as the name of this

array like it's a variable from last

week that I defined as being an array of

integers in this case so doors bracket I

means that when I is zero it's this

location when I is 1 it's this when I is

7 or more generally n minus one sorry

six or more generally n minus one that's

this location here so same idea but a

translation of it so now let's consider

what the running time of this algorithm

is if we have this menu of possible

answers to this question how efficient

or inefficient is this algorithm let's

take a look in the context if the pseudo

code we don't even have to bother going

all the way to Sea how do we go about

analyzing each of these steps well let's

consider this this outermost Loop here

for I from 0 to n minus 1 that line of

code is going to execute how many times

how many times will that Loop

execute let me give folks this moment to

think on it

how many times is that going to Loop

here uh yeah over there

[Music]

end times right because it's from 0 to n

minus 1 and if it's a little weird to


think in from zero to n minus one this

is essentially the same mathematically

as from 1 to n and that's perhaps a

little more obviously more intuitively

and total steps so I might just make a

note to myself this Loop is going to

operate n times what about these inner

steps well how many steps or seconds

does it take to ask a question if the

number behind if the number you're

looking for is behind doors bracket I

well as namina did that's kind of like

one step right so you open the door and

boom or maybe it's two steps but it's a

constant number of steps so this is some

constant number of steps let's just call

it one for Simplicity how many steps or

seconds does it take to return true I

don't know exactly in the computer's

memory but that feels like a single step

just return true so if this takes one

step this takes one step but only if the

condition is true it looks like you're

doing a constant number of Things N

times

or maybe you're doing one additional

step

so in short the only thing that really

matters here in terms of their


efficiency or inefficiency of the

algorithm is what are you doing again

and again and again because that's

obviously the thing that's going to add

up doing one thing or two things a

constant number of times not a big deal

but looping that's going to add up over

time because the more doors there are

the more the bigger n is going to be and

the more steps that's going to take

which is all to say if you were to

describe roughly how many steps does

this algorithm take in Big O notation

what might your instincts say

how many steps is this algorithm on the

order of given n doors or n integers

yeah

say again

Big O of N and indeed that's going to be

the case here why because you're

essentially at the end of the day doing

N Things as an upper bound on running

time and that's in fact what exactly

what happens with Amina so you had to

look at all in lockers before finally

getting to the right answer but what if

she got lucky and the number we were

looking for was not at the end of the

array but were was at the beginning of

the array how might we think about that


well we have a nomenclature for this too

of course Omega notation remember Omega

notation is a lower bound so given this

menu of possible running times for lower

bounds on an algorithm what might the

Omega notation be for namina's linear

search

uh Omega of 1 and why that

[Music]

right because if just by chance she gets

lucky and the number she's looking for

is right there where she Begins the

algorithm that's it it's one step maybe

it's two steps if you have to like

unlock the door and open it but it's a

constant number of steps and the way we

describe constant number of steps is

just with a single number like one so

the Omega notation for linear search

might be Omega of one because in the

best case you might just get the number

right from the get-go but in the worst

case we need to talk about the upper

bound which might indeed be Big O of n

so again there's this way now of talking

symbolically about best cases and worst

cases or upper lower bounds and upper

bounds Theta notation just as a little

trivia now is it applicable based on the


definition I gave earlier

okay no because you only take out the

Fado notation when those two bounds

upper and lower happen to be the same

for shorthand notation if you will so it

suffices here to talk about just Big O

and Omega notation well what if we are a

little smarter about this let me go

ahead and sort of semi-secretely here

rearrange these numbers but first how

about one other volunteer one other

volunteer to be comfortable with your

mask and you're being on the internet

how about over here

yes you want to come on down all right

come on down and don't look at what I'm

doing because I'm going to

[Applause]

[Music]

take your time because and don't look up

this way because I need a moment to

rearrange all of the numbers

and actually if you could stay right

there before coming up just an awkward

few seconds while I finish hiding the

numbers Behind These doors for you

I will be right with you

actually if um

do you want to warm up the crowd for a

moment and I'll be right back so you


want to introduce yourself yeah hi guys

I'm rave

[Applause]

all right I think I am ready thank you

for stalling there of course and I

didn't catch your name what was your

name I'm Rave sorry Rave like a party

Rave okay nice to meet you come on over

so Rave is kindly volunteered now and

I'm going to give you an additional

Advantage this time

um unbeknownst to you I now took numbers

behind the doors but I sorted them for

you so they're not in the same random

order like they were for namina you now

have the advantage to know that the

numbers are sorted from small to big

okay given that and given perhaps what

we talked about in week zero with the

phone book where might you propose we

begin the story this time with which

locker

to find zero uh let's find number six

this time let's make things interesting

okay

um I'll start in the middle okay so the

middle there's seven total so that would

be right here go ahead open that up

and you find sadly the number five so


what do you know now

um I know to go up okay yeah okay all

right so and just to keep it uniform

just like I did uh I opened to the right

half of the phone book let's keep it

similar yeah all right all right and uh

a little too far even though I know you

wanted to go one over all good all good

and now we're gonna go which direction

over here in the middle all right and

voila the number six all right so very

nicely done

a little stress ball for you as well

thank you again so here we see by nature

of the locker door still being open sort

of uh an artifact of the greater

efficiency it would seem of this

algorithm because now that

um Rave was given the assumption that

these numbers are sorted from small on

the left to large on the right she was

able to apply that same divide and

conquer algorithm from week zero which

we're now going to give a name binary

search and simply by starting in the

middle and realizing okay too small then

by going to the right half and realizing

oh went a little too far then by going

to the left half which raise Brave able

to find in just three steps instead of


seven the number six in this case that

we were actually searching for so you

can see that this would seem to be more

efficient

let's consider for just a moment is it

correct if I had used different numbers

but still sorted them from left to right

would it still have worked

this algorithm you're nodding your head

can I call on you like why would it

still have worked do you think

[Music]

yeah so so long as the numbers are

always in the same order from left to

right or heck they could even be in

reverse order so long it's it's

consistent the decisions that Rave was

making if greater than else if less than

would guide us to the solution no matter

what and it would seem to take fewer

steps so if we consider now the pseudo

code for this algorithm let's take a

look how we might describe binary search

so binary search we might describe with

something like this if the number is

behind the middle door which is where

Rave began then we can just return true

else if the number is less than the

middle door so if six is less than


whatever's behind the middle door then

Rave would have searched the left half

else if the number is greater than the

middle door Rave would have searched the

right half else

if there are no doors and we'll see in a

moment why I put this up top just to

keep things clean if there's no doors

what should Rave have presumably

returned immediately if I gave her no

lockers to work with

just return false but this is an

important case to consider because if in

the process of searching by Locker by

Locker we might have whittled down the

problem from seven doors to three doors

to one door to zero doors and at that

point we might have had no doors left to

search so we have to naturally have a

scenario for just considering if there

were no doors so it's not to say that

maybe I don't give Rave any doors to

begin with but as she divides and

divides and divides if she runs out of

lockers to ask those questions of or a

few weeks ago if I ran out of phone book

pages to Taryn half I too might have had

to return false as in this case so how

can we now describe this a little more

like C just to give ourselves a variable


to start thinking and talking about well

I might talk about doors as being an

array and so if I want to express the

middle door I could just in pseudo code

say doors bracket middle I'm assuming

that someone has done the math to figure

out what the middle door is but that's

easy enough to do and then doors if the

number we're looking for is less than

Doors by bracket middle then search door

zero through doors middle minus one so

again this is a more pedantic way of

taking what's a pretty intuitive idea

search the left half search the right

half but start to now describe it in

terms of

actual indices or indexes like we did

with our array notation the last

scenario of course is if the number is

greater than the doors bracket middle

then Rave would have wanted to search

the middle door plus one so one over

through doors n minus one

through n minus 1. so again just a way

of sort of describing a little more

syntactically what it is that's going on

so how might we translate this now into

Big O notation

well in the worst case


how many steps total

might raise binary search algorithm have

taken given seven doors or given more

generically n doors

how many times could she go left or go

right before finding herself with one or

no doors left

what's the way to think about that

oh yeah in the middle

log in so there's login again and even

if you're not feeling wholly comfortable

with your logarithm still pretty much in

programming and in computer science more

generally anytime we talk about some

algorithm that's dividing and conquering

in half in half in half or any other

multiple it's probably involving

logarithms in some sense and log base n

essentially refers to the number of

times you can divide n by two until you

bottom out at just a single door or

equivalently zero doors left so log in

so we might say that indeed binary

search is in Big O of login because the

door that Rave opened last this one

happened to be three doors away and

actually if you do the math here that

roughly works out to be exactly that

case if we add one it's sort of

um and if seven doors or roughly eight


we were able to search it in just three

total steps what about Omega notation

though like in the best case Rave might

have gotten lucky she opened the door

and there it is so how might we describe

a lower bound on the running time of

linear of a binary search

yeah

say again

Omega of one so here too we see that in

some cases binary search and linear

search like they're pretty equivalent

and so this is why sometimes it's

considered it's sometimes compelling to

consider both the best case and the

worst case because honestly in general

who really cares if you just get lucky

once in a while and your algorithm is

super fast what you probably care about

is what's the worst case how long are my

users how long am I going to be sitting

there watching some spinning hourglass

or a beach ball trying to give my uh

give myself an answer to a pretty big

problem well odds are you're going to

generally care about Big O notation so

indeed moving forward we'll generally

talk about the running time of

algorithms often in terms of big oh a


little less so in terms of Omega but

understanding the range can be important

depending on the nature of the data that

you're going to actually be given here

all right let me pause and see if there

is any questions

any questions here

yes thank you

[Music]

yeah that's a really good question and

if I can generalize it how do you

guarantee that you can do this at scale

which algorithm is better I've sort of

led us down this road of implying that

rave's second algorithm binary search is

better because it's so much faster it's

log of n in the worst case instead of

Big O event but Rave was given an

advantage when she came up here and that

the doors were already sorted and so

that sort of invites the question well

given a whole bunch of random data

either a small data set or heck

something Google size with millions

billions of pieces of data should you

sort it first from smallest to largest

and then search or should you just Dive

Right In and search it linearly

like how might you think about that if

you are Google for instance and you've


got millions billions of web pages

should they just go with linear search

because it's always going to work even

though it might be slow or should they

invest the time in sorting all of that

data we'll see how in a bit

and then search it more efficiently like

how do you decide between those options

[Music]

yeah if you had to sort the data first

and we don't yet formally know how to do

this but obviously as humans we could

probably figure it out you do have to

look at all of the data anyway and so

you're sort of wasting your time if

you're sorting it only then to go and

search it but maybe it depends a bit

more like that's absolutely right and if

you're just searching for one thing in

life then that's probably a waste of

time to sort it and then search it

because you're just adding to the

process but what's another scenario in

which you might not worry about that

whereby it might make sense to sort it

and then search yeah

[Music]

yeah exactly so if your problem is a

google-like problem where you have more


than just one user who's searching for

more than just one web page probably you

should incur the cost up front and sort

the whole thing because every subsequent

request thereafter is going to be faster

faster faster because it's going to be

Raves algorithm of binary search binary

search binary search that's going to add

up way to be way fewer steps than doing

linear search multiple times so again

kind of depends on the use case and kind

of depends on how important it is and

this happens even in like real world

context I think back always to graduate

school when I was writing some code to

analyze some large data set and honestly

it was actually easier at the time for

me to write pretty inefficient but

hopefully correct code because you know

what I could just go to sleep for eight

hours and let it analyze this really big

data set I didn't have to bother writing

more complex code to sort it just to run

it more efficiently why because I was

the only user and I only needed to run

these queries once and so this was kind

of a reasonable approach reasonable

until I woke up 8 hours later and my

code was incorrect and now I had to

spend another eight hours re-running it


after fixing it but even there you see

an example where what is your most

precious resource is it time to run the

code is it time to write the code is it

the amount of memory the computer is

using these are all resources we'll

start to talk about because it really

depends on what your goals are any

questions then on upper bounds lower

bounds or each of these two searches

linear or binary yeah

[Music]

when analyzing running time does the

Sorting step count if you want it to if

you actually do it at the moment it did

not apply I just gave Rave the luxury of

knowing that the data was sorted but if

I really wanted to charge her for the

amount of time it took to find that

number six I should have added the time

to sort plus the time to search and in

fact that's a road will go down why

don't we go ahead and Pace ourselves as

before let's take a 10 minute break here

and when we come back we'll write some

actual code

so we've seen a couple searches linear

search in binary search which to be fair

we saw back in week zero but let's


actually translate at least one of those

now to some code using this building

block from last week where we can

actually Define an array if we want like

an array of integers called numbers so

let me switch over to vs code here let

me go ahead and start a program called

numbers dot C and in numbers dot C let

me go ahead here and how about let's

include our familiar header files so

cs50.h I'll include standardio.h that we

can get input and print input if we want

and now I'm going to go ahead and give

myself int main void no command line

arguments today so I'll leave that as

void and I'm going to go ahead and give

myself an array of how about seven

numbers so I'll call it int number seven

and then I can fill this array with

numbers like numbers bracket zero can be

the number four and numbers bracket one

could be the number six and numbers

bracket two can be the number eight and

this is the same list that we saw when

Amina a bit ago where it was four then

six then eight but you know what there's

actually another syntax I can show you

here if you know in advance in a c

program that you want an array of

certain values and you know therefore


how many of those values you want you

can actually do this little trick using

curly braces you can say don't worry

about how big this is it's going to be

implicit by way of these curly braces

here I can do four six eight two seven

five zero close curly brace so it's a

somewhat new use of curly braces but

this has the effect of giving me an

array called numbers inside of which are

a whole bunch of integers how many the

compiler can infer it from what's ever

inside these curly braces and it seems

to be of size one two three four five

six seven and all seven elements will be

initialized with four six eight two

seven five zero respectively so just a

minor optimization code wise to tighten

up what would have otherwise been like

eight separate lines of code now let's

go ahead and Implement linear search as

we called it and you can do this in a

bunch of ways but I'm going to do it

like this for INT I gets zero I is less

than seven

I plus plus then inside of my Loop I'm

going to ask the question well if the

numbers at location I equals equals as

we ask the Domina the number zero then


I'm going to go ahead and do something

like printf found backslash n

and then I'm going to return 0 just

because of last week's discussion of

returning a value for main when all is

well I'm going to return 0 by convention

just a signal that indeed I found what

I'm looking for otherwise on what line

do I want to go and add

a printf like not found and returns

something other than zero right I don't

think I want an else here per our pseudo

code earlier so on what line would you

prefer I sort of insert a default

scenario not found and I'll return an

error

uh yeah over here

[Music]

nice so at the end of the for Loop

because you want to give the program or

volunteer earlier a chance to go through

all of the doors all of the numbers but

if you go through the whole thing

through the whole loop at the very end

you probably just want to conclude not

found backslash n and then return

something like positive one just to

signify that an error happens and again

this was a minor detail last week

anytime Maine is successful the


programming convention is to return zero

that means all is well and if something

goes wrong like you didn't find what

you're looking for you might return

something other than zero like positive

one maybe positive two or even negative

numbers if you want all right well let

me go ahead and save this let me do make

numbers hopefully no syntax errors

all good so far dot slash numbers enter

all right and it's found as I would hope

it would be and just as a little check

let's search for something that's

definitely not there like the number uh

negative one let me go ahead and

recompile the code with make numbers let

me rerun the code with DOT slash numbers

and hopefully okay not found so proof by

example seems to be working correctly

but let's make things a little more

interesting now right now I'm using just

an array of integers let me go ahead and

introduce maybe an array of strings

instead and maybe this time I'll store a

bunch of names and not just integers but

actual strings of names so how might I

do this well let me go back to my code

here I'm going to switch us over to

maybe a file called names.c and in here


I'll go ahead and include cs50.h I'll

include standard io.h and I'm going to

go ahead and for now include a new

friend from last week string.h which

gives me some string related

functionality in main void because I'm

not going to bother with any command

line Arguments for now and now if I want

an array of strings I could do something

like this string names bracket 7 and

then I could start doing like before

names bracket zero could be someone like

Bill and names bracket one could be

someone like Charlie and so forth but

there's this new uh Improvement I can

make let me just let the compiler figure

out how many names there are and using

curly braces I'll do Bill and then

Charlie and then Fred and then George

and then Ginny and then Percy and then

Ron if there's the pattern there all

right so now I have these seven names as

strings let's do something similar so

for INT

I get zero I is less than seven as

before I plus plus as before and inside

of the loop let's this time check for

the string in question and suppose we're

searching for Ron arbitrarily he is

there so we should eventually find him


let me go ahead and say if uh names

bracket I equals quote unquote Ron then

inside of my if condition I'm going to

say printf found just like before and

I'm going to return 0 just because all

is well and I'm going to take your

advice from the get-go this time and at

the end of the loop print out not found

because if I get this far I have not

printed found and I have not returned

already so I'm just going to go ahead

and return one and after printing not

found all right let me go ahead and

cross my fingers as always make names

this time

[Music]

and it doesn't seem to like my code here

this is perhaps a new error that you

might not have seen yet in names.c line

11 so that's this line here my if

condition uh result of comparison

against a string literal is unspecified

use an explicit string comparison

function instead I mean that's kind of a

mouthful and first time you see it

you're probably not going to know how to

make sense of that but it does kind of

draw our attention to something being

awry with the equality checking here


with equal equals and Ron and here's

where again we've been telling sort of a

white lie for the past couple of weeks

strings are a thing in C strings are a

thing in programming but recall from

last week I did disclaim there's no such

thing as a string data type technically

because it's not a primitive in the way

an INT and a float and a bull are that

are sort of built into the language you

can't just use equal equals to compare

two strings you actually have to use a

special function that's in this header

file we talked brief about last week in

that header file was string length or

Sterling but there's other functions

instead as well let me in fact go ahead

and open up the manual pages and if we

go to string.h let me scroll down a bit

in string.h you can perhaps infer what

function will probably take the place of

equals equals for today

what do we want to use yeah

so stir comp strcmp which apparently

Compares two strings and if I click on

that we'll see more information and

indeed if I click on stir comp we'll see

under the synopsis that okay I need to

use the cs50 library header file and

string.h as I already have here is its


prototype which is telling me that stir

comp takes two strings S1 and S2 that

are presumably going to be compared and

it returns an integer which is

interesting so let's let's read on the

description of this function is that it

Compares two strings case sensitively so

uppercase or lowercase matters just FYI

and then let's look at the return value

here the return value of this function

returns an INT less than zero if S1

comes before S2 zero if S1 is the same

as S2 or an inch greater than zero if S1

comes after S2 so the reason that this

function returns an integer and not just

a Bool true or false is that it actually

will allow us to sort these things

eventually because if you can tell me if

two strings come in this order or in

this order or they're the same you need

three possible return values and a bull

of course only gives you two but an ins

gives you like 4 billion even though we

just need the three so zero or a

positive number or a negative number is

what this function returns and the

documentation goes on to explain what we

mean by ASCII medical order recall that

capital A is 65 capital B is 66 and it's


those underlying ASCII or Unicode

numbers that a computer uses to figure

out whether something comes before it or

after it like in a dictionary but for

our purposes now we only care about

equality so I'm going to go ahead and do

this if I want to compare names bracket

I against Ron I use Stir pair more stir

comp names bracket I comma quote unquote

Ron so it's a little more involved than

actually using equals equals which does

work for integers Longs and certain

other values but for Strings it turns

out we need to use a more powerful

function why well last week recall what

a string really is it's an array of

characters and so whereas you can use

equals equals for single characters stir

comp as we'll eventually see is going to

compare multiple characters for us

there's more logic there there's a loop

needed and that's why it comes with the

string library but it doesn't just work

out of the box with equals equals alone

that would literally be comparing two

things not to arrays of things and we'll

come back to this next week as to what's

really going on under the hood so let me

go ahead and fix one bug that I just

realized I made I want to check if the


return value of stir compare is equal to

zero because per the documentation that

meant they're the same all right let me

go ahead and make names this time now it

compiles dot slash names enter

found and just as a sanity check let's

check someone outside the family

searching now for Hermione after

recompiling the code after rerunning the

code and she's not in fact found so

here's just a similar implementation of

linear search not for integers this time

but instead for Strings the subtlety

really being we need a helper function

stir compare to actually do the the leg

work for us of comparing two arrays of

characters

all right questions on either of these

implementations yeah in the middle

[Music]

good question if I had not fixed what I

claimed was a mistake earlier and I did

this and we saw an example of this last

week actually if a function returns an

integer be it negative or positive or

zero

when you get back zero the expression

the Boolean expression will be

considered false so 0 equals false


always if a function returns any

positive number or any negative number

that's going to be interpreted as true

even if it's positive or negative

whether it's 1 negative one two negative

two and so if I did this this would be

saying the opposite so if I were to say

this if stir compare of names bracket I

and Hermione that's implicitly like

saying this

does not equal zero or it means sort of

is true but you don't want to check for

true because again we're comparing

integers here so the reason I did zero

here in this case is that it explicitly

checks for the return value that means

they're the same and yeah follow up

River

yes you might not have seen this yet but

you can express the equivalent because

if you want to this if you want to check

if this is false you can actually use an

exclamation point known as a bang in

programming that inverts the meaning so

false becomes true true becomes false so

this would be another way of expressing

it this is arguably a worse design

though because the documentation

explicitly says you should be checking

for zero or a positive value or a


negative value and this little trick

well correct and I think you can make a

reasonable case for it Center of hides

that detail and I would argue instead

for the first way checking for equals

equals zero instead and if that's a

little subtle not to worry we'll come

back to sort of a little syntactic

tricks like that before long other

questions on linear search in these two

forms

is there another hand or hands two hands

no okay just holler if I missed so let's

now actually take this one step further

suppose that we want to write a program

that maybe implements something a little

more like a phone book that has both

names and numbers and not just integers

but actual phone numbers well we could

escalate things like this we could now

have two arrays one called names one

called numbers and I'm going to use

strings for the numbers now the phone

numbers because in most communities

phone numbers might have dashes pluses

parentheses so something that really

looks more like a string even though we

call it a phone number probably don't

want to use an INT lest we throw away


those kinds of details so let me switch

back to vs code here and let's do one

more program this one in a file called

phonebook.c and now let me go ahead and

do the same let me include cs50.h let me

include standard io.h and let me include

string.h I'm going to instead again do

int main void and then inside of my

program I'm going to give myself two

arrays the efficient way this time

string names will be just two of us this

time how about Carter and me and then

I'll give myself oops typo already if I

want this to be an array I don't have to

specify the number the compiler can

count for me but I do need to the square

brackets then for numbers I'm again

going to use a string uh array

specifying with the curly braces that

how about Carter can be at one six one

seven four nine five one thousand and

how about my own number here one nine

four nine four six eight oh pattern

appearing two seven five zero will be

mine why mine well I've just kind of

lined things up so Carter's number is

apparently uh first in this array and

I'm claiming that he'll be first in this

array respectively I David will be the

first the second in the names array and


second in the numbers red if you want to

have a little fun with programming feel

free to text or call me sometime at that

number so now let's actually use this

data in some way let's go ahead and

actually search for my own name and

number here so let me do for INT I get

zero there's two of us this time so I

less than two and then I plus plus as

before and now I'm going to practice

what I preached earlier and I'm going to

use Stir compare to find my name in this

case and I'm going to say if stir comp

of names bracket I

equals quote unquote David

and that equals zero meaning they're the

same then just as before I'm going to go

ahead and print something out but this

time I'm going to make the program more

useful not just say found or not found

now I'm implementing a phone book like

the contacts app on iOS or Android so

I'm going to say something like quote

unquote found percent s backslash n and

then actually plug in numbers bracket I

to correspond to the current names

bracket I and then overturn 0 as before

and then down here if we get all the way

through the loop and David's not there


for some reason I'm going to print as

before not found and then return one so

let me go ahead and compile this with

make phone book dot slash phone book and

it seems to have found the number

so this code I'm going to claim is

correct it's kind of stupid because I've

just made a phone book or a contacts app

that only supports two people they're

only going to be me and Carter this

would be like downloading the contacts

app on a phone and you can only call two

people in the world there's no ability

to add names or edit things that of

course could come later using get string

or something else but for now for the

sake of discussion I've just hard-coded

two names and two numbers but for what

it does I claim this is correct it's

going to find me and print out my number

but is it well designed let's start to

now consider if we're not just using

arrays but are we using them well we

started to use them last week but are we

using them well this weekend what might

I even mean by using an array well or

designing this program well

any critiques or concerns

with why this might not be the best road

for us to be going down when I want to


implement something like a phone book

with pieces of information it seems all

too vulnerable to just mistakes for

instance if I screw up the number the

actual number of names in the names

array such that it's now more or less

than is in the numbers array or vice

versa it feels like there's not a tight

relationship between those pieces of

data and it just sort of is trusting on

the honor System that anytime I use

names bracket I that it re that it uh

lines up with numbers bracket I and

that's fine if you're the one writing

the code you're probably not going to

really screw this up but if you start

collaborating with someone else or the

program's getting much much longer the

odds that you or your colleagues

remember that you're sort of just

trusting that names and numbers line up

like this is going to fail eventually

someone's not going to realize that and

just the code is going to break and

you're going to start outputting the

wrong numbers for names which is to say

it'd be much nicer if we could somehow

couple these two pieces of data names

and numbers a little more tightly


together so that you're not just

trusting that these two independent

variables names and numbers have this

kind of relationship with themselves so

let's consider how we might solve this a

new feature today that we'll introduce

is generally known as a data structure

in C we have the ability to invent our

own data types if you will data types

that the authors of C decades ago just

didn't Envision or just didn't think

were necessary because we can Implement

them ourselves similar to scratch just

as you could create custom puzzle pieces

or in C you can create custom functions

so in C can you create your own types of

data that go beyond the built-in ins and

floats and even strings you can make for

instance a person data type or a

candidate data type in the context of

Elections or a person data type more

generically that might have a name and a

number so how might we do this well let

me go here and propose

that if we want to define a person

wouldn't it be nice if we could have a

person data type and then we could have

an array called people and maybe that

array is our only array with two things

in it two persons in it but somehow


those data types these persons would

have both a name and a number associated

with them so we don't need two separate

arrays we need one array of persons a

brand new data type so how might we do

this well if we want every person in the

world or in this program to have a name

and a number we literally write out

first those two data types give me a

string called name give me a string

called number semicolon after each and

then we wrap that those two lines of

code with this syntax which at first

glance is a little cryptic it's a lot of

words all of a sudden but typedef is a

new keyword today that defines a new

data type this is the C keyword that

lets you create your own data type for

the very first time so struct is another

related keyword that tells the compiler

that this isn't just a simple data type

like an INT or a float renamed or

something like that it actually is a

structure it's got some Dimensions to it

like two things in it or three things in

it or even 50 things inside of it

the last word down here is the name that

you want to give your data type and it

weirdly goes after the curly braces but


this is how you invent a data type

called person and what this code is

implying is that henceforth the compiler

clang will know that a person is

composed of a name that's a string and a

number that's a string and you don't

have to worry about having multiple

arrays now you can just have an array of

people moving forward

so how can we go about using this well

let me go back to my code from before

where I was implementing a phone book

and why don't we enhance the phone book

code a little bit by borrowing some of

that new syntax let me go to the top of

my program above Main and Define a type

that's a structure or a data structure

that has a name inside of it and that

has a number inside of it and the name

of this new structure again is going to

be called person

inside of my code now let me go ahead

and delete this old stuff temporarily

let me give myself an array called

people of size 2 and I'm going to use

the non-ters way to do this I'm not

going to use the curly braces I'm going

to more pedantically spell out what I

want in this array of size 2. at

location zero which is the first person


in an array because you always start

counting at zero I'm going to give that

person a name of quote unquote Carter

and the dot is admittedly one new piece

of syntax today too the dot means go

inside of that structure and access the

variable called name and give it this

value Carter similarly if I want to give

Carter a number I can go into people

bracket zero DOT number and give that

the same thing as before plus one six

one seven four nine five one thousand

and then I can do the same for myself

here people bracket where should I go

okay one because again two elements but

we started counting at zero bracket name

equals quote unquote David and then

lastly people bracket one dot number

equals quote unquote plus one uh nine

four nine two uh four six eight two

seven five zero so now if I scroll down

here to my logic I don't think this part

needs to change too much I'm still for

the sake of discussion gonna iterate two

times from I is zero on up two but not

through two but I think this line of

code needs to change

how should I now refer to the ith

person's name as I iterate


[Music]

what should I compare quote unquote

David to this time let me see on the end

here

yeah people bracket i.name why because

people is the name of the array bracket

I is the ith person that we're iterating

over in the current Loop for zero then

one maybe higher if it had more people

then dot is our new syntax for going

inside of a data structure and accessing

a variable therein which in this case is

name and so I can compare David just as

before so it's a little more verbose but

now arguably this is a better program

because now these people

our full-fledged data types unto

themselves there's no more honor System

inside of my Loop that this is going to

line up because in just a moment I'm

going to fix this one Last Remnant of

the previous version and if I can call

back on you again what should I change

numbers bracket I to this time

[Music]

DOT number exactly So Gone is the honor

System that just assumes that bracket I

in this array lines up with bracket I in

this other array now why there's only

one array it's an array called people


the things that stores are persons a

person has a name and a number and so

even though it's kind of marginal

admittedly given that this is a short

program and given that this kind of made

things look more complicated at first

glance we're now laying the foundation

for just a better design because you

really can't screw up now the

association of names with numbers

because every person's name and number

is so to speak encapsulated inside of

the same data type and that's a term of

Art in CS encapsulation means to

encapsulate that is contained related

pieces of information and thus we have a

person that encapsulates two other data

types name and number and this just sets

the foundation for all of the cool stuff

we've talked about and you use every day

how what is an image well recall that an

image is a bunch of pixels or dots on

the screen every one of those dots has

RGB values associated with it red green

and blue you could imagine now creating

a structure in C probably where maybe

you have three values three variables

one called red one called green one

called blue and then you could name the


thing not person but pixel and now you

could store in c three different colors

some amount of red some green some blue

and collectively treat it as the color

of a pixel and you can imagine doing

something similar perhaps for video or

music music you might have three

variables one for the musical note the

duration the loudness of it and you can

imagine coming up with your own data

type for music as well so this is a

little low level we're just using like a

familiar contacts application but we now

have the away in code to express most

any type of data that we might want to

implement or discuss ultimately so any

questions now on struct or defining our

own types the purposes for which are to

use arrays but use them more responsibly

now in a better design but also to lay

the foundation for implementing cooler

and cooler stuff

per our week zero discussion yeah

what's the difference between this and

an object in an object-oriented language

so a slight side note C is not object

oriented languages like Java and C plus

plus and others which you might have

heard of programmed yourself had friends

program in or object oriented languages


in those languages they have things

called classes or objects which are

interrelated and objects can store not

just data like variables objects can

also store functions and you can kinda

sort of do this in C but it's not sort

of conventional in C you have data

structures that store data in languages

like Java and C plus you have objects

that store data and functions together

python is an object-oriented language as

well so we'll see this issue in a few

weeks but let me wave my hands at it for

now yeah

yes could you use this struck to

redefine how an INT is defined short

answer yes we talked a couple of times

now about integer overflow and most

recently you might have seen me mention

um the bug in iOS and Mac OS that was

literally related to an INT overflow

that's probably that's the result of ins

only storing four bytes or 32 bits or

even a long is 64 bits or eight bytes

but it's finite but if you want to

implement some Financial software or

some scientific or mathematical software

that allows you to count way bigger than

a typical int or a long you could


imagine coming up with your own

structure and in fact in some languages

there is a structure called Big int

which allows you to express even bigger

numbers how well maybe you store inside

of a big int an array of values and you

somehow allow yourself to store more and

more bits based on how high you want to

be able to count so in short yes we now

have the ability now to do most anything

we want in the language even if it's not

built in for us

other questions

[Music]

could you define a name and a number in

the same line uh sort of it starts to

get syntactically a little messy so I

did it a little more pedantically line

by line

good question oh we're here

[Music]

prototypes you have to do in C you have

to Define anything you're going to use

or declare anything you're going to use

before you actually use it so it is

deliberate that I put it in the top of

my code in this file otherwise the

compiler would not know what I mean by

person when I first use it here on

what's line 14. so it has to come first


or it has to be put into something like

a header file so that you know so you'll

include it at the very top of your code

other questions over here uh yeah

[Music]

yeah good question we'll come back to

this later in the term when we talk

about SQL a database language and

storing things in actual databases

generally speaking even though we humans

call things uh phone numbers or in the

US we have social security numbers those

types of numbers often have other

punctuation in it like dashes

parentheses uh pluses and so forth you

could not store any of that syntax or

that punctuation inside of an INT You

Could Only store numbers so one

motivation for using a string is just I

can store whatever the human wanted me

to store including parentheses and so

forth another reason for storing things

as strings even if they look like

numbers is in the context of like ZIP

codes in the United States again we'll

come back to this but long story short

years ago actually I was using Microsoft

Outlook for my email client and

eventually I switched to Gmail and this


is like 10 plus years ago now and

Outlook at the time lets you export all

of your contacts as a CSV file comma

separated values more on that in the

weeks to come too and that just means I

could download a text file with all of

my friends and family and their numbers

inside of it unfortunately I opened that

same CSV file with Excel I think at the

time just to kind of spot check it and

see if what's in there was what it was

expected and I must have instinctively

hit like command or control s to save it

and Excel at least has this habit of

sort of reformatting your data things

look like numbers it treats them as

numbers and Apple Numbers does the two

Google spreadsheets does this too

nowadays but long story short I then

imported imported my mildly saved CSV

file into Gmail and now 10 plus years

later I'm still occasionally finding

friends and family members whose zip

codes are in Cambridge Massachusetts

2138 which is missing the zero because

we here in Cambridge are 02138 and

that's because I treated or I let Excel

treat what looks like a number as an

actual number or int and now leading

zeros become a problem because


mathematically they mean nothing but in

the mail system they do

sending envelopes and such all right

other final questions here

yeah so could I have created used a 2d

or two-dimensional array to solve the

problem earlier of having just one array

yes but uh one I'd argue it's less

readable especially as I get lots of

names and numbers and two that too is

also kind of relying on the honor System

it would be all too easy to Omit some of

the square brackets in the

two-dimensional array so I would argue

it too is not is not as good as

introducing a struct more on that down

the road two-dimensional arrays just

means arrays of arrays as you might

infer

all right so now that we have this

ability to store different types of data

like contacts in a phone book having

names and addresses let's actually take

a step back and consider how we might

now solve one of the original problems

uh by actually sorting the information

we're given in advance and considering

per our discussion earlier just how

costly how time consuming is that


because that might tip the scales in

favor of sorting sorting then searching

or maybe just not sorting and only

searching it'll give us a sense of just

how expensive so to speak uh sorting

something actually is what's the

formulation of this problem it's the

same thing as week zero we've got input

to sort we want it to up be outputted as

sorted so for instance if we're taking

unsorted input as input we want the

sorted output as the result more

concretely if we've got numbers like

these six three eight five two seven

four one which are just randomly

arranged numbers we want to get back out

one two three four five six seven eight

so we just want those things to be so

supported so again inside of the black

box here is going to be SWA one or more

algorithms that actually gets this job

done so how might we go about doing this

well just to vary things a bit more I

think we have some a chance here for a

bit more audience participation uh but

this time we need eight people if we may

all you have to be comfortable appearing

on the internet okay so this is actually

quite convenient that you're all quite

close I've got one two three four five


six seven

oh okay and someone volunteering their

friend number eight

come on down come on down or and if you

could I'm going to set things up if you

all could join Valerie my colleague over

there to give you a prop to use here

we'll go ahead in just a moment

and try to find some numbers at hand

[Music]

in just a moment each of our volunteers

is going to be representing an integer

and that integer is ultimately is

initially going to be in unsorted order

and I claim that using an algorithm

step-by-step instructions we can

probably sort these folks in at least a

couple of different ways so they're in

wardrobe right now uh just getting their

very own Harbor T-shirt with a jersey

number on it which will then represent

an element of our array

give us just a moment to finish

getting the attire ready they're being

handed a shirt and a number

and let me ask the audience for just a

moment as we have these numbers up here

on the screen these numbers too are

unsorted they're just in random order


and let me ask the audience how would

you go about sorting these eight numbers

on the screen

how would you go about sorting these

yeah what are your thoughts

[Music]

remember

okay

[Music]

okay

so just to just to recap you would start

with one of the numbers on the end you

would look to the number to the right or

to the left of it depending on which end

you start at and if it's out of order

you would just start to swap things and

that seems reasonable there's a whole

bunch of mistakes to fix here because

things are pretty out of order but

probably if you start to solve small

problems at a time you can achieve the

end result of getting the whole thing

sorted other instincts if you were just

handed these numbers how you might go

about sorting them

how might you get in the back

okay

I like that so to recap there find the

smallest one first and and put it at the

beginning if I heard you correctly and


then presumably you could do that again

and again and again and that would seem

to give you a couple of different

algorithms and if you all are tired here

do you want to come on up if you're

ready

we had some felt volunteers too come on

over

so if you all would like to line

yourselves up facing the audience in

exactly this order so whoever is number

zero should be way over here and whoever

is number five should be way over there

feel free to distance as much as you'd

like and Screech a little this way if

you could

okay

all right and make a little more room so

seven let's see five two seven four

hopefully one uh and yeah keep them to

the side okay one uh six

and there we go three come on over three

he's looking for you all right so here

we have an array of eight numbers eight

integers if you will and you wanna each

say a quick hello to the group

hello I'm Quinn go Canada Day

hi everyone I'm agird

hey I'm Mitchell


hi I'm Brett and also go Canada

I'm Hannah go Appley

hi I'm Matthew goholbert

hi I'm Miriam gowenthrop

hi I'm Celeste they go Strauss wonderful

but welcome all to the stage and let's

just visualize perhaps organically how

you ate would solve this problem so we

currently have the number zero through

seven quite out of order could you go

ahead and just sort yourselves

from zero through seven

okay so what did they just do well okay

yes first of all yes very well done

how would you describe what they just

did

well let's do this could you go back

into that order on the screen five two

seven four one six three oh

and could you do exactly what you just

did again sort yourselves

all right put it okay I guess well done

again

all right so admittedly there's kind of

a lot going on because each of you

except number four are doing something

in parallel all at the same time and

that's not really how A computer

typically works just like a computer can

only look at one memory location at one


Locker at a time so can a computer only

move one number at a time sort of

opening a locker checking what's there

moving it as needed so let's try this

more methodically based on the two

audience suggestions if you all could

randomize yourself again to five two

seven four one six three zero let's take

the second of those approaches first I'm

gonna look at these numbers and even

though I as the human can obviously see

all the numbers and I just kind of have

the intuition for how to fix this we've

got to be more methodical because

eventually we got to translate this to

pseudo code and then code so let me see

I'm going to search for as you propose

the smallest number and I'm going to

start from left to right I could do it

right to left but left to right this

tends to be convention all right five at

this moment is the smallest number I've

seen so I'm going to sort of remember

that in a variable if you will now I'm

going to take one more step two okay two

I'm going to compare to the variable

people in mind obviously smaller I'm

going to forget about five and only now

remember two as the now smallest element


seven nope I'm going to ignore that

because it's not smaller than the two I

have in mind four one okay I'm going to

update the variable in mind because

that's indeed smaller now obviously we

the humans know that's getting pretty

small maybe it's the end I have to check

all values to see if there's something

even smaller because 6 is not 3 is not

but zero is and what's your name again

Celeste where should Celeste or number

zero go according to this proposed

algorithm

all right all I'm seeing a lot of this

so at the beginning of the array so

before doing this for real let's have

you pop out in front and could you all

shift and make room for Celeste

is this a good idea to have all of them

move or equivalently move everything in

the array to make room for Celeste and

number zero over there

no probably not that felt like a lot of

work and even though it happened pretty

quickly that's like seven steps to

happen just to move her in place so what

would be marginally smarter perhaps a

little more efficient perhaps what's

that swapping what do you mean by swap

okay replace two values so if you want


to go back to where you were one step

over number five he's not in the right

place he's got to move eventually so you

know what if that's where Celeste

belongs why don't we just swap five and

zero so if you want to go ahead and

exchange places with each other notice

what's just happened the problem trying

to solve I'm trying to solve has gotten

smaller instead of being size eight now

it's size seven now granted I moved five

to another wrong location but if these

numbers started off randomly it doesn't

really matter where five goes until we

get him into the right place so I think

we've improved and now if I go back my

Loop is sort of coming back around I can

ignore Celeste and make this a seven

step problem and not eight because I

know she's in the right place two seems

to be the smallest I'll remember that

not seven not four one seems to be the

smallest now I know as a human this

should be my next smallest but why

intuitively should I keep going do you

think

I can't sort of optimize as a human and

just say number one let's let's get you

into the right place I still want to


check the whole array why yeah

maybe there's another one and that could

be another problem altogether other

thoughts yeah

there could be another zero indeed but I

I did go through the list once right and

I I kind of know there isn't your

thoughts

[Music]

yeah I don't necessarily know what is

there and honestly I only stipulated

earlier that I'm using one variable in

my mind I could use two and remember

this two smallest elements I've seen I

could use three variables four but then

I'm going to start to use a lot of space

in addition to time so if I've

stipulated that I only have one variable

to solve this problem I don't know

anything more about these elements

because the only thing I'm remembering

at this moment is number one is the

smallest element I've seen so I'm going

to keep going six nope three nope five

nope okay I know that number one and

your name was Hannah is the next

smallest element I could have everyone

move over to make room but nope two you

know even though you're so close to

where I want you I'm just going to keep


it simple and swap you two so granted

I've made the problem a little worse but

on average I could get lucky too and

just pop number two into the right place

now let me just accelerate this I can

now ignore Hannah and Celeste making the

problem size six instead of eight so

it's getting smaller seven is the

smallest nope now fours two is the

smallest still two still two still two

so let's go ahead and swap two and seven

and now I'll just kind of orchestrate it

verbally for you're about to have to do

something so we now have four seven six

three five okay three could you swap

with four

all right now we have seven six four

five okay four could you swap with seven

now we have six seven five uh five could

you swap with six

and now we have seven six six would you

swap at seven and now perhaps a round of

applause they've sorted themselves okay

hang on there one minute

so we'll do this one other approach and

my God that felt so much slower than the

first approach but that's one because I

was kind of providing a long voice over

but two we were doing one thing at a


time whereas the first time you guys had

the luxury of moving like eight

different CPUs brains if you will we're

all operating at the same time and

computers like that exist if you have a

computer with multiple cores so to speak

that's like having a computer that

technically can do multiple things at

once but software typically at least as

we've written it thus far can only do

one thing at a time so in a bit we'll

add up all of these steps but for now

let's take one other approach if you all

could reorder yourselves like that five

two seven four one six three zero let's

take the other approach that was

recommended by just fixing small

problems and see where this gets us so

we're back in the original order five

and two are clearly out of order so you

know what let's just bite this problem

off now it's five and two could you swap

now let me take a next step five and

seven I think you're okay there's a gap

yes but that might not be a big deal

seven and four problem let's have you

swap

okay seven and one let's have you swap

seven and six let's have you swap seven

and three you swap seven and zero you


swap now let me pause for just a moment

still not sorted so I'm clearly not done

but have I improved the problem

right I I can't cheat like before or I

can't optimize like before because zero

is obviously not here so let's stay

still way back there so it's not like

I've gone from eight steps to seven to

six just yet but have I made any

improvements

yes in what sense is this improved

what's a concrete thing you could point

to is better

[Music]

yeah

sort of the highest number which is

indeed seven and conversely if you

prefer Celeste is one step closer to the

beginning now worst case Celeste is

going to have to move one step on each

iteration so I might need to do this

thing like n total times to move her all

the way over but that might work out

okay let me see uh two and five you're

good five and four swap you five and one

let's swap you five and six you're good

six and three let's swap you

six and zero let's swap you six and

seven you're good and I think now notice


that the high values as you noted are

sort of bubbling up if you will to the

end of the list to enforce you're good

four in one let's swap four and five

good five and three swap five and zero

swap

five six seven of course are good so now

you can sort of see the problem

resolving itself and let's just do this

part now faster two and one

two and four okay four and three

four and zero

all right now one and two two and three

three and zero

and good so we do have some optimization

there we don't need to keep going

because those all are sorted one and two

you're good two and zero

all right done one and zero and big

round of applause in closing okay

thank you all

um we need the puppets back but you can

keep the shirts thank you for

volunteering here uh feel free to make

your way exits left or right and let's

see if thanks to our volunteers here we

can't now

formalize a little bit what we did on

both passes here

um I claim that the first algorithm our


volunteers kindly acted out is what's

called selection sort and as the name

implied we selected the smallest

elements again and again and again

working our way from left to right

putting Celeste into the Celestia into

the right place and then continuing with

everyone else so selection sort as it's

formally called can be described for

instance with this pseudo code here 4i

from 0 to n minus one and again why this

this is just how we think about talk

about arrays the Left End is zero the

right end is n minus one where in this

case n happen to be eight people so

that's zero through seven so for I from

zero to n minus one what did I do I

found the smallest number between

numbers bracket I and numbers bracket n

minus one

so a little cryptic at first glance but

this is just a very pseudocode-like way

of saying find the smallest element

among all eight volunteers because if z

i starts at zero

and N minus 1 never changes because

there's always eight and uh eight people

so eight minus one is seven this first

says find the smallest number between


numbers bracket zero and numbers bracket

seven if you will then what do I do swap

the smallest number with numbers bracket

I so that's how we got Celeste from over

here all the way over there we just

swapped those two values what then

happens next in the pseudo code I of

course goes from zero to one and that's

the technical way of saying now find the

smallest element among the seven

remaining volunteers ignoring Celeste

this time because she was already in the

correct location so the problem went

from size 8 to size seven and if we

repeat size six five four three two one

until boom it's all done at the very end

so this is just one way of expressing in

pseudo code what we did a little more

organically in a formalization of what

we've all uh what someone volunteered

out in the audience

so if we consider then the efficiency of

this algorithm maybe abstracting in a

way now is a bunch of doors where the

leftmost again is always zero the

rightmost is always n minus one or

equivalently the second to last is n

minus two the third to last is n minus

three where n might be eight or anything

else
how do we think about or quantify the

running time of selection sort Big O of

what

I mean that was a lot of steps to be

adding up it's probably more than n

right because I went through the list

again and again it was like n plus n

minus 1 plus n minus two any instincts

here

we got like the whole team in the

orchestra now

let me let me propose we think about it

this way with with just a bit of

first time I had a

different volunteers and was eight in

this case but in generically I looked at

all eight numbers in order to decide who

was the smallest and sure enough Celeste

was at the very end she happened to be

all the way to the right but I only knew

that once I looked at all eight or all n

volunteers so that took me n steps first

but once Celeste was swapped into the

right place then my problem with size n

minus one and I had n minus one other

people to look through so that's n minus

one steps then after that it's n minus

two plus n minus 3 plus n minus four

plus dot dot dot until I had one final


step and it's obvious that I only have

one human left to consider so we might

wave our hands at this with a little

Ellipsis and just say dot dot dot plus

one for the final step now what does

this actually equal well this is where

you might think back on like your high

school math or physics textbook that has

a little cheat sheet at the end that

shows these kinds of recurrences that

happens to work out mathematically to be

n times n plus one all divided by two

that's just what that recurrence that

series actually adds up to so if you

take on faith that that math is correct

let's just now multiply this out math

work mathematically that's N squared

plus n divided by 2 or N squared divided

by 2 plus n over 2 and here's where

we're starting to get annoyingly into

the weeds like honestly as n gets really

large like a million doors or integers

or a billion web pages in Google search

engine honestly which of these terms is

going to matter the most mathematically

if N is a really big number is N squared

divided by 2 the dominant factor or is n

divided by 2 the dominant Factor

yeah N squared I mean no matter what n

is and the bigger it is the bigger uh


raising it to the power 2 is going to be

so you know what let's just wave our

hands at this because at the end of the

day as n gets really large the dominant

factor is indeed that first one and you

know what even the divided two as I

claimed earlier with our two phone book

examples were the two straight lines if

you keep zooming out essentially looked

the same when n is large enough let's

just call this on the order of N squared

so that is to say a computer scientist

would describe bubble sort as taking on

the order of N squared steps that's an

oversimplification if we really added it

up it's actually this many steps N

squared divided by 2 plus n over two but

again if we want to just be able to

generally compare two algorithms

performance I think it's going to

suffice if we look at that highest order

term to get a sense of what the graph

what the algorithm feels like if you

will or what it even looks like

graphically all right so with that said

we might describe bubble sort as being

in Big O sorry selection sort as being

in Big O of N squared but what if we

consider now the best case scenario an


opportunity to talk about a lower bound

in the best case

how many steps does selection sort take

well here we need some context like what

does it mean to be the best case or the

worst case when it comes to sorting

like what could you imagine meaning the

best possible scenario when you're

trying to sort a bunch of numbers

uh okay the whole crew here again yeah

all right they're already sorted right I

can't really imagine a better scenario

than I have to sort some numbers but

they're already sorted for me but does

this algorithm leverage that fact in

practice even if all of our humans had

lined up from zero to seven I'm pretty

sure I would have pretty naively started

here and yes Celeste happens to be here

but I only know she needs to be here

once I've looked at all eight people and

then I would have realized well that was

a waste of time I can leave Celeste B

but then what would I what would I have

done I would have ignored her position

because we're solved one problem I would

have done the same thing now for seven

people then six people so every time I

walk through I'm not doing much useful

work but I am doing those comparisons


because I don't know until I do the work

that the people were in the right order

so this would seem to imply that the

Omega notation the best case scenario

even a lower bound on the running time

would be what then

a little louder

it's still going to be N squared in fact

because the code I'm giving myself

doesn't leverage or benefit from any of

that scenario because it just mindlessly

continues to do this again and again so

in this case yes I would have claim that

the Omega notation for selection source

is also Big O of N squared so those are

the kinds of numbers to beat it seems

like the upper bound and lower bound of

selection sort are indeed N squared and

so we can also describe selection sort

therefore as being in Theta of N squared

that's the first algorithm we've had the

chance to describe that in which is to

say that it's kind of slow I mean maybe

other algorithms are slower but this

isn't the best starting point can we do

better well there's a reason that I

guided us to doing the second algorithm

second even though you verbally proposed

them in a different order this second


algorithm we did is generally known as

bubble sort and I deliberately use that

word a minute a bit ago saying the big

values are bubbling their way up to the

right to kind of capture the fact that

indeed this algorithm Works different at

least but let's consider if it's better

or worse so here we have pseudo code for

bubble sort you could write this two in

different ways but let's consider what

we did on the stage we repeated the

following n minus 1 times

we initialized at least even though I

didn't verbalize it this way a variable

like I from 0 to n minus two

n minus 2 and then I asked this question

if numbers bracket I and numbers bracket

I plus 1 are out of order

then swap them so again I just did it

more intuitively by pointing but this

would be a way with a bit of pseudo code

to describe what's going on but notice

that I'm doing something a little

differently here I'm iterating from I if

equals 0 to n minus 2. why well if I'm

comparing two things left hand and right

hand I'd still want to start at zero but

I don't want to go all the way to n

minus 1 because then I'd be going past

the boundary of my array which would be


bad I want to make sure that my left

hand I if you will stops at n minus 2 so

that when I plus 1 in my pseudo code I'm

looking at the last two elements not the

last elements and then past the boundary

that's actually a common programming

mistake that will undoubtedly soon make

but going beyond the boundaries of your

array so this pseudo code then allows me

to say

compare everyone again and again

and swap them if they're out of order

why do I repeat the whole thing n minus

1 times

like why it does it not suffice just to

do this Loop

here

think what happened with Celeste

why do I repeat this whole thing n minus

1 times

yeah and back

[Music]

indeed and I think if I can recap

accurately think back to Celeste again

and I'm sorry to keep calling on you as

our number zero each time through bubble

sort she only moved one step and so in

total if there's n locations at the end

of the day she needs to move n minus one


steps to get 0 all the way to where it

needs to be and so this inner loop if

you will where we're iterating using I

that just fixes some of the problems but

it doesn't fix all of the problems until

we do that same logic again and again

and again and so how might we quantify

the running time of this algorithm well

one way to see it is to just literally

look at the pseudo code the outer loop

repeats n minus one times by definition

it literally says that the inner loop

the for Loop also iterates n minus one

times why because it's going from zero

to n minus two and if that's hard to

think about that's the same thing as 1

to n minus one if you just add one to

both ends of the formula so that that

means you're doing n minus 1 Things N

minus 1 times so I literally multiply

how many times the outer loop is running

by the how many times the inner loop is

running which gives me sort of foil

method n minus 1 squared and I could

multiply that whole thing out well let's

consider this just a little more

methodically here if I have n minus 1 on

the outer n minus 1 on the inner let's

go ahead and foil this so N squared

minus n minus n plus 1 combine like


terms N squared minus 2N plus one and

now which of these terms is clearly

going to be dominant so to speak

the

the N squared so yes even though minus

2N is a good thing because it's

subtracting off some of the time

required plus one is not that big a

thing there's such drops in the bucket

when n gets really large like in the

millions or billions certainly that

bubble sort 2 is on the order of N

squared it's not the same exactly as

selection sore but as n gets big

honestly we're barely going to be able

to notice the difference most likely and

so it too might be said to be on the

order of N squared and if we consider

now the lower bound on Bubble shorts

running time here's where things get

potentially interesting

um what might you claim is the running

time

of bubble sort in the best case and the

best case I claim is when the numbers

are already sorted

is our pseudo code going to take that

into account

okay n why do you propose n


[Music]

yes and that's the key word to summarize

in bubble sort I do have to minimally

make one pass because if I don't look at

all n elements that I'm theoretically

just guessing if it's sorted or not like

I obviously intuitively have to look at

every element to decide yay or nay it's

in the right order and my original

pseudo code though is pretty naive it's

just going to blindly go back and forth

n times n minus 1 times again and again

and that's going to add up but what if I

add a bit of an optimization that you

might have glimpsed on the slide a

moment ago where if I compare two people

and I don't swap them compare two people

don't swap them and I go all the way

through the list comparing every pair of

adjacent people and I make no swaps it

would be kind of not just naive but

stupid to do that same process again

because if the humans have not moved I'm

not going to make any different

decisions I'm going to do nothing again

nothing again so at that point it would

be stupid very inefficient to go back

and forth and back and forth so if I

modify our pseudo code with just an

additional if condition I bet we can


speed this up inside of that same pseudo

code what if I say hey if no swaps quit

like quit prematurely before the loops

are finished running one of the loops

has gone through per the indentation

here but if I do a loop from left to

right and I have made no swaps which you

can think of as just being one other

variable that's plus plusing as I go

keeping track of how many swaps if I've

made no swaps from left to right I'm not

going to make any swaps the next time

around either so let's just quit at that

point

and that is to say in the best case if

you will when the list is already sorted

the Omega notation for bubble sort might

indeed be Omega of n if you add that

optimization so as to Short Circuit all

of that inefficient looping to do it

only as many times

as is necessary

all right let me pause to see if there's

any questions here yeah

[Music]

good question if the uh running time of

selection sort and bubble sort are both

in Big O of N squared but they are in

but selection sorts in Omega of N


squared while bubble sorts in Omega of n

which sounds better I think if I may uh

should we just always use bubble sort

yes if we think that we might benefit

over time from a lot of good case

scenarios or best case scenarios

however the goal at hand in just a bit

is going to be to do even better than

both of these so hold that question

further for a moment yeah

[Music]

oh my uh no so oh yes good question so I

say Omega then but it's a technically

Omega of n minus one maybe but again

we're throwing away low or lower order

terms and that's an advantage because

we're not comparing things ever so

precisely just like I plotted with the

or green and yellow and red chart I just

want to get a sense of the shape of

these algorithms so that when n gets

really large which of these choices is

going to matter the most at the end of

the day it's actually perfectly

reasonable to use selection sort or

bubble sort if you don't have that much

data because they're going to be pretty

fast my God our computers nowadays are

one gigahertz two gigahertz one billion

things per second one two billion things


per second but if we have large data

sets as we will later in the term and as

you might in the real world at the

Googles of the world then you're going

to be one you're going to want to be

more thoughtful and that's where we're

going today all right so let's actually

see this visualized a little bit in a

moment I'm going to change screens here

to open up what is a little

visualization tool that will give us a

sense of how these things actually work

and look at a faster rate than our

humans were able to do here on stage so

here is another visualization of a bunch

of numbers an array of numbers short

bars means small numbers tall bars mean

big numbers so instead of having the

numbers on their torsos here we just

have bars that are small or tall based

on the magnitude of the number let me go

ahead and I've pre-configured this in

advance to operate somewhat quickly

let's go ahead and do selection sort by

clicking this button and you'll see some

pink bars flying by and that's like me

walking left and right left and right to

select the next smallest number and so

what you'll see happening on the left of


this array of numbers is Celeste day if

you will and all of the other smaller

numbers are appearing on the left while

we continue to solve the remaining

problems to the right

so again we no longer have to touch the

smaller numbers here so that's why the

problem is getting smaller and smaller

and smaller over time but you can notice

now visually look at how many times

we're retracing our steps this is why

things that are N squared tend to be

frowned upon if avoidable because I'm

touching the same elements again and

again when I was walking through I kept

pointing at the same humans again and

again and that adds up so let's see if

bubble sort looks or feels a little

different let me re-randomize the thing

and let me now click bubble sort at the

top and as you might infer there's other

sorting algorithms out there not all of

which we'll look at but here's bubble

sort

same pink coloration but it's doing

something different it's two pink bars

going through again and again comparing

the adjacent numbers and you'll see that

the largest numbers are indeed bubbling

their way up to the right


but the smaller numbers like our number

zero was is only slowly making its way

over here's a comparable here's the

number one and it's going to take a

while to get all the way to the left and

here too notice how many times the same

bars are becoming pink how many times

the algorithm is retracing and retracing

its steps why because it's only solving

one problem at a time on each pass and

each time we do that we're stepping

through practically the whole array now

granted I could speed this up even

further if I really wanted to but my God

this is only what like 50 or 60 elements

something like that this is slow like

this is what N squared looks like and

feels like and now I'm just trying to

come up with words to say until we get

to the finish line here like this would

be annoying if this is the speed of

sorting and this is why I sort of

secretly sorted the numbers for Rave in

advance because it would have taken us

an annoying number of steps to get that

in place for her so those two algorithms

are N squared can we do in fact better

well to save the best algorithm for Less

let's a shorter five minute break here


and when we come back we'll do even

better than N squared

all right

so the challenge at hand is to do better

than selection sort and better than

bubble sort and ideally not just

marginally better but fundamentally

better just like in week zero that third

and final divide and conquer algorithm

was sort of fundamentally faster than

the other two so can we do better than

something on the order of N squared well

I bet we can if we start to approach the

problem a little differently the sorts

we've done thus far generally known as

comparison sorts and that kind of

captures the reality that we were doing

a huge number of comparisons again and

again and you kind of saw that in the

vertical bars that were going Pink as

everything was being compared again and

again but there's this programming

technique and it's actually a

mathematical technique known as

recursion that we've actually seen

before and this is a a building block or

a mental model we can bring to bear in

the problem to solve the Sorting problem

sort of fundamentally differently but

first let's look at it in a more


familiar context a little bit ago I

proposed this pseudo code for the binary

search algorithm and notice that what

was interesting about this code even

though I didn't call it out at the time

it's kind of cyclically defined like I

claim this is an algorithm for search

and yet it seems a little unfair that

I'm using the verb search inside of the

algorithm for search it's like an

English sort of defining a word by using

the word normally you shouldn't really

get away with that but there's something

interesting about this technique here

because even though this whole thing is

a search algorithm and I'm using my own

algorithm to search the left half or the

right half the key feature here that

doesn't normally happen in English when

you define a word in terms of a word is

that when I search the left half or

search the right half yes I'm doing the

same thing I'm using the same algorithm

but the problem is by definition half as

large so this isn't going to be a

cyclical argument in the same way this

approach by using search within search

is going to whittle the problem down and

down and down until hopefully one door


or no doors remains and so recursion is

a programming technique whereby a

function calls itself and we haven't

seen this yet in C and we haven't seen

this really in scratch but in C you can

have a function call itself and the form

that takes is like literally using the

function's name inside of the function's

implementation itself we've actually

seen this an opportunity for this once

before too think back to week zero

here's that same pseudocode for

searching for someone in an actual

physical phone book and notice these

yellow lines here we described those in

week zero as inducing a loop a cycle and

this is a very procedural approach if

you will because lines 8 and 11 are very

mechanically if you will telling me to

go back to line three to do this kind of

looping thing

but really what that's doing in the

binary search algorithm for the phone

book is it's just telling me to search

the left half or search the right half

I'm doing it more mechanically Again by

sort of telling myself what line number

to go back to but that's equivalent to

just telling myself go search the left

half search the right half the key thing


being the left half and the right half

are smaller than the original problem it

would be a bug if I just said search the

phone book search the phone book search

because obviously you never get anywhere

but if you search the half the half the

half problem gets smaller and smaller so

let's reformulate week zeros phone book

code to be not procedural as here but

recursive whereby in this search

algorithm AKA binary search formerly

called divide and conquer I'm going to

literally use also the keyword search

here notice among the benefits of doing

this is it kind of tightens the code up

makes it a little more succinct even

though that's kind of a fringe benefit

here but it's an elegant way A2 of

describing a problem by just using

having a function use itself to solve a

smaller puzzle at hand so let's now

consider a familiar problem a smaller

version the one you've dabbled with this

sort of pyramid this half pyramid from

Mario and let's throw away the the parts

that aren't that interesting and just

consider how we might up until now

implement this in C code this

left-the-line pyramid if you will let me


go over here and let me create a file

called how about

um

iteration dot C and in this file I'm

going to go ahead and include cs50.h and

I'm going to include standard in

standardio.h and the goal at hand is to

implement in C A little program that

just prints out this and exactly this

pyramid so no get string or any of that

we're just going to keep it simple and

print exactly this pyramid of height 4

here so how might I do this well let me

go ahead and in main let me first ask

the user for

um well we'll go ahead and generalize it

let's go ahead and ask the user for

Heights we're using get int as before

and I'll store that in a variable called

height and then let me go ahead and

simply call a function draw passing in

that height so for the moment let me

assume that someone somewhere has

implemented a draw function and this

then is the entirety of my program all

right unfortunately C does not come with

a draw function so let me go ahead and

invent one it doesn't need to return a

value it just needs to print something

so-called side effect so I'm going to


define a function called Draw that takes

as input and int I'll call it n for

number but I could call it anything I

want and inside of this I'm going to go

ahead and print out a left aligned

pyramid like this from top to bottom the

Salient features here are that this is a

pyramid at least in this example of

height four and on height four the first

row has one brick the second row has two

the third has three the fourth has four

that's a nice pattern that I can

probably represent in code so how might

I do this well how about for INT I gets

let me do it the old school way one and

then I is less than uh or equal to n and

then I plus plus so I'm going from one

to four just to keep myself saying here

and then inside of this Loop what do I

want to do well let me let me keep it

conventional in fact let me just change

this to be the more conventional 0 to n

even though it might not be as intuitive

because now on row 0 I want one brick on

Row one I want two bricks dot dot dot on

Row three I want four so it's kind of

offset now but I'm being more

conventional so on each row how many

bricks do I want to print well I think I


want to do this for INT J for instance

commonly common to use J after I if you

have a nested Loop let's start J at zero

and do this so long as J is less than

I Plus 1. and then do J plus plus so y i

plus one well again when I equals zero

that's the first row and I want one

brick when I equals one that's the

second row I want two bricks and dot dot

when I is 3 I want four bricks so again

I have to add one to I to get the total

number of bricks that I want to print on

the screen so inside of this nested for

Loop I'm going to do printf of a hash

with no line line new line I'm going to

save the new line for about here instead

all right the last thing I'm going to do

is copy and paste the Prototype at the

top of the file so that I can call this

and again this is sort of now week one

week two wouldn't necessarily come to

your mind as quickly as it might to mine

after all this practice but this is

something reminiscent of what you

yourselves did already for Mario

printing out a pyramid that hopefully in

a moment is going to look like this so

let me go back to my code let me run

make iteration

and let me dot slash iteration I'll type


in four and voila seems to be correct

and let's assume it's going to work for

other inputs as well oh thank you this

is

so this is indeed an example of

iteration doing something again and

again and it's very procedural like I

literally have a function called Draw

that does this thing but I can think

about implementing draw in a somewhat

different way that's kind of clever and

it's not strictly necessary for this

problem because this problem honestly is

not that complicated to solve once you

have practice under your belt certainly

the first time around probably

significantly challenging but now that

you kind of associate okay Row one with

one brick row two with two bricks it

kind of comes together with these for

Loops but how else could we think about

this problem well this physical

structure these bricks in some sense is

a recursive structure a structure that's

defined in terms of itself now what do I

mean by that well if I were to ask you

the question what is what does a pyramid

of height 4 look like you would point of

course to this picture but you could


also kind of

um

you know cleverly say to me well it's

actually a pyramid of height three plus

one additional row and here's that

cyclical argument right kind of

obnoxious to do typically in English or

in a spoken language because you're

defining one thing in terms of itself

what's a pyramid of height four well

it's a pyramid of whoops it's a pyramid

of height three plus one more row but we

can kind of Leverage this logic and code

well what's a pyramid of height three

well it's a pyramid of high two plus one

more row fine what's a pyramid of height

two well it's a pyramid of height one

plus one more row and then hopefully

this process ends and it does because

notice the pyramid is getting smaller

and smaller so you're not going to sort

of have this sort of silly back and

forth with me infinitely many times

because when we finally get to the base

case the end of the pyramid fine what is

a pyramid of height one well it's a

pyramid of no height plus one more row

and at that point things just get

Negative no pun intended things just

would otherwise go negative and so you


can just kind of stop the base case is

when there is no more pyramid so there's

a way to sort of draw a line in the sand

and say stop no more arguments but this

idea of defining a physical structure in

terms of itself or code in terms of

itself actually lets us do some

interesting new algorithms let me go

back to my code here let me go ahead and

create one final file here called

recursion.c that leverages this idea of

this built-in

self-referential nature let me include

cs50.h let me go ahead and include

standard io.h int main void and then

inside of main I'm going to do the exact

same thing int height equals get int

asking the user for height and then I'm

going to go ahead and call draw passing

in height so that's going to stay the

same I even am going to make my

prototype the same void draw int and

semicolon and now I'm going to implement

void down here with that same prototype

of course but the code now is going to

be a little different

what am I going to do here well first of

all if you ask me to draw a pyramid of

height n I'm going to be kind of a you


know wise ass here and say well just

draw a pyramid of n minus 1. done all

right but there's still a little more

work to be done what happens after I

print or draw a pyramid of height n

minus one according to our structural

definition a moment ago

what remains after drawing a pyramid of

height n minus one or three specifically

we need one more row of hashes okay so I

can do that right I'm okay with the

single Loops there's no nesting

necessary here I'm just going to do this

four int I get zero I is less than n

which is the height that's passed in I

plus plus and then inside of this Loop

I'm very simply going to print out a

single hash and then down here I'm going

to print out a new line at the very end

so that's good right I might not be as

comfortable with nested Loops this is

nice and simple what does this Loop do

here on line 17 through 20 it literally

prints n hashes by counting from I

equals zero on up two but not through n

so that's sort of week one style syntax

but this is kind of trippy now because

I've somehow boiled down the

implementation of draw into printing a

row after just drawing the thing above


it but this is problematic as is because

in this case my draw function notice is

always going to call the draw function

forever in some sense but ideally when

do I want this cyclical process to stop

when do I want to not call draw anymore

yeah

when n is one right when I get to the

top of the pyramid when n is one or heck

when the pyramid is all gone and n

equals zero I can pick any Line in the

Sand so long as it's sort of at the end

of the process then I don't want to call

draw anymore so maybe what I should do

is this if

if n equals equals zero there's really

nothing to draw so I'm just going to go

ahead and return

like this otherwise I'm going to go

ahead and Draw

N minus 1 rows and then one more row and

I could express this differently I could

do something like this which would be

equivalent I could say something like if

n is greater than or equal to zero then

go ahead and draw the row but I like it

this way first for now I'm going to go

with the original way just to ask a

simple question and then just bail out


of the function if n equals 0 and heck

just to be super safe just in case the

user types in a negative number let me

also just check if N is a negative

number also just return immediately

don't do anything I'm not returning a

value because again the function is void

it doesn't need or have a return value

so just saying return suffices but if n

equals one or two or three or anything

higher it is reasonable to draw a

pyramid of slightly shorter height like

instead of four three and then go ahead

and print one more row so this is an

example now of code that calls itself

within itself draw is calling draw but

this so-called base case ensures this

conditional ensures that we're not going

to do this forever otherwise we

literally would do this infinitely many

times and something bad is probably

going to happen

all right let me go ahead and compile

this code make recursion

okay no syntax errors dot slash

recursion enter height of 4 and

voila if only because some of you have

run into this issue accidentally already

let me get rid of the base case here and

let me recompile the code make recursion


oh and actually now it's actually

catching it so the compiler is smart

enough here to realize that

all passed through this function will

call itself AKA it's going to Loop

forever so let me do the first thing

suppose I only check for n equaling 0.

let me go ahead and recompile this code

with make recursion

and now let me just be kind of

uncooperative when I run this program

still works for four still works for

zero what if I do like negative 100

have any of you experienced a

segmentation fault or core dump okay so

no shame in this like this means I have

somehow touched memory that I shouldn't

have and in short I actually called this

function thousands of times accidentally

it would seem now until the program just

bailed on me because I eventually

touched memory in the computer that I

shouldn't have that'll make even more

sense next week but for now it's simply

a bug and I can avoid that bug in this

context probably not your own pset

context by just making sure we don't

even allow for negative numbers at all

so with this building block in place


what can we now do in terms of those

same numbers to sort well it turns out

there's a sorting algorithm called merge

sort and there's Bunches of others too

but merge sort is a nice one to discuss

because it fundamentally I we hope is

going to do better than selection sword

and bubble sort that is better than N

squared

but the cats is it's a little harder to

think about in fact I'll act it out

myself with just these numbers on the

Shelf here rather than humans because

recursion in general takes a little bit

of effort to wrap your mind around

typically a bit of practice but I'll see

if we can't walk through it methodically

enough such that this comes to light so

here's the pseudo code I propose for

this algorithm called merge sort in the

spirit of recursion this sorting

algorithm literally calls itself by

using the verb sort in its pseudo code

so how does merge sort work it sort of

obnoxiously says well if you want to

sort all of these things go sort the

left half then go sort the right half

and then merge the two together now

obnoxious in what sense well if I just

asked you to sort something and you just


tell me we'll go sort that thing and

then go sort that thing what was the

point of asking you in the first place

but the key is that each of these lines

is sorting a smaller piece of the

problem So eventually we'll be able to

Pare this down into something that

doesn't go on forever because in fact in

merge sort there's a base case two

there's a scenario where we just check

wait a minute if there's only one number

to sort that's it quit then because

you're all done so there has to be this

base case in any use of recursion to

make sure that you don't mindlessly call

yourself forever you've got to stop at

some point

so let's focus on the third of these

steps what does it mean to merge two

lists uh two halves on the list just

because this is apparently going to be a

key ingredient so here for instance are

two halves of a list of size eight we

have the numbers two and I'll call it

out if you're at a bad angle two four

five seven and zero one three six notice

that the left half at the moment two

four five seven is already sorted and

the right half zero one three six is


also sorted as well so that's a good

thing because it means that

theoretically I've sorted the left half

already I've sorted the right half

already before we began I just need to

merge these two halves what does it mean

to sort two halves well for the sake of

discussion I'm just going to turn over

most of the numbers except for the first

numbers in each of these halves there's

two halves here left and right at the

moment I'm only going to consider the

leftmost element of each half that is

the one on the left here and the one on

the left here how do I merge these two

lists together well if I look at two and

I look at zero which one should

presumably come first the smaller one so

I'm going to grab the zero and I'm going

to put it into its own place on this new

shelf here and now I'm going to consider

as part of my

iteration

the beginning of this list and the new

beginning of this list so I'm now

comparing two and one which one's

smaller I'm going to go ahead and grab

the one now I'm going to compare the

beginning of the left list and the new

beginning of the right list two and


three of course it's two

now I'm going to compare the beginning

of the left list and the beginning of

the right list four and three it's of

course three

[Music]

now I'm going to compare the four

against the beginning and end it turns

out of the second list four of course

now I'm going to compare the beginning

of the less list and the beginning of

the right list five of course

I'm realizing this does not gonna this

is not gonna end well because I left too

much distance between the numbers but

that has nothing to do with the

algorithm 7 is the beginning of the left

list six is the beginning of the right

list it's of course six and at the risk

of knocking all of these over

if I now make room for this element

we have hopefully

sorted the whole thing by having merged

together the two halves of the list so

in short thank you

I'm a little worried that's just getting

sarcastic now but now we have

we now have merged two have lists we

haven't done the guts of the pro of the


algorithm yet sort the left half and

sort the right half but I claim that

that is how mechanically you merge two

sworded halves you keep looking at the

beginning of each list and you just kind

of weave them together based on which

one belongs first based on its size so

if you agree that that was a reasonable

way to merge two lists together let's go

ahead and focus lastly on what it means

to actually sort the left half and sort

the right half of a whole bunch of

numbers and for this I'm going to go

ahead and order them in this seemingly

random order and I just have a little

cheat sheet above so that I don't mess

up and I'm going to start at the very

top this time and hopefully these will

not fall down at any point but I'm just

deliberately putting them in this random

order

5274 and then we have one six three zero

one six

3-0 hopefully this won't fall over

here is now a array of size eight with

eight integers and I want to sort this I

could use selection sort and just go

back and forth and back and forth I

could use bubble sort and just compare

pairs pairs pairs but those are going to


be on the order of Big O of N squared my

hope is to do fundamentally better here

so let's see if we can do better all

right so let me look now at my code I'll

keep it on the screen

how do I Implement merge sort well if

there's only one number I quit there's

obviously not there's eight numbers so

that's not applicable I'm going to go

ahead and sort the left half of numbers

all right here's the left half five two

seven four how do I sort an array of

size four

well here's where we the recursion kicks

in how do you sort a list of size four

well there's the pseudo code on the

board I sort the left half of the of the

list of size four so here we go

I have a list of size four how do I sort

it I sort the left half all right now I

have a list of size two how do I sort

this

well sort the left half so here we go

here's a list of size one how do I sort

this

oh I think it's done right that's quit

right if only one number I'm done the

five is sorted all right what was the

next step you have to now rewind in time


I just sorted the left half of the left

half of the left half what do I now sort

the right half

which is two this is one Element so I'm

done so now at this point in the story I

have sorted sort of idiotically the five

is sorted and the two is sorted but

what's the third and final step of this

phase of the algorithm

merge the two together so here's the

left here's the right list how do I

merge these together I compare the lists

and I put the two there I only have the

file left and I do that so now we see

some visible progress but again let's

rewind how did we get here we started to

sort the left half of the left half of

the left half then the right half and

now where are we we've just sorted the

left half of the left half

so what comes after sorting the left

half of anything

right half all right here's the sort of

same nonsensical thing here's a list of

size two let's sort the left half done

let's sort the right half done what's

the third step merge them together so

that's the four and that's the seven

what have I now done in total I've now

sorted the left half


of the original thing so what happens

next

wait a minute wait a minute I have not

done that what have I done I have sorted

the left half of the left half and I've

sorted the right half of the left half

what do I now need to do lastly merge

those two lists together so again I put

my finger on the beginning of this list

the beginning of this list and if you

want I'll do the same thing when I

merged last time to be clear what I'm

comparing two and four the two obviously

comes first what comes next

well the four comes next what comes next

the five comes next and then lastly of

course the seven notice that the two

four five seven are now sorted so the

original left half is sorted and I'll do

the rest a little faster because my God

this feels like it takes forever but I

bet we're on to something here what step

remains next I've just sorted the left

half of the original sort the right half

of the original how do I sort this I

sort the left half of the right half how

do I sort this I sort the left half of

the left half done I sort the right half

of the left half done now I merge the


two together the one comes first the six

comes next now I sort the right half of

the right half

what do I do sort the left half done

sort the right half done what do I do

merge them together so that's the third

step of that phase now where are we in

the store oh my God where are we in the

story we have sorted the raw left half

of the right half and the right half of

the right half what comes next merge so

I'm going to compare and I'm going to

move those down just to make clear what

I'm comparing the beginning of both sub

lists what comes first of course the

zero what comes next

what comes next uh the one

what comes next the three

and then lastly comes the six all right

where are we in the story we've now

sorted the left half of the original and

the right half of the original what step

remains

merge all right so I'm gonna make the

same point and this is actually

literally what we did earlier because I

deliberately demoed those original

numbers in this order two and a zero

this comes out first what comes next two

and one the one comes out next what


comes next the two comes next what comes

next the three comes next what comes

next the

four

what comes after that the five

what comes after that the six and lastly

this is when we run out of memory the

seven over there is actually in place

okay

okay so admittedly a little harder to

explain and honestly gets a little

trippy because it's so easy to forget

about like where you are in the story

because we're constantly like diving

into the algorithm and then backing back

out of it but in code we could express

this pretty correctly and it turns out

pretty efficiently because what I was

doing even though it's longer when I do

it verbally I was touching these

elements a minimal amount of times right

I wasn't going back and forth back and

forth in front of the Shelf again and

again I was deliberately only ever

merging the smallest elements in each

list so every time we merged even though

I was doing it quickly my fingers were

only touching each of the elements once

and how many times did we


divide divide divide in half the list

well we started with all of the elements

here and there were eight of them and

then we moved them one two three

positions so the height of this

visualization if you will is actually

log n right if I started with eight

turns out if you do the arithmetic this

is log n height

because 2 to the 3 is 8 but for now just

trust that this is a log n height and

how wide is the Shelf well it's a with n

because there's n elements anytime they

were on the Shelf so technically I was

kind of cheating this algorithm because

this is the first time I've needed

shelves right with the human examples we

just have the humans and that's it and

only eight of them here I was sort of

using more and more memory in fact I was

using like four times as much memory

even though that was just for

visualization's sake merge sort actually

requires that you have some spare space

an empty array to move the elements into

when you're merging them together but if

I really wanted and if I didn't have

this shelf or this shelf honestly I

could have just gone back and forth

between the two shelves that would have


been sufficient so merge sort uses more

memory for this merging process but the

advantage of using more memory is that

the total running time if you can

perhaps infer from that math is what the

Big O notation for merge sort it turns

out is actually going to be n Times log

n and even if you're a little rusty

still on your logarithms we saw in week

zero and again today that log n is

smaller than n right that's a good thing

any binary search was log n that's

faster than linear search which was n so

n Times log n is of course smaller than

n times n or N squared so it's sort of

lower on this little cheat sheet that

I've been drawing which is to suggest

that its running time is indeed better

or faster and in fact if we consider the

best case running time turns out it's

not quite as good as bubble sore with

Omega of n where you can just sort of

abort if you realize wait a minute I've

done no work merge sort you actually

have to do that work to get to the

Finish Line anyway so it's actually in

Omega and ultimately Theta of n log n as

well so again a trade-off there because

if you happen to have a data set that is


very often sorted honestly you might

want to stick with bubble sort but in

the general case where the data is

unsorted n log n is sounding better than

N squared well what does it actually

look or feel like give me a moment to

just change over to our visualization

here and we'll see with this example

what merge sort looks like depicted

within now these vertical bars so same

algorithm but instead of my numbers on

shelves here is a random array of

numbers being sorted

and you can see it being done half at a

time and you see sort of remnants of

algorithm of the previous bars

actually that was unfair let me zoom out

here

let me zoom out so you can actually see

the height here

let me go ahead and randomize this again

and run merge sort there we go now you

can see the second array

and where the values are going

temporarily

and even though this one looks way more

cryptic visualization wise it does seem

to be moving faster and it seems to be

merging halves together and boom it's

done let's actually see in conclusion


what these algorithms compare to and

consider that moving forward as we write

more and more code the goal is again not

just to be correct but to be well

designed and one measure of design is

going to indeed be efficiency so here we

have in final a visualization of three

algorithms selection sort bubble sort

and merge sort from top to bottom and

let's see what these algorithms might

look or sound like here if we can dim

the lights for dramatic effect

selections on top bubble on bottom merge

in the middle

[Applause]

[Music]

foreign

[Music]

[Music]

foreign

[Music]

[Music]

[Music]

thank you

[Music]

thank you

[Music]

well this is cs50 and already this is

week four and recall that last week week


three we began to explore like the

inside of a computer's memory a bit more

we talked about arrays which were just

chunks of memory back to back to back

that really laid things out left to

right top to bottom and this is actually

a pretty common Paradigm even if you're

new to programming and certainly new to

C you've kind of seen this approach of

just using uh memory in some way to lay

things out like images for instance so

for instance here is a photo taken of

last week's front front row for instance

and this is kind of an opportunity to

explore exactly what happens if we start

to zoom in and zoom in and zoom in

because it seems like most any TV show

like CSI whatever or any movie that

explores forensic information might have

the investigators sort of zoom in on an

image like this to see what the glint in

someone's eye is because that reveals

like the license plate number of someone

that just drove past right something

that's a little over the top there but

there's an opportunity here to speak to

why that is so unrealistic for instance

let's zoom on this puppet here's eye and

let's zoom in a little more to see what

might be reflected let's zoom in a


little more and that's it there's only

finite amount of information if you have

an image represented in this way using

pixels these dots on the screen is rows

and columns because if you're only using

a finite amount of memory then at the

end of the day you can only store a

finite amount of information and at

least I don't really see in this grid

here any glint of a license plate or

something like that that you might

otherwise see in in Hollywood so today

we'll explore sort of these kinds of

representations of how you might use

memory in new and interesting ways to

represent now very familiar things but

also start to explore what some of the

limitations are of this representation

but consider after all that this doesn't

need to be even as high resolution as

many pixels as something like this other

image you can imagine just doing

something silly with Post-it notes like

this and if you think of an image it's

just having rows and columns these rows

otherwise known as scan lines something

looks floor in the coming week you could

make this sort of fun smiley face by

just using two different values maybe a


zero and a one or yellow in purple or

vice versa just to kind of make

something come to life now in practice

recall we talked about storing not just

a zero or one but maybe an r a g and a b

value like 24 bits or three uh bytes in

total but we'll come back to that that

would just be a more involved image but

if for fun if today you sort of want to

um

tackle something passively in the

background if you go to uh this URL here

we've put together an opportunity to do

a sort of a bit of pixel art if you go

to this URL here that'll redirect you to

a Google spreadsheet if you have a

laptop with you today that'll look a

little something like this which we've

sort of organized in rows and columns so

if you'd like to go ahead and use Google

spreadsheets colorization fill feature

to color in those individual squares if

you'd like to see if you can't make

something a little creative and then

email it to Carter and we'll exhibit

some of the the best or favorites on the

website thereafter so let's transition

then to something a little more familiar

images and not all of you have used

presumably Photoshop but you're probably


generally familiar with Photoshop as a

program for editing and creating images

or photos or the like and here is a

screenshot of photoshop's Color Picker

via which you can change like what color

you're going to draw with the paintbrush

or what color you're going to fill in

with the the paint buckets it's

representative of any kind of graphical

tool and there's kind of a lot of

information in here but there's perhaps

some familiar terms now r g and B in

fact right now this is a photoshop's way

of saying you're about to fill in your

background or foreground with the color

black and that appears to be represented

with an r a g and a b value of zero zero

zero or alternatively using a hash

symbol and then zero zero zero zero zero

zero and if some of you have already

made web pages before and you know a

little bit of HTML and CSS you probably

are familiar with this kind of syntax

like a hash symbol and then six or

sometimes three digits thereafter and if

we look at a few different colors here

for instance here might be the

representation of white now the r the G

and the B values went way up from 0 to


255 255 255 or alternatively it looks

like Photoshop and in turn web browsers

could represent that same color white

with ffff and let's just do a few others

here is red and it turns out that red is

a whole lot of red 255 but no green no

blue or AKA

ff000 so there's perhaps a pattern here

emerging here is green 0 255 0

aka00ff00 or lastly here blue which is

no red no green but apparently a lot of

blue 255 again AKA zero zero zero zero

FF now some of you again might have seen

these this notation before these zeros

in these F's and all of the numbers and

letters in between but this is another

form of notation and in fact we'll

explore this today really is just a

precondition for talking about some

other Concepts but the idea is

ultimately are really no different what

we're about to see is a different base

system not just binary not just decimal

but something we're about to call

hexadecimal but first recall that with

RGB we previously did that we did the

following any RGB value red green blue

just combines some amount of red or

green or blue so here we have 72 73 33

which in the context of an email or a


text of course said

what a couple weeks back

just high with an exclamation point but

in the context of a Photoshop like

program this might instead be

representing collectively this shade of

yellow for instance when you combine

that much red that much green that much

blue so here is the same idea if you've

got a lot of red no green no blue

together that's going to give us red if

you've got no red a lot of green no blue

that's going to give us of course green

if you've got no red no green a lot of

blue that of course is going to give us

blue so there's a pattern emerging here

where apparently 0 0 is none as always

and FF is apparently a lot and it's

maybe somehow equated with 255 at least

per that photoshop screenshot meanwhile

if we combine one last one a lot of red

a lot of green a lot of blue that's

actually going to give us a single white

pixel like this all right so think back

here was binary in the world of binary

you had just two digits zero and one

could have been anything else A or B X

or Y but the world standardized on these

uh uh numerals zero and one and our


world's Decimal System of course you

have zero through nine as of today

though we're going to start using

hexadecimal sometimes in the context of

images and also files just because it's

a convention there's some conveniences

to it where now you're going to be able

to count up to f in a notation called

hexadecimal from zero through nine then

you keep going to A to B to C to D to e

to F the idea being each of these even

though it's weirdly a letter of the

English alphabet it's still just a

single symbol it's not 1 0 for 10 or 1 1

for 11 all 16 of these values these

digits so to speak are indeed still just

single symbols and that's a

characteristic of just using this other

notational system so how do we get from

zero zero uh and FF to something like

zero and 255 respectively well this

hexadecimal system AKA base 16 just does

the math from week zero and really grade

school a little bit differently for

instance if you have a number that's got

two digits or hexadecimal digits as of

today the columns are just a little

different instead of powers of two or

powers of 10 which we solve for binary

and decimal respectively its powers of


16. so if we just kind of do the math

out that's the ones column this is the

16s column and so forth things get

actually pretty big pretty quickly and

this system but now let's just consider

how we would represent familiar numbers

if you've got two hexadecimal digits for

which these hashes are just placeholders

0 0 is going to mathematically equal the

decimal number you and I know of course

as zero why same thing as week zero

sixteen times zero plus one times zero

is the number you and I know is zero and

we can count up from here this in

hexadecimal would be how a computer

represents the number we know is one it

would be zero one in this case this

would be two three four five six seven

eight nine in decimal we're about to go

to ten but in hexadecimal to be clear

what comes next

so apparently a so 0 a 0b which is now

10 or 11 or 12 13 14 15. so using

hexadecimal is just kind of an

interesting way of using single symbols

now zero through F to count from 0

through 15 and we'll see why it's 15 in

a moment but as soon as we get to F

anyone want to conjuncture how in


hexadecimal AKA hex do we now count up

one position higher what comes after

zero f in hexadecimal

so one zero it's the same kind of thing

like once you're at the highest digit

possible F or in our decimal world it

would have been nine you sort of add one

more and nine wraps around to zero or in

this case f reps around to zero you

carry the one and voila now we're

representing the number U and I know is

16 and we should keep going forever

literally this could be 17 18 19 20 in

decimal but let's just wave our hands at

it and count as high as we can dot dot

dot the highest we could count in

hexadecimal with two digits just

logically would be what in hexadecimal

something something

FF I heard so yes that's the biggest

digit possible so FF is what we have so

how high can you count in hexadecimal if

you've got just two of these digits well

it's the same math as always 16 times F

AKA 15. so that's 16 times 15 plus 1

times F or 1 times 15. that gives us 240

plus 15 in decimal the result of which

of course now is 255.

so this hexadecimal system you may have

seen in the world of web pages and if


you haven't we'll get to that in this

class in a few weeks or we just saw in

the context of Photoshop just has this

sort of shorthand notation of counting

as high as 255 but just calling it FF

now it's marginal but that's like 50

Savings of how many digits you need in

order to count as high as 255 because in

decimal of course 255 is three digits in

hexadecimal you can count as high using

just two and that difference is going to

get magnified the bigger our numbers get

let me stipulate for now you're going to

get more and more Savings in terms of

just how many symbols you need on the

screen to represent

bigger and bigger numbers than that all

right let me pause here just to see if

there's any questions thus far on what

we've called hexadecimal which again

just gives us zero through nine as well

as a

through f

any questions or confusion

and if it feels like we're lingering a

bit much on arithmetic we're not really

going to see other notations besides

this moving forward these are sort of

the go-to three in a programmer's world


typically

but there are some others yeah

good question does hexadecimal require

more storage or less storage than a

decimal system theoretically no because

this is just a way of representing

information and we'll see in a concrete

example in a moment but inside of the

computer at the end of the day you're

still storing bits and using

hexadecimals not using more or fewer

bits think of this as how you might

write it down on a piece of paper just

how many digits you're going to write or

on a computer screen how many digits

you're going to see at once but it

doesn't change how many how the

computers representing information

because all they're representing at the

end of the day is zeros and ones so in

fact let's go there if this a moment ago

FF I claim was 255 let's just rewind to

week zero and if we wanted to count to

255 in binary that's as high as you can

count recall with eight bits and there's

only a few of these numbers that are

useful to memorize like 255 is as high

as you can count with eight bits if you

start at zero because two to the eighth

is 256 but if you start at zero it's


zero through 255. so in binary recall if

you have eight bits all of which were

ones and I won't do out the math

pedantically here but if I do do this

plus this plus this dot dot that's also

going to give me 255. so this is what's

interesting here about hexadecimal it

turns out that in upside of storing

values in hexadecimal is that we're

going to see the first F sort of

represents the left half of all these

bits and the second F in this case

represents the right most four of these

bits so it turns out hexadecimal is very

useful when you want to treat data in

units of four it's not quite eight but

units of four and that's not bad which

is why if you use two digits like I have

thus far zero zero or FF or anything in

between that's actually a convenient way

of representing eight bits in total one

hexade for the first four bits one hex

digit for the second and again there's

nothing new intellectually here per se

it's just a different way of

representing the same story as before

zeros and ones so in what context do we

see this well we talked about memory

last week we're going to talk more about


it this week if this is my computer's

Ram Random Access Memory you can again

think of each byte as having a number

associated with it like it's a dress or

location this might be zero this might

be like 2 billion and so in the past

I've described these as just this using

decimal numbers here's byte zero one two

three four five six seven fifteen

sixteen would be here and so forth but

it turns out in the world of memory and

thus today programming people tend to

count memory bytes using hexadecimal

partly just by convention but also

partly because it's a little more

succinct and again each digit represents

four bits typically so what comes after

F here well if I think about the

computer's memory I normally might do

after F which is 15 16 but instead one

zero one one one two one three this is

not 10 11 12 13 because I claim I'm in

the context of hexadecimal now as per

the previous slide we already started

going into A's through F's so you

immediately see here a possible problem

like why is this now worrisome if all of

a sudden you're seeing seemingly

familiar numbers like ones 10 11 12 13.

we didn't really stumble off across this


problem when it was like all zeros and

ones before yeah

yeah so if you're writing some code in C

that's doing some math you might

accidentally or the computer might

accidentally confuse hexadecimal with

decimal if they look in some context the

same I mean any number on the board that

doesn't have a letter is Ambiguously

hexadecimal or decimal at this point and

so how might we resolve this well it

turns out that what computers typically

do is this by convention anytime you see

visually 0x and then a number that's a

human Convention of saying signaling to

the reader that this is in fact a

hexadecimal number so if it's 0x 1 0

that is not the number 10 that is the

hexadecimal number one zero which recall

we said earlier is how you count up to

16 and again and again these are not the

kinds of things to memorize it's really

just the system for how you think about

these things so hence for today we're

going to start seeing hexadecimal in a

bunch of contexts when you write code

you might even write code using some

hexadecimal but again it's just a

different way of representing numbers


and humans have different conventions

for different contexts

all right so with that said any

questions now on this building block but

here on out we'll start using it in some

actual code

any questions nothing so far all right

so let's go ahead and consider maybe a

familiar example something where

involving code where I initialize a

variable like n to a value like 50 in

this case and then let's start to Tinker

around with what's going on inside of

the computer's memory in a moment I'm

going to load up vs code on my computer

and I'm going to go ahead and whip up a

program that very simply assigns a value

like the number 50 to a variable called

n but today keep in mind that that

variable n and that value 50 is going to

be stored somewhere in my computer's

memory and it turns out today we'll

introduce a bit more syntax you can

actually see where things are being

stored so let me click over to BS code

here I'm going to create a program

called address dot C just to explore

computers addresses today and I'm going

to do an include standard io.h int main

void as usual no command line Arguments


for now I'm going to declare that

variable n equals 50. and then I'm just

going to go ahead and print it out so

nothing very interesting but I'll use

percent I backslash n and then comma n

to print out that value nothing here

should be very interesting to compile or

run but I'll do it just to make sure I

didn't make any mistakes looks like as

expected it simply prints out the number

zero 50 like this but let's consider

then what this code is doing underneath

the hood when it's actually run on your

machine so here we have that grid of

memory that variable n is an INT and if

you think back how many bytes typically

do we use for an INT

yeah

four so four bytes or 32 bits so if each

of these squares represents one byte

then my computer Somewhere In My Memory

or Ram is using four of these squares

maybe it ends up over here just because

there's other stuff being used Elsewhere

for instance though I don't really know

and frankly I don't really care where it

ends up just that it ends up somewhere

so the variable the value 50 is stored

here in a variable called n even though


I've written it as decimal just like in

my code let me again remind that this is

32 zeros and ones representing that 50

it's just going to be very tedious if we

start writing everything in binary so

I'll use the more comfortable human

Decimal System so that's what's going on

inside of the computer's memory so what

if I actually wanted to start tinkering

with its location or maybe just knowing

its location well this variable n indeed

has a name n that's a label of sorts for

it but at the end of the day that 50 is

technically at a specific address and

I'm going to make one up

0x123 and it's one two three because I

really don't care what it is just want

an address for the sake of discussion so

way over here off screen might be byte

zero way down here is byte Ox one two

three it's in hexadecimal notation just

by convention

so how can I actually see where my

variables are ending up in memory if I'm

curious to do so well let me go back to

my code here and let me actually change

this just a little bit let me go ahead

and introduce for instance another

symbol here and another topic altogether

namely pointers
so a pointer is a variable that stores

the address of some value the location

of some value or more specifically the

specific byte in which that value is

stored so again if you think of your

memory as being a whole bunch of bytes

zero at top left two billion or whatever

at bottom right depending on how much

RAM you have each of those things has a

location in our address a pointer is

just a variable storing one such address

so it turns out that in the world of C

there's a couple of new symbols we can

use if we want to see what it is we're

talking about here and those two

operators as of today are these you can

use the Ampersand operator in C in a

couple of ways we already saw it very

briefly to do Ampersand Ampersand to

kind of and to con to Boolean

Expressions together in the context of a

conditional this is different a single

Ampersand is the address of operator so

literally in your code if you've got a

variable like n or anything else and you

write Ampersand n c is going to figure

out for you what is the address of that

variable n in the computer's memory and

it's going to give you a number a a


number the otherwise known as the

address of that

if you want to store that address in a

variable even though yes it's a number

like ox123 you have to tell C in advance

that you want to store not an INT per se

but the address of an INT and the Syntax

for doing that somewhat not obviously is

to use an asterisk here a star operator

and you say this when creating a

variable if you want P to be a pointer

that is the address of some other

variable you do int star p and the star

just tells the computer this is not an

integer per se this is the address of

something that yes is an INT but we're

just being more precise so on the right

hand side you have the address of

operator as always with the equal sign

you copy from right to left because

Ampersand n is by definition the address

of something you have to store it in a

pointer and the way to declare a pointer

is to specify the type of value whose

address you're storing and then use the

star to indicate that this isn't a

pointer and not just a regular old int

so let's see this in practice let me go

back to my own source code here and let

me make just a couple of tweaks I'm


going to leave n alone here but I'm

going to go ahead and initially

just do this uh let me say int star P

equals Ampersand n and then down here

I'm going to print out not end this time

but P the variable p and then even

though yes it's just a number and

therefore I kind of sort of could use

percent I for integers there's actually

a special format code in printf for

printing pointers or addresses and

that's percent p

so now let's go ahead and recompile this

make address so far so good dot slash

address enter and a little weirdly but

perhaps understandably now the address

in my computer's memory at which the

variable n happened to be stored was not

quite as simple as ox123 this computer

has a lot more memory so technically it

was stored at Ox

7ffcb4578e5c now that has no special

significance to me it could have ended

up somewhere else altogether but this is

just where in my computer or technically

the Cloud Server to which I'm connected

using vs code here that just happens to

be where n ended up and strictly

speaking I don't even need to introduce


this variable I could get rid of p and I

could just say print not just n but the

address of N and achieve the same thing

you don't need to temporarily store it

in a variable let me just do make

address again dot slash address and now

I see this address here and notice if I

keep running the program it's actually

moving around there's other stuff

presumably going on inside of the

computer maybe it's actually randomizing

it so it's not always at the same

location that can actually be a security

feature underneath the hood but this

happens to be at that moment in time

where that value is in memory quite like

our picture a moment ago all right so

let me pause here to see if there's now

any questions on what we just did yeah

really good question is there any way to

control where something is in memory

short answer is yes and this is both the

power and the danger of c and we're

going to do this today and make a few

deliberate mistakes because with this

power of going to or getting the address

of any variable I could just arbitrarily

right now write code that stores a value

like byte 2 billion or zero or anything

in between but that also means


potentially I could start kind of

creepily looking around at all of the

computer's memory even at things that I

didn't put there maybe other programs

maybe other parts of programs and indeed

this is a potential security threat if

suddenly you're able to just look

anywhere you want in the computer's

memory now I'm overselling it a little

bit because nowadays in this decade

there are some defenses in place in

compilers and in our operating systems

that do hedge against this a little bit

but this is still a very frequent source

of problems and later today we'll talk

briefly about of things called stack

Overflow which is not just a website it

is a problem that you can encounter Heap

overflow and more generally buffer

overflows there's just so many things

that can go wrong using this language

called C and if any of you have

encountered a segmentation fault yet I

think we saw a few hands for that

already you touched memory that you

shouldn't have and odds are you did it

most recently by going too far in an

array going to the left or negative in

an array or somehow looking at memory


you shouldn't have and we'll explain

today why it is you were able to do that

other questions on these Primitives so

far yeah from Carter

good question earlier we used star P let

me rewind in time to the previous

version of this code where I actually

had a variable called P just like with

variable declarations in the past once

you've declared a variable to be an INT

a Char a bull or an in Star AKA a

pointer you don't thereafter keep using

the word int or now the star once you've

declared it that's it you only refer to

it by name and so it's very deliberate

what I did here name

saying that the type here is in star

that is a pointer to an INT but here I

just said the name of the variable as

always I didn't repeat int and I also

didn't repeat star but at the risk of

kind of bending one's Minds a little bit

there is unfortunately one other use for

the star operator and that's as follows

if you want to print out not the address

of something but what is at a specific

address you can actually do this if I

want to print out the integer via

percent I that is at that address I can

actually use the star here which


technically contradicts what I just said

but it has a different function here

different purpose so let me go ahead and

do this in two different ways I'm going

to leave this line of code AS is but I'm

going to add another line of code now

that prints out what apparently will be

an integer in a moment

so percent I backslash n and I could

cheat and let me just do n for now so

there's really nothing special happening

now I'm just adding a sort of mindless

printing of n so make address dot slash

address there's the current address

event and there's the value of n but

what's kind of cool about C here too is

if you know that a value is at a

specific address like P there's one

other use for this star operator the

asterisk you can use it as these are

called dereference operator which means

go to that address and so here what we

actually have is an example of a pointer

P which is an address like ox123 or ox7

FF and so forth but if you say star P

now you're not redeclaring the variable

because I didn't mention int you're

going to that address in P so let me

recompile this now make


um make a dress

dot slash address and just to be clear

what should I see I'm first going to see

the pointer itself Ox something what's

the second line of output I should

presumably see now

just add a little louder

so I'm hearing 50 and that's true

because if you figure out the address of

and print it in line seven but then go

to the address of n AKA P that's indeed

going to just show you the number n the

value of n again

or any questions now on this syntax and

I will concede I think this is confusing

the fact that we use the star for

multiplication the fact that we use the

star to declare a pointer but then we

use a star and a third way to

de-reference the pointer and go to the

pointer it's just confusing honestly but

with practice comes Comfort yeah

good question do you when you are using

the uh the Ampersand operator to get the

address of something the onus is on you

at the moment to know what you are

getting the address of is it a string is

it a Char is it a bull is it an INT I

wrote this code so I know in line six


that I'm trying to get the address of

what is an integer

in line eight you don't have to worry

about that good question notice in line

eight I didn't tell the computer other

than the percent I what kind of address

I'm going to but I did already in line

six I told the compiler that P now and

forever is going to be the address of an

INT that's enough information in advance

so that printf or really the language C

still knows on line 8 that P is a

pointer to an INT and that way it will

print out all four bytes at that address

not just part of it and not more than

those four bytes good question yeah next

to you

do pointers have pointers yes we won't

um sort of do this today by having

pointers to pointers but yes you can use

star star and then things get and I'm

sorry

we won't do that today and we won't do

that often in fact Python and other

languages just a couple of weeks away so

hang in there almost there a question

back here

was there

that was a good more verbal feedback


like that is helpful as we Forge it's a

more complicated stuff other questions

yeah

what's the point of printing the address

like

sure what's the point of doing this if

you don't mind let me let's get there in

a moment this is not the common use case

just printing out the address like who

really cares at the moment we care only

for the sake of discussion we're soon

going to start using these addresses so

hang in there just a little bit for that

one too but it will uh solve some

problems for us before long so let's

actually just now depict what was going

on inside of the computer's memory just

a moment ago so if I toggle back here

let me redraw my computer's memory let

me plop into the memory n which is

storing in this program the number 50

where is p in my computer's memory

specifically I don't know and apparently

it moves around each time I run the

program so for the sake of discussion

let's just propose that if 50 ended up

at address ox123 I don't know P ends up

over here at address whoops at whatever

address this is here but notice a couple

of Curiosities now if p is a pointer


it's the address of something so the

value in P should be an address and I've

indeed written it as such ox123 and

technically there's not an X there

there's not a zero there there's not

even a one two three there per se

there's a pattern of bits that

represents the address ox123 but again

that's week zero don't care about binary

day to day

so if this is p and this I claimed was n

why is p so much bigger can someone

conjecture here

because it turns out whether n is an INT

or a Char or a bull which are different

types heck even along

it turns out that P is always going to

take up eight squares on the board but

why might that be

What might explain that

thought

okay fair maybe it's allocating eight

bytes because it doesn't know the type

turns out that's okay because it

addresses an address it's really up to

the programmer to use it as a string or

a terrible other thoughts

so you know

okay possibly it could be that pointers


have some complexity like a backslash n

or something curious like that like we

talked about for Strings turns out

that's not the case it turns out that

pointers nowadays typically are but not

always are eight bytes AKA 64-bit

because you and I are Max RPS heck even

our phones have a lot more memory than

they did years ago back in the day a

pointer might have only been 32 bits or

even only eight bits way back in the day

if it's considered 32 bits because that

was the norm for some time how high can

you count roughly if you've got 32 bits

what's the number we keep rattling off

32 bits is roughly

2 to the 32 so it's 4 billion and I keep

saying it's 2 billion if you could do

negative but in the world of memory I

there's a reason I keep saying 2 billion

bytes two gigabytes because for a very

long time that was the maximum amount of

memory a computer could have why because

the pointers that the computers were

using were only for instance 32 bits and

with 32 bits depending on whether you

allow for negatives or not you can count

as high as 2 billion roughly or maybe 4

billion but you know what your Mac your

PC your phone could not have had five


gigabytes of memory or five billion

bytes of memory you certainly couldn't

have had what computers nowadays come

with which might be eight gigabytes of

memory 16 gigabytes of memory y because

with four bytes or 32 bits you literally

physically can't count that high which

means if I drew a picture of all of the

memory we would run out of numbers to

describe them which means like most of

my memory would just be unusable so

pointers nowadays are 64 bits or 8 bytes

that's really big I can't even pronounce

how big that number is but it's plenty

for the next many years and so we've

drawn it that way on the board here now

let's just kind of abstract this away

let's get rid of all the other bites

that are storing something or nothing

else and let's now start to abstract

away this complexity because the reality

is to your question earlier you know

what is this useful for or what do we do

we actually care about these addresses

generally no we're doing this so that

you see there's no magic we're just

moving things around and poking around

in memory but what a person would

typically do when talking about pointers


would literally be to just point at

something like I really don't care what

address and is at so it suffices when

General when drawing pictures on a

whiteboard having a discussion with

another programmer you just kind of draw

an arrow from the pointer to the value

in question because neither you nor I

probably care about the specifics of ox

whatever and there's your pointer it's

literally an arrow and we can kind of

see this so it turns out that these

pointers these addresses are not that

dissimilar to what we've done for

hundreds of years in the form of a

postal system for instance here's a post

office here no here is a mailbox and

suppose that this is a mailbox labeled P

it's a pointer and suppose there's

another mailbox like way over there

which is just another byte of my

computer's memory what are we really

talking about well you store in your

computer's memory values like the number

50 or the word like High inside of your

computer's memory at some location but

today we can also use those same memory

locations to store the address of things

for instance if I open this up here and

I see okay the value inside of this


mailbox is not a number like 50. it's

actually an address ox123 that's kind of

like a pointer sort of a breadcrumb

leading from one location in memory to

another and in fact with someone who's

seated roughly over there do you mind

getting the mail over there

any volunteers over in this section

just need you to get to the mailbox

before I do who's who's being

volunteered oh yes please whoever is uh

gesturing wild most wildly

come on down

sure

what's your name

say again

and Foo

okay come on up to the edge of the stage

there and just to be clear if this is P

that is apparently n but to make clear

what we're talking about when we're

storing Ox whatever values like ox123

that's essentially equivalent to my you

know maybe pulling out something like

this and just abstractly pointing to

your mailbox there or if you prefer

pointing to the mailbox okay all right

oh thank you all right

this is akin to me pointing at your


mailbox and if you want to go ahead and

open your mailbox and reveal to the

crowd what's inside your mailbox

labeled n

all right

thank you

we have a little a cs50 stress ball for

your trouble thank you for coming up so

that's just to put a visual on what it

is we're talking about because it can

get very abstract very cryptic quickly

when we're talking about addresses and

memory and drawing it like these little

squares but if you think about just

walking into a post office or an

apartment complex that's got a lot of

mailboxes those mailboxes essentially

are a big chunk of memory and each of

those mailboxes has an address this is

apartment one two three apartment two

billion and inside of those mailboxes

can go anything that can be represented

as information it could be a number like

n or 50 or if you prefer it could be a

number that represents the address of

another mailbox and this is Akin really

if you've ever had an apartment or you

and your parents have moved to having a

forwarding address it's like having the

post office in the U.S put some kind of


piece of paper in your old mailbox

saying actually forward it to that other

mailbox that really is all a pointers

doing at the end of the day it's just a

number but it's a number being used in a

different way and it's the syntax that

we've introduced not just int but in

star that tells the computer how to

treat that number in this slightly

different way are any questions then on

this yeah and back

if I did in C and set it to say the code

again

once more

equal to n so let me actually type it

out if I give myself another line of

code tell me one last time what's a type

int

is equal to n like this

so this is okay and I can't draw it

quite quickly enough on the board here

but this would be like creating another

four bytes somewhere in memory maybe

down here that stores an identical copy

of 50 because the assignment operator

from right to left copies one value to

another so that would just add one more

rectangle of size 4 to this particular

picture
if I'm answering your question as

intended okay so that is sort of sort of

week one style use of assignment

operators before pointers I could though

start copying pointers but again we'll

come back to some of that complexity any

other questions here

ah good question short answer no to

repeat for the camera if I create a

second variable like this in C equals n

and I claim without actually drawing it

on the board that this gives me another

rectangle the value of which is also 50

P does not get touched and this is

what's important and really

characteristic of C nothing happens

automatically for you like not p is not

going to be updated unless you update p

in some way so creating a third variable

called C even if you're copying its

value from right to left that has no

effect on anything else in the program a

good question

so what have we seen that's perhaps now

a little more explainable we'll recall

that we talked quite a bit last week

about strings and just to recap in

layperson's terms like what is a string

as you now understand it

say let me take a specific hand here


what's a string how about uh over here

and okay sure both of you are right an

array of characters an array of

characters and we I claimed or revealed

last week that string is not technically

a feature built into C like it's not an

official data type but every programmer

in most any language refers to sequences

of characters words letters paragraphs

as strings so the vernacular exists but

the data type doesn't typically exist

per se in C so what we're about to do if

you will for dramatic effect is take off

some training wheels today the cs50

library implemented in the form of the

header file cs50.h we claim has had a

bunch of things in it prototypes for get

string prototypes for get int and all of

those other functions but it turns out

it also is what defines the word string

in such a way that you all can use it

these past several weeks so let's take a

look at an example of a string in use

here for instance is a tiny bit of code

that uses the word string creating a

variable called s and then storing quote

unquote High exclamation station Point

Let's consider what this looks like now

in the computer's memory I don't care


about all the other bytes let's just

focus on these and this per last week is

how high might be stored h i exclamation

point and then one more as someone

already observed that Sentinel value

that null character which just means

eight zero bits to demarcate the end of

that string just in case there's

something to the right of it the

computer can now distinguish one string

from another

so last week we introduced this new

syntax well if strings are just arrays

of characters you can then very cleverly

use that square bracket notation and go

to location 0 or 1 or 2 which are kind

of sort of like addresses but they're

relative to the string right this could

be at ox123 or ox456 but with this

bracket notation zero is always the

beginning of the string one is the next

two is the next and so forth so that was

our array syntax for indexing into an

array but technically speaking we can go

a little deeper today technically

speaking if high is starting at the

address ox123 then it stands to reason

that I is at ox124 exclamation points at

ox125 and the the null is at ox126 now I

don't care about one two three per se


but even though this is hexadecimal this

is correct math even in HEX if you just

add one when you start at ox123 the next

number is four five six at the end I

don't have to worry about A's B's into

sees because I'm not counting that high

in this example

so if that's the case and my computer is

actually laying out the word hi in

memory like that well what exactly is s

right what exactly is s if at the end of

the day h i exclamation point null is

storing is are stored at these addresses

like where is s like now that I've kind

of taken off those training wheels and

showed you where h i exclamation point

null actually are what happened to s

well S as always is actually a variable

right even in the code I proposed a

moment ago s is apparently a data type

that yes doesn't come with C but cs50's

Library makes it exist s is a variable

of type string so where is s in this

picture well it turns out that s might

be up here again I'm just drawing it

anywhere for the sake of this discussion

but s is a variable per that line of

code what s is storing apparently I

claim is ox123 I actually don't really


care about these addresses so let's

abstract that away s is apparently as of

now today one week later just a pointer

to a character specifically the first

character in s and this is kind of the

last piece of the puzzle last week we

had this clever way of demarcating the

end of a string well it turns out that

strings are represented in the

computer's memory as a variable that is

a pointer

inside of which is the address of the

first character in the string so if s is

got points at the first character and

you can trust that backslash zero is at

the end of the string that's literally

all you need to figure out where a

string begins and ends so what do I mean

by this well let's be a little more

Concrete in terms of this picture if

I've started with this line of code here

it turns out all this time since week

one that the word string has just

semi-secretely been an alias for

Char star

I know so Char star so why does this

make sense it's a little weird still but

if in our previous example we were able

to store the address of an integer by

declaring a variable called p as in Star


P well if as of now strings are just the

address of the first character in a

string then probably a string is just a

Char star because that means s is the

address of a character the very first

character in the string now the string

might have three letters like it did or

four or even a hundred if it's a long

paragraph but that's fine because you

can trust that there's going to be that

null character at the very end so this

is a general purpose way of representing

strings using this new mechanism in C so

in fact let me go ahead here and

introduce maybe a couple of

manipulations of this let me go back to

my code here and let's get rid of this

integer stuff and let's instead Now do

for instance this let me add in the cs50

library so we'll include cs50.h4 now I'm

going to go ahead and inside of main

give myself a string s equals High

exclamation point I don't type the

backslash zero C does that for me

automatically uh by using my double

quotes like this now let me just go

ahead and print it so this again is sort

of week one style stuff where I'm just

printing a string no pointers yet


so let me do make address enter dot

slash address and hopefully I see high

so nothing new there but let's start to

peel back some of these layers here let

me first of all get rid of the cs50

library for a moment and let me change

string

to char star

and it's a little bit weird but yes the

convention is to say Char a space then

the star and then immediately thereafter

the name of the variable strictly

speaking though you might see textbooks

or websites that do it like this or like

this but the canonical way is typically

to do it like that so now no more cs50

Library no more training wheels if you

will I'm just treating strings for what

they really are let me go ahead and do

make address enter so far so good dot

slash address and that too still works

so percent s is a thing that comes with

printf because the word string is

programmer terminology but strictly

speaking C doesn't have a string data

type it's always been Char star so what

this means now is I can start to kind of

have some fun with these basic ideas

even though this is not purposeful other

than for the sake of discussion but if s


is this let me go back and give myself

the cs50 library let's put those

training wheels back on for just a

moment so that I can do one manipulation

at a time here's my string S as before

well let me go ahead and declare a car

called C and let me store the first

character in the string there which is s

bracket zero and that should give me H

and then Just for kicks let me go ahead

and do Char star whoops let me go ahead

and do Char star P equals Ampersand C

and see what this actually prints for me

let me go ahead and print out what p is

here

so we're just playing around so make a

dress so far so good dot slash address

all right so what have I just done I've

just created a Char C and stored in it

the letter H which is the same thing as

s bracket I then I'm saying what's the

address of c and that's apparently ox7ff

whatever so that's the address but I

technically didn't have to do that let

me go ahead and do two things now

instead of just printing P let me go

ahead and print out maybe s itself

let me go ahead and do make address

enter so far so good dot slash address


and damn it what did I do

oh shoot I didn't want to do that oh I

really made a mess of this

um

what did I want to do here

That was supposed to be impressive but

it was the opposite so let me

turn it around so if I intended to do

this why are not lines nine and ten

printing different values didn't really

intend to go here but let me try to save

this

why are we seeing different addresses

namely this address

402004 for S and then ox7ff for

P any thoughts yeah over here

correct so if I really wanted to weasel

my way out of this this is a great

answer to the previous question which

was about what if I introduce another

variable C that's a copy of the value

and not in this case an INT but an

actual Char here I've made C be a copy

of the character that's at the beginning

of s but that's indeed a copy so if I

were to draw it on the screen that would

give me a different rectangle in which

this copy of H would actually be stored

so I didn't intend to do this but what

you're seeing is yes the address of s


and apparently that's at a pretty low

address by default here then you're

seeing the address of C but even though

each of them is H I claim one is at a

different address in memory and this has

always been happening anytime you

created one variable or another it was

ending up here or here or here or

somewhere else in memory now for the

first time all we're doing is actually

just poking around the computer's memory

to see what is actually there so let me

actually

back this up a little bit and do what I

intended to do here which was something

like this so if string s equals quote

unquote High let's go ahead and give

myself a pointer called P to the first

character in EST

all right so now let me go ahead and

print out the value of this pointer

percent P printing out P so we're just

going to do one thing at a time so make

a dress enter dot slash address there at

the moment is the address of the first

character in s what I meant to do now

was this if I want to print out two

things this time let me print out not

only what p is but also what s itself


originally is because if I claim that

everyone from last week should be

comfortable with s bracket zero just

representing the first character in s by

definition of strings being a rays of

characters

then S as of today is itself the address

of a character the first one in s so if

I now do make a dress

and do dot slash address this time I see

the same exact things thank you

this is really like the lamest sort of

thing to be applauding over but what

we're demonstrating here is that s is by

definition the address of the first

character in C so if we borrow some of

our mental model from last week well if

s bracket zero is the first character in

C doing the Ampersand on that expression

should be the same as s now this isn't

to say that we would kind of jump

through these hoops all the time with

this much syntax but this is just to do

proof by example that s is in fact as I

claimed a moment ago just the address of

a character not even multiple characters

it's the address of a single character

but the key thing is it's the address of

the first character in the string and

per last week we trust that c is going


to look for that null character at the

very end just to make sure it knows

where the string actually ends all right

let's person came up over here

correct to summarize on line eight when

I am using percent P that just means

print a pointer value so Ox something

I'm passing it s

previously when we used percent s printf

new to print not just the first

character of s but h i exclamation point

and then stop when it hits the backslash

zero percent p is different percent P

tells the computer to go to that address

sorry tells the computer to print that

address on the screen

so this is where percent s all this time

has been powerful the reason printf

worked in week one and two and three was

because printf was designed by some

human years ago to go to the address

that's being passed in for instance s

and print out character after character

after character until it sees the null

character backslash zero and then stop

printing it so that's you're getting a

lot of functionality sort of for free

from percent s today we're using

something much simpler percent P which


just literally prints what s is and the

reason we don't do this in week one is

just because this is like way too much

to be interesting when all you want to

print out is high or hello world or the

like but now what we're really doing is

revealing what's been going on this

whole time and let me make one other

example here let me go ahead and get rid

of this variable here and let me just

print out a few things to make this same

point I'm going to print out not just an

S like I did here but let's go ahead and

print out every the address of every

character in s so let's get the first

letter in s and get its address

and I'm going to do copy paste for

time's sake but not something I would do

frequently

so let me print out the address of the

first character the second character the

third and actually even the fourth which

is the backslash Zero by doing this

so when I compile this program make

address

dot slash address I should see two

identical values and then additional

values that are one byte away in my

diagram a moment ago my addresses were

arbitrarily ox123 one two four one two


five one two six now it starts at by

chance Ox

402004 which is s

ox402004 is the same thing as s because

I'm just saying go to the first

character and then get its address those

are one in the same now and then after

that is Ox

[Music]

40205.006.007 because that is just like

the diagram go to the I to the

exclamation point and to the null

character so all I'm doing now is using

my newfound understanding of what

Ampersand does and what the star does is

I'm just kind of playing around I'm

poking around in the computer's memory

just to demonstrate there's no magic

it's all there very deliberately because

I or printf or someone else put it there

yeah

thank you

really good observation so it's indeed

the case that high unlike 50 is kind of

ending up at a very low address not the

ox7 FF wherever it was that's actually

because long story short strings are

often stored in a different part of the

computer's memory more on that later


today for efficiency there's actually

only going to be one copy of the word hi

an exclamation point and the computer is

going to tuck it at sort of the

beginning of my memory but other various

like ins and floats and the like they

end up lower in memory by convention but

a good observation because that is

consistent here

all right so a couple final details then

on what's been going on here let me go

ahead and claim that

we implemented Char star or rather

string as a Char Star as follows as of

last week we were writing this code as

of this week we can now start writing

this code because Char star specifically

we invented in the cs50 library but it

turns out you've seen a way of inventing

your own data types recall this thing

here we played around last time with

data structures or the struct keyword in

C and briefly the type diff keyword

which defines a type for you and if I

highlight what's interesting here the

way we invented a person data type last

time was to define a person as having

two variables inside of it a structure

that encapsulates a name and

encapsulates a number now even though


the syntax is a little different today

because of the star thing notice that

this could be a similar application of

that idea if I want to create a type

called string highlighted in yellow here

then I use typedef to make it defined to

be Char star so this is literally all

that has ever been in cs50.h in addition

to those prototypes of functions we've

talked about type def Char star star

string is a one line code that brings

the word string as a data type into

existence and that's all that's ever

been there but the star the tar star is

just too much in week one we wait until

this point to sort of Peel back that

layer are any questions then on what a

string is what star or the ampersander

doing yeah

thank you

oh my God massive spoiler but yes if

that is that why when you compare two

strings as I briefly did uh or almost

did uh you problems arise and in fact

yes last week we used stir compare

s-t-r-cmp for a very deliberate reason

because yes the spoiler is I

accidentally would have compared two

addresses in memory not the strings at


those addresses

other questions here

no all right well before we give

ourselves maybe a 10 minute break here

we have lots of pieces of paper if

anyone wants to come on up and play with

this big stack of Post-its if you want

to make your own 8x8 grid of something

to share with the class if you're

artistically inclined come on up

otherwise let's take 10 minutes and

we'll return after 10. all right so

let's come back to this question of how

we can start to use these pointers and

these addresses ultimately in an

interesting way the goal ultimately next

week is going to be to use these

addresses to really stitch together more

complicated data structures than just

persons like last week or candidates in

the context of like an electoral

algorithm if you will and actually

really use our memory in the most

versatile way to represent not just

images but maybe videos and other

two-dimensional structures as well but

for now let's come back to this address

example Whittle it down to just a high

initially and see what's going on again

here underneath the hood so let me


re-add the cs50 library just so we use

our synonym for a moment that is the

word string and I'll redefine S as a

string and what I didn't mention before

is that these double quotes that you've

been using for some time are actually a

little special the double quotes are a

clue to the compiler that what is

between them is in fact a string as we

now know it which means the compiler

will do all the work of figuring out

where to put the H the I the exclamation

point and even adding for you

automatically a backslash zero and what

the compiler will do for you too is

figure out what address all four of

those chars ended up at and store it for

you in the variable s so that's why it

just kind of happens with strings

without using ampersands or even Stars

explicitly but the star at least has

been there because again string is just

synonymous now with Char star it's not

really as readable but it is now the

same idea so I'll leave string in place

just to do something sort of week one

style here for a moment and let's go

ahead and print out a few characters so

I'm going to use percent C this time and


I'm going to print out s bracket zero

and then I'm going to print out s

bracket one and S bracket 2.

literally doing sort of week three style

from last week a printing of every

character in s as though it were an

array so dot slash address should give

me h i exclamation point and if I really

want to get curious technically speaking

I could print out one more location and

let me go ahead and recompile make

address dot slash address and there is

it would seem the backslash zero I'm not

seeing Zero because I didn't type

literally the zero Char in ASCII it's

literally eight zero bits which are

technically unprintable if you will in

printf speak and so what I'm seeing here

is like a blank symbol that just means

there is something else there it's

apparently all eight zero bits but they

are there even though we're not seeing

them literally right now well let's go

ahead and peel back one of these layers

and let me go ahead and get rid of the

cs50 library

and get rid of therefore the word string

because again henceforth it's just Char

star nothing else is different I'm going

to now do make address dot slash address


and it's the same exact thing and now

let's just focus on the high rather than

even worry about that so I'm going to

recompile one last time and now I have h

i exclamation point well it turns out

that the array notation we used last

week was technically some of this

syntactic sugar sort of a neat way to

use syntax in a useful way but we can

see more explicitly today what the

square brackets for a string is actually

doing let me go ahead and do this let me

adventurously say I want to print out

not s bracket zero

but I want to print out whatever the

first character of s is so to be clear

what is s now it's the address of a

string okay but what is s really s is

the address of the first Char in a

string and again that's sufficient for

defining a string because eventually the

computer will see that there's a

backslash n at the end of it so s is

specifically the address of the first

character in a string

so that means using my new syntax if I

want to print out that first character I

can print out star s because recall that

star is the d-reference operator when


you don't repeat the word Char you don't

repeat the word in you just use the star

here that means go to that address

similarly if I in my sort of Newfound

knowledge of how strings work know that

the H comes first then the I right after

it then the exclamation point then the

backslash zero contiguously one byte

apart I could kind of start to do some

arithmetic I could go to S plus one byte

and print out the second character and I

could print out whatever is at s plus

two in fact doing what's generally known

as pointer arithmetic literally treating

pointers as the numbers they are

hexadecimal or decimal doesn't really

matter it's still just numbers and go

ahead and add one byte or two bytes to

them to start at the beginning of a

string and just kind of poke around from

left to right so this now is equivalent

to what we did last week using square

bracket notation but now I'm

re-implementing that same idea with the

sort of lower level Plumbing

understanding ampersands and stars now a

little bit more so if I remake this

program and do dot slash address I

should still see h i exclamation point

but what I'm really do is just kind of


demonstrating hopefully my my sort of

understanding of what really is going on

in the computer's memory now programmers

who are maybe trying to show off might

actually write this syntax I think the

more common syntax would be what we did

last week s bracket zero s bracket one

why it's just a little more readable and

like we don't need to sort of uh brag

about or care about this underlying

representation the square brackets last

week were an abstraction if you will on

top of what is lower level math but

that's all that's going on underneath

the hood we're poking around from bite

to bite to byte all right let me pause

here see if there's any questions on

that one

questions on this

let's do one more then just to

demonstrate that this is not even

specific to Strings let me go ahead and

get rid of all of this and let me give

myself an array of numbers like I did

last week so if I don't uh if I'm going

to declare all the numbers at once using

this funky curly brace notation I can do

like four six eight two seven five zero

so seven different numbers inside of an


array that's automatically initialized

like this I don't strictly speaking need

to say seven the compiler is smart

enough to figure out how many numbers I

put with commas between them and that

just gives me an array containing four

six eight two seven five zero so it

turns out I can print each of these

numbers in the familiar way I can do a

printf of percent I

backslash n and I can print numbers

bracket zero and let me just do some

quick copy paste just to print the first

three of these theoretically that should

print out four six eight and so forth

but I can do the same sort of

manipulation understanding what pointers

now are using pointer arithmetic so let

me actually unwind this and just go back

to one printf and instead of printing

numbers bracket zero like I might have

last week let me just go and print out

whatever is at that address so asterisk

numbers let me then print out the second

digit which is going to be whatever is

at numbers plus one and then let me do

this further and do whatever is at

numbers plus two and if I really want to

repeat this let me do it four more times

and do what's at location


three four five and six and that's seven

total numbers because I started counting

at zero so let me just quickly run this

make address dot slash address there are

those seven digits being printed but

there's something subtle but also kind

of useful here

each of these digits four six eight two

seven five zero is an INT why because I

made an array of integers but think back

how big is a typical integer have we

claimed

four bytes or 32 bits so it's worth

noting that I don't really need to worry

about that detail notice that I did not

do plus four plus eight plus twelve plus

16 plus 20 right I the programmer

strictly speaking don't need to worry

about how big the data type is this is

the power of pointer arithmetic the

compiler is smart enough to know that if

you add one to this pointer that is the

same as saying go one more piece of data

not just one byte so if it's an INT move

four if it's a second it move eight if

it's a third in move 12 pointer

arithmetic sort of handles that annoying

arithmetic for you so you can just think

of this as a number after a number after


a number that are back to back to back

but not one byte apart but four bytes

apart which is only to say plus one plus

two plus three Works no matter the data

type y because the compiler knows what

type of data you're talking about

now there's one other detail I should

reveal here

that I've kind of taken for granted in

the past I was using double quotes to

represent strings and I claim that the

compiler's smart enough to realize that

oh if I have double quote high that

means it's an array of h i exclamation

point and then the backslash zero notice

this usefulness it turns out that you

can actually treat arrays as though the

name of the array is itself a pointer

and this is actually going to be

something useful in upcoming problems

when we want to pass arrays around in

the computer's memory notice that

strictly speaking on line five there's

no pointers going on there's no star

there's no Ampersand there's nothing new

there and yet instantly on line seven

I'm sort of pretending that it is the

address and this is actually okay it

turns out that an array really can be

treated as the address of the first


element in that array the difference is

that there's no secret backslash zero

anywhere like this is just part of the

phone number here the ending in zero

that's not like a special backslash zero

so this is something we're going to take

advantage of too before law there's this

sort of interrelationship between

addresses and arrays that just generally

allows you to treat one as though it is

the other but the math is taken care of

for you or any questions then on on this

before we start to solve some of some

bigger problems yeah

potentially if you go beyond the end of

an array you might get a segmentation

fault the problem is that that symptom

is sometimes non-deterministic which

means that sometimes it will happen

sometimes it won't it often depends on

how far off the end of the array you

actually go you'll often not induce the

segmentation fall if you just kind of

poke a little too far but if you go way

too far it quite likely will but we'll

give you a tool today actually for

detecting and solving exactly that kind

of situation so let's go ahead now and

do something a little different in code


but that actually comes back to that

spoiler from earlier let me go ahead and

create a program called compare dot C

and in this program I'm going to go

ahead and allow myself the cs50 library

not so much for string but so that I can

actually use get in still which is way

easier than the way we'll see that c

normally lets you get input let me give

myself standard io.h do an INT main void

not worrying about command line

arguments today and let me go ahead and

get an INT I using get int and ask the

human for the value of I then let me

give myself an INT J ask the user for

another int calling it J

and then let me go ahead and kind of

naively but to your point earlier if I

equals equals J then let's go ahead and

print out something like same backslash

let's go ahead and print out different

if they are not in fact the same

so that would seem to be a program that

compares the value of two integers all

right so let's go ahead and run make

compare so far so good dot slash compare

okay I will be 50 I A J will be 50.

they're the same let's do it once more I

will be 50 J will be 42 they are


different so so far so good in this

first version of comparison

but as you might see where I'm going

with this let's move away from integers

and let's actually change these things

to char to Strings so I could do string

s over here get string s over here then

I could do uh string T over here and get

string over here asking the user for T

this time here and then I can compare

the two if s equals equals T and this is

a common convention if you've used s for

string already you can use T for the

next one at least for simple

demonstrations like this I'm going to

compare the two just like I did for ants

which worked great make compare

so far so good dot slash address oh

sorry wrong program dot slash compare

let me go ahead and type in something

like uh hi

exclamation point and buy exclamation

point which of course should definitely

be different let me run it again with

high exclamation point and high

exclamation point

huh different maybe I I messed up let's

maybe do lowercase is maybe that'll fix

but no those two are different so to


come back to what I described as a

spoiler earlier what's the fundamental

issue here to be clear

why is it saying different even though I

pretty sure I typed the same thing twice

yeah

yeah this is where it's now useful to

know that string has been an abstraction

a training wheel if you will and if we

take that away still use get string

because that's convenient still but if I

change string to be Char star it's a

little more explicit as to what s and

what T are s is a pointer to a Char that

is the address of a Char T is a pointer

to a Char that is the address of a Char

specifically the first character in s

and the first character in t

respectively so if I'm comparing these

two it should stand to reason that

they're going to be different why

because s might end up here in memory

and T might end up here in memory each

time I call getstring it is not smart

enough or Advanced enough to know that

wait a minute you type the same thing

I'm just going to hand you back the same

address that doesn't happen because we

did not design get string that way each

time I call it get string it returns


apparently a different copy of the

string that was typed in a high over

here and a high over here they might

look the same to the human but to the

computer they are different chunks of

memory and therefore it different

addresses and here too we can reveal

what is get string returning well up

until today it was returning a string so

to speak that's not really a thing

technically what get string has always

been doing is returning the address of

the first Char in a string and trusting

that we put a backslash zero at the end

of whatever the human typed in and

that's enough now for printf for

Sterling for you to know where a string

begins and ends so get string has

actually always returned a pointer it

has not returned a quote unquote string

per se but there are functions that can

solve this comparison for us recall that

I could do something like this I could

actually go in here and I could uh let's

see where was it so if I include stir

compare here and use it to pass in two

values s and t let's see now what

happens when I make compare

huh implicitly declaring Library


function stir compare with type int and

well there's a star so you might have

seen this error before and you might

have uh ignored most of it but there's

some evidence of stars or pointers going

on here it looks like I didn't include

the string.h header file so that's an

easy fix include string.h which despite

its name does not create a data type

called string it just has string related

functions in it like stir compare let's

make compare again

now it compiles dot slash compare now

let's type in high exclamation point and

even the same thing again these are now

oh I used it wrong okay user error

That was supposed to be impressive but

it's the opposite what did I do wrong

what did I do wrong here yeah

it returns three different values zero

if they're the same positive one becomes

before the other negative if the

opposite is true I just forgot that so

like I did last week correctly if I want

to compare them for equality per the

manual page I should be checking for

zero as the return value now may compare

dot slash compare enter let's try it one

last time high and high okay now they're

in fact the same and Justin thank you


all right

and indeed not that it's returning same

all the time if I type in high and then

buy it's indeed noticing that difference

as well well let me go ahead and do one

other thing here let's do one other

thing let me go ahead now and just

reveal more pictorially what's going on

let's get rid of the string comparison

and let's just print these things out

the simple way to print this out would

be with percent s and again percent s is

special printf knows taking an address

and start there print every character up

until the backslash ends so let's just

hand it s and do that and then let's do

one more percent s comma T this is again

sort of a mix of weeks one and this week

because I got rid of the word string I'm

using Char star but I'm still using

printf and percent s in the same way let

me go ahead and run compare now and if I

type high and high I should see the same

thing twice so they look the same but

here now we have the syntax today to

print out the actual addresses of these

things so let me just change the S to a

p e because p means don't go to the

address and print it it means just print


the address as a pointer so make compare

dot slash compare and now let's type in

hi and once more and I should see indeed

two slightly different addresses given

in hexadecimal one's got a B at the end

one's got an F at the end and they are

indeed a few bytes apart so this is just

confirming what our suspicions have

actually been so what does this mean

perhaps in the computer's memory well

let's kind of take a look I've zoomed

out so I have a little more squares to

look at it once here might be s in

memory when I do string s equals or Char

star s equals I get a variable that's up

size one two three four five six seven

eight because I claimed earlier that on

Modern systems pointers are generally

eight bytes nowadays so they can count

even higher and inside of the computer's

memory also might be high and I don't

know where it ends up so for the sake of

discussion it ended up down here that's

what was free when I ran the program Hi

exclamation point backslash zero maybe

it ended up for the sake of discussion

at ox123 4 5 and 6. so to be clear what

is s storing

once the assignment operator copies from

right to left what is s storing if I


Advance one more

slide yeah

0x123 the presumption being that if a

string is defined by the address of its

first Char and that address of its first

R is ox123 then that's indeed what

should be in the variable s and so

technically that's what's been happening

with that assignment operator from right

to left get string indeed returns a

string so to speak but more properly it

Returns the address of a Char what's

been then copied from right to left

using that assignment operator all these

weeks is indeed that address now

technically we don't really need to care

about where these addresses are it

suffices to just think about them sort

of referentially but let's first

consider where T might be T is just

another variable that I created on my

second line of code maybe it ends up

there maybe somewhere else for the sake

of discussion I'll draw it left and

right where did the second word end up

that I typed in well suppose the second

copy of high ended up at Ox four five

six four five seven four five eight four

five nine what ended up in t I'll plug


this one off myself ox4 five six

presumably and so this is now a

pictorial representation of Y and let's

abstract away everything else when I

compared s against T using equal equals

but based on the picture they're

obviously not the same one is over here

one is over here and per a moment ago

one is ox123 the other is ox456 yes

technically they're pointing at

something that's the same but that just

reveals how stir compare works stir

compare is apparently a function that

takes in the address of a string as its

argument and the address of another

string as its argument it goes to the

first character in each of those strings

respectively and probably has like a for

Loop or a while loop and just goes from

left to right comparing looking for the

same chars left and right and if it

doesn't notice any differences boom it

returns zero if it does notice a

difference it returns a positive or A

negative value and that's very similar

recall to how we implemented string

length ourselves last week I sort of

used a for loop I was looking for a

backslash zero stir compare is probably

a little similar in spirit looping from


left to right but comparing this time

not just counting are any questions then

on string comparison and why it is that

when you stir compare and not equals

equals yeah

do pointers have addresses yes so we

won't do that today but I could actually

use the Ampersand operator on S or on T

that would give me the equivalent of a

Char star star that itself could be

stored elsewhere in memory that's where

it ends we don't sort of do that

recursively forever there's star and

there's star star but yes that is a

thing and it's very often useful in the

context of like two-dimensional arrays

which we haven't really talked about but

that is a feature of the language too

but not today

good question

all right so what might we now do to

take things up a notch well let's go

ahead and Implement a different program

here that maybe try is copying some

values just to demonstrate this let me

open up a file called how about copy.c

and I'm going to start off with a few

includes so let's include the cs50

library just so we have a way of getting


user input let's include how about

standard i o as always let's

preemptively include string.h and maybe

one other in a moment let's do int main

void as before and then in here let's

get a string from the user and just call

it s for Simplicity and heck we can

actually just call this uh Char star

star if we want or string since we're

using the cs50 library but we'll come

back to that let's now make a copy of s

and do s equals T using a single

assignment operator and then let's check

something like this let's go into the

first character of T which is T bracket

zero and then let's upper case it using

that function that we've used in the

past of two upper T bracket zero

semicolon and actually I should go back

up here if I'm using two upper or if

you've used two lower or is upper or is

lower I might not remember this offhand

but it was in another header file called

C type dot h there was a bunch of

helpful functions in that Library as

well now at the very last line of the

program let's just print out what both s

and TR

by simply printing out percent s for

each of them and T is percent s also not


percent T of course and let's see what

happens here so let me make copy oh my

God so many mistakes what did I do wrong

oh

okay that was unintended string T equals

s sorry so I'm creating two variables s

and t respectively and I'm copying s

into T make copy enter there we go dot

slash copy

and let's now type in for instance uh

how about high exclamation point and

I'll lower case this time and now what

gets printed aha

I don't think that's what I intended

so to speak here because notice that I

got s from the user so that checks out I

then copied T into s which looks correct

that's what we always use assignment for

then I uppercase the first letter in t

but not s at least in my code then I

printed s and t and then notice

apparently both s and t got capitalized

so if you're sort of getting starting to

get a little comfortable with what's

going on underneath the hood like what

what's the fundamental problem here why

did both get capitalized

why did both get capitalized yeah over

here
yeah they're representing the same

address so C is really literal if you

create another variable called T and you

assign it the value of s you are

literally assigning it the value in s

which is like ox123 or something like

that and so at that point in the story

both s and t presumably have a value of

ox123 which means they technically point

to the same h i exclamation point in

memory nowhere did I tell the computer

to give me a copy of Hi exclamation

point per se I literally said just copy

s so here's where an understanding of

what s literally is kind of explains the

situation I'm only copying the pointers

so what actually went on in memory let's

take a look here at this grid if I

created s initially maybe it ends up

here and I created high in lower case

and it ended up down here then the

address was again like ox123 four five

six ox123 is what's in s if then I

create a second variable called T and I

call it a string AKA HR Star maybe it

again ends up here but when I copy s

into T by doing T equals s semicolon

that literally just copies s into T

which puts the value ox123 there so if

we now abstract away all these numbers


and just think about a picture with

arrows what we've drawn in the

computer's memory is this two different

pointers but storing the same address

which means the breadcrumbs lead to the

same place and so if you follow the tea

breadcrumb and capitalize the first

letter it is functionally the same as

copying the changing the first letter in

the version S as well

so what's the solution then to this kind

of problem

like even if you have no idea how to do

it in code like what's the gist of what

I really intended which is I want a

genuine copy of s called T I want a new

h i exclamation point backslash zero

what do I need to do to make that happen

thoughts

so there is a function called stir copy

strcpy which is a a possible answer to

this question the catch with stir copy

is that you have to tell in advance not

only what the source string is the one

you want to copy you also need to pass

in the address of a chunk of memory into

which you can copy the string and here's

one thing we haven't seen yet and we

need one more building block today if


you will we haven't yet seen a way to

create new chunks of memory and then let

some other function copy into them and

for this we're going to introduce

something called dynamic memory

allocation and this is the last and most

powerful feature perhaps today whereby

we're going to introduce two functions

malloc and free where malloc means

memory allocate which literally does

just that it's a function that takes a

number is input how many bytes of memory

do you want the operating system to find

for you somewhere in that Big Grid it's

going to find it and it's going to

return to the address of the first byte

of contiguous memory back to back to

back and then you can do anything you

want with that chunk of memory free is

going to do the opposite when you're

done using a chunk of memory that Malik

has given you you can say free it and

that means you hand it back to the

operating system and then the operating

system can use it for something else

later so this is actually evidence of a

common problem in programming if your

Mac your PC has ever been in the habit

of starting to get like really really

slow or it's kind of slowing to a crawl


heck maybe it even freezes one of the

possible exclamation explanations could

be that the program you're running by

apple or Microsoft to whoever maybe

they're using malloc or some equivalent

asking the operating system Mac OS or

Windows for give me more memory I need

more memory the user is creating more

images the user is typing a longer essay

give me more memory more memory if the

program has a bug and never actually

frees any of that memory your computer

might end up using all of the available

memory and honestly humans are not very

good at handling Corner cases like that

very often programs computers just

freeze at that point or get really

really slow because they start trying to

be creative when there's not enough

memory left so one of the reasons for a

computer really slowing down might be

calling for malloc a lot or some

equivalent but never freeing it which is

to say you should always use these two

functions in concert and free memory

once you are done with it so let me go

ahead and do this in code and solve this

problem properly let me go ahead and do

this
before I copy s into T using something

like stir copy I first need to get a

bunch of memory from the computer so to

do that let's make this super clear that

we're doing with pointer so I'm going to

change my strings to char stars for both

s and t and what I technically am going

to store in T is the CH is the address

of a available chunk of memory to do

that I can ask the computer to allocate

memory for me and how many bytes if I

want to create a copy of h i exclamation

point I need how many bytes

good four because I need the H the I the

exclamation point and additional space

for the backslash zero it's up to me to

understand that and ask for it it's not

going to happen magically nothing does

in C so I could just naively type 4

there and that would be correct if I

type in h i exclamation point or any

other three letter word or phrase but to

do this dynamically I should probably do

something like Sterling of s plus one

for the additional null character recall

that string length does it in the sort

of the English sense it Returns the

length of the string you see plus one

also takes into account the fact that

I'm going to need that backslash n now


let me do this old school style first

let me go ahead and manually copy the

string t uh s into T first so four uh

int i x equals zero I is less than the

string length of s i plus plus then

inside my for Loop I'm going to do T

bracket I equals s bracket I but

actually I want the

null character two so I want to do the

length of the string plus one more and

heck I think I learned an optimization

last time if I'm doing this again and

again I could really do n equals

Sterling of s plus 1 and then do I is

less than n just as a nice design

optimization I think this for Loop will

actually handle the process then of

copying every character from s into

every available byte of memory in t or I

could get rid of all of that and take

your suggestion which is to use Stir

copy which takes as its first argument

the destination and its second argument

the source so copy from right to left in

this case two that's going to do all of

that automatically for me as well now I

think I'm good I can now capitalize

safely the first character in t which is

now a different chunk of memory than S


and then I can print them both out to

see that one has not changed but the

other has so make copy

all right what did I do wrong implicitly

declaring Library function malloc dot

dot so we've seen this kind of error

before

what is even if you don't know quite how

to solve it what's the essence of the

solution what do I need to do to fix

this kind of problem involving

implicitly declaring a library function

what did I forget yeah

I need to include the library and I

could look this up in the manual uh or I

can know it off the top of my head I

just forgot it there's another Library

we'll occasionally need now called

standard lib standard library that

contains malloc and free prototypes and

some other stuff too all right let me

just clear this away and do make copy

one more time now I'm good dot slash

copy enter all right s I'm going to type

in high lowercase T and S now come back

as intended s is untouched it would seem

but T is now capitalized

all right any questions then on what we

just did in code yeah

indeed there's a few improvements I want


to make so let me actually do those

right now technically I should practice

what I preached and I should indeed when

I'm done with T free T fortunately I

don't have to worry about how Big T was

the opera the computer remembers how

many bytes it gave me and it will go

free all of them not just the first I

should do free T I don't need to do free

s and I shouldn't because that is

handled automatically by the cs50

library s recall came from get string

and we actually have some fancy code in

place that makes sure that at the end of

your program's execution we free any

memory that we allocated so we don't

actually waste memory like I described

earlier but there's actually a couple of

other things if I really want to be

pedantic I should put in here it turns

out that sometimes malloc can fail and

sometimes malloc doesn't have enough

memory available because maybe your

computer is doing so much stuff there's

just no more RAM available so

technically I should do something like

this if T equals equals null with two

L's today okay then I should just return

one or something to say that there was a


problem I should probably print an error

message too but for now I'm going to

keep it simple I should also probably

check this this is a little risky of me

if I'm doing T bracket zero this is

assuming that there is a letter there

but what if the human the human just hit

enter at the prompt and didn't even type

h let alone h i exclamation point what

if there is no T bracket zero so

technically what I should probably do

here is if the length of T is at least

greater than zero then go ahead and

safely capitalize the first letter of it

and then at the very end if all goes

well I can return 0 thereby signifying

that indeed this thing was successful so

yes these two functions malloc and free

should be in concert and so if you call

malloc you should call free eventually

but you did not call malloc for S so you

should not call free for s

all right yeah other question

why did I do malloc plus one so malloc

sorry malloc of string length of s plus

one the string length is sort of the

literal length of the string as a human

would perceive it in English so h i

exclamation point Sterling gives me

three but I know now as of last week in


this week what a string technically is

and a string always has an extra byte

the onus is on me to understand and

apply that lesson learned so that I

actually give stir copy enough room for

that trailing null character and here's

just an annoying thing when we call the

backslash zero and UL uh last week uh it

turns out that n-u-l-l is the same idea

it's also zero but it's zero in the

context of pointer so long story short

you never really write nul I've just

said it and we saw it on the screen you

will start writing n-u-l-l when you want

to check whether or not a pointer is

valid or not and what I mean by that is

this if malloc fails and there's just

not enough memory left inside of the

computer for you it's got to return a

special value and that's special value

is n-u-l-l in all capital letters that

signifies something went wrong do not

trust that I'm giving you a useful

return value

other questions

on these copies thus far yeah over there

good question will stir copy not work

without Malik do you kind of need both

in this case because stir copy by


definition if I pull up its manual page

needs a destination to put the copied

characters it's not sufficient just to

say Char star T semicolon that only

gives you a pointer but I need another

chunk of memory that's just as big as Hi

exclamation point backslash zero so

malloc gives me a whole bunch of memory

and then stir copy fills it with Hi

exclamation point backslash zero so

again that's why we're sort of going

down to this lower level because once

you understand what needs to be done you

now have the functions to do it so let's

actually consider what we just solved so

in this next version of the program

where I actually introduced malloc T was

initialized to the return value of

malloc and maybe the memory that I got

back was here

ox456-457-458-459 I've left it blank

initially because nothing is put there

automatically by malloc I just get a

chunk of memory that is now mine to use

as I see fit I then then assign T to

that return value which points T at the

first address notice there's no

backslash zero this is not yet a string

it's just a chunk of memory four bytes

an array of four bytes what stir copy


eventually did for me was it copied the

H over the I over the exclamation point

over and the backslash zero and if I

didn't want to use Stir copy or I forgot

that it existed my for Loop would have

done exactly the same thing

all right any questions then

on these examples here

any questions yeah

good question after Matlock if I had

then still done just T equals s it

actually would have recreated the same

original Problem by just copying ox123

from s into T so then I would have been

left with a picture that looked like

this a few steps ago I would have and I

can't quite do it live this Arrow if I

did what you just described would now be

pointing over here and so I wouldn't

have fundamentally solved the problem I

would have just additionally wasted four

bytes temporarily that I'm not actually

using yeah

um you can you do you always use Malik

and stir copy together not necessarily

these are both solving two different

problems malloc's giving me enough

memory to make a copy stir copies doing

the copy however you could actually use


an array if you wanted of characters and

you could use Stir copy on that and

there's other use cases for stir copy

but thus far it's a reasonable mental

model to have that if you want to copy

strings you use malloc and then stir

copy or your own homegrown Loop yeah

foreign

say that once more

no it well good question uh if I uh I

didn't well stir copy per its

documentation we'll copy the whole

string plus the null character at the

end it just assumes there will be one

there it's therefore up to you to pass

stir copy a long enough chunk of memory

to have room for that if I only ask

malloc for three bytes that could have

potentially created a memory problem

whereby stir copy would just still

blindly copy one two three four bytes

but technically it should only have only

touched three of those you do not yet

have access to the fourth one or the

rights to it because you never asked

malloc for it uh yeah

correct the number inside Mallock it's

one argument is the number of bytes you

want back

yes the onus is on you the programmer to


remember or frankly use a function to

figure out how many bytes you actually

need that's why I did not ultimately

type in four manually I use Stir length

plus one so the plus one is necessary if

you understand how strings are

represented but using Sterling means

that I can actually play around with any

types of inputs and it will dynamically

figure out the length

so suffice it to say there's so many

ways already where you can start to

break programs let's give you at least

one tool for finding mistakes that you

might make and indeed in upcoming

problem sets will you use this to find

bugs in your own code not just using

printf not just using the built-in

debugger but another tool here as well

so let me go ahead and deliberately

write a program called memory.c that has

some memory related errors let me

include standard io.h at the top and let

me include standard lib.h at the top so

I have access to malloc now let me do

int main void and then inside of main

let me do this I want to allocate maybe

how about three space for three integers

why just for the sake of discussion so


I'm going to go ahead and do malloc of

three but I don't want three bytes I

want three integers and an integer is

four bytes so technically I could do

this three times four or I could do 12.

but again that's making certain

assumptions and if I run this program on

a slightly different computer ants might

be a different size so the better way to

do this would be three times whatever

the size is of an INT and this is just

an operator you can use anytime if you

just want to find out on this computer

how big is an INT how big is a float or

something else so that's going to give

me that many that much memory for three

ends what do I want to assign this to

well malloc returns an address

pointers are addresses so I'm going to

create a pointer to an INT called X and

assign it the value

so what am I doing here this is a little

less obvious but again go back to basics

the right hand side here gives me a

chunk of memory for three integers

malloc Returns the address of the first

byte of that chunk how do I store the

address of anything I need a pointer the

Syntax for today is

type of data star the type of data in


question is a three ins so I do in Star

again it's kind of purposeless only for

sort of instructional purposes here but

this is equivalent now to having a chunk

of memory of size 12 in total presumably

so I can technically now do this I can

go into maybe the first location and

assign it the number 72 like the other

day second location number uh number 73

and the Third location may be the uh

locate the number 33. now I've

deliberately made two mistakes here

because I'm trying to trip over my

newfound understanding or my sort of

greenness with understanding pointers

one I didn't remember that I should be

treating chunks of memory as zero

indexed malloc essentially returns an

array if you want to think of it as that

an array of three ants or more

technically the address of a chunk of

memory that could fit three ends so I

can use my square bracket notation or I

could be really cool and use pointer or

arithmetic but this is a little more

user friendly but I have made two

mistakes

I did not start indexing at zero so line


seven should have been X bracket zero

line eight should have been X bracket

one and then line nine should have been

X bracket two so first mistake the

second mistake that I've made as a side

effect is I'm also touching memory that

I shouldn't

X bracket three would mean go to the

fourth INT in the chunk of memory that

came back I only asked for enough memory

for three ants not four so this is

what's called a buffer overflow I am

accidentally but deliberately at the

moment going Beyond boundaries of this

array this chunk of memory so bad things

happen but not necessarily by just

running your program let me go ahead and

just try this make memory

and you'll see here that it compiles

okay dot slash memory and it actually

does not segmentation fault which comes

back to that point of non-determinism

sometimes it does sometimes it doesn't

it depends on how bad of a mistake you

made but there's a program that can spot

these kinds of mistakes and I'm going to

go ahead and expand my terminal window

for a moment and I'm going to run not

just dot slash memory but a program

called valgrind dot slash memory this is


a command that comes with a lot of

computer systems that's designed to find

memory related bugs in code so it's a

new tool in your toolkit today and

you'll use it with the coming problem

sets I'm going to run this now its

output honestly is hideous but there's a

few things that will start to jump out

and will help you with tools and the

problem set to see these kinds of things

here's the first mistake invalid right

of size four that's on memory.c line 9

per my highlights so let me go look at

line nine in what sense is this an

invalid right of size four well I'm

touching memory that I shouldn't it and

I'm touching it as though it's an INT

and an INT is four bytes size four so

again this takes some practice to get

used to the nomenclature here but this

is now a clue for me the programmer that

not only did I screw up but I screwed up

related to memory and so this is just

kind of a hint if you will it's not

going to necessarily tell you exactly

how to fix it you have to kind of

wrestle with the the semantics but

invalid right of size four oh okay so I

should not have indexed past the


boundary here all right so I shouldn't

have done that so let me go ahead then

and change this to uh zero one and two

perhaps here all right so let me go

ahead and recompile my code make memory

dot slash memory still doesn't seem to

be broken but it is technically buggy

let me go ahead and run valgrind again

so valgrind

of dot slash memory enter

and now there's fewer scary less scary

output now but there's still something

in there notice this 12 bytes in one

blocks no regard for grammar there are

definitely lost in Lost record one of

one super cryptic but this is hinting at

a so-called memory leak the blocks of

memory are lost in the sense that I

mallocked them I asked for them but I

never take a guess

freed them I have a memory leak and this

is the Arcane way of saying you've

screwed up you have a memory leak so

this is an easy fix fortunately once I'm

done with this memory I just need to

free it at the end so now let me go

ahead and rerun make memory it still

runs fine so all the while I might have

thought incorrectly my code is correct

but let me run valgrin one more time


valgrin of dot slash memory enter now

this is pretty good all heat blocks were

freed whatever that means no leaks are

possible and even though it's still a

little cryptic there's no other error

here and in fact it's pretty explicit

error summary zero errors from zero

contexts dot dot dot so even though this

is one of the most Arcane tools we'll

use it's also one of the most powerful

because it can see things that you the

human might not and maybe even that the

debugger might not it does a much closer

reading of your code while it's running

to figure out exactly what is going on

all right any questions then on this

tool and we'll guide you after today

with actually using this too

just helps you find memory related

mistakes that you might now be capable

of making

all right let's do one other memory

related thing let me shrink my terminal

window here let me create one other file

here called garbage.c so it turns out

there's a term of art called garbage

values in programming that we can reveal

as follows let me include standard io.h

and let me include how about standard


lib.h and then let me give myself

Ant-Man void and then in this relatively

short program let me give myself like

three ins using last week's notation

just int scores bracket three for like

three quiz scores or whatever then let

me go ahead and do four into I equals

zero I less than three I plus plus then

let me go ahead and print out percent I

backslash n scores bracket I semicolon

that's it

this code producer is

going to compile and it's going to run

but what is my logical bug I've sort of

Forgotten a step even though the code

that's written is not so wrong yeah

yeah I didn't provide the score so I

didn't actually initialize the array

called scores to have any scores

whatsoever what's curious about this

though is that the computer technically

doesn't mind let me go ahead and sort of

playfully make garbage enter and it's

kind of an app description because what

I'm about to see are so-called garbage

values when you the programmer do not

initialize your code variables to have

values sometimes who knows what's going

to be there the computer's been doing

some other things there's a bit of work


that happens even before your code runs

in the computer so there might be

remnants of past ins Char strings floats

anything else in there and what you're

seeing is those garbage values which is

to say you should never forget as I just

did to initialize the value of some

variable and this is actually pretty

dangerous and there have been many

examples of software being compromised

because of one of these issues where a

variable wasn't initialized and all of a

sudden users maybe people on the

internet in the context of web

applications could suddenly see the

contents of someone else's memory or

remnants maybe someone's password that

had been previously typed in or some

other value like a credit card number

that had been previously typed in there

are different defense mechanisms in

place to generally make this not so

likely but it's certainly very possible

at least in this kind of context to see

values that you probably shouldn't

because they might be remnants from

something else that use them so this is

to say again you sort of have this great

power now to manipulate memory but also


now you have this great sort of hacking

ability to poke around the concepts of

memory and this is exactly what hackers

sometimes do when trying to find

ways to exploit systems

for any questions here

no all right let's go ahead and take a

quick five minute break and when we come

back we'll build on these final topics

see you in five we are back uh first

just a little programmer of humor from

XKCD which hopefully now will make a

little bit of sense to you and what

let's we'll also do next is take a look

at a short two minute video that uh

animates uh with claymation if you will

from our friends at Stanford exactly

what happens now if you have an

understanding of what garbage values are

and how they get there and what happens

then if you misuse them it's one thing

just to print them out as I just did

it's another if you actually mistake a

garbage value for a valid pointer

because garbage values are just zeros

and ones somewhere numbers that is but

if you use that new dereference operator

the star and try to go to a garbage

value thinking incorrectly that it's a

valid pointer bad things can happen


computers can crash or more familiarly

uh segmentation faults can happen so

allow me to introduce if we could dim

the lights for two minutes uh our friend

Binky from Stanford

hey Binky wake up it's time for pointer

fun

what's that learn about pointers oh

goody

goody well to get started I guess we're

gonna need a couple pointers

this code allocates two pointers which

can

pull I see the two pointers but they

don't seem to be pointing to anything

that's right and

pointers don't point to anything the

things they point to are called pointees

and setting them up is a separate step

oh right right I knew that the pointees

are separate sure so how do you allocate

a pointy

okay well this code allocates a new

integer Point T and this part sets X to

point to it

hey that looks better so make it do

something

okay I'll dereference the pointer X to

store the number 42 into its Point e for


this trick I'll need my magic wand of

dereferencing your magic wand of

dereferencing uh

this is what the code looks like I'll

just set up the number and

hey look there it goes so doing a

dereference on X follows the arrow to

access its Point e in this case the

store 42 in there hey try using it to

store the number 13 through the other

pointer why

okay I'll just go over here to Y and get

the number 13 set up and then take the

wand of dereferencing and just oh

oh hey that didn't work

say uh Binky I don't think dereferencing

Y is a good idea because uh you know

setting up the point D is a separate

step and uh I don't think we ever did it

Point yeah we allocated the pointer y

but we never set it to point to a point

very observant hey you're looking good

there Binky can you fix it so that y

points to the same pointy as X sure or I

use my magic wand of pointer assignment

is that going to be a problem like

before no this doesn't touch the

pointe's it just changes one pointer to

point to the same thing as another


oh I see now y points to the same place

as X

so so wait now Y is fixed it has a

pointy so you can try the wand of

dereferencing again to send the 13 over

uh okay

okay here goes

hey look at

now dereferencing works on why and

because the pointers are sharing that

one point e they both see the 13. yeah

sharing uh whatever so are we gonna

switch places now oh look we're out of

time but

it's from our friend Nick parlante at

Stanford so let's consider what Nick did

here as Binky so here's kind of all the

code together these first couple of

lines were not bad and uh notice that in

Stanford's code they moved the Stars to

the left that's fine again more

conventional might be this syntax here

these two lines are fine it's okay to

create variables even pointers and not

assign them a value initially so long as

you eventually do so we eventually do

here with this line we assign 2x the

return value of Matlock which is

presumably the address of something to


be fair we should really be checking for

null as well but that's not the biggest

problem here the biggest problem is not

even this next line which means go to

the memory location in X and store the

number 42 there that's fine because

again malloc Returns the address of some

chunk of memory this chunk of memory is

big enough for an INT X is therefore

going to store the address of that chunk

that's big enough for an INT star X

recalls the dereference operator means

go to that address and put 42 in it's

like going to the mailbox and putting

the number 42 in it instead of taking

the number 50 out like we did before but

why is this line bad this is where Binky

sort of lost his head so to speak

why is this bad yeah

exactly we haven't yet allocated space

for why there's no mention of malloc

there's no assignment of why even to

that same memory so this would be go to

the address and why but if there is no

known address and why it is a so-called

garbage value which means go to some

random address that you have no control

over and boom that might cause what

we've seen in the past perhaps as a

segmentation fault now this fortunately


is the kind of thing that if you don't

quite have the eye for it yet valgrind

that new tool could help you find as

well but it's just another example of

again the sort of upside and downside of

having control now over memory at this

level all right well let's go ahead and

do one other thing considering from last

week that this notion of swapping was

actually really common operation we had

all of our volunteers come up we had to

swap a lot of things during bubble sorts

and even selection sort and we just kind

of took for granted that you know the

two humans would swap themselves just

fine but there needs to be code to do

that if you actually Implement bubble

sort selection sort or anything that

involves swapping so let's consider some

code like this we'll keep it simple like

last week and where we wanted to swap

some values like int a and int B for

instance here void because I'm not going

to return a value but I have a function

called swap so here for instance might

be some code for this uh but why is it

so complicated here let's actually take

a step back why don't we do this here I

think we have time for one more


volunteer could we get someone to come

on up you have to be comfy on camera and

oh and you're being asked to help with

your oh I'll go with the friend pointing

so whoever has their friend doing this

here

no now they're pointing over here now

literally an arm is being Twisted okay

come on down that backfired

come on over

and what is your name

Marina nice to meet you who were you

trying to volunteer

that's okay

so here we have for Marina two glasses

of liquid orange and purple just so that

they're super obvious and suppose that

the problem of at hand like last week

it's just to swap two values like as

though these two glasses represented two

people and we want to swap them but

let's consider these glasses to be like

variables or location in an array and

you know what I'd really like you to

swap the values so like orange has to go

in there and purple has to go in there

how would you do it and we'll see if we

can then translate that to code

okay what did you say a little letter

all right yeah so presumably you're sort


of struggling mentally with how you

would do this without having an extra

cup so good foresight here let me go

ahead and we do have a temporary

variable if you will so if I hand you

this how would you now solve this

problem

no that's oh well okay do it go with

your instincts

okay sure go ahead go to whatever your

instincts are

yeah so a little so strictly speaking

probably shouldn't have moved the

glasses just because that would be like

moving the array locations so let's

actually do it one more time but the

glasses now have to go back where they

originally are so how would you swap

these now using this temporary variable

okay good

otherwise we'd be completely uprooting

the array for instance by just

physically moving it around

so you move the orange into this

temporary variable then you copied the

purple into where the orange was and now

presumably excellent the Orange is going

to end up where the purple once was and

this temporary variable it's sort of


some extra memory it was necessary at

the time but not necessary ultimately

but a round of applause if we could and

thank you for doing that so well

so

right the fact that it sort of instantly

occurred to Mariana that like you need

some temporary variable is a perfect

translation to code and in fact this

code here that we might Glimpse now is

reminiscent of exactly that algorithm

where A and B at the end of the day are

the same chunks of memory just like the

second time the two glasses have to kind

of stay put even though we're physically

lifting them but they're going back to

where they were it's kind of like having

two values A and B and you just have a

temporary variable into which you copy a

then you change a with B then you go and

change B with whatever the original

value of a was because you temporarily

stored it in this temporary variable TMP

unfortunately this code doesn't

necessarily work as intended so let me

go over to my vs code here and open up a

program called swap.c and in swap.c Let

Me Whip up something really quickly here

with how about include standard io.h int

main void inside of main let me do


something like in X gets one and Y gets

two let me just print out as a visual

confirmation that X is percent i y is

percent I backslash n plugging in X and

Y respectively then let me call a swap

function that will invent in just a

moment swap X and Y and then let me

print out again X is percent i y is

percent I backslash and just to print

out again what they are because

presumably I should see one two first

then 2 1 the second time now how is swap

going to be implemented let me implement

it exactly as on the screen a moment ago

so void swap int X or let's call it int

a for consistency in B but I could

always call those anything I want in

temp gets a A gets b b gets temp so

exactly as I proposed a moment ago and

exactly as Mariana really implemented it

using these glasses of water I need to

now include my prototype as always so

nothing new there and I'll just copy

paste that up here and now let's go

ahead and run this so make swap

so far so good swap X is now one y is

two x is one y is two

so there seems to be a bit of a bug here

but why might this be this code does not


in fact work even though it obviously

works in reality yeah

good and let me summarize A and B do

indeed have different addresses of X and

Y and in fact what happens when you call

a function like this on line 11 calling

swap passing in X and Y you are calling

a function by value so to speak and this

is a term of art that just means you are

passing in copies of X and Y

respectively and calling them A and B in

the context of this function but they're

indeed copies now technically these

names are local only I could have called

this x I could have called this y I

could have changed this to X this to Y

this to X and this to Y the problem

would still remain just because you use

the same names in One function as you do

elsewhere that doesn't mean they're the

same they just look the same to you but

indeed swap is going to get copies of

this X and Y and in this context this

scope so to speak X and Y will be copies

of the original so for clarity let me

revert this back to A and B just to make

super clear that they're indeed

different albeit copies but there's

indeed a problem there this function

actually works fine in fact notice this


let me go ahead and print out inside of

this printf a is percent i b is percent

I backslash n and then I'll print A and

B and let me do that same thing at the

beginning of this function before it

does any work let me go ahead and rerun

make swap

dot slash Swap and this is promising

initially X is one y is two a is one B

is 2 a is two B is one but then nope X

is one y is two so if anything I've

confirmed that the logic is Right

Mariana's logic is right but there's

something about C there's something

about using one function versus another

that's actually creating a problem here

the fact that I'm passing in copies of

these values is creating this problem so

what in fact is going on well again

inside of your computer's memory there's

these little chips and we've been

talking about them abstractly it's just

this grid of memory locations it turns

out that your computer uses this memory

in a pretty conventional way it's not

just kind of random where it just puts

stuff wherever is available it actually

uses like different parts of the memory

for different purposes and you have


control over a lot of it but the

computer uses some of it for itself and

let's go ahead and zoom out from this

and consider that within your computer's

memory what a computer will typically do

is actually store initially all of the

the zeros and ones that you compiled in

the top of your computer's memory so to

speak so when you compile a program and

then you run it with DOT slash whatever

or on a Mac or PC you double click on it

the computer first the operating first

operating system first loads all of your

program zeros and ones AKA machine code

into just one big chunk of memory at the

top so to speak

below that it stores Global variables

any variables you have created in your

program that are outside of Main and

outside of any functions generally like

the top of your file globals tend to go

at the top there then there's this chunk

of memory that's generally known as the

Heap and we saw that word briefly in

valgren's output and then there's this

other chunk of memory called the stack

and it turns out that up until this week

you were using the stack heavily anytime

you use local variables in a function

they end up on the stack anytime you use


malloc that memory ends up on the Heap

now as the arrow suggests this actually

looks like a problem waiting to happen

because if you use more and more and

more Heap and more and more and more

stack it's like you know two things

barreling down the tracks of one another

this does not end well and that's

actually a problem if you've ever heard

the phrase stack Overflow or use the

website this is the origin of its name

when you start to use more and more and

more memory by calling lots and lots of

functions or using lots and lots of

local variables you use a lot of this

stack memory or if you use malloc a lot

and keep calling malloc Mallock Mallock

and never really or rarely calling free

you just use more and more memory and

eventually these two things might

overflow each other at which point

you're just kind of out of luck the

program will crash or something bad will

happen so the onus is kind of on you

just to don't do that but this is the

design generally of what's going on in

inside of your computer's memory now

within that memory though there are

certain conventions focusing on here the


stack and in fact let me go over here

with a marker and say that this

represents like the bottom of my memory

ultimately and so here we have a whole

bunch of wooden blocks and each of these

squares represents a byte of memory and

this for instance might represent four

bytes altogether good enough for an INT

or something like that so in my original

code that I wrote earlier that is in

fact buggy

what is in fact going on inside the swap

function we can kind of visualize it

like this when you run dot slash swap or

any program for that matter main is the

first function to get called with a c

program and so I'm just going to label

this bottom row of memory as Main and

what were the two variables I had in

Maine called in this code

yeah

X and Y and each of those words and N so

that's four bytes so it's kind of

deliberate that I reserved a four uh a

chunk of wood here that's four bytes so

let me just call this X and I'm just

going to write the number one in this

box here and then I had my other

variable Y and I'm going to put the

number 2 there what happens when main


calls swap like it does in this code

here well it has two variables of its

own A and B

and a initially is 1 and B is initially

two but it has a third variable temp

which is the local variable in addition

to the arguments A and B that are passed

in so I'm going to call this temp TMP

over here and what is the value of temp

well we have to look back at the code

temp initially gets the value of a all

right the value of a was one so temp

initially gets one that's step one in my

three line program okay a equals B so

that is a sign from the right to the

left of the B into the a so B is 2 a is

this so let me go ahead and erase this

and just overwrite that so at this

moment in the story you have two copies

of two

so that's that's okay though because the

third line of code says temp gets copied

into B so what's temp one gets copied

into B so let me overwrite this two with

a one

and now what happens now unfortunately

the code ends swap doesn't actually do

anything with the result and the problem

in C is that I could have had a return


value I could go in there and change

void to int but which one am I going to

return the a or the B the whole goal is

to swap two values and it seems kind of

lame if you can't write a function to do

something as common per last week's

sorting algorithms as swapping two

values but what really happens well even

though when this program starts running

main is using this chunk of memory at

the bottom in the so-called stack and

the stack is just like a cafeteria stack

of trays it grows up like this here's

Maine's memory on the stack here's the

swap functions memory on the stack it's

using three ins instead of two instead

of only two

what happens when the function returns

whether it's void or not the sort of

recollection that this is swaps memory

goes away and garbage values are left so

adorably we get rid of these values here

and there's still data there technically

the numbers 1 1 and 2 are still there in

the computer's memory but they no longer

belong to us because the function has

now returned so they're still in there

and this is kind of an example visually

of why there's other stuff in memory

even though you didn't put it there


necessarily sometimes you did put it

there but now once swap returns you only

should be touching memory inside of main

but we've never actually copied one

value

into Maine we haven't returned anything

and we haven't solved this fundamentally

so how could we do this well what if we

instead passed into swap not copies of X

and Y calling them A and B what if they

passed in breadcrumbs to X and Y sort of

a Treasures map that will lead swap to

the actual X and to the actual y today

we have that capability using pointers

so suppose that we use this code instead

there's a lot of stars going on here

which is a bit annoying but let's

consider what it is we're trying to

achieve what if we pass in not X and Y

but the address of X and the address of

Y respectively breadcrumbs if you will

that will lead swap to the original

values then what we do is we still give

ourselves a temp variable Like An Empty

Glass it's still a glass so we still

call it an INT but what do we want to

put into that temporary variable we

don't want to put a into it because

that's an address now we want to go to


that address per the star and put

whatever's at that address what do we

then want to do well we want to then

copy into whatever's at location a we

want to copy over to location A's

contents whatever is at locations B's

contents and then lastly we want to copy

temp into whatever's at location B so

again we're very deliberately

introducing all of these stars because

we don't want to change any of these

addresses we want to go to these

addresses per the dereference operator

and put values there or get values from

so what does this actually mean well if

I kind of rewind in this story and I go

back here I still have temp although I'm

going to delete its value to begin with

I still have uh

B and I still have a but what's going to

be different this time is how I use a

and b so let me finish erasing those

that's a on the left this is B on the

right at this point in the story we're

re-running swap with this new and

improved version and let's see what

happens well

X is presumably at some address maybe

it's like

ox123 as always what then does a get


when I'm using this code the value of a

is

ox123 what is the value of B maybe y is

at Ox four five six what goes in B well

I'm going to put Ox four five six and

then what am I going to do based on

these three lines of code I'm going to

store in temp whatever is at the address

in a what is the address in a that's

this thing here so I'm going to put 1 in

temp line two I'm going to go to B all

right B is four five six so I'm going to

B and I'm going to store 2 at whatever

is at location a and at location a is

one two three so that's this so what am

I going to do I'm going to change this

one to a 2.

last line of code get the value of temp

which is one and then put it at whatever

the location B is so B four five six go

there and change it to be the value of

temp TMP which puts one here that's it

for the code there's still no return

value swap returns which means these

three temporary variables are sort of

garbage values now that can be reused by

subsequent function calls but now I've

actually swapped the values of X and Y

which is to say what came as naturally


as the real world here is for Mariana is

not quite as simply done in C because

again functions are sort of isolated

from each other you can pass in values

but you get copies of those values if

you want one function to affect the

value of a variable somewhere else

you have to one understand what's going

on but two pass things in as by a

pointer here so if I go back to my code

here I need to make a few changes now

let me get rid of these extra printfs

let me go in and add all these Stars

so I'm dereferencing these actual

addresses

here and here and I've got to make one

more change

how do I now call swap

if swap is expecting an instar and an

instar that is the address of an INT and

the address of another int what do I

change on line 11 here yeah

sorry a little ladder

sorry the the address of operator so up

here on line 11 we do Ampersand X and

Ampersand y so that yes we're

technically passing in a copy of a value

but this time the copy we're passing in

is technically an address and as soon as

we have an address just like when I held


up the fuzzy finger or the foamy finger

I can point at that address I can go to

that address and actually get a value

from the mailbox or put a value into the

mailbox if I even want so let's cross

our fingers now and do make

swap enter oh my God so many mistakes oh

I didn't remember to change my prototype

so let me go way up here and add two

more stars because I made that change

already make swap dot slash Swap and

voila now I have actually swapped thank

you

thank you the two values all right so

what more can we do here well let me

consider that

all this time we've been deliberately

using get string and get int and get

float and so forth but for a reason

these aren't just training wheels for

the sake of like making things easier

they're actually in place to make your

code safer and to illustrate this let me

go ahead and open up one other file here

how about a file called uh code uh

scanf.c it turns out that the old school

way the way and see really of getting

user input is via functions like scanf

and let me go ahead and include standard


io.h int main void and without using the

cs50 library at all for Strings or for

any of those get functions let me give

myself an INT called X let me just print

out what the value of x is even though

it's going to be um or rather ask the

user for the value by asking them for x

and I'm going to use a function called

scanf that's going to scan in an integer

using percent I and I'm going to store

whatever the human types in at this

location and then I'm going to go ahead

and just so we can see what happened I'm

going to print out with percent I

whatever the human typed in as follows

all right so line 8 is week one style

code line 5 and 6 is week one style code

so the Curiosity today is this new line

scanf is another function in

standardio.h and notice what I'm doing

I'm using the same syntax that I use for

printf which is kind of a little clue a

format code to tell scan F what it is I

want to scan in that is read from the

human's keyboard and I'm telling it

where to put whatever the human typed in

I can't just say x because we run into

the same darn problem as with swap I

have to give a little breadcrumb to the

variable where I want scanf to put the


human's integer and so this just tells

the computer to get an INT this is what

you would have had to type essentially

in week one just to get an INT from the

user and there's a whole bunch of things

that can go wrong still but that's the

cryptic syntax we would have had to show

you in week one let me go ahead and make

scan F here oops uh

user error put the semicolon in the

wrong place make scan F enter oh my God

uh non-void doesn't return a value

oh thank you

strike two okay make scan F there we go

okay so scan F I'm going to type in a

number like 50 and it just prints it

back out so that is the sort of

traditional way of implementing

something like get int the problem

though is when you start to get into

Strings things get dangerous quickly let

me get delete all of this and give

myself a string s although wait a minute

we don't call it strings anymore Char

star to store a string then let me go

ahead and just prompt the user for a

string using just printf then let me go

ahead and use scanf ask them for a

string this time with percent s and


store it at that address then let me go

ahead and print out whatever the human

typed in just by using the same notation

so here line five is the same thing as

string s but we've taken back that layer

today so it's Char star s this is just

week one this is just week one line

seven is new scanf will also read from

the human's keyboard a string and store

it at s but that's okay because s is an

address it's correct not to do the

Ampersand it's not necessary a string is

and has always been a Char star AKA

string

the problem though arises as follows if

I do make scan f

oh my god oh uh I can't okay we have

certain defenses in place with make let

me do clang of scan f

dot C and output a program called scanf

all right so I'm overriding some of our

pedagogical defenses that we have in

place with make let me now run scanf of

this version enter and let me type in

something like uh how about hi again

huh

so it didn't even store something and it

weirdly printed out no this time it's in

lower case but that is somewhat related

what did I fundamentally do wrong though


here

why is this getting more and more

dangerous and let me illustrate the

point even more what if I type in not

just something like hello which also

doesn't work what if I do like hello and

make a really long string enter

that still works let's kind of do this

again

let's try again

right a really long unexpectedly long

string this is the non-determinism

kicking it enter all right damn it I was

trying to trigger a segmentation fault

but it wouldn't uh but the point Still

Remains it's still not working but

what's the essence of Y

this isn't working and it's not storing

my actual input yeah

we have to make space for it so what

we're missing here is malloc or

something like that so I I could do that

I could do something like this well let

me let the human type in at least a

three letter word so I could do malloc

of three plus one for the new for the

null character so like let me give them

four characters and let me go ahead and

do make scan F whoops uh nope sorry


clang I have to Circ nope damn it oh

include standard lib dot h

there we go that gives me malloc now I'm

going to recompile this with clang now

I'm going to rerun it and now I'm going

to type in my first thing hi that now

works and let me get a little aggressive

now and type in hello which is too long

still works but I'm getting lucky let me

try it hello

damn it that still works too

sort of but it actually not quite

there's some weirdness going on there

already it turns out I can also do this

I could actually just say Char star star

4 and give myself an array of four

characters let me try this one more time

so let me rerun clang

dot slash scanf hello clearly exceeding

the four characters

there we go thank you all right so

the the point here though is if we

hadn't given you get int you would have

had to use the Scana thing not a huge

deal because it seemed to work but if we

hadn't given you get string you would

have had to do stuff like this knowing

about malloc already or knowing about

strings being erased and even now

there's a danger if the human types in


five letters six letters a hundred

letters this code like with the hello

input will probably just crash which is

bad so get string also has this

functionality built in where we have a

fancy Loop inside such that we allocate

using Matlock as many bytes as you

physically type in and we use malloc

essentially every keystroke the moment

you type in

h-e-l-l-o we're sort of like laying the

tracks as we go and we keep allocating

more and more memory so that we

theoretically will never crash with get

string even though it's this easy to

crack this easy to crash your code using

scanf if you again did it without the

help of a library so where are we all

going with this well let me show you a

few final examples that'll pave the way

for what will be problem set for let me

go ahead and open up from today's code

which is available on the course's

website for instance a program like this

called phonebook.c and I'm just going to

give you a quick tour of it that you'll

see more details on in the context of

preset 4 itself we're going to introduce

a few new functions you're going to see


you're going to see a function called f

open which stands for file open and it

takes two arguments the name of a file

to open like a CSV that you might

manipulate in Excel or Google

spreadsheets or the like comma separated

values and then something like a for

append R for read W for right depending

on whether you want to add to the file

just open it up or change it we're going

to introduce you to a file pointer

you'll see that Capital file which is a

little non-conventional Capital file is

a pointer to an actual file on the

computer's hard drive so that you can

actually access something like a CSV

file or heck even images and we're going

to see down below that you're also going

to have the ability to write files as

well or print to files you'll see

functions like printf printf for file

printf or F right file right which now

that you will begin to understand

pointers you'll have the ability ability

to actually not only read files text

files images other things but also write

them out in fact for instance just as a

teaser here jpegs will be one of the

things we focus on this week where we

give you a forensic image and your goal


is to recover as many photographs from

this forensic image of like a digital

camera as you possibly can and the way

you're going to do that is by knowing in

advance that every jpeg in the world

starts with these three bytes written in

hexadecimal but these three numbers and

so in fact just as a teaser let me open

up an example you'll see on the courses

website for today if I scroll through

here you'll see a program that does a

little something like this and again

more on this and if we could hit the

button oh there we go so here we have

um the notion of a byte we're going to

create for ourselves we'll see a data

type called byte which is a common

convention this gives me three bytes and

you're going to learn about a function

called f read which reads from a file

some number of bytes for instance three

bytes we might then use code like this

if bytes bracket zero equals equals oxf

and bytes bracket one equals 0x d8 and

bytes bracket two equals zero xff all

three of those bytes I just claim to

represent a JPEG you'll see an output

like this let me go ahead and run this

program as follows let me copy jpeg.c


into my directory from today's

distribution let me do make jpeg

and let me run jpeg on a file which is

available online called lecture.jpg and

I claim yes it's possibly a JPEG well

what is that file let me open it up for

us called lecture.jpg and here for

instance is that same photo with which

we began class namely implemented as a

JPEG but what we're also going to do

this week is start to implement our own

sort of filters a la Instagram whereby

we might take images and actually run

them through a program that creates

different versions thereof for instance

using a different file format called BMP

which essentially lays out all of its

pixels from left to right top to bottom

in a grid you're going to see a struct a

data structancy that's way more

complicated than like the candidate

structure from the past or the person

structure from the past that looks like

this which is just a whole bunch more

values in it but we'll walk you through

these in the pset and we might take a

photograph like this and ask you to run

a few different filters on it all on

Instagram like a black and white filter

or grayscale a CPA filter to give it


some old school field or a reflection

like this to an invert it or blur it

even in this way and just to end on a

note here I have a version of this code

ready to go that doesn't Implement all

of those filters it just implements one

filter initially let me go ahead and

just ready this on my computer here I'm

going to go into my own version of

filter and you'll see a few files that

will give you a tour of this coming week

in bitmap.h for instance is a version of

this structure that I claimed existed a

moment ago and let me show you this file

here

helpers.c in which there is

a function called filter that I've

already implemented in advance today but

the ones we give you for the piece that

won't already be implemented this

function called filter takes the height

of an image the width of an image and a

two-dimensional array so rows and

Columns of pixels and then I have a loop

like this that iterates over all of the

pixels in an image from top to bottom

left to right and then notice what I'm

going to do here I'm going to change the

blue value to be zero in this case and


the green value to be zero in this case

but why well the image I have here in

mind is this one whereby we have this

hidden image that simply has sort of old

school style a secret message embedded

in it and if you don't happen to have in

your dorm like one of these sort of

secret decoder glasses that essentially

make everything red getting rid of the

Green in the world and the blue in the

world you can actually I'm actually

probably the only one who can read this

right now see what messages hidden

behind all of this red noise but if

using my code written here in helpers.c

I get rid of all the blue in the picture

and I get rid of all the Green in the

picture essentially implementing the

idea of this filter this red filter

where you only see red well let's go

ahead and compile this program make

filter run dot slash filter on this

hiddenmessage.bmp I'm going to save it

in a new file called message.bmp and

with one final flourish we're going to

open up message.bmp which is the result

of having put on these glasses and

hopefully now you too will see what I

see

foreign
all right that's it for cs50 we'll see

you next time

[Music]

foreign

[Music]

foreign

[Music]

[Music]

this is cs50 and this is already week

five which means this is actually our

last week in C together in fact in just

a few days time what has looked like

this and much more cryptic than this

perhaps is going to be distilled into

something much simpler next week when we

transition to a language called Python

and with python we'll still have our

conditionals and loops and functions and

so forth but a lot of like the low level

Plumbing that you might have been

wrestling with struggling with

frustrated by over the past couple of

weeks especially now that we've

introduced pointers and it feels like

you probably have to do everything

yourself in Python and in a lot of

higher level languages so to speak more

modern more recent languages you'll be

able to do so much more with just single


lines of code and indeed we're going to

start leveraging libraries all the more

code that other people wrote uh

Frameworks which is collections of

libraries that other people wrote and on

top of all that will you be able to make

even better or grander more impressive

projects that actually solve problems of

particular interest to you particularly

by way of your own final project

so last week though in week four recall

that we focused on memory and we've been

treating this memory inside of your

computers kind of like a canvas right at

the end of the day it's just zeros and

ones or bytes really and it's really up

to you what you do with those bytes and

how you interconnect them how you

represent information on them and arrays

were like one of the simplest ways we

started playing around with that memory

just contiguous chunks of memory back to

back to back but let's consider for a

moment some of the problems that pretty

quickly arise with arrays and then today

focus on what more generally are called

data structures using your computer's

memory as a much more versatile canvas

to create even two dimensional

structures to represent information and


ultimately to solve more interesting

problems so here's an array of size

three maybe the size of three integers

and suppose that this is inside of a

program and at this point in the story

you've got three numbers in it already

one two and three and suppose whatever

the context you need to now add a fourth

number to this array like the number

four well instinctively where should the

number four go if this is your

computer's memory and we currently have

this array one two three from what left

to right

where should the number four just

perhaps naively go yeah what do you

think

sorry

oh okay so you could replace number one

I don't really like that though because

I'd like to keep number one around but

that's an option but I'm losing of

course information so what else could I

do if I want to add the number four over

there

yeah so I mean it feels like if there's

some ordering to these which seems kind

of a reasonable inference that it

probably belongs somewhere over here but


recall last week as we started poking

around a computer's memory there's other

stuff potentially going on and if we

sort of fill that in ideally we'd want

to just plop the number four here if

we're maintaining this kind of order but

recall in the context of your computer's

memory there might be other stuff there

some of these garbage values that might

be usable but we don't really know or

care what they are as represented by

Oscar here but there might actually be

useful data in use like if your program

has not just a few integers in this

array but also a string that says like

Hello World it could be that your

computer has plopped the

h-e-l-l-o-w-o-r-l-d right after this

array why well maybe you created the

array in one line of code and filled it

with one two three maybe the next line

of code used get string or maybe just

hard-coded a string in your code for

hello world and so you kind of painted

yourself into a corner so to speak now I

think you might claim well let's just

overwrite the H but that's kind of

problematic for the the same reasons we

don't want to do that

so where else could the four go or how


do we solve this problem if we want to

add a number and there's clearly memory

available because those garbage values

are junk that we don't care about

anymore so we could certainly reuse

those

where could the four and perhaps this

whole array go

okay so I'm hearing we could move it

somewhere maybe replace some of those

garbage values and honestly we kind of

have a lot of options we could use any

of these garbage values up here we could

use any of these down here or even

further down the point is there is

plenty of memory available as indicated

by these Oscars where we could put four

maybe even five six or more integers the

catch is that we sort of chose poorly

early on or we just got unlucky and one

two three ended up back to back with

some other data that we care about all

right so that's fine let's go ahead and

assume that we'll abstract away

everything else and we'll plop the new

array in this location here so I'm going

to go ahead and copy the one over the

two over the three over and then

ultimately once I'm ready to fill the


four I can throw away essentially the

old array at this point because I have

it now entirely in duplicate and I can

populate it with the number four all

right so problem solved that is a

correct potential solution to this

problem but what's the trade-off and

this is something we're going to start

thinking about all the more what's the

downside of having solved this problem

in this way yeah

yeah I'm adding a lot of running time it

took me a lot of effort to copy those

additional numbers now granted it's a

small array three numbers who cares it's

going to be over in the blink of an eye

but if we start talking about

interesting data sets sort of uh web

application data sets mobile app data

sets where you have not just a few but

maybe a few hundred a few thousand a few

million pieces of data this is probably

kind of a sub-optimal solution to just

oh move all your data from one place to

another because who's to say that we're

not going to paint ourselves into a new

corner and it would feel like you're

wasting all of this time moving stuff

around and ultimately just costing

yourself a huge amount of time in fact


if we put this now into the context of

our Big O notation from a few weeks back

what might the running time now of

search be for an array let's start

simple I'll throw back a couple of weeks

ago if you're using an array to recap

what was the running time of a search

algorithm in Big O notation so maybe in

the worst case

if you've got n numbers 3 in this case

or four but N More generally they go of

what for search

yeah what do you think

Big O of N and what's your intuition for

that

[Music]

okay yeah so if we go through each

element for instance from left to right

then search is going to take us Big O

notation a big O running time if though

we're talking about these numbers

specifically and now I'll explicitly

stipulate that yeah they're sorted does

that bias anything what would the Big O

notation be for searching an array in

this case be it of size three or four or

N More generally

big goal of not end but rather login

right because we could use per week zero


binary search on an array like this we'd

have to deal with some rounding because

there's not a perfect number of elements

at the moment but you could use binary

search go to the middle roughly and then

go left or right left or right until you

find the element you care about so

search remains in Big O of log and when

using arrays but what about insertion

now if we start to think about other

operations like adding a number to this

array or adding a friend to your

contacts app or Google finding another

page on the internet so insertion

happens all the time what's the running

time of insert

when it comes to inserting into an

existing array of size n how many steps

might that take

Big O of n it would be indeed n why

because in the worst case where you're

sort of out of space you have to

allocate it would seem a new array maybe

taking over some of the previous garbage

values but the catch is even though

you're only inserting one new number

like the number four you have to copy

over all the darn existing numbers into

the new one so if your original array is

size n the copying of that is going to


take Big O of n plus one but we can

throw away the plus one because the math

we did in the past so insert now becomes

Big O of N and that might not be ideal

because if you're in the habit of

inserting things frequently that could

start to add up and add up and add up

and this is why computer programs and

websites and mobile apps could be slow

if you're not being mindful of these

kinds of trade-offs so what about uh

just for good measure uh Omega notation

and maybe the best case well just to

recap here we could get lucky and search

could just take one step because you

might just get lucky and boom the number

you're looking for is right there in the

middle if using binary search or even

linear search for that matter and in or

two if there's enough room and we didn't

have to move all of those numbers one

two and three to a new location you

could get lucky and we could have a

someone suggested just put the number

four right there at the end and if we

don't get lucky it might take end steps

if we do get lucky it might just take

the one or constant number of steps in

fact let me go ahead and do this how


about we do something like this let me

switch over to some code here let me

start to make a program called list.c

and in list.c let's start with the old

way so we kind of follow our the

breadcrumbs we've laid for ourselves as

follows so in this list.c I'm going to

include standardio.h int main void as

usual then inside of my code here I'm

going to go ahead and give myself the

first version of memory so int list

three is now implemented at the moment

in an array so we're rewinding for now

to week two style code and then let me

just initialize this thing at the first

location will be one at the next

location will be two and at the last

location will be three so the array is

zero indexed always I for just a the

sake of discussion though and putting in

the numbers one two three like a normal

person might all right so now let's just

print these out four into I get zero I

less than three I plus plus let's go

ahead now and print out using printf

percent I backslash n list bracket I so

very simple program kind of inspired by

what we did in week two just to create

and then print out the context of an

array so let's make list


so far so good dot slash list and voila

we see one two three now let's start to

practice some of what we're preaching

with this new syntax so let me go in now

and get rid of the array version and let

me zoom out a little bit to give

ourselves some more space and now let's

begin to create a list of size three so

if I'm going to do this now dynamically

so that I'm allocating these things

again and again

let me go ahead and do this let me give

myself a list

that's of type instar

equal the return value of malloc of

three times whoops three times the size

of an INT so what this is going to do

for me is give me enough memory for that

very first picture we drew on the board

which was the array containing one two

and three but laying the foundation to

be able to resize it which was

ultimately the goal so my syntax is a

little different here I'm going to use

malloc and get memory from the so-called

Heap as we called it last week instead

of using the stack by just doing the

previous version where I said int list

three that is to say this line of code


from the first version is in some sense

identical to this line of code in the

second version but the first line of

code puts the memory on the stack

automatically for me the second line of

code that I've left here now is creating

an array of size three but it's putting

it on the Heap and that's important

because it was only on the Heap and Via

this new function last week malloc that

you can actually ask for more memory and

even give it back when you just use the

first notation int list three you have

permanently given yourself an array of

size three you can not add to that in

code so let me go ahead and do this if

list equals equals null something went

wrong the computer's out of memory so

let's just return one and quit out of

this program there's nothing to see here

so just a good error check there now let

me go ahead and initialize this list so

list bracket 0 will be one again list

bracket one will be two and list bracket

two will be three so that's the same

kind of syntax as before and notice this

equivalence recall that there's this

relationship between chunks of memory

and arrays and arrays are really just

doing pointer arithmetic for you or the


square bracket notation is so if I've

asked myself here in line five for

enough memory for three integers it is

perfectly okay to treat it now like an

array using square bracket notation

because the computer will do the

arithmetic for me and find the first

location the second and the third if you

really want to be kind of cool and

hacker like well you could say list

equals one list plus one equals two list

plus two equals three that's the same

thing using very explicit pointer

arithmetic which we looked at briefly

last week but this is atrocious to look

at for most people it's just not very

user friendly it's longer to type so

most people even when allocating memory

dynamically as I did a second ago would

just use the more familiar notation of

an array all right so let's go on now

suppose time passes and I realize oh

shoot I really wanted this array to be

of size 4 instead of size three now

obviously I could just rewind and like

fix the program but suppose that this is

a much larger program and I've realized

at this point that I need to be able to

dynamically add more things to this


array for whatever reason well let me go

ahead and do this let me just say all

right list should actually be the result

of asking for four

uh chunks of memory for malloc and then

I could do something like this uh list

bracket 3 equals four

now this is buggy potentially in a

couple of ways but let me ask first

what's really wrong first with this code

the goal at hand is to start with the

array of size three with the one two

three and I want to add a number four to

it so at the moment in line 17 I've

asked the computer for a chunk of four

integers just like the picture and then

I'm adding the number four to it but I

kind of have skipped a few steps and

broken this somehow yeah

[Music]

yeah I don't necessarily know where this

is going to end up in memory it's

probably not going to be immediately

adjacent to the previous chunk and so

yes even though I'm putting the number

four there I haven't copied the one the

two or the three over to this chunk of

memory so well let me fix well that's

actually indeed really the essence of

the problem I am orphaning the original


chunk of memory if you think of the

picture that I drew earlier the line of

code up here on line five that allocates

space for the initial three integers

this code is fine this code is fine but

as soon as I do this I'm clobbering the

value of list and saying no no don't

point at this chunk of memory point it

this chunk of memory at which point I've

forgotten if you will where the original

chunk of memory is so the right way to

do something like this would be a little

more involved let me go ahead and give

myself a temporary variable and I'll

literally call it temp TMP kind of like

I did last week so that I can now ask

the computer for a completely different

chunk of memory size four I'm gonna

again say if temp equals null I'm going

to say oh bad things happened here so

let me just return one and you know what

just to be tidy let me free the original

list before I quit because remember from

last week anytime you use malloc you

eventually have to use free but this

chunk of code here is just a safety

check if there's no more memory there's

nothing to see here I'm just going to

clean up my state and quit but now if I


have asked for this chunk of memory now

I can do this for INT I get whoops for

INT I get zero I is less than three I

plus plus what if I do something like

this temp bracket I equals list bracket

I that would seem to have the effect of

copying all of the memory from one to

the other and then I think I need to do

one last thing temp bracket three gets

the number four for instance again I'm

kind of just hard coding the numbers for

the sake of discussion

after I've done this what could I now do

I could Now set list equals to temp

and now I have updated my linked list

properly so let me go ahead and do this

for INT I get zero I is less than 4 I

plus plus let me go ahead and print each

of these elements out with percent I

using list bracket I and then I'm going

to return 0 just to signify that all is

successful now so to recap we initialize

the original array of size three and

plug in the values one two three

time passes and then I realized wait a

minute I need more space and so I asked

the computer for a second chunk of

memory this one of size four just as a

safety check I make sure that temp

doesn't equal null because if it does


I'm out of memory so I should just quit

all together but once I'm sure that it's

not null I'm going to copy all the

values from the old list into the new

list

and then I'm going to add my new number

at the end of that list and then now

that I'm done playing around with this

temporary variable I'm going to remember

in my list variable what the addresses

of this new chunk of memory and then I'm

going to print all of those values out

so at least aesthetically when I make

this new version of my list except for

my missing semicolon let me try this

again when I make list okay what I do

this time implicitly declaring a library

function malloc dot dot dot what's my

mistake anytime you see that kind of

error

yeah a library so up here I forgot to do

include standard lib dot h which is

where malloc lives let me go ahead and

again do make list there we go so I

fixed that dot slash list and I should

see one two three four

but there's still a bug here

does anyone see the bugger question

oh sorry say again


I forgot to free the original list and

we could see this even if not just with

our own eyes or intuition if I do

something like Val grind of dot slash

list remember our tool from this past

week let me increase the size of my

terminal window temporarily the output

is crazy cryptic at first but notice

that I have definitely lost some number

of bytes here and indeed it's even

pointing at the line number in which

some of those bytes were lost so let me

go ahead and back to my code and indeed

I think what I need to do is before I

clobber the value of list pointing it at

this new chunk of memory instead of the

old I think I now need to First

proactively say free the old list of

memory and then change its value so if I

now do make list and do dot slash list

the output is still the same and if I

cross my fingers and run valgrind again

after increasing my window size

hopefully here

ah still a bug so better it seems like

less memory is lost

what have I now forgotten to do

I forgot to free it at the very end too

because I still have a chunk of memory

that I got from malloc so let me go to


the very bottom of the program now and

after I'm done sort of uh sort of

senselessly just printing this thing out

let me free the new list

and now let me do make list dot slash

list it still works visually now let's

do valgrind of

dot slash list enter and now hopefully

all Heap blocks were freed no leaks are

possible so this is perhaps the best

kind of output you can see from a tool

like valgrin I use the Heap but I freed

all the memory as well so there were two

fixes needed there are any questions

then on this array-based approach the

first of which is statically allocating

an array so to speak by just hard coding

the number three this second version now

is dynamically allocating the array

using not the stack but the Heap but it

too suffers from the slowness we

described earlier of having to copy all

those values from one to the other okay

I'll hand was over here

[Music]

good question why did I not have to free

the temp I essentially did eventually

because temp was pointing at the chunk

of four integers but on line 33 here I


assigned list to be identical to what

temp was pointing at and so when I

finally freed the list that was the same

thing as freeing temp in fact if I

wanted to I could say free temp here and

it would be the same but conceptually

it's sort of wrong because at this point

in the story I should be freeing the

actual list not that temporary variable

but they were the same at that point in

the story yeah

good question and long story short

everything we're doing thus far is still

in the world of arrays the only

distinction we're making is that in

version one when I said int list bracket

three close bracket that was an array of

fixed size so-called statically

allocated on the stack as per last week

this version now is still dealing with

arrays but I'm kind of flexing my

muscles and using dynamic memory

allocation so that I can still use an

array per the first pictures we started

talking about but I can at least grow

the array if I want so we haven't even

now solved this even better in a sense

with linked list that's going to come

next yeah

foreign
how am I able to free list I

freed the original address of list I

then changed what list is storing I'm

moving its Arrow to a new chunk of

memory and that is perfectly reasonable

for me to now manipulate because now

list is pointing at the same value of

Temp and temp is what was given the

return value of malloc the second time

so that chunk of memory is valid so

these are just

um you know squares on the board right

there's just pointers inside of them so

what I'm technically saying is I'm not

pointing I'm not freeing list per se I

am freeing the chunk of memory that

begins at the address currently enlist

therefore if a few lines later I change

what the address is in list totally

reasonable to then touch that memory and

eventually free it later because you're

not freeing the variable per se you're

freeing the address in the variable good

distinction all right so let me back up

here and just now make one final edit so

let's finish this with one final

Improvement here because it turns out

there's a somewhat better way to

actually resize an array as we've been


doing here and there's another function

in standard lib that's called realloc

for reallocate and I'm just going to go

in and make a little bit of a change

here so that I can do the following let

me go ahead and first comment this now

just so we can keep track of what's been

going on this whole time so dynamically

allocate an array of size three

assign three numbers to that array

time passes

allocate new array of size four

[Music]

copy numbers from old array into new

array

and add fourth number to new array

free old array

um remember if you will new array using

my same list variable and now print new

array

free new array hopefully that helps and

we'll post this code online after two

which tells a more explicit story so it

turns out that we can reduce some of the

labor involved with this not so much

with the printing here but with this

copying turns out C does have a function

called realloc that can actually handle

the resizing of an array for you as

follows I'm going to scroll up to where


I previously allocated a new array of

size four and I'm instead going to say

this

resize old array to be of size 4. now

previously this wasn't necessarily

possible because recall that we had

painted ourselves into a corner with the

example on the screen where hello world

happened to be right after the original

array but let me do this let me use

re-alloc for reallocate and pass in not

just the size of memory we want this

time but also the address that we want

to resize which again is this array

called list

all right the code thereafter is pretty

much the same but what I don't need to

do

is this so re-adlock is a pretty handy

function that will do the following if

at the very beginning of class when we

had one two three on the board and

someone's Instinct was to just plop the

four right at the end of the list if

there's available memory realc will just

do that and boom it will just grow the

array for you in the computer's memory

if though it realizes sorry there's also

there's already a string like hello


world or something else there re-adlock

will handle the trouble of moving that

whole array from one chunk of memory

originally to a new chunk of memory and

then realock will return to you the

address of that new chunk of memory and

it will handle the process of freeing

the old chunk for you so you do not need

to do this yourself so in fact let me go

ahead and get rid of this as well

so re-alak just condenses a lot of what

we just did into a single function

whereby

realock handles it for you all right so

that's the final Improvement on this

array-based approach so what now knowing

what your memory is what can we now do

with it that solves that kind of problem

because the world is going to get really

slow in our apps and our phones and our

computers are getting it really slow if

we're just constantly wasting time

moving things around in memory what

could we perhaps do instead well there's

just one new piece of syntax today that

builds on these three pieces of syntax

from the past recall that we've looked

at struct which is a keyword in C that

just lets you invent your own structure

your own variable if you will in


conjunction with type def which lets you

say a person has a name and a number or

something like that or a candidate it

has a name and some number of votes you

can encapsulate multiple pieces of data

inside of just one using struct what did

we use the dot notation for now a couple

times

what does the dot operator do in C

and

perfect to access the field inside of a

structure so if you've got a person with

a name and a number you could say

something like person.name or

person.number if person is the name of

one such variable star of course we've

seen now in a few ways like way back in

week one we saw it as like

multiplication uh last week we began to

see it in the context of pointers

whereby you use it to declare a pointer

like int star P or something like that

but we also saw it in one other context

which was like the opposite which was

the dereference operator which says if

this is an address that is if this is a

variable like a pointer and you put a

star in front of it then with no int or

no char No data type in front of it that


means go to that address and it

dereferences the pointer and goes to

that location so it turns out that using

these three building blocks you can

actually start to now use your

computer's memory almost any way you

want and even next week when we

transition to Python and you start to

get a lot of features for free like a

single line of code will just do so much

more in Python than it does in C it

boils down to those basic Primitives and

just so you've seen it already it turns

out that it's so common in C to use this

operator to go inside of a structure and

this operator to go to an address that

there's shorthand notation for it AKA

syntactic sugar that literally looks

like an arrow so recall last week I was

in the habit of pointing even with the

big foam finger this arrow notation a

hyphen and a angled bracket denotes

going to a an address and looking at a

field inside of it but we'll see this in

practice in just a bit

so what might be the solution now to

this problem we saw a moment ago whereby

we had painted ourselves into a corner

and our memory a few moments ago looked

like this we could just copy the whole


existing array to a new location add the

four and go about our business what

would another perhaps better solution

longer term be that doesn't require

constantly moving stuff around

and maybe hang in there for your

instincts if you know the sort of Buzz

phrase we're looking for from past

experience hang in there

but if we want to avoid moving the one

two and the three but we still want to

be able to add endless amounts of data

what could we do

yeah so maybe create some kind of list

using pointers that just kind of point

at a new location right in an ideal

world even though this uh piece of

memory is being used by this H in the

string hello world maybe we could

somehow use a pointer from last week

like an arrow that says after the three

oh I don't know go down over here to

this location in memory and you just

kind of stitch together these integers

in memory so that each one leads to the

next it's not necessarily the case that

it's literally back to back that would

have the downside it would seem of

costing us a little bit of space like a


pointer which recall takes up some

amount of space typically eight bytes or

64 bits but I don't have to copy

potentially a huge amount of data just

to add one more number and so these

things do have a name and indeed these

things are what generally would be

called a

linked list a linked list captures

exactly that kind of intuition of

linking together things in memory so

let's take a look at an example here's

computer's memory in the abstract

suppose that I'm trying to create an

array no let's generalize it as a list

now of numbers an array has a very

specific meaning it's memory that's

contiguous back to back to back at the

end of the day I as the programmer just

care about the data one two three four

and so forth I don't really care how

it's stored until uh I don't care how

it's stored when I'm writing the code I

just want it to work at the end of the

day so suppose that I first insert my

number one and who knows it ends up up

there at location ox123 for the sake of

discussion all right maybe there's

something already here and heck maybe

there's something already here but


there's plenty of other options for

where this thing can go and suppose that

for the sake of discussion the first

available spot for the next number

happens to be over here at uh location

Ox 456 for the sake of discussion so

that's where I'm going to plop the

number two and where might the number

three end up oh I don't know maybe down

over there at Ox 789 the point being I

don't know what is or really care about

everything else that's in the computer's

memory I just care that there are at

least three locations available where I

can put my one my two and my three but

the catch is now that we're not using an

array we can't just naively assume that

you just add one to an index and boom

you're at the next number add two to an

index and boom you're at the next next

number now you kind of have to leave

these little breadcrumbs or use the

arrow notation to kind of lead from one

to the other and sometimes it might be

close a few bytes away maybe it's a

whole gigabyte away in an even bigger

computer's memory

so how might I do this like where do

these pointers go as you proposed


right all I have access to here are

bytes I've already stored the one the

two and the three so what more should I

do

okay yeah so let me you put the pointers

right next to these numbers so let me at

least plan ahead so that when I ask the

computer like malloc recall from last

week for some memory I don't just ask it

now for space for just the number let me

start getting into the habit of asking

Matlock for enough space for the number

and a pointer to another such number so

it's a little more aggressive of me to

ask for more memory but I'm kind of

planning ahead and here's an example of

a trade-off almost any time in CS when

you start using more space you can save

time or if you try to conserve space you

might have to lose time it's being that

kind of trade-off there so how might I

solve this well let me abstract this

away and either next to or below I'm

just drawing it uh vertically just for

the sake of discussion so the arrows are

a bit prettier

I've asked malloc for now twice as much

space it would seem than I previously

needed but I'm going to use this second

chunk of memory to refer to the next


number and I'm going to use this chunk

of memory defer to the next essentially

stitching this thing together so what

should go in this first box well I claim

the number

ox456 and it's written in HEX because it

represents a memory address but this is

the equivalent of sort of drawing an

arrow from one to the other

as a little check here what should go in

this second box if the goal is to stitch

these together in order one two three

feel free to just shout this out

okay oh okay that worked well so Ox 789

indeed and you can't do that with the

hands because I can't count that fast so

ox789 should go here because that's like

a little breadcrumb to the next and then

we don't really have terribly many

possibilities here this has to have a

value right because at the end of the

day it's Gotta uh use its 64 bits in

some way so what value should go here if

this is the end of this list

so it could be 0x123 the implication

being that it would kind of be a

cyclical list which is okay but

potentially problematic if any of you

have accidentally sort of lost control


over your uh code space because you had

an infinite Loop this would seem a very

easy way to give yourself The Accidental

uh probability of an infinite Loop what

might be simpler than that and Ward that

off

say again

so just the null character not n-u-l

confusingly which is at the end of

strings but n-u-l-l as we introduced it

last week which is the same as ox0 so

this is just a special value that

programmers decades ago decided that if

you store the address is zero that's not

a valid address there's never going to

be anything useful at ox0 therefore it's

a sentinel value just a special value

that indicates that's it there's nowhere

further to go it's okay to come back to

your suggestion of making a cyclical

list but we'd better be smart enough to

maybe remember where did the list start

so that you can detect Cycles if you

start looping around in this structure

otherwise all right but these addresses

who really cares at the end of the day

if we abstract this away it really just

now looks like this and indeed this is

how most anyone would draw this on a

whiteboard if having a discussion at


work talking about what data structure

we should use to solve some problem in

the real world we don't care generally

about the addresses we care that in code

we can access them but in terms of the

concept alone this would be perhaps the

right way to think about this all right

let me pause here and see if there's any

questions on this idea of creating a

linked list in memory by just storing

not just the numbers like one two three

but twice as much data so that you have

little breadcrumbs in the form of

pointers that can lead you from one to

the next

any questions on these linked lists

any questions no all right oh yeah over

here

[Music]

this does take more memory than an array

because I now need space for these

pointers and to be clear I technically

didn't really draw this to scale thus

far in the class we've generally thought

about integers like one two and three as

being four bytes or 32 bits I made the

claim last week that on Modern computers

pointers tend to be eight bytes or 64

bits so technically this box should


actually be a little bigger it was just

going to look a little stupid in the

picture so I abstracted it away but

indeed you're using more space as a

result

oh how does the sorry how does the

computer identify useful data from uh

used data so for instance garbage values

are non-garbage values for now think of

that as the job of malloc so when you

ask malloc for memory as we started to

last week malloc keeps track of the

addresses of the memory it has handed to

you as valid values the other type of

memory you use not just from the Heap

because recall we briefly discussed that

malloc uses space from the Heap which

was drawn at the top of the picture

pointing down there's also stack memory

which is where all of your local

variables go and where all of the memory

used by individual functions go and that

was drawn in the picture is working its

way up that's just an artist's rendition

of Direction

the compiler essentially will also help

keep track of which values are valid or

not inside of the stack or really the

underlying code that you've written will

keep track of that for you so it's


managed for you at that point

all right good question sorry it took me

a bit to catch on so let's now translate

this to actual code how could we

implement this idea of let's call these

things nodes and that's a term of Art

and CS whenever you have some kind of

data structure that encapsulates

information node node is the generic

term for that so each of these might be

said to be a node well how can we do

this well a couple of weeks ago we saw

how we could represent something like a

student or a candidate and a student or

rather a person we said well has a name

and a number and we used a few pieces of

syntax here one we use the struct

keyword which gives us a data structure

we use typedef which defines the name

person to be our new data type

representing that whole structure so we

probably have the right ingredients here

to build up this thing called a node and

just to be clear what should go inside

of one of these nodes do we think it's

not going to be a name or a number

obviously but what should a node have

in terms of those fields perhaps yeah

so a number like a number and a pointer


in some form so let's translate this to

actual code so let's rename person to

node for to capture this notion here and

the number is easy if it's just going to

be an INT that's fine we can just say

int number or int n or whatever you want

to call that particular field the next

one's a little non-obvious and this is

where things get a little weird at first

but in retrospect it should all kind of

fit together let me propose that ideally

we would say something like node star

next and I could call the word next

anything I want next just means what

comes after me is the notion I'm using

it at so a lot of Cs people would just

use next to represent the name of this

pointer but there's a catch here C and C

compilers are pretty naive recall they

only look at code top to bottom left to

right and anytime they encounter a word

they have never seen before bad things

happen like you can't compile your code

you get some cryptic error message or

the like and that seems to be about to

happen here because if the compiler is

reading this code from top to bottom

it's going to say oh inside of this

struct should be a variable called Next

which is of type node star what the heck


is a node because it literally does not

find out until two lines later after

that semicolon so the way to avoid this

which we haven't quite seen before is

that you can temporarily name this whole

thing up here struct node

and then down here inside of the data

structure you say struct node star and

then you leave the rest alone this is

kind of a workaround this is possible

because now you're teaching the compiler

from the first line that here comes a

data structure called struct node down

here you're shortening the name of this

whole thing to just node why it's just a

little more convenient than having to

write struct everywhere but you do have

to write struct node star inside of the

data structure but that's okay because

it's already come into existence now as

of that first line of code so that's the

only fundamental difference between what

we did last week with a person or a

candidate we just now have to use this

this struct work around syntactically

all right yeah question

[Music]

why is the next variable a struct node

star pointer and not an instar pointer


for instance so think about the picture

we are trying to draw technically yes

each of these arrows I deliberately Drew

is pointing at the number but that's not

a loan they need to point at the whole

data structure in memory because the

computer ultimately and the compiler in

turn needs to know that this chunk of

memory is not just an INT it is a whole

node inside of a node is a number and

also another pointer so when you draw

these arrows it would be incorrect to

point at just the number because that

throws away information that would leave

the compiler wondering okay I'm at a

number where the heck is the pointer you

have to tell it that it's pointing at a

whole node so it knows a few bytes away

is that corresponding pointer good

question yeah

really good question it would seem that

just as copying the array earlier

required twice as much memory because we

copied from old to new so technically

twice as much plus one for the new

number here too it looks like we're

using twice as much memory also into my

comment earlier it's even more than

twice as much memory because these

pointers are eight bytes and not just


four bytes like a typical integer is the

differences are these in the context of

the array you were using that memory

temporarily so yes you needed twice as

much memory but then you were quickly

freeing the original array so you

weren't consuming long term more memory

than you might need the difference here

too is that as we'll see in a moment it

turns out it's going to be relatively

quick for me potentially to insert new

numbers in here because I'm not going to

have to do a huge amount of copying and

even though I might still have to follow

all of these arrows which is going to

take some amount of time I'm not going

to have to be asking for more memory

freeing more memory and certain

operations in a computer anything

involving asking for or giving back

memory tends to be slow lower so we get

to avoid that situation as well there's

going to be some downsides though this

is not all upside but we'll see in a bit

just what some of those trade-offs

actually are all right so from here if

we go back to the structure in code as

we left it let's start to now build up a

linked list with some actual code how do


you go about and see representing a

linked list in code well at the moment

it would actually be as simple as this

you declare a variable called list for

instance that itself stores the address

of a node that's what node star means

the address of a node so if you want to

store a linked list in memory you just

create a variable called list or

whatever else and you just say that this

variable is going to be pointing at the

first node in a list wherever it happens

to end up because malloc is ultimately

going to be the tool that we use just to

go get at any one particular node in

memory all right so let's actually do

this in pictorial form when you write a

line of code like I just did here and I

do not initialize it to anything with

the assignment operator an equal sign it

does exist in memory as a box as I'll

draw it here called list but I've

deliberately drawn Oscar inside of it

why to connote what exactly

it's a garbage value I have been

allocated the variable in memory called

list which is going to give me 64 bits

or 8 bytes somewhere drawn here with

this box but if I myself have not used

the assignment operator it's not going


to get magically initialized to any

particular address for me it's not going

to even give me a node this is literally

just going to be an address of a future

node that exists so what would be a

solution here suppose that I'm beginning

to create my linked list but I don't

have any nodes yet what would be a

sensible thing to initialize list two

perhaps

yeah again

so just null right when in doubt with

pointers generally it's a good thing to

initialize things to null so at least

it's not a garbage value it's a known

value invalid yes but it's a special

value you can then check for with a

conditional or the like so this might be

a better way to create a linked list

even before you've inserted any numbers

into the thing itself all right so after

that how can we go about adding

something to this linked list so now the

story looks like this Oscar is gone

because inside of this box is all zero

bits just because it's nice and clean

and this represents an empty linked list

well if I want to add the number one to

this linked list what could I do well


perhaps I could start with code like

this borrowing inspiration from last

week let's ask malloc for enough space

for the size of a node and this kind of

gets to your question earlier like what

is it I'm manipulating here I don't just

need space for an ant and I don't just

need space for a pointer I need space

for both and I gave that thing a name

node so size of node figures out and

does the arithmetic for me and gives me

back the right number of bytes this then

stores the address of that chunk of

memory in what I'll temporarily called n

just to represent a generic new node and

it's of type node star because just like

last week when I asked malloc for enough

space for an INT and I stored it in an

instar pointer this week if I'm asking

for memory for a node I'm storing it in

a node star pointer so technically

nothing new there except for this new

term of Art and data structure called

node all right so what does that do for

me it essentially draws a picture like

this in memory I still have my list

variable from my previous line of code

initialized to null and that's why I've

drawn it blank I also now have a

temporary variable called n which I


initialize to the return value of malloc

which gave me one of these nodes in

memory but I've drawn it having garbage

values too because I don't know what int

is there I don't know what pointer is

there it's garbage values because malloc

does not magically initialize memory for

me there is another function for that

but malloc alone just says sure use this

chunk of memory deal with whatever's

there so how can I go about initializing

this to known values well suppose I want

to insert the number one and then leave

it at that a list of size one I could do

something like this

and this is where you have to think back

to some of these Basics my conditional

here is asking the question if n does

not equal null so that is if malloc gave

me valid memory and I don't have to quit

altogether because my computer is out of

memory if n does not equal null that is

it equal to valid address I'm going to

go ahead and do this and this is cryptic

looking syntax now

but does someone want to take a stab at

translating this inside line of code to

English

in some sense
how might you explain what that inner

line of code is doing star n DOT number

equals one uh let me go further back

no okay over here yeah

perfect the place that n is pointing to

set it equal to one or using the

vernacular of going there go to the

address in n and set its number field to

one however you want to think about it

that's fine but the star again is the

dereference operator here and we're

doing the parentheses which we haven't

needed to do before because we haven't

dealt with pointers and data structures

together until today this just means go

there first and then once you're there

go access number you don't want to do

one thing before the other so this is

just enforcing order of operations the

parentheses just like in grade school

math all right so this line of code is

cryptic it's ugly it's not something

most people easily remember thankfully

there's that syntactic sugar that

simplifies this line of code to just

this and this even though it's new to

you today should eventually feel a

little more familiar because this now is

shorthand notation for saying start at n

go there as by following the arrow and


when you get there change the number

field in this case to one so most people

would not write code like this it's just

ugly it's a couple extra keys strokes

this just looks more like the artist's

Renditions we've been talking about and

how most CS people would think about

pointers as really just being arrows in

some form all right so what have we just

done the picture now after setting

number to one looks a little something

like this so there's still one step

missing and that's of course to

initialize it would seem the pointer in

this new node to something known like

null so I bet we could do this like this

with a different line of code I'm just

going to say if n does not equal null

then Set n is next field to null or more

pedantically go to n Follow the arrow

and then update the next field that you

find there to equal null and again this

is just doing some nice bookkeeping

technically speaking we might not need

to set this to null if we're going to

keep adding more and more numbers to it

but I'm doing it step by step so that I

have a very clean picture and there's no

bugs in my code at this point


but I'm still not done there's one last

thing I'm going to have to do here if

the goal ultimately was to insert the

number one into my linked list what's

the last step I should perhaps do here

just in English is fine yeah

[Music]

yes I now need to update the actual

variable that represents my linked list

to point at this brand new node that is

now perfectly initialized as having an

integer and a null pointer yeah

technically this is already pointing

there but I described this deliberately

earlier as being temporary I just needed

this to get it back from malloc and sort

of clean things up initially this is the

long-term variable I care about so I'm

going to want to do something simple

like this list equals n and this seems a

little weird that list equals n but

again think about what's inside this box

at the moment this is null because there

is no linked list at the beginning of

our story n is the address of the

beginning and it turns out end of our

linked list so it stands to reason that

if you set list equal to n that has the

effect of copying this address up here

or really just copying the arrow into


that same location so that now the

picture looks like this and Heck if this

was a temporary variable it will

eventually go away and now this is the

picture so kind of an annoying number of

steps certainly to walk through the

verbally like this but it's just malloc

to give yourself a node initialize the

one the second the two Fields inside of

it update the linked list and boom

you're on your way I didn't have to copy

anything I just had to insert something

in this case

or let me pause here to see if there's

any questions on those steps and we'll

see before long it all in context with

some larger code

[Music]

yes I we I drew them separately just for

the sake of the voiceover of doing each

thing very methodically in real code is

we'll transition to now I could have and

should have just done it all inside of

one conditional after checking if n is

not equal no no I could set number to a

value like one and I could set the

pointer itself to something like null

all right well let's translate then this

into some similar code that allows us to


build up a linked list now using code

similar and spirited before but now

using this new primitive so I'm going to

go back into vs code here I'm going to

go ahead and now and delete the entirety

of this old version that was entirely

array based and now inside of my main

function I'm going to go ahead and first

do this I'm going to first give myself a

uh a list of size 0 and I'm going to

call that node star list and I'm going

to initialize that to null as we

proposed earlier but I'm also now going

to have to take the additional step of

defining what this node is so recall

that I might do something like typedef

struct node inside of this struck node

I'm going to have a number which I'll

call number of type int and I'm going to

have a structure called node with a star

that says the next pointer is called

Next and I'm going to call this whole

thing more succinctly node instead of

struct node now as an aside for those of

you wondering what the difference really

is between struct and node technically I

could do something like this not use

typedef and not use the word node alone

this syntax here would actually create

for me a new data type called verbosely


struct node and I could use this

throughout my code saying struct node

struck node it just gets a little

tedious and it would be nicer just to

refer to this thing more simplistically

as a node so what typedef has been doing

for us is it again lets us invent our

own word that's even more succinct and

this just has the effect now of calling

this whole thing node without the need

subsequently to keep saying struct all

over the place just FYI all right so now

now that this thing exists in Main let's

go ahead and do this let's add a number

to list and to do this I'm going to give

myself a temporary variable I'll call it

n for consistency I'm going to use

malloc to give myself the size of a node

just like in our slides and then I'm

going to do a little safety check if n

equals equals null I'm going to do the

opposite of the slides I'm just going to

quit out of this program because there's

nothing useful to be done at this point

but most likely my computer is not going

to run out of memory so I'm going to

assume we can keep going with some of

the logic here if n does not equal null

and that is it's a valid memory address


I'm going to say n bracket I'm going to

build this up backwards well let's do

that's okay let's go ahead and do this n

bracket number equals one and then n

bracket net or Arrow next equals null

and now

uh update list to point to new node list

equals n

so at this point in the story we've

essentially constructed what was that

first picture which looks like this this

is the corresponding code via which we

built up this node in memory suppose now

we want to add the number two to the

list so let's do this again add number

add a number to list how might I do this

well I don't need to redeclare n because

I can use the same temporary variables

before so this time I'm just going to

say n equals malloc and the size of a

node I'm again going to have my safety

check so if n equals equals null then

let's just quit out of this all together

but but I have to be a little more

careful now

technically speaking what do I still

need to do before I quit out of my

program to be really proper

free the memory that did succeed a

little higher up so I think it suffices


to free what is now called list way at

the top all right now if all was well

though let's go ahead and say n bracket

number equals two and now n bracket uh

or sorry n arrow next equals null and

now let's go ahead and add it to the

list if I go ahead and do uh

list Arrow next equals N I think what

we've just done is build up the

equivalent now of this in the computer's

memory

by going to the list Fields next field

which is synonymous with the one nodes

bottom most box and store the address of

what was n which a moment ago looked

like this and I'm just throwing away in

the picture the temporary variable all

right one last thing to do let me go

down here and say add a number to list n

equals malloc let's do it one more time

size of node and clearly in a real

program we might want to start using a

loop and do this dynamically or a

function because it's a lot of

repetition now but just to go through

the syntax here this is fine if n equals

equals null out of memory for some

reason let's return one but but we

should return we should free the list


itself and even the second node list

bracket next but I've deliberately done

this poorly

all right this is a little more subtle

now and let me get rid of the

highlighting just so it's a little more

visible

if n happens to equal equal null and

something really just went wrong there

out of memory why am I freeing two

addresses now

and again it's not that I'm freeing

those variables per se I'm freeing the

addresses at in those variables

but there's also a bug with my code here

and it's subtle

let me ask more pointedly this line here

43 what is that freeing specifically can

I go to you

I'm freeing not not so that's okay I'm

not freeing lists two times technically

I'm freeing list once and list next ones

but let me just ask the more explicit

question what am I freeing with line 43

at the moment which node

I think node number one why because if

one is at the beginning of the list list

contains the address of that number one

node and so this frees that node this

line of code you might think now


intuitively okay it's probably freeing

the node number two but this is bad and

this is subtle valgrin might help you

catch this but by eyeing it it's not

necessarily obvious you should never

touch memory that you have already freed

and so the fact that I did this in this

order very bad because I'm telling the

operating system I don't know I don't

need the list address anymore do with it

what you want and then literally one

line later you're saying wait a minute

let me actually go to that address for a

moment and look at the next field of

that first node it's too late you've

already sort of given up control over

the node so it's an easy fix in this

case logically but we should be freeing

the second node first and then the first

one so that we're doing it in

essentially reverse order and again

valgrin would help you catch that but

that's the kind of thing one needs to be

be careful about when touching memory at

all you cannot touch memory after you've

freed it but here is my last step let me

go ahead and update the number field of

next number field of n to be three

the next node of n to be null and then


just like in the slide earlier I think I

can do list next next equals n and that

has the effect now of building up in the

computer's memory essentially this data

structure very manually very

pedantically like in a better world we'd

have a loop and some functions that are

automating this process but for now

we're doing it just to play around with

the syntax

so at this point unfortunately suppose I

want to print the numbers it's no longer

as easy as int I equals zero I less than

three I plus plus because you cannot

just do something like this

because pointer arithmetic no longer

comes into play when it's you who are

stitching together the data structure in

memory in all of our past examples with

arrays you've been trusting that all of

the bytes in the array are back to back

to back so it's perfectly reasonable for

the compiler and the computer to just

figure out oh well if you want bracket

zero that's at the beginning bracket one

it's one location over bracket two it's

one location over this is way less

obvious now because even though you

might want to go to the first element in

the linked list or the second or the


third you can't just jump to those

arithmetically by doing a bit of math

instead you have to follow all of those

arrows so with linked lists you can't

use this square bracket notation anymore

because one node might be here over here

over here over here you can't just use

some simple offset so I think our code

is going to have to be a little fancier

and this might look scary at first but

it's just an application of some of the

basic definitions here let me do a for

Loop that actually uses a node star

variable initialized to the list itself

I'm going to keep doing this so long as

temp does not equal null

and on each iteration of this Loop I'm

going to update temp to be whatever temp

Arrow next is and I'll rewind in a

moment and explain in more detail but

when I print something here with printf

I can still use percent I because it's

still a number at the end of the day but

what I want to print out is the number

in this temporary variable so maybe the

ugliest for Loop we've ever seen because

it's mixing not just the idea of a for

Loop which itself was a bit cryptic or

weeks ago but now I'm using pointers


instead of integers but I'm not

violating the definition of a for Loop

recall that a for Loop has three main

things in parentheses what do you want

to initialize first what condition do

you want to keep checking again and

again and what update do you want to

make on every iteration of the loop so

with that basic definition in mind this

is giving me a temporary variable called

temp that is initialized to the

beginning of the loop so it's like

pointing my finger at the number one

node then I'm asking the question does

temp not equal null well hopefully not

because I'm pointing at a valid node

that is the number one node so of course

it doesn't equal null yet null won't be

until we get to the end of the list so

what do I do I start at this temp

variable I Follow the arrow and go to

the number field therein

what do I then do the for Loop says

change temp to be whatever is at temp by

following the arrow and grabbing the

next field that then has the result of

being checked against this conditional

no of course it doesn't equal null

because the second node is the number

two node null is still at the very end


so I print out the number two next step

I update temp one more time to be

whatever is next that then does not yet

equal null so I go ahead and print out

the number three node then one last time

I update temp to be whatever temp is in

the next field but after one two three

that last next field is null and so I

break out of this for Loop all together

so if I do this in pictorial form all

we're doing if I now use my finger to

represent the temp variable I initialize

temp to be whatever list is so it points

here that's obviously not null so I

print out whatever is that temp Follow

the arrow in number and I print that out

then I update attempt to point here then

I update temp to point here then I

update temp to point here wait that's

null the for Loop ends

so again admittedly much more cryptic

than our familiar into I equals zero and

so forth but it's just a different

utilization of

the for Loop syntax

yes

[Music]

good question how is it that I'm

actually printing numbers and not


printing out addresses instead the

compiler is helping me here because I

taught it in the very beginning of my

program what a note is which looks like

this here the compiler knows that a node

has a number field and an X field down

here in the for Loop because I'm

iterating using a node star pointer and

not an INT star pointer the compiler

knows that anytime I'm pointing at

something I'm pointing at the whole node

doesn't matter where specifically in the

rectangle I'm pointing per se it's

ultimately pointing at the whole node

itself and the fact that I then use temp

Arrow number means okay adjust your

finger slightly so you're literally

pointing at the number field and not the

next field so that's sufficient

information for the computer to

distinguish

the two good question other questions

then on this approach here

yeah in the back

how would I use a for Loop to add

elements to a linked list you will do

something like this if I may in problem

set five we will give you some of the

scaffolding for doing this

um but in this coming week's materials


where we guide you to that but let me

not spoil it just yet

fair question though yeah

[Music]

okay

good question is line 49 acceptable even

if we freed it earlier we didn't free it

in line 43 in this case right you can

only reach line 49 if n does not equal

null and you do not return on line 45 so

that's safe I was only doing those

freeing if I knew on line 45 that I'm

out of here anyway at that point good

question and yeah

[Music]

correct if you're asking about temp

because it's in a for Loop does that

mean you don't have to free it you never

have to free pointers per se you should

only free addresses that were returned

to you by malloc so I haven't finished

the program to be fair but you're not

freeing variables you're not freeing

like Fields you are freeing specific

addresses whatever they may be so the

last thing and I was kind of stalling on

showing this just because it too is a

little cryptic here is how you can free

now a whole linked list in the world of


arrays recall it was so easy you just

say free list you return zero and you're

done not with a linked list because

again the computer doesn't know what you

have stitched together using all of

these pointers all over the computer's

memory you need to follow those arrows

so one way to do this would be as

follows while the list itself is not

null so while there's a list to be freed

what do I want to do I'm going to give

myself a temporary variable called temp

again and it's a different temp because

it's in a different scope it's inside of

the while loop instead of the for Loop

before a few lines earlier

I am going to initialize temp to be the

address of the next node just so I can

get one step ahead of things why am I

doing this because now I can boldly free

the list itself which does not mean the

whole list again I'm freeing the address

in list which is the address of the

number one node that's what list is it's

just the address of the number one node

so if I first use temp to point at the

number two slightly to in the middle of

the picture then it is safe for me on

line 61 at the moment to free list that

is the address of the first node now I'm


going to say all right once I freed the

first the first note in the list I can

update the list itself to be literally

temp

and now the loop repeats so what's

happening here if you think about this

picture temp is initially pointing at

not the list but list Arrow next so temp

represented by my right hand here is

pointing at the number two totally safe

and reasonable to free now the list

itself aka the address of the number one

node that has the effect of just

throwing away the number one note

telling the computer you can reuse that

memory for you the last line of code I

wrote updated list to point at the

number two at which point my Loop

proceeded to do the exact same thing

again and only once my finger is

literally pointing at nowhere the null

symbol will the loop by nature of a

while loop as I'll toggle back to break

out and there's nothing more to be freed

so again what you'll see ultimately in

problem set five more on that later is

an opportunity to play around with just

this syntax but also these ideas but

again even though the syntax is


admittedly pretty cryptic we're still

using Basics like these for Loops or

while Loops we're just starting to now

follow explicit addresses rather than

letting the computer do all of the

arithmetic

for us as we previously benefited from

at the very end of this thing I'm going

to return 0 as though all as well and I

think then

we're good to go

our questions on this linked list code

now

and again we'll walk through this again

in the coming week spec yeah

[Music]

sure can we explain this this while loop

here for freeing the list so notice that

first I'm just asking the obvious

question is the list null because

because if it is there's no work to be

done however while the list is not null

according to line 58 what do we want to

do I want to create a temporary variable

that points at the same thing that list

Arrow next is pointing at so what does

that mean here's list

list Arrow next is whatever this thing

is here so if my right hand represents

the temporary variable I'm literally


pointing at the same thing as the list

is itself the next line of code recall

was free the list and unlike in our

world of arrays like half an hour ago

where that just meant free the whole

darn list you now have taken over

control over the computer's memory with

a linked list in ways that you didn't

with the array the computer knew how to

free the whole array because you malloc

the whole thing at once you are now

mallocking the linked list one node at a

time and the operating system does not

keep track of for you where all these

nodes are so when you free list you are

literally freeing the value of the list

variable which is just this first node

here

then my last line of code which I'll

flip back to in a second updates list to

now ignore the freed memory and point at

two

and the story then repeats so again it's

just a very pedantic way of using this

new syntax of star notation and the

arrow notation and the like to sort of

do the equivalent of walking down all of

these arrows following all of these

breadcrumbs but it does take admittedly


some getting used to

syntax you only have to do one week but

again next week in Python will we begin

to abstract a lot of this complexity

away but none of this complexity is

going away it's just that someone else

the author's a python for instance will

have automated this kind of stuff for us

the goal this week is to understand what

it is we're going to get for free so to

speak next week

all right questions on these linked

lists

all right just oh yeah and back

[Music]

fair question let me summarize as could

we have freed this with a for Loop

absolutely

um it just is a matter of style it's a

little more elegant to do it in a while

loop according to me but other people

will reasonably disagree

um anything you can do with a while loop

you can do with a for Loop and vice

versa do while Loops recall are a little

different but they will always do at

least one thing but for loops and while

Loops behave the same in this case

sure other questions

all right well let's just vary things a


little bit here just to see what some of

the pitfalls might now be without

getting into the weeds of code indeed

we'll try to save some of that for

problem set five's exploration but

instead let's imagine that we want to

create a list here of our own um I can

offer an exchange for a few volunteers

uh some foam fingers to bring to the

next game perhaps uh could we get maybe

just one volunteer first come on up you

will be our linked list

from the get-go what's your name

Pedro come on up

all right thank you to Pedro

and if you wanted to stand roughly over

here but you are a null pointer so just

point sort of at the ground as though

you're pointing at zero all right so

Pedro is our linked list of size zero

which pictorially might look a little

something like this for consistency with

our past pictures now suppose that we

want to go ahead and malloc oh how about

uh the number two can we get a volunteer

to be on camera here okay you kind of

jumped out of your seat do you want to

come up

okay you really want the foam finger I


say all right round of applause sure

[Applause]

okay and what's your name

say again hey when Caleb Caleb sorry

all right so here is your number two for

your number field and here is your

pointer and come on let's say that there

was room for Caleb like right there

that's perfect so Caleb got malot if you

will over here so now if we want to

insert Caleb and the number two into

this linked list well what do we need to

do I already initialized you to two and

pointing as you are to the ground means

you're initialized to null for your next

field Pedro what you should do perfect

what's your Pedro dupe Point that's fine

too so Pedro's now pointing at the list

so now our list looks a little something

like this so so far so good all is well

so the first couple of these will be

pretty straightforward let's insert one

more if anyone really wants another foam

finger here how about make Right In The

Middle come on down

and just in anticipation how about let's

mailbox someone out okay your friends

are pointing at you do you want to come

down too preemptively this is a pool of

memory if you will what's your name


Hannah all right Hannah you are number

four

and hang there for just a moment all

right so we've just uh mallocked Hannah

and Hannah how about Hannah suppose you

ended up over there in just some random

location all right so what should we now

do if the goal is to keep these things

sorted how about so Pedro do you have to

update yourself

no all right Caleb what do you have to

do okay and Hannah what should you be

doing

I would just you're oh it's just you for

now so points at the ground representing

null okay so again demonstrating the

fact that unlike in past weeks where we

had our nice clean array back to back to

back contiguously these guys are

deliberately all over the stage so let's

Mallock another how about number five

what's your name

Jonathan all right Jonathan you are our

number five

and pick your favorite place in memory

okay

all right so Jonathan's now over there

and Hannah's over there so five we want

to point Hannah at number five so you of


course are gonna point there and where

should you be pointing down to represent

null as well okay so pretty

straightforward but now things get a

little interesting and here we'll use a

chance to without the weeds of code

point out how order of operations is

really going to matter suppose that I

next want to allocate say the number one

and I want to insert the number one into

this list yes this is what the code

would look like but if we sort of act

this out could we get one more volunteer

uh how about on the end there in the

sweater yeah come on down

we have what's your name

Lauren okay Lauren come on down

and how about Lauren why don't you go

right in here in front if you don't mind

here is your number here is your pointer

so I've initialized Lauren to the number

one and your pointer will be no pointing

at the ground uh where do you belong if

we're maintaining sorted order looks

like right at the beginning what should

happen here

okay so Pedro has presumed to point now

at Lauren

but how do you know where to point

a Pedro's undoing what he did a moment


ago so this was deliberate and that was

perfect that Pedro presumed to point

immediately at Lauren why you literally

just orphaned all of these folks all of

these chunks of memory why because if

Pedro was our only variable pointing at

that chunk of memory this is the danger

of using pointers and dynamic memory

allocation and building your own data

structures the moment you point

temporarily if you could to Lauren I

have no idea where he's pointing to I

have no idea how to get back to Caleb uh

or Hannah or anyone else on stage so

that was bad so you did undo it so

that's good I think we need Lauren to

make a decision first who should you

point at so pointing at Caleb why

because you're pointing at literally who

Pedro is pointing at Pedro now what are

you safe to do good so order of

operations there matters and if we had

just done this line of code in red here

list equals in that was like Pedro's

first instinct bad things happen and we

orphan the rest of the list but if we

think through it logically and do this

as Lauren did for us instead we've now

updated the list to look a little


something more like the this let's do

one last one we've got one more foam

finger here for the number three how

about on the end yeah you want to come

down

all right one final volunteer

all right what's your name

sorry Miriam all right so here is your

number three here's your pointer if you

want to go maybe in the middle of the

stage in a random memory location so

here too the goal is to maintain sorted

order

so let's ask the audience who or what

number should point at whom first here

so we don't screw up in orphan some of

the memory and this is if we do orphan

memory this is what's called again per

last week a memory leak your Mac your PC

your phone can start to slow down if you

keep asking for memory but never give it

back or lose track of it so we want to

get this right

who should point at whom or what number

say again

three should point at four so three do

you want to point at four

and not uh so okay good and how did you

know Miriam whom to point at

perfect okay so copying Caleb why


because if you look at where this list

is currently constructed and you can

cheat on the board here 2 is pointing to

four if you point at whoever Caleb

number two is pointing at that indeed

leads you to Hannah for number four

so now what's the next step to stitch

this together

our voice in the crowd

two to three so two to three so Caleb I

think it's now safe for you to decouple

because someone is already pointing at

Hannah we have an orphaned anyone so now

if we follow the breadcrumbs we've got

Pedro leading to one to two to three to

four to five we need the numbers back

but you can keep the foam fingers thank

you to our volunteers here

thank you

you can just put the numbers here

thank you to all so this is only to say

that when you start looking at the code

this week and the problem said it's

going to be very easy to sort of lose

sight of the forest for the trees

because the code does get really dense

but the ideas again really do bubble up

to these higher level descriptions and

if you think about data structures at


this level if you go off in program

after a class like cs50 and you're

whiteboarding something with a friend or

a colleague most people think at and

talk at this level and they just assume

that yeah if we went back and look at

our textbooks or class notes we could

figure out how to implement this but the

important stuff is the conversation and

the ideas up here even though via this

week where we get some practice with the

actual code so when it comes to

analyzing an algorithm like this let's

consider the following what might be now

the running time of operations like

searching and sorting and searching and

inserting into a linked list we talked

about arrays earlier and we had some

binary search possible ability still as

soon as it's an array but as soon as we

have a linked list these arrows like our

volunteers could be anywhere on stage

and so you can't just assume that you

can jump arithmetically to the middle

element to the middle element to middle

one you pretty much have to follow all

of these breadcrumbs again and again

so how might that inform what we see

well consider this too even though I

keep drawing all these pictures with all


of the numbers exposed and all of us

humans in the room can easily spot where

the one is where the two is where the

three is the computer again just like

with our lockers and arrays can only see

one location at a time and the key thing

with a linked list is that the only

address we've fundamentally been

remembering is what Pedro represented a

moment ago he was the link to all of the

other nodes and in turn each person led

to the next but without Pedro we would

have lost some of or all of the linked

list

so when you start with a linked list if

you want to find an element as via

search you have to do it linearly

following all of the arrows following

all of the pointers on the stage in

order to get to the node in question and

only once you hit null can you conclude

yep it was there or no it was not so

given that if a computer essentially can

only see the number one or the number

two or the number three or the number

four or the number five one at a time

how might we think about the running

time of search it is indeed Big O event

but why is that well in the worst case


the number you might be looking for is

all the way at the end and so obviously

you're going to have to search all of

the N elements and I drew these things

with boxes on top of them because again

even though you and I can immediately

see where the 5 is for instance the

computer can only figure that out by

starting at the beginning and going

there so there too is another trade-off

it would seem that overnight we have

lost the ability to do a very powerful

algorithm from week zero known as binary

search right it's gone because there's

no way in this picture to jump

mathematically to the middle node unless

you remember where it is and then

remember where every other node is and

at that point you're back to an array

linked lists by Design only remember the

next node in the list all right how

about something like insert

in the worst case perhaps how many steps

might it take to insert something into a

linked list someone else

someone else yeah

say again

N squared fortunately it's not that bad

it's not as bad as N squared that

typically means doing N Things N times


and I think we can stay under that but

not a bad

bad thought yeah

why would it be n

[Music]

okay so you're to summarize you're

proposing and because to find where the

thing goes you have to Traverse

potentially the whole list because if

I'm inserting the number six or the

number 99 that numerically belongs at

the very end I can only find its

location by looking for all of them at

this point though in the term and really

this point in the story you should start

to question these kinds of very

simplistic questions to be honest

because it the answer is almost always

going to depend right if I've just got a

linked list that looks like this the

first question back to someone asking

this question would be well does the

list need to be sorted right I've drawn

it assorted and it might imply as much

so that's a reasonable assumption to

have made but if I don't care about

maintaining sorted order I could

actually insert into a linked list in

constant time why I could just keep


inserting into the beginning into the

beginning into the beginning and even

though the list is getting longer the

number of steps required to insert

something between the first element is

not growing at all you just keep kind of

inserting inserting if you want to keep

it sorted though yes it's going to be

indeed Big O event but again these kinds

of now assumptions are going to start to

matter so let's for the sake of

discussion say it's Big O of n if we do

want to maintain sorted order but what

about um in the case of not caring it

might indeed be Big O of one and now

these are the kinds of decisions that

will start to leave to you what about in

the best case here if we're thinking

about big Omega notation then frankly we

could just get lucky in the best case

and the element we're looking for

happens to be at the beginning or heck

we just blindly insert to the beginning

irrespective of the order that we want

to keep things in all right so besides

that

how can we improve further on this

design we don't need to stop it linked

list because honestly it's not been a

clear win like linked lists allow us to


use more of our memory because we don't

need massive growing chunks of

contiguous memory so that's a win but

they still require Big O event time to

find the end of it if we care about

order we're using at least twice as much

memory for the darn pointer so that

seems like you know a side step it's not

really a step forward

so can we do better here's where we can

now accelerate the story by just

stipulating that hey even if you haven't

used this technique yet we would seem to

have an ability to stitch together

pieces of memory just using pointers and

anything you could imagine drawing with

arrows you can implement it would seem

in code so what if we leverage a second

dimension instead of just stringing

together things laterally left to right

essentially even though they were

bouncing around on the screen what if we

start to leverage a second dimension

here so to speak and build more

interesting structures in the computer's

memory well it turns out that in a

computer's memory we could create a tree

similar to a family tree if you've ever

seen or drawn a family tree with


grandparents and parents and siblings

and so forth It's kind of this uh uh you

know so inverted branch of a tree that

grows typically when it's drawn downward

instead of upward like a typical tree

but that's something we could translate

into code as well specifically let's do

something called a binary search tree

which is a type of tree and what I mean

by this is the following notice this

this is an example of an array from like

week two when we first talked about

those and we had the lockers on stage

and recall that

what was nice about an array if one it's

sorted and two all of its numbers are

indeed contiguous which is by definition

of an array we can just do some simple

math for instance if there's seven

elements in this array and we do seven

divided by two that's what three and a

half round down through truncation

that's three zero one two three that

gives me the middle element

arithmetically in this thing and even

though I have to be careful about

rounding using simple arithmetic I can

very quickly with a single line of code

or math find for you the middle of the

left half of the left half of the right


half or whatever that's the power of

arrays and that's what gave us binary

search and how did binary search work

well we looked at the middle and then we

went left or right and then we went left

or right again sort of as implied by the

this color scheme here wouldn't it be

nice if we somehow

preserved the new upsides today of

dynamic memory allocation giving

ourselves the ability to just add

another element add another element add

another element but retain the power of

binary search because log of n was much

better than n certainly for large data

sets right even the phone book

demonstrated as much weeks ago

so what if I kind of draw this same

picture in two dimensions and I preserve

the color scheme just so it's obvious

what came where

what do these things kind of look like

now

[Music]

maybe like things we might Now call

nodes right a node is just a generic

term for like storing some data what if

the data these nodes are storing are

numbers so still integers but what if we


kind of connected these cleverly like an

old family tree whereby every node has

not one pointer now but as many as two

maybe zero like in the leaves at the

bottom there in green but other nodes on

the interior might have as many as two

like having two children so to speak and

indeed the vernacular here is exactly

that this would be called the root of

the tree or this would be a parent with

respect to these children the green ones

would be grandchildren respect to these

the green ones would be siblings with

sorry these green ones would be siblings

with respect to each other and over

there too so all the same jargon you

might use in the real world applies in

the world of data structures and CS

trees

but this is interesting because I think

we could build this now this kind of

data structure in the computer's memory

how well suppose that we defined a node

to be no longer just this and no a

number in a next field what if we sort

of give ourselves a bit more room here

and give ourselves a pointer called left

and another one called right both of

which is a pointer to a struct node so

same idea as before but now we just make


sure we think of these things as

pointing this way and this way not just

this way not just a single Direction but

two so you could imagine in code

building something up like this with a

node that creates in essence this

diagram here but why is this compelling

suppose I want to find the number three

I want to search for the number three in

this tree it would seem just like Pedro

was the beginning of our linked list in

the world of trees the root so to speak

is the beginning of your data structure

you can retain and remember this entire

tree just by pointing at the root node

ultimately one variable can hang on to

this whole tree

so how can I find the number three well

if I look at the root node and the

number I'm looking for is less than

notice I can go this way or if it's

greater than I can go this way so I've

preserved that property of the phone

book or just a sorted array in general

what's true over here if I'm looking for

three I can go to the right of the two

because that number is going to be

greater if I go left it's going to be

smaller instead and here's an example of


actually recursion recursion in a

physical sense much like the Mario's

pyramid which was kind of recursively

defined notice this I claim this whole

thing is a tree specifically a binary

search tree which means every node has

two or maybe one or maybe zero children

but no more than two hence the Buy in

binary and it's the case that every left

child is smaller than the root and every

right child is larger than the root that

definition certainly works for two four

and six but it also works recursively

for every sub tree or branch of this

tree notice if you think of this as the

root it is indeed bigger than this left

child and it's smaller than this right

child and if you look even at the leaves

so to speak the grandchildren here this

root node is bigger than its left child

if it existed so it's sort of a

meaningless statement and it's it's less

than it's right child or it's not

greater than certainly so that's

meaningless too so we haven't violated

the definition even for these leaves as

well and so now how many steps does it

take to find in the worst case any

number in a binary search tree it would

seem
[Music]

so it seems too literally and the height

of this thing is actually three and so

long story short especially if you're a

little less comfy with your your

logarithms from yesteryear log base 2 is

just like the number of times you can

divide something in half and half and

half until you get down to one this is

kind of like a logarithm in the reverse

Direction here's a whole lot of elements

and we're having we're having until we

get down to one so the height of this

tree that is to say is log base 2 of n

which means that even in the worst case

the number you're looking for maybe is

all the way at the bottom in the leaves

doesn't matter it's going to take log

base 2 of n steps or log of n steps to

find maximally any one of those numbers

so again binary sorry binary search is

back

but we've paid a price

right this isn't a linked list anymore

it's a tree

but we've gained back binary search

which is pretty compelling right that's

where the whole class began on making

that distinction but what price have we


paid to retain binary search

in this new world yeah

it's no uh it's no longer sorted left to

right but this is a claim sorted

according to the binary search tree

definition where again left tree is left

child is smaller than root and right

child is greater than root so it is

sorted but it's sorted in a

two-dimensional sense if you will not

just one but another price paid

foreign

exactly every node now needs not one

number but two three pieces of data a

number and now two pointers so again

there's that kind of trade-off again

where well if you want to save time you

got to give something if you start

giving space and you start using more

space you can speed up time like you've

got it there's always a price paid and

it's very often in space or time or

complexity or developer time uh the

number of bugs you have to solve I mean

all of these are sort of finite

resources that you have to juggle among

so if we consider now the code with

which we can implement this here might

be the node and how might we actually

use something like this well let's take


a look at maybe one final program in C

here before we transition to higher

level Concepts ultimately let me go

ahead here and let me just open a

program I wrote here in advance so let

me in a moment copy over a file called

tree dot C which we'll have on the

courses website and I'll walk you

through some of the logic here that I've

written for code dots for tree dot C all

right so what do we have here first so

here is a implementation of a binary

search tree for numbers and as before

I've just kind of played around and I've

inserted the no the numbers manually so

what's going on first here is my

definition of a node for a binary search

tree copied and pasted from what I

proposed on the board a moment ago

here are two prototypes for two

functions that I'll show you in a moment

that allow me to free an entire an

entire tree one note at a time and that

also allow me to print the tree in order

so it's even though they're not sorted

left to right I bet if I'm clever about

what child I print first I can

reconstruct the idea of printing this

tree properly so how might I Implement a


binary search tree here's my main

function here is how I might represent a

tree of PSI zero it's just a null

pointer called tree

here's how I might add a number to that

list so here for instance is me

mallicking space for a node storing it

in a temporary variable called n here is

me just doing a safety check make sure n

does not equal null and then here is me

initializing this node to contain the

number two first then initializing the

left child of that node to be null and

the right child of that null node to be

null and then initializing the tree

itself to be equal to that particular

node so at this point in the story

there's just one rectangle on the screen

containing the number two with no

children

all right let's just add manually to

this a little further let's add another

number to the list by mallicking another

node I don't need to redeclare n as a

node star because it already exists at

this point here's a little safety check

I'm going to not bother with my let me

do this uh free memory here just to be

safe do I want to do this

um we'll go on a free memory too which


I've not done here but I'll save that

for another time here I'm going to

initialize the number to one I'm going

to initialize the children of this no

note to null and null and now I'm going

to do this initialize the tree's left

child to be n so what that's essentially

doing here is if this is my root node

the single rectangle I described a

moment ago that currently has no

children neither left nor right here's

my new node with the number one I want

it to become the new Left child so that

line of code on the screen there tree

left equals n is like stitching these

two together with a pointer from 2 to

the one all right the next line of code

next lines of code you can probably

guess are me adding another number to

the list just the number three so this

is a simpler tree with two one and three

respectively and this code let me wave

my hands is almost the same except for

the fact that I'm updating the tree's

right child to be this new and third

node let's now run the code before

looking at those two functions let me do

make tree

dot slash tree and voila one two three


so it sounds like the data structure is

sorted to your concern earlier but how

did I actually print this and then

eventually free the whole thing well

let's look at the definition of first

print tree and this is where things get

kind of interesting print tree returns

uh nothing so it's a void function but

it takes a pointer to a root element as

its sole argument node star root here's

my safety check if root equals equals

null there's obviously nothing to print

just return that sort of goes without

saying but here's where things get a

little magical otherwise print your left

child

then print your own number

then print your right child

what is this an example of even though

it's not mentioned by name here

what programming technique here

so yeah so this is actually perhaps the

most compelling use of recursion yet it

wasn't really that compelling with the

Mario thing because we had such an easy

implementation with a for Loop weeks ago

but here is kind of a perfect

application of recursion where your data

structure itself is recursive right if

you take any snip of any branch it all


still looks like a tree just a smaller

one that lends itself to recursion so

here is this leap of faith where I say

ah print my left tree or my left sub

tree if you will via my child at the

left then I'll print my own root node

here in the middle then go ahead and

print my right subtree and because we

have this base case that just makes sure

that if the tree the root is null

there's nothing to do you're not going

to recurse infinitely you're not going

to call yourself again and again and

again infinitely many times so it just

kind of works out and prints the one the

two and the three and notice what we

could do too if you wanted to print the

tree in reverse order you could do that

print your right tree first the greater

element then yourself then then your

smaller subtree and if I do make tree

here in dot slash tree voila now I've

reversed the order of the list and

that's pretty cool you could do it with

a for Loop in an array but you can also

do it even with this two-dimensional

structure let's lastly look just at this

free tree function and this one's almost

the same order doesn't matter in quite


the same way but it does still matter

here's what I did with free tree well if

the root of the tree is an ALT there's

obviously nothing to do just return

otherwise go ahead and free your left

child and all of its descendants then

For Your Right child and all of its

descendants and then for yourself and

again free literally just freeze the

address in that variable doesn't free

the whole darn thing it just frees

literally what's at that address why was

it important that I did line 72 last

though

why did I free the left child and the

right child before I freed myself so to

speak

exactly if you free yourself first if I

had done incorrectly this line higher up

you're not allowed to touch the left

child subtree or the right child subtree

because the memory address is no longer

valid at that point you would get some

kind of memory error perhaps the program

would crash valgrin definitely wouldn't

like it bad things would otherwise

happen but here then is an example of

recursion and again just a recursive use

of an actual data structure and what's

even cooler here isn't relatively


speaking suppose we wanted to search

something like this binary search

actually gets pretty straightforward to

implement two for instance here might be

the prototype for a search function for

a binary search tree you give me the

address the root of a tree and you give

me a number I'm looking for and I can

pretty easily now return true if it's in

there or false if it's not how well

let's first ask a question if tree

equals equals null then you just return

false because if there's no tree there's

no number so it's obviously not there

return false else if the number you're

looking for is less than the tree's own

number

which direction should we go

okay left how do we express that well

let's just return the answer to this

question search the left subtree by way

of my left child looking for the same

number and you just assume through the

beauty of recursion you're kicking the

can and let yourself figure it out with

a smaller problem just that snipped left

tree instead else if the number you're

looking for is greater than the tree's

own number go to the right as you might


infer so I can just return the answer to

this question search my right subtree

for that same number

and there's a fourth and final condition

what's the fourth scenario we have to

consider explicitly yeah

if the number itself is right there so

else if the number I'm looking for

equals the tree's own number then and

only then should you return true and if

you're you're thinking quickly here

there's an optimization possible better

design opportunity

think back to even our scratch days what

could we do a little better here you're

pointing at it

exactly and else suffices because if

there's logically only four things that

could happen you're wasting your time by

asking a fourth gratuitous question and

else here suffices so here too more so

than the Mario example a few weeks ago

there's just this Elegance arguably to

recursion and that's it this is not

pseudocode this is the code for binary

cert on a binary search tree and so

recursion tends to work in lockstep with

these kinds of data structures that have

this kind of structure to them as we're

seeing here
are any questions then on

binary search

as implemented here with a tree yeah

[Music]

uh good question so when returning a

Boolean value true and false are values

that are defined in a library called

standard ghoul STD bool.h with the

header file that you can use it is the

case that true is

it's not well defined what they are but

they would map indeed yes to zero and

one essentially but you should not

compare them explicitly to zero and one

when you're using true and false you

should compare them to each other

[Music]

foreign

so if I am in my own code from earlier

in a void function it is totally fine to

return you just can't return something

explicitly so return just means that's

it quit out of this function you're not

actually handing back a value so it's a

way of short-circuiting the execution if

you don't like that and some people do

frown upon having code return from

functions prematurely you could invert

the logic and do something like this if


the root does not equal null do all of

these things and then indent all three

of these lines underneath that's

perfectly fine too I happen to write it

the other way just so that there was

explicitly a base case that I could

point to on the screen whereas now it's

kind of implicitly there for us only but

a good observation too all right so

let's ask the question as before about

like running time of this it would look

like binary search is back and we can

now do things in logarithmic time but we

should be careful is this a binary

search tree just to be clear

and again a binary search tree is a tree

where the root is greater than its left

child and smaller than its right child

that's the essence so you're Shake

you're nodding your head you agree

okay I agree so this is a binary search

tree

is this a binary search tree

okay I'm hearing yeses or I'm hearing

just my delay changing the vote it would

seem so this is kind of one of those

trick questions this is a binary search

tree because I've not violated the

definition of what I gave you right is

there any example of a left child that


is greater than its parent or is there

any example of a right child that's

smaller than its parent that's just the

opposite way of describing the same

thing no this is a binary search tree

unfortunately it also looks like albeit

at a different axis what

a linked list but you could imagine this

happening right suppose that I hadn't

been as thoughtful as I was earlier by

inserting two and then one and then

three which kind of nicely balanced

everything out suppose that instead

because of what the user is typing in or

whatever you can drive in your own code

suppose you insert a one and then a two

and then a three like you've kind of

created a problem for yourself because

if we follow the same logic as before

going left or going right this is how

you might Implement a binary search tree

accidentally if you just blindly keep

following that definition I mean this

would be better designed as what if we

kind of like rotated the whole thing

around and that's totally fine and those

kinds of trees actually have names

there's trees called AVL trees and

computer science there are red black


trees in computer science there are

other types of trees that additionally

add some logic that tell you when you

got to Pivot the thing and rotate it and

kind of snip off the root and fix things

in this way but a binary search tree in

and of itself does not guarantee that it

will be balanced so to speak and so if

you consider the worst case scenario of

even using a binary search tree if

you're not smart about the code you're

writing and you just blindly follow this

definition you might accidentally create

a crazy long and stringy binary search

tree that essentially looks like a

linked list because you're not even

using any of the left children so

unfortunately the literal answer to the

question here is what's the running time

of search well hopefully log in but not

if you don't maintain the balance of the

tree both insert in search could

actually devolve into instead of Big O

of login literally Big O of n if you

don't somehow take into account and

we're not going to do the code for that

here sort of a higher level thing you

might explore Beyond down the road it

can devolve into something that you

might not have intended and so now that


we're talking about two Dimensions it's

really the onus is on the programmer to

consider what kinds of perverse

situations might happen where the thing

devolves into a structure that you don't

actually want it to devolve into

all right we've got just a few

structures to go let's go ahead and take

one more five minute break here when we

come back we'll talk at this level about

some final applications of this see you

in five

all right

so we are back and as promised we'll

sort of operate now at this higher level

where if we take for granted that even

though we haven't had an opportunity to

play with these techniques yet you have

the ability now in code to kind of

stitch things together both in a Wonder

Dimension and even two Dimensions to

build things like lists and trees so if

we have these building blocks things

like now arrays and lists and trees what

if we start to kind of amalgamate them

such that we build things out of

multiple data structures can we start to

get some of the Best of Both Worlds by

way of for instance something called a


hash table so a hash table is sort of a

Swiss army knife of data structures in

that it's so commonly used because it

allows you to associate keys with value

so to speak so for instance it allows

you to associate a username with a

password or a name with a number or

anything where you have to take

something as input and get as output a

corresponding piece of information and

hash table is often a data structure of

choice and here's what it looks like

it's actually looks like an array at

first glance but for discussion's sake

I've drawn this array vertically which

is totally fine it's still just an array

but it allows you a hash table to jump

to any of these locations randomly that

is instantly so for instance there's

actually 26 locations in this array

because I want to for instance store

initially names of people for instance

and wouldn't it be nice if the person's

name starts with a I have a go-to place

for it maybe the first box and if it

starts with z i put them at the bottom

so that I can jump instantly

arithmetically using a little bit of

ASCII or Unicode fanciness exactly to

the location that they wanna they need


to go so for instance here's our array 0

index 0 through 25 if I think of this

though as a through z I'm going to think

of these 26 locations now in the context

of a hash table is what we'll generally

call buckets so buckets into which you

can put values so for instance suppose

that we want to insert a value one name

into this data structure and that name

is say Albus so Albus starting with a

Albus might be go at the very beginning

of this list all right and then we want

to insert another name this one happens

to be Zacharias starting with z so it

goes all the way at the end of this data

structure in location 25 AKA Z and then

maybe a third name like Hermione and

that goes at location H according to

that position in the alphabet so this is

great because in constant time I can

insert and conversely search for any of

these names based on the first letter of

their name a or Z or H in this case

let's fast forward and assume we put a

whole bunch of other names it might look

familiar into this hash table it's great

because every name has its own location

but if you're thinking of names you

don't yet see it on the screen we


eventually encounter a problem with this

right when could something go wrong

using a hash table like this if we

wanted to insert even more names

what's going to eventually happen

yeah there's already someone with the

first letter right like I haven't even

mentioned Harry for instance or Hagrid

and yet Hermione is already using that

spot so that sort of invites the

question well what happens maybe if we

want to insert Harry next do we maybe

cheat and put him at location I but then

if there's sudden location I where do we

put them and it just feels like the

situation could very quickly devolve but

I've deliberately drawn this data

structure that I claim is a hash table

sort of in two directions an array

vertically here

but what might this be hinting I'm using

horizontally even though I'm drawing the

rectangles a little differently from

before

yeah maybe another array to be fair but

honestly erasers to Japan with the

allocating and reallocating and so forth

these kind of look like the beginnings

of a linked list if you will where the

name is where the number used to be even


though I'm drawing it horizontally now

just for discussion sake and this seems

to be like a pointer that isn't pointing

anywhere yet but it looks like the array

is 26 pointers some of which are null

that is empty some of which are pointing

at the first node in a linked list so

that's really what a hash table might be

in your mind an amalgam of a an array

whose elements are linked lists and in

theory this kind of gives you the best

of both worlds right you get Random

Access with high probability right you

get to jump immediately to the location

you want to put someone but if you run

into this perverse situation where

there's someone already there okay fine

it starts to devolve into a linked list

but it's at least 26 smaller length

lists not one massive linked list which

would be Big O of N and quite slow to

solve so if Harry gets inserted in

Hagrid yeah you have to kind of chain

them together

so to speak in this way but at least

you're not you've not painted yourself

into a corner and in fact if we fast

forward and put a whole bunch of

familiar names in the data structure


starts to look like this so the chains

not terribly long and some of them are

actually of size zero because there's

just some unpopular letters of the

alphabet among these names but it seems

better than just putting everyone in one

big array or one big linked list we're

sort of trying to balance these

trade-offs a little bit in the middle

here well how might we represent

something like this here's how we could

describe this thing a node in the

context of a linked list could be this I

have an array called word of type Char

and it's big enough to fit the longest

word in the alphabet plus one and the

plus one y probably

the null character so I'm assuming that

longest word is like a constant defined

elsewhere in the story and it's

something big like 40 100 whatever

whatever the longest word in the uh

Harry Potter universe is or the English

alphabet or English dictionary is

longest word plus one should be

sufficient to store any name in the

story here and then what else does each

of these nodes have well it has a

pointer to another node so here's how we

might implement the notion of a node in


the context of storing not integers but

names instead like this but how do we

decide what the hash table itself is

well if we now have a definition of a

node we could have a variable in main or

even globally called hash table that

itself is an array

of node star

pointers that is an array of pointers to

nodes the beginnings of linked lists

number of buckets is kind of up to me I

proposed verbally that it'd be 26 but

honestly if you get a lot of collisions

so to speak a lot of H names trying to

go to the same place well maybe we need

to be smarter and not just look at the

first letter of their name but maybe the

first and the second so it's H A and H E

but wait no then Harry and Hagrid still

Collide but we start to at least make

the problem a little less impactful by

tinkering with something like the number

of buckets in a hash table like this but

how do we decide where someone goes in a

hash table in this way well it's an old

school problem of input and output the

input to the problem is going to be

something like the name and the

algorithm in the middle as of today is


going to be something called a hash

function a hash function is generally

something that takes as input a string a

number whatever and produces this output

a location in our context like a number

0 through 25 or 0 through 16 000 or

whatever the number of buckets you want

is it's going to just tell you where to

put that input at a specific location so

for instance Albus according to the

story thus far gave me back zero as

output Zacharias gave me 25. so the hash

function in the middle of that black box

is pretty simplistic in this story it's

just looking at like the ASCII value it

seems of the first letter in their name

and then subtracting off what capital A

is 65 so like doing some math to get

back a number between 0 and 25. so

that's how we got to this point in the

story and how might we then resolve the

problem further and use this notion of

hashing more generally well just for

demonstrations sake here here's actually

some buckets literally and we've labeled

and advanced these buckets with the

suits from a deck of cards so we've got

some Spades and we've got diamonds

here

and we've got what else here uh


clubs

and hearts

so we have a deck of cards here for

instance right and this is something you

yourself might do instinctively If

you're sort of getting ready to start

playing a game of cards you're just kind

of cleaning up or you want things in

order like here is literally a jumbo

deck of cards what would be the easiest

way for me to sort these things well

we've got a whole bunch of sorting

algorithms from the past so I could go

through like here's the three of

diamonds and I could here let me throw

this up on the screen just so if you're

far and back so here's uh you know

diamonds I could put this here three

four I could do this in order here but a

lot of us honestly if given a deck of

cards and you just want to kind of clean

it up and sort it in order you might do

things like this well here's my input

three of diamonds let's put it in this

bucket four of diamonds this bucket five

of diamonds this bucket and if you keep

going through the cards here's seven of

hearts hearts bucket eight bucket uh

Queen of Spades over here and it's still


going to take you 52 steps but at the

end of it you have hashed all of the

Cards into four distinct buckets and now

you have problems of size 13 10 which is

a little more tenable than doing one

massive 52 card problem you can now do

four 13 size problems and so hashing is

something that even you and I might do

instinctively taking as input some card

some name and producing his output some

location a sort of temporary Pile in

which you want to Stage things so to

speak

but these collisions are kind of

inevitable and honestly if we kept going

through the Harry Potter Universe some

of these chains would get longer and

longer and longer which means that

instead of getting someone's name

quickly by searching for them or

inserting them might start taking a

decent amount of time so what could we

do instead to resolve situations like

this if the problem fundamentally is

that the first letter is just too darn

popular H we need to take in more input

not just the first letter but maybe the

first two letters so if we do that we

can go from a through z to something

more extreme like maybe h a h b h c h d


h e h f and so forth so that now Harry

and Hermione end up at different

locations but you know darn it Hagrid

still collides with Harry so it's better

than before the chains aren't quite as

long but the problem isn't fundamentally

gone and in this case here

anyone know how many buckets we just

increased to if we now look at not just

a through z but a a through z z

roughly

yeah so okay good so these answer 26

squared or 676 so that's a lot more

buckets and this is why I only showed a

few of them on the screen so that's a

lot more and it spreads things out

specific uh in particular what if we

take this one step further instead of H

A we do like h a a h a b h a c h z and

so forth well now we have an even better

situation because Hermione has her one

spot Harry has his one spot Hagrid has

his one spot

but there's a trade-off here the upside

is now arithmetically we can find their

locations in constant time maybe

technically three steps but three is

constant no matter how many other names

are in here it would seem but what's the


downside here

sorry say again

memory so significantly more we're now

up to 17

576 buckets which itself isn't that big

a deal right computers have a lot of

memory these days but as you can kind of

infer you know I can't really think of

someone whose name started with heq for

instance in the Harry Potter universe

and if we keep going definitely don't

know of anyone whose name started with

zzz or AAA there's a lot of sort of not

useful combinations that have to be

there mathematically so that you can do

a bit of math and jump to randomly so to

speak the precise location but they're

just going to be empty so it's a very

sparsely populated array so to speak so

what does that really mean for

performance ultimately well let's

consider again in the context of our Big

O notation it turns out that a hash

table technically speaking is still just

going to give us Big O of n in the worst

case why if you have some crazy perverse

case where everyone in the universe has

a name that starts with a or starts with

H or starts with z you just get really

unlucky and your chain is massively long


well then at that point it's just a

linked list it's not a hash table it's

like the perverse situation with the

tree where if you insert it without any

mind for balancing it keeping it

balanced it just evolves but there's a

difference here between sort of a

theoretical performance and an actual

performance if you look back at the tree

the hash table here this is absolutely

in practice going to be faster than a

single linked list you know

mathematically asymptotically Big O

notation sure it's all the same Big O of

n but if what we're really caring about

is real humans using our software

there's something to be said for

crafting a data structure that

technically if this data were uniformly

distributed is 26 times faster than a

linked list alone and so there's this

tension two in between like systems

types of Cs and theoretical CS where

yeah theoretically these are all the

same but in practice we're making real

world software you know improving this

speed by a factor of 26 in this case let

alone 576 or more might actually make a

big difference but there's going to be a


trade-off and that's typically some

other resource like giving up more space

all right how about another data

structure we could build let me fast

forward to something here called a try

so a try sort of a weird name and

pronunciation short for retrieval

pronounced try typically a try is a tree

that actually gives us constant time

look up even for massive

data sets what do I mean by this in the

world of a try you create a tree out of

arrays so we're really getting into like

the Frankenstein territory of just

building things up with like spare parts

of data structures that we have here but

the root of a try is itself an array for

instance of size 26 where each element

in that try points to another node which

is to say another array and each of

those locations in the array represents

a letter of the alphabet Like A through

Z so for instance if you wanted to store

the names of the Harry Potter Universe

not in a hash table not in a linked list

not in a tree but in a try what you

would do is hash

on every letter in the person's name one

at a time so a try is like a multi-tier

hash table in a sense where you first


look at the first letter then the second

letter then the third and you do the

following for instance each of these

locations represents a letter A through

Z suppose I wanted to insert someone's

name into this that starts with the

letter A H like Hagrid for instance well

I go to the location h i see it's null

which means I need to malloc myself

another node or another array and that's

depicted here then suppose I want to

store the second letter in Hagrid's name

in a so I go to that location in the

second node and I see okay it's

currently null there's nothing below it

so I allocate another node using malloc

or the like and now I have h a g and I

continue this with r i d and then when I

get to the bottom of this person's name

I just have to indicate here in color

but probably with a Boolean value or

something like a true value that says a

name stops here so that it's clear that

the person's name is not h a h a or h a

g or h-a-r or hagr it's

h-a-g-r-i-d and the D is green just to

indicate there's like some other Boolean

value that just says yes this is the

node in which the name stops and if I


continue this logic here's how I might

insert someone like Harry

and here's how I might insert someone

like Hermione and what's interesting

about the design here is that some of

these names share a common prefix which

starts to get compelling because you're

reusing space you're using the same

nodes for names like h a g and h-a-r

because they share H in an A in common

and they all share an H in common

so you have this data structure now that

itself is a tree each node in the tree

is itself an array and we therefore

might implement this thing using Code

like this every node is containing

I'll do it in reverse order a an array

I'll call it children because that's

what it really represents up to 26

children for each of these nodes size of

the alphabet so I might have used a

constant for number 26 to give myself 26

letters of the alphabet and each of

those arrays stores that many node stars

that many pointers to another node and

here's an example of the bull this is

what I represented in green on the slide

a moment ago I also need another piece

of data just a zero or One A true or

false that says yes a name stops in this


node or it's just a path to the rest of

the person's name but the upside of this

is that the height of this tree is only

as tall as the person's longest name

h-a-g-r-i-d or

h-e-r-m-i-o-n-e and notice that no

matter how many other people are in this

data structure there's three at the

moment if there were three million it

would still take me how many steps to

search for Hermione

h-e-r-m-i-o-n-e so eight steps total no

matter if there's two other people two

million 10 million other people because

the path to her name is always on the

same path and if you assume that there's

uh there's a maximum limit on the length

of names in the human world maybe it's

40 100 whatever whatever the longest

name in the world is that's constant

maybe it's 40 100 but that's constant

which is to say that with a try

technically speaking it is the case that

your look up time Big O of n a Big O

notation would be Big O of one it's

constant time because unlike every other

data structure we've looked at with a

try the Run amount of time it takes you

to find one person or insert one person


is completely independent of how many

other pieces of data are already in the

data structure and this holds true even

if one name is a prefix of another I

don't think there was a a Daniel or

Danielle in the Harry Potter universe

that I could think of but

d-a-n-i-e-l could be one name and

therefore we have a true there in green

and if there's a longer name like

Danielle then you keep going until you

get to the E so you can still have with

a try one name that's a substring of

another name so it's not as though we've

created a problem there that too is

still possible

but at the end of the day it only takes

a finite number of steps to find any of

these people and again that's what's

particularly compelling that you

effectively have constant time look up

so that's amazing right we've gone

through this whole story for weeks now

of like linear time and then it went up

to like N squared and then log in and

now constant time what's the price paid

for a data structure like this

this so-called try

what's the downside here there's got to

be a catch
and in fact tries are not actually used

that often amazing as they might sound

on some CS level here

memory what why in what sense

exactly if you're storing all of these

darn arrays it's again a sparse sparsely

populated data structure and you can

kind of see it here granted there's only

three names but most of those boxes most

of those pointers are going to remain

null so this is an incredibly wide data

structure if you will it uses a huge

amount of memory to store the names but

again you got to pick a lane either

you're going to minimize space or you're

going to minimize time it's not really

possible to get truly The Best of Both

Worlds you have to decide where the

inflection point is for the device

you're writing software for how much

memory it has how much expensive it is

and again taking all of these kinds of

things into an account

so lastly let's do one further

abstraction so even higher level to

discuss something that are generally

known as abstract data structures it

turns out we could spend like all day

all week talking about different things


we could build with these data

structures but for the most part now

that we have arrays now that we have

linked lists or their cousins trees

which are two-dimensional and beyond

that there's even graphs where the

arrows can go in multiple directions not

just down so to speak now that we have

this ability to Stitch things together

we can solve all different types of

problems so for instance a very common

type of data structure to use in a

program or even our human world are

things called cues a q being a data

structure like a line outside of a store

where it has what's called a fifo

property first in first out which is

great for fairness at least in the human

world and if you've ever waited outside

of uh tasty burger or Salsa Fresca or

some other restaurant nearby presumably

if you're queuing up at the counter you

want them store to maintain a fifo

system first and first down so that

whoever's first in line and gets their

food first and gets out first so a fight

a cue is actually a computer science

term too and even if you're still in the

habit of printing things on paper there

are things you might have heard called


printer cues which also do things in

order the first person to send their

essay to the printer should ideally be

printed before the last person to send

their essay to the printer again in the

interest of fairness but how can you

implement a queue well you typically

have to implement like two fundamental

operations NQ and DQ so adding something

to it and removing something from it and

the interesting thing here is that how

do you implement a queue well in the

human world you would just have like

literally physical space for humans to

line up from left to right or right to

left same in a computer like a printer

queue if you send a whole bunch of jobs

to be printed a whole bunch of essays or

documents well you need a chunk of

memory like an array all right well if

you use an array what's a problem that

could happen in the World of Printing

for instance if you use an array to

store all of the documents that need to

be printed

it could be filled right so if the

programmer decided HP or whoever makes

the printer decides oh you can send like

a megabyte worth of documents to this


printer at once at some point you might

get an error message which says sorry

out of memory wait a few minutes which

is maybe a reasonable solution but a

little Annoying or HP could write code

that maybe dynamically resizes the array

or so forth but at that point maybe they

should just use a linked list and they

could so there too you could implement

the notion of a queue

using a linked list instead you're going

to spend more memory but you're not

going to run out of space in your array

which might be more compelling you know

this happens even in the physical world

you go to the store and you know you

start having to line up outside and down

the road and like for a really busy

store they kind of run out of space so

they they make do but in that case it

tends to be more of an array just

because of the physical notion of humans

lining up but there's other data

structures too if you've ever gone to

the dining hall and picked up like a

Harvard or Yale tray right you're

typically picking up the last tray that

was just cleaned not the first tray that

was cleaned why because these uh

cafeteria trays stack up on top of each


other and indeed a stack is another type

of abstract data structure in the

physical world it's literally something

physical like a stack of trays which

have what we would call a lifo property

last in first out so as these things

come out of the washer they're putting

the most recent ones on the top and then

you the human are probably taking the

most recently cleaned one which means in

the extreme the no one on campus might

ever use that very first tray which is

probably fine in the world of trays but

would really be bad in the world of like

tasty burger lining up for food if lifo

were the property being implemented but

here too it could be an array it could

be a linked list and you see this

honestly every day if you're using Gmail

and your Gmail inbox that is actually

kind of a stack at least by default

where your newest message last in are

the first ones at the top of the screen

that's kind of a lifo data structure and

it means that you see your most recent

emails but if you have a busy day you're

getting a lot of emails it might not be

a good thing because now you're kind of

ignoring the people who wrote you way


earlier in the day or the week so lifo

and fifo are just properties that you

can achieve with these very specific

types of data structures and the

parlance in the world of stacks is to

push something onto a stack or pop

something out these are here for

instance is an example of like why might

you always wear the same color well if

you're storing all of your clothes in a

stack you might not ever get to like the

different colored clothes at the bottom

of the list and in fact to paint this

picture we have a a couple minute uh

video here just to to paint this here

made by a faculty member elsewhere let's

go ahead and dim the lights for just a

minute or two here so that we can take a

look at Jack learning some facts

once upon a time there was a guy named

Jack when it came to making friends Jack

did not have the Knack so Jack went to

talk to the most popular guy he knew he

went up the loo and asked what do I do

Lou saw that his friend was really

distressed well it began just look how

you're dressed don't you have any

clothes with a different look yes said

Jack I sure do come to my house and I'll

show them to you so they went off the


Jacks and Jack showed Lou the box where

he kept all his shirts and his pants and

his socks loose said I see you have all

your clothes in a pile why don't you

wear some others once in a while Jack

said well would I remove clothes and

socks I washed them and put them away in

the box then comes the next morning and

up I hop I go to the box and get my

clothes off the top Lou quickly realized

the problem with Jack he kept clothes

CDs and books in a stack when he reached

for something to read or to wear he

chose the top book or underwear then

when he was done he would put it right

back back it would go on top of the

stack I know the solution said a Triumph

at Lou you need to learn to start using

a cue Lou took Jack's clothes and hung

them in a closet and when he had emptied

the box he just tossed it then he said

now Jack at the end of the day put your

clothes and a left when you put them

away then tomorrow morning when you see

the sun shine get your clothes from the

right from the end of the line don't you

see said Lou it will be so nice you'll

wear everything once before you wear

something twice and with everything in


queues in his closet and shelf Jack

started to feel quite sure of himself

all thanks to Lou and his wonderful cue

foreign

that these things are everywhere

in the world even in our human world if

you've ever lined up at this place

anyone recognize this

okay so sweet green a little salad place

in the Square this is if you order

online or in advance your food ends up

according to the first letter in your

name which actually sounds awfully

reminiscent of something like a hash

table and in fact no matter whether you

implement a hash table like we did with

an array and linked lists or with like

three shelves like this this is actually

an abstract data type called a

dictionary in a dictionary just like in

our human world has keys and values

words and their definitions this just

has uh letters of the alphabet in salads

as their value but here too there's a

real world constraint at what in what

kind of scenario does this system at

sweetgreen devolve into a problem for

instance because they too are using only

finite space finite storage what could

go wrong yeah
yeah if they run out of space on the

shelf and there's a lot of people whose

name start with D or e or whatever and

so they just pile up and then maybe they

kind of overflow into the E's or the F's

and you know they probably don't really

care because any human is going to come

by and just eyeball and figure it out

anyway but in the world of a computer

you're the one coding and have to be

ever so precise we thought we would

lastly do one final thing here

um in advance we prepared a a linked

list of sorts in the audience since this

has become a bit of a thing I am

starting to represent the beginning of

this linked list and so far as I have a

pointer here with seat location G9 uh

whoever's in G9 would you mind standing

up

and what letter is on your sheet there

okay so you have S15 and your letter

say again F 15. so I see you're holding

a c in your node you are pointing to if

you could physically F15 F15 what do you

held

you have an S and who should you be

pointing at

F5 could you stand up F5 you're holding


a five I see what no what address

F12 big finale F12 if you'd like to

stand up holding a zero and null which

means that was cs50

all right we'll see you next time

[Music]

foreign

[Music]

foreign

[Music]

and this is already week six and this is

the week in which you learn yet another

language but the goal is not just to

teach you another language for

language's sake as we transition today

and in the coming weeks from C where we

spent the past several weeks now to

python the goal ultimately is to teach

you all how to teach yourselves new

languages so that by the end of this

course it's not in your mind the fact

that you learned how to program in C or

learned some we expect how to program in

scratch but really how you learned how

to program fundamentally in a paradigm

known as procedural programming as well

as with some taste today and in the

weeks to come of other aspects of

programming languages like object

oriented programming and more so recall


though back in week zero hello worlds

looked a little something like this and

the world was quite simple all you had

to do was drag and drop these puzzle

pieces but there were still functions

and conditionals and loops and variables

and all of those kinds of Primitives we

then transitioned of course to a much

more Arcane language that looked to

little something like this and even now

some weeks later you might still be

struggling with some of the syntax or

getting annoying bugs when you try to

compile your code and it just doesn't

work but there too the past few weeks

we've been focusing on functions and

loops and variables conditionals and

really all of those same ideas and so

what we begin to do today is to one

simplify the language we're using

transitioning from C now to python this

now being the equivalent program in

Python and look at its relative

Simplicity but also transitioning to

look at how you can Implement these same

kinds of features just using a different

language so we're going to see a lot of

code today and you won't have nearly as

much practice with python as you did


with C but that's because so many of the

ideas are still going to be with us and

really it's going to be a process of

figuring out all right I want to do a

loop I know how to do it and see how do

I do this in Python how do I do the same

with conditionals how do I declare

variables and the like and moving

forward not just in cs50 but in life in

general if you continue programming and

learn some other language after the

class if in five ten years there's a new

more popular language that you pick up

it's just going to be a matter of

Googling and looking at websites like

stack Overflow and the like to look at

just basic building blocks of

programming languages because you

already speak after these past six plus

weeks you already speak programming

itself fundamentally all right so let's

do a few quick comparisons left and

right of what something might have

looked like in scratch and what it then

looked like in C but now as of today

what it's going to look like in Python

then we'll turn our attention to the

command line ultimately in order to

implement some actual programs so in

scratch we had functions like this say


hello world a verb or an action in C it

looked a little something like this and

a bit of a cryptic mess the first week

you had the printf you had the double

quotes you had the semicolon the

parentheses there's a lot more syntax

just to do the same thing we're not

going to get rid of all of that syntax

now but as of today in Python that same

statement is going to look a little

something like this and just to perhaps

call out the obvious what is different

or now simpler in python versus C even

in this simple example here yeah

good so it's now print instead of printf

and there's also no semicolon and

there's one other subtlety over here

yeah so no new line and that doesn't

mean it's not going to be printed it

just turns out that one of the

differences we'll see is that with print

you get the new line for free it

automatically gets outputted by default

being sort of a common case but you can

override it we'll see ultimately too how

about in scratch we had multiple

functions like this that not only said

something on the screen but also asked a

question thereby being another function


that returned a value called answer in C

we saw code that looked a little

something like this whereby that first

line declares a variable called answer

sets it equal to the return value of get

string one of the functions from the

cs50 library and then the same double

quotes and parentheses and semicolon

then we had this format code in C that

allowed us with percent s to actually

print out that same value in Python this

too is going to look a little bit

simpler instead we're going to have

answer equals get string quote unquote

what's your name and then print with a

plus sign and a little bit of new syntax

but let's see if we can't just infer

from this example what it is that's

going on well first missing on the left

is what

to the left of the equal sign there's no

what this time

feel free to just call it out

so there's no type there's no type like

the word string which even though that

was a type in cs50 every other variable

in C did we use int or string or float

or bull or something else in Python

there's still going to be data types

today onward but you the programmer


don't have to bother telling the

computer what types you're using the

computer is going to be smart enough the

language really is going to be smart

enough to just figure it out from

Context meanwhile on the right hand side

get string is going to be a function

we'll use today in this week which comes

from a python version of the cs50

library but will also start to take off

those training wheels so that you'll see

how to do things without any cs50

Library moving forward using a different

function instead as before no semicolon

but the rest of the syntax is pretty

much the same here this starts of course

to get a little bit different though

we're using print instead of printf but

now even though this looks a little

cryptic perhaps if you've never

programmed before cs50 what might that

plus be doing

just based on inference here what do you

think

[Music]

yeah so adding answer to the string

hello and adding so to speak not

mathematically but in the form of

joining them together much like we saw


the join Block in scratch or

concatenation was the term of art there

this plus sign appends if you will

whatever is in answer to whatever is

quoted here and I deliberately left a

space there so that grammatically it

looks nice after the comma as well now

there's another way to do this and it

too is going to look cryptic at first

glance but it just gets easier and more

convenient over time you can also change

this second line to be this instead

so what's going on here this is actually

a relatively new feature of python in

the past couple of years where now what

you're seeing is yes a string between

these same double quotes but this is

what python would call a format string

or F string and it literally starts with

the letter F which admittedly looks I

think a little weird but that just

indicates that python should ensue

assume that anything inside of curly

braces inside of the string should be

interpolated so to speak which is a

fancy term saying substitute the value

of any variables therein and it can do

some other things as well so answer is a

variable declared of course on this

first line this F string then says to


python print out hello comma space and

then the value of answer if by contrast

you avoid if you omitted the curly

braces just take a guess what would

happen What would the symptom of that

bug be if you accidentally forgot the

curly braces but maybe still had the F

there

yeah I would literally print hello comma

answer because it's going to take you

literally so the curly braces just kind

of allow you to plug things in and again

it looks a little more cryptic but it's

just going to save us time over time and

if any of you programmed in Java in high

school for instance you saw Plus in that

context too for concatenation this just

kind of makes your code a little tighter

a little more succinct so it's a

convenient feature now in Python all

right this was an example in scratch of

a variable setting a variable like

counter equal to zero in C it looked

like this where you specify the type the

name and then the value with a semicolon

in Python it's going to look like this

and I'll State the obvious here you

don't need to mention the type just like

before with string and you don't need a


semicolon so it's a little simpler if

you want a variable just write it and

set it equal to some value but the

single equal sign still behaves the same

as in C suppose we wanted to increment

counter by one in scratch we use this

puzzle piece here in C we could do this

actually in a few different ways there

was this way if counter already exists

you just say counter equals counter plus

one there was the slightly less verbose

way where you could whoops sorry in let

me do the first sentence first in Python

that same thing as you might guess is

actually going to be almost the same you

just throw away the semicolon and the

mathematics are ultimately the same

copying from right to left via the

assignment operator now recall and see

that we had this shorthand notation

which did the same thing in Python you

can similarly do the same thing just no

need for the semicolon the only step

backwards we're taking if you were a big

fan of counter plus plus that doesn't

exist in Python nor minus minus you just

can't do it you have to do the plus

equals one or plus minus or minus equals

one to achieve that same result all

right how about in Python 2 here in


scratch recall was a conditional asking

a silly question like is X less than y

and if so just say as much in C that

looked a little something like this

printf and if with the parentheses the

curly braces the semicolon and all of

that in Python this is going to get a

little more pleasant to type two it's

going to be just the this

and if someone wants to call it some of

the obvious changes here what has been

simplified now in Python for a

conditional it would seem

yeah what's missing or chain

so no curly braces

and sorry

and we're using the colon instead so I

got rid of the curly braces in Python

but I'm using a colon instead and even

though this is a single line of code so

long as you indent subsequent lines

along with the printf that's going to

imply that everything if the if

condition is true should be executed

below it until you start to unindent and

start writing a different line of code

altogether so indentation in Python is

important so this is among the reasons

we've emphasized uh axes like Style just


how well styled your code is and

honestly we've seen certainly in office

hours and you're seen in your own code

sort of a tendency sometimes to be a

little LAX when it comes to indentation

right if you're one of those folks who

likes to indent everything on the left

hand side of the window yeah it might

compile and run but it's not

particularly readable by you or anyone

else python actually addresses this by

just requiring indentation when

logically needed so python is going to

force you to start indenting properly

now if that's been perhaps a tendency

otherwise what else is missing well we

have no semicolon here of course it's

print instead of printf but otherwise

those seem to be the primary differences

what about something larger in scratch

if an if else block like this you can

perhaps guess what it's going to look

like and see it looked like this curly

braces semicolons and so forth in Python

it's going to now look like this almost

the same but indentation is important

the colons are important and there's one

other difference that's now again

visible here but we didn't call it out a

second ago what else is different in


python versus C for these conditionals

yeah

perfect we don't have any parentheses

around the condition the Boolean

expression itself and why not well it's

just simpler to type it's less to type

you can still use parentheses and in

fact you might want to or need to if you

want to like uh combine thoughts and do

this and that or this or that but by

default you no longer need or should

have those parentheses just say what you

mean lastly with conditionals we had

something like this and if else if else

statement in C it looked a little

something like this in Python it's going

to get really tighter now it's just if

and this is the Curiosity L if x greater

than y so it's not else if it's

literally one keyword L if and the

colons remain now on each of the three

lines but the indentation is important

and if we did want to do multiple things

we could just indent below each of these

conditionals as well all right let me

pause theirs first to see if there's any

questions on the syntactic differences

yeah

all right
[Music]

uh in between it's between what and what

ah good question is Python's uh

sensitive to spaces and where they go

sometimes no sometimes yes is the short

answer stylistically though you should

be practicing what we're preaching here

whereby you do have spaces to the left

and right of binary operators that

they're called something like less than

or greater than is a binary operator

because there's two operands to the left

and to the right of them and in fact in

Python more so than the world of C

there's actually formal style

conventions not only within cs50 have we

had style a style guide on the courses

website for instance that just dictates

how you should write your code so that

it looks like everyone else is in the

python Community they take this one step

further and there's an actual standard

whereby you don't have to adhere to it

but generally speaking in the real world

someone would reprimand you would reject

your code if you're trying to contribute

it to another project if you don't hear

to these standards so while you could be

lacks with some of this white space do

make things readable and that's Python's


theme for the code to be as readable as

possible all right so let's take a look

at a couple of other cons structs before

transitioning to some actual code this

of course in scratch was a loop meowing

forever and see the closest we could get

was doing something while true because

true never changes so it's sort of a

simple way of just saying do this

forever in Python it's pretty much the

same thing but a couple small

differences here the parentheses are

gone the colon is there the indentation

is there no semicolon and there's one

other subtle difference what do you see

true is capitalized just because both

true and false are Boolean values in

Python but you got to start capitalizing

them just because all right how about a

loop like this where you repeat

something a finite number of times like

meowing three times in C we could do

this a few different ways there's this

very mechanical way where you initialize

a variable like I to zero you then use a

while loop and check if I is less than

three the total number of times you want

to meow then you print what you want to

print you increment ah using this syntax


or the longer more verbose syntax with

plus equals or whatnot and then you do

it again and again and again in Python

you can do it functionally the same way

same idea slightly different syntax you

just don't bother saying what type of

variable you want python will infer from

the fact that there's a zero right there

you don't need the parentheses you do

need the colon you do need the

indentation you can't do the I plus plus

but you can do this other technique as

we could have done in C as well how else

might we do this though too well it

turns out in C we could do something

like this which again and sort of

cryptic in at first glance became

perhaps more familiar where you have

initialization a conditional and then an

update that you do after each iteration

in Python there isn't really an analog

there is no analog in Python where you

have the parentheses and the multiple

semicolons in the same line instead

there is a for Loop but it's meant to

read a little more like English for I in

0 1 and 2. so we'll see in a bit these

square brackets represent an array now

to be called a list in Python so lists

in Python are more like linked lists


than they are in Array they are arrays

more on that soon so this just means for

I in the following list of three values

and on each iteration of this Loop

python automatically for you it first

sets I to zero then it sets I to one

then it sets I to two so that you

effectively do things three times

but this doesn't necessarily scale as

I've drawn it on the board suppose you

took this at face value as the way you

iterate some number of times in Python

using a for loop at what point does this

approach perhaps get bad or bad design

let me give folks just a moment to think

yeah I'm back

[Music]

sure if you don't know how many times

you want to Loop or iterate you can't

really create a hard-coded list like

that of zero one two other thoughts

where you wanted

[Music]

yeah if you're iterating a large number

of times this list is going to get

longer and longer and you're just kind

of stupidly going to be typing out like

comma 3 comma four comma five comma dot

dot comma 99 comma 100 I mean your code


would start to look atrocious eventually

so there is a better way in Python there

is a function or technically a type

called range that essentially magically

gives you back a range of values from

zero on up two but not through a value

so the effect of this line of code for I

in the following range essentially hands

you back a list of three values thereby

letting you do something three times and

if you want to do something 99 times

instead you of course just change the

three to a 99 question

is there a way to start to start getting

points at that range

or an integer that's higher than zero or

is there never really

a really good question can you start

counting at a higher number so not zero

which is the implied default but

something larger than that yes so it

turns out the range function takes

multiple arguments not just one but

maybe two or even three that allows you

to customize this Behavior so you can

customize where it begins you can

customize the increment by default it's

one but if you want to do every two

values for like evens or odds you could

do that as well and a few other things


and before long we'll take a look at

some python documentation that will

become your authoritative source for

answers like that like what can this

function do other questions on this thus

far

seeing none so what else might we uh

compare and contrast here well in the

world of C recall that we had a whole

bunch of built-in data types like these

here

um Bull and Char and double and float

and so forth string which happened to

come from the cs50 library but uh the

the language C itself certainly

understood the idea of strings because

the backslash zero the support for

percent s and printf that's all Native

built into c not a cs50 simplification

all we did and revealed as of a couple

of weeks ago is that string this data

type is just a synonym for a type def

for Char star which is part of the

language natively in Python now this

list actually gets a little shorter at

least for these common primitive data

types still going to have Bulls we're

gonna have floats and ins and we're

going to have strings but we're going to


call them stirs and this is not a cs50

thing from the library stir Str is in

fact a data type in Python that's going

to do a lot more than strings did for us

automatically in C ins and floats

meanwhile do don't need the

corresponding Longs and doubles because

in fact among the problems python solves

for us too ins can get as big as you

want integer overflow is no longer going

to be an issue per week one the language

solves that for us floating point and

precision unfortunately is still a

problem that remains but there are

libraries code that other people have

written as we briefly discussed in weeks

past that allow you to do scientific or

financial Computing using libraries that

build on top of these data types as well

so there's other data types too in

Python which we'll see actually gives us

a whole bunch of more power and

capability things called ranges like we

just saw lists like I called out

verbally with the square brackets things

called tuples for things like X comma y

or latitude comma longitude dictionaries

or dicks which allow you to store keys

and values much like our hash tables

from last time and then sets in the


mathematical sense where they filter out

duplicates for you and you can just put

a whole bunch of numbers a whole bunch

of words or whatnot and the language

with via this data type will filter out

duplicate kits for you now there's going

to be a few functions we give you this

week and Beyond training wheels that

we're then going to very quickly take

off just because as we'll see today they

just simplify the process of getting

user input correctly without

accidentally writing buggy code just

when you're trying to get hello world or

something similar to work and will give

you functions not like not as long as

this list in C but a subset of these get

float get int and get string that'll

automate the process of getting user

input in a way that's more resilient

against potential bugs but we'll see

what those bugs might be and the way

we're going to do this is similar in

spirit to see instead of doing include

cs50.h like we did in C you're going to

now start saying import cs50 python

supports similar to C libraries but

there aren't header files anymore you

just use the name of the library in


Python and if you want to import cs50s

functions you just say import cs50 or if

you want to be more precise and not just

import the whole thing which could be

slow if you've got a really big library

with a lot of of functionality in it you

can be more precise and say from cs50

import get flow from cs50 import get int

from cs50 import get string or you can

just separate them by commas and import

three and only three things from a

particular Library like ours but

starting today and onward we're going to

start making much more heavy use of

libraries code that other people wrote

so that we're no longer Reinventing the

wheel we're not making our own linked

lists our own trees our own dictionaries

we're going to start standing on the

shoulders of others so that you can get

real work done so to speak Faster by

building your software on top of others

code as well

all right so that's it for the syntactic

tour of the language and the sort of

core feature soon we'll transition to

application thereof but let me pause

here to see if there's any questions on

syntax or Primitives

or otherwise
or otherwise

oh yes and back

[Music]

oh sorry say it again why doesn't python

have what kind of operators

[Music]

sorry someone called when you said

something operators

[Music]

the oh the increment operator I'd have

to check the history honestly python has

tended to be a fairly minimalist

language and if you can do something one

way the community arguably has tended to

not give you multiple ways to do the

same thing syntactically

um there's probably a better answer and

I'll see if I can dig in and post

something online uh to follow up on that

all right so before we transition to now

writing some actual code let me go ahead

and consider exactly how we're going to

write code in the world of C recall that

it's generally been a two-step process

we create a file called like hello.c and

then step one make hello step two dot

slash hello or if you think back to week

two when we sort of peeled back the

layer of what hello what make was doing


you could more verbosely type out the

name of the actual compiler clang in our

case command line arguments like Dash o

hello to specify what name you want to

create and then you can specify the file

name and then you can specify what

libraries you want to link in so that

was a very verbose approach but it was

always a two-step approach and so even

as you've been doing recent problem sets

odds are you've realized that anytime

you want to make a change to your code

or make a change to your code and try

and test your code again you're

constantly doing those two steps moving

forward in Python it's going to become

simpler and it's going to be just this

the file name is going to change but

that might go without saying it's going

to be something like Hello dot Pi py

instead of hello.c and that's just a

convention using a different file

extension but there's no compilation

step per se you jump right to the

execution of your code and so python it

turns out is the name not only of the

language we're going to start using it's

also the name of a program on a Mac a PC

assuming it's been pre-installed that

interprets the language for you this is


to say that python is generally

described as being interpreted not

compiled and by that that I mean you get

to skip from the programmer's

perspective that compilation step there

is no manual step in the world of python

typically of writing your code and then

compiling it to zeros and ones and then

running the zeros and ones instead these

kind of two steps get collapsed into the

illusion of one whereby you instead are

able to just run the code and let the

computer figure out how to actually

convert it to something the computer

understands and the way we do that is

via this whole process input and output

but now when you have source code it's

going to be passed into an interpreter

not a compiler and the best analog of

this is just to perhaps point out that

in the human world if you speak or don't

speak multiple human languages it can be

a pretty slow process from going from

one language to another for instance

here are step-by-step instructions for

finding someone in a phone book

unfortunately in Spanish unfortunately

if you don't speak or read Spanish you

could figure this out you could run this


algorithm but you're going to have to do

some Googling or you're going to have to

open up literal dictionary from Spanish

to English and convert this and the

catch with translating any language

human or computer or otherwise is that

you're going to pay a price typically

some time and so converting this in

Spanish to this in English is just going

to take you longer than if this were

already in your native language and

that's going to be one of the subtleties

with the world of python yes it's a

feature that you can just run the code

without having to bother compiling it

manually first but we might pay a price

and things might be a little slower now

there's ways to chip away at that but

we'll see an example thereof in fact let

me transition now to just a couple of

examples that demonstrate how python is

not only easier for many people to use

perhaps yourselves too because it throws

away a lot of the Annoying syntax it

shortens the number of lines you have to

write and also it comes with so many

darn libraries you can just do so much

more without having to write the code

yourself so as an example of this let me

switch over here to this image from


problem set four which is the weeks

Bridge down by the Charles River here in

Cambridge and this is the original photo

pretty clear and it's even higher res if

we looked at the original version of the

photo but there have been no filters all

our Instagram applied to this photo

recall for problem set four you had to

implement a few filters and among them

might have been blur and blur was

probably among the more challenging of

the ones because you had to iterate over

all of the pixels you had to take into

account what's above what's below to the

left to the right I mean there's a lot

of math and arithmetic and if you

ultimately got it it was probably a

great sense of satisfaction but that was

probably several hours later in a

language like python where there might

be Library libraries that have been

written by others on Whose shoulders you

can stand we could perhaps do something

like this let me go ahead and run a

program or write a program called blur

dot Pi here and in blur dot Pi in vs

code let me just do this let me import

from a library not the cs50 library but

the pillow Library so to speak a keyword


called image and another one called

image filter then let me go ahead and

say let me open the current version of

this image which is called bridge.bmp so

the before version of the image will be

the result of calling image dot open

quote unquote bridge.bmp

and then let me create an after version

so you'll see before and after after

equals the before version dot filter of

image filter and there is if I read the

documentation I'll see that there's

something called a box blur that allows

you to blur in box format like one pixel

above below left and right so I'll do

one pixel there and then after that's

done let me go ahead and save the file

as something like out dot BMP

that's it assuming this Library works as

described I am opening the file in

Python using line three and this is

somewhat new syntax in the world of

python we're going to start making use

of the dot operator more because in the

world of python you have what's called

object oriented programming or oop as a

term of Art and what this means is that

you still have functions you still have

variables but sometimes those functions

are embedded inside of the variables or


more specifically inside of the data

types themselves think back to C when

you wanted to convert something to

uppercase there was a two upper function

that takes as input in argument that's a

Char and you can pass in any Char you

want and it will upper case it for you

and give you back a value well you know

what if that's such a common Paradigm

where uppercasing charge is a useful

thing what the world of python does is

it embeds into the string data type or

Char if you will the ability just to

uppercase any Char by treating the Char

or the string as though it's a struct in

C recall that strucks encapsulate

multiple types of values in

object-oriented programming in a

language like python you can encapsulate

not just values but also functionality

functions can now be inside of structs

but we're not going to call them structs

anymore we're going to call them objects

but that's just a different vernacular

so what am I doing here inside of the

image Library there's a function called

open and it takes an argument the name

of the file to open once I have a

variable called before that is a struct


or technically an object inside of which

is now because it was returned from this

function a function called filter that

takes an argument the argument here

happens to be image dot box blur one

which itself is a function but it just

Returns the filter to use and then

after.save does what you might think it

just saves the file so instead of using

F open and F right you just say dot save

and that does all of that messy work for

you so it's just what four lines of code

total let me go ahead and go down to my

terminal window let me go ahead and show

you with ls that at the moment whoops

sorry let me not bother showing that

because I have other examples to come

I'm going to go ahead and do python of

blur dot pi

nope sorry wrong place I did need to

make a command there we go okay let me

go ahead and type LS inside of my filter

directory which is among the sample code

online today there's only one file

called bridge.b damn it I'm trying to

get these things ready at the same time

let me rewind let me move this code into

place

all right I've gone ahead and moved this

file blur dot Pi into a folder called


filter inside of which there's another

file called bridge.bmp which we can

confer with ls let me now go ahead and

run python which is my interpreter and

also the name of the language and run

python on this file so much like running

the Spanish algorithm through Google

translate or something like that as

input to get back the English output

this is going to translate the Python

language to something this computer or

this cloud-based environment understands

and then run the corresponding code top

to bottom left to right I'm going to go

ahead and enter no error message is

generally a good thing if I type LS

you'll now see out.bmp let me go ahead

and open that and you know what just to

make clear what's really happening let

me blur it even further let's make a box

that's not just one pixel around but 10

so let's make that change and let me

just go ahead and rerun it with python

of blurred up high I still have out.bmp

let me go ahead and open out.bmp and

show you first the before

which looks like this that's the

original and now crossing my fingers

four lines of code later


the result of blurring it as well so the

library is doing all of the same kind of

leg work that you all did for the

assignment but it's encapsulated it all

into a single library that you can then

use instead those of you who might have

been feeling more comfortable might have

done a little something like this let me

go ahead and open up one other file

called edges.pi and in edges.pi I'm

again going to import from the pillow

Library the image keyword and the image

filter then I'm going to go ahead and

create a before image that's a result of

calling image.open of the same thing

bridge.bmp then I'm going to go ahead

and run a filter

on that called image whoops uh image

filter dot find edges which is like a

constant if you will to find inside of

this library for us and then I'm going

to do after dot save quote unquote

out.bmp using the same file name I'm now

going to run python of edges.pi

after sorry user error we'll see what

syntax error means soon let me go ahead

and run the code now edges.pi let me now

open that new file out.pmp and before we

had this and now especially if what will

look familiar if you did the more


comfortable version of P said four

we now get this after just four lines of

code so again suggesting the power of

using a language that's better optimized

for the tool at hand and at the risk of

really making folks sad let's go ahead

and re-implement if we could problem set

five real quickly here let me go ahead

and open another version of this code

wherein I have a c version just from

problem set five when you implemented a

spell checker loading a hundred thousand

plus words into memory and then you keep

kept track of just how much time and

memory it took and that probably took a

while implementing all of those

functions in dictionary.c let me instead

now go into

a new file called dictionary.pi and let

me stipulate for the sake of discussion

that we already wrote in advance

speller.pi which corresponds to

speller.c you didn't write either of

those recall for problem set five we

gave you speller.c assume that we're

going to give you speller.pi so the onus

on us right now is only to implement

speller dictionary dot Pi all right so

I'm going to go ahead and Define a few


functions and we're going to see now the

Syntax for defining functions in Python

I want to go ahead and Define a first a

hash table which was the very first

thing you defined in dictionary.c I'm

going to go ahead then and say words

gets this give me a dictionary otherwise

known as a hash table all right now let

me Define a function called check which

was the first function you might have

implemented check is going to take a

word and you'll see in Python the syntax

is a little different you don't specify

the return type you use the word def

instead to Define you still specify the

name of the function and the any

arguments there too but you omit any

mention of types but you do use a colon

and indent so how do I check if a word

is in my dictionary or in my hash table

well in Python I can just say if word in

words go ahead and return true else go

ahead and return false done with the

check function all right now I want to

do like load that was the heavy lift

where you had to load the big file into

memory so let me Define a function

called load it takes a string the name

of a file to load so I'll call that

dictionary just like in C but no data


type let me go ahead and open a file by

using an open function in Python by

opening that dictionary in read mode so

this is a little similar to F open a

function and so you might recall then

let me iterate over every line in the

file in Python this is pretty Pleasant

for line in file colon indent how now do

I get at the current word and then strip

off the new line because in this file of

words 140 000 words there's word

backslash N word backslash n all right

well let me go ahead and get a word from

the current line but strip off from the

right end of the string the new line

which the r strip function in Python

does for me then let me go ahead and add

to my dictionary or hash table that word

done let me go ahead and close the file

for good measure and then let me go

ahead and return true because all was

well that's it for the load function in

Python how about the size function this

did not take any arguments it just

Returns the size of the hash table or

dictionary in Python I can do that by

returning the length of the dictionary

in question and then lastly gone from

the world of python is malloc and free


memory is managed for you so no matter

what I do there's nothing to unload the

computer will do that for me so I give

you in these functions problem set 5 in

Python so I'm sorry we made you write it

in C first but the implication now is

that what are you getting for free in a

language like python well encapsulated

in this one line of code is much of what

you wrote for problem set five

implementing your array for all of your

letters of the alphabet or more all of

the link lists that you implemented to

create chains to store all of those

words all of that is happening it's just

someone else in the world wrote that

code for you and you can now use it by

way of a a dictionary and actually I can

change this a little bit because add is

technically not the right function to

use here I'm actually treating the

dictionary as something simpler a set so

I'm going to make one tweak set recall

was another data type in Python but set

just allows it to handle duplicates and

it allows me to just throw things into

it by literally using a function as

simple as ADD and I'm going to make one

other tweak here because when I'm

checking a word it's possible it might


be given to me in uppercase or

capitalized it's not going to

necessarily come in in the same lower

case format that my dictionary did I can

force every word to lowercase by using

word dot lower and I don't have to do it

character for character I can do the

whole darn string At Once by just saying

word dot lower

all right let me go ahead and open up a

terminal window here and let me go into

first my C version on the left and

actually I'm going to go ahead and split

my terminal window into two and on the

right I'm going to go into a version

that I essentially just wrote but it's

also available online if you want to

play along afterward I'm going to go

ahead and make speller and C on the left

and note that it takes a moment to

compile then I'm going to be ready to

run speller of dictionaries let's do

like the Sherlock Holmes text which is

pretty big and then over here let me get

ready to run python of speller on texts

homes dot txt2 so the syntax is a little

different at the command prompt I just

on the left have to compile the code

with make and then run it with DOT slash


speller on the right I don't need to

compile it but I do need to use The

Interpreter so even though the lines are

wrapping a little bit here let me go

ahead and run it on the right and I'm

going to count how long it takes

verbally for demonstration's sake

One Mississippi two Mississippi three

Mississippi okay so it's like three

seconds give or take

now running it in Python keeping in mind

I spent way fewer hours implementing a

spell checker in Python than you might

have in problem set five but what's the

trade-off going to be and what kinds of

design decisions do we all now need to

be making consciously here we go on the

right in Python

One Mississippi two Mississippi three

Mississippi four Mississippi five

Mississippi six Mississippi seven

Mississippi eight Mississippi nine

Mississippi 10 Mississippi 11

Mississippi all right so 10 or 11

seconds

so which one is better let's like go to

the group here which of these programs

is the better one how might you answer

that question based on demonstration

alone
what do you think

[Music]

okay so python to summarize is better

for the programmer because it was way

faster to write but C is maybe better

for the computer because it's much

faster to run I think that's a

reasonable formulation other opinions

yeah

I think it depends on the size of the

project that you're dealing with so if

it's going to be something that's

relatively quick I might not care that

it takes 10 seconds to do it and it took

me way faster with python whereas and

see if I'm dealing with something like

dudes that that time is going to really

build up on it might be worth it to put

in The Upfront effort and just loading

it and see so the process continually

runs faster over

okay absolutely a really good answer and

let me summarize is it depends on the

workload if you will if you were to you

if you have a very large data set you

might want to optimize your code to be

as fast and performant as it can be

especially if you're running that code

again and again maybe you're a company


like Google people are searching a huge

database all the time you really want to

squeeze every bit of performance as you

can out of the computer you might want

to have someone smart take a language

like C and write it at a very low level

it's going to be painful they're going

to have bugs they're going to have to

deal with memory management and like but

if and when it works correctly it's

going to be much faster it would seem by

contrast if you have a data set that's

big and 140 000 words is not small but

you don't want to spend like five hours

10 hours a week of your time building a

spell checker or dictionary you can

instead leverage a different language

with different libraries and build on

top of it in order to prioritize the

human time instead other thoughts

[Music]

that perfect segue to exactly the next

point we wanted to make which was is

there something in between and indeed

there is I'm oversimplifying what this

language is actually doing it's not as

Stark a difference as saying like hey

python is four times slower than C like

that's not the right takeaway there are

absolutely ways that Engineers can


optimize languages as they have already

done for Python and in fact I've

configured my settings in such a way

that I've kind of dramatized just how

big the difference is it is going to be

slower python typically than the

equivalent C program but it doesn't have

to be as big of a gap as it is here

because indeed among the features you

can turn on in Python is to save some

intermediate results technically

speaking yes python is interpreting uh

dictionary dot pi and these other files

translating them from one language to

another but that doesn't mean it has to

do that every darn time you run the

program as you propose you can save or

Cache CAC e the result of that process

so that the second time and the third

time are actually notably faster and in

fact python itself The Interpreter the

most popular version thereof itself is

actually implemented in C so you can

make sure that your interpreter is as

fast as possible and what then is maybe

the high level takeaway yes if you are

going to try to squeeze every bit of

performance out of your code and maybe

code is constrained you maybe you have


very small devices maybe it's like a

watch nowadays or maybe it's a sensor

that's installed in some small format in

an appliance or in infrastructure where

you don't have much battery life and you

don't have much size you might want to

minimize just how much work is being

done and so the faster the code runs and

the better it's going to be if it's

implemented something low level so C is

still very commonly used for certain

types of applications but again if you

just want to solve real world problems

and get real work done and your time is

just as if not more valuable than the

device you're running it on long term

you know what python is among the most

popular line languages as well and

frankly if I were implementing a spell

checker moving forward I'm probably

starting with python I'm not going to

waste time implementing all of that low

level stuff because the whole point of

using newer modern languages is to use

abstractions that other people have

created for you and by abstraction I

mean something like the dictionary

function that just gives you a

dictionary or hash table or the

equivalent version that I used which in


this case was a set

all right any questions then on python

thus far

no all right let's oh yeah in the middle

[Music]

I'm not sure

it feels like

really good question or observation

could you just compile python code yes

absolutely this idea of compiling code

or interpreting code is not native to

the language itself it tends to be

native to the conventions that we humans

use so you could actually write an

interpreter for C that would read it top

to bottom left to right converting it to

on the fly something the computer under

the computer understands but

historically that's not been the case C

is generally a compiled language but it

doesn't have to be what python nowadays

is actually doing is what you described

earlier it technically is sort of

unbeknownst to us compiling the code

technically not into zeros and ones

technically into something called byte

code which is this intermediate step

that just doesn't take as much time as

it would to recompile the whole thing


and this is an area of research for

computer scientists working in

programming languages to improve these

kinds of paradigms why well honestly for

you and I the programmer it's just much

easier to one run the code and not worry

about the stupid second step of

compiling it all the time why it's

literally half as many steps for me the

human and that's a nice thing to

optimize for and ultimately two you

might want all of the fancy features

that come with these other languages so

you should really just be fine-tuning

how you can enable these features as

opposed to shying away from them here

and in fact the only time I personally

ever you see is from like September to

October of every year during cs50 almost

every other month do I reach for python

or another language called JavaScript to

actually get real work done which is not

to impune C it's just that those other

languages tend to be better fits for the

amount of time I have to allocate and

the types of problems that I want to

solve all right let's go ahead and take

a five minute break here and when we

come back we'll start writing some

programs from scratch


all right so let's go ahead and start

writing some code from the beginning

here whereby we start small with some

simple examples and then we'll build our

way up to more sophisticated examples in

Python but what we'll do along the way

is First Look side by side at what the C

code looked like way back in week one or

two or three and so forth and then write

the corresponding python code it right

and then we'll transition just to

focusing on python itself what I've done

in advance today is I've downloaded some

of the code from the course's website my

source 6 directory which contains all of

the pre-written C code from weeks past

but it'll also have copies of the Python

code will right here together and look

at so first here is hello.c back from

week zero this was version zero of it

I'm going to go ahead and do this I'm

going to go ahead and split my code

window up here I'm going to go ahead and

create a new file called hello.pi and

this isn't something you'll typically

have to do laying your code out side by

side but I've just clicked the little

icon in vs code that looks like two

columns that splits my code editor into


two places so that we can in fact see

things for now side by side with my

terminal window down below all right now

I'm going to go ahead and write the

corresponding Python program on the

right which recall was just print quote

unquote hello world and that's it now

down in my terminal window I'm going to

go ahead and run python of hello.pi

enter and voila we've got hello.pi

working so again I'm not going to play

any further with the c code it's there

just to jog your memory left and right

so let's now look at a second version of

hello world from that first week whereby

if I go and get hello 1.c I'm going to

drag that over to the right whoops I'm

going to go ahead and drag that over to

the left here and now on the right let's

modify hello.pi to look a little more

like this second version in C all right

I want to get a uh answer from the user

as a return value but I also want to get

some input from them so from cs50 I'm

going to import the function called get

string for now we're going to get rid of

that eventually but for now it's a

helpful training wheel and then down

here I'm going to say answer equals get

string quote unquote what's your name


question mark space but no semicolon no

data type and then I'm going to go ahead

and print just like the first example on

the slide hello comma space plus answer

and now let me go ahead and run this

python hello.pi all right it's asking me

what's my name David hello comma David

but it's worth calling attention to the

fact that I've also

simplified further it's not just that

the individual functions are simpler

what is also now glaringly omitted from

my python code at write both in this

version and the previous version what

did I not bother implementing

yeah so I didn't even need to implement

main we'll revisit the main function

because having a main function actually

does solve problems sometimes but it's

no longer required and see you have to

have that to kick start the entire

process of actually running your code

and in fact if you were missing main as

you might have experienced if you

accidentally compiled helpers.c instead

of the file that contained main you

would have seen a compiler error in

Python it's not necessary python you can

just jump right in start programming and


boom you're good to go especially if

it's a small program like this you don't

need the added overhead or complexity of

a main function so that's one other

difference here all right there are a

few other ways we could say hello world

recall that I could use a format string

so I could put this whole thing in

quotes I could use this F prefix and

then let me go ahead and run python if

hello.pi again you can perhaps see where

we're going with this let me type my

name David and here we go okay that's

the mistake that someone identified

earlier you need the curly braces

otherwise no variables are interpolated

that is substituted with their actual

values so if I go back in and add those

curly braces to the F string now let me

run python of hello.pi type in my name

and there we go we're back in business

which one's better I mean it depends but

generally speaking making shorter more

concise code tends to be a good thing so

stylistically the F string is probably a

reasonable instinct to have all right

well what more can we do besides this

well let me go ahead here and let's get

rid of the training wheel altogether

actually so same C code at left let me


get rid of the cs50 library which we

will ultimately in a couple of weeks

anyway I can't use get string but I can

use a function that comes with python

called input and in fact this is

actually a one for one substitution

pretty much there's really no downside

to using input instead of get string we

Implement get string just for

consistency with what you saw in C

python of hello.pi what's your name

David still actually works the same so

gone or the cs50 specific training

wheels but we're going to bring them

back shortly just to deal with integers

or Floats or other values too because

it's going to make our lives a little

simpler with error checking all right

any questions before we now pivot to

revisiting other examples from week one

but now in Python

all right let me go ahead and open up

now let's say calculator 0.c which was

one of the first examples we did

involving math and operators like that

as well as functions like get int let me

go ahead and create a new file now

called calculator dot Pi at right so

that I have my C code at left still and


my python code at right all right let me

go dive into a translation of this code

into python I am going to use get int

from the csrt library so let me import

that I'm going to go ahead now and get

an INT from the user so x equals get int

and I'll ask them for an x value just

like we did weeks ago no need to specify

a semicolon though or a int for the X it

will just figure it out Y is going to

get another int via y colon and then

down here I'm going to go ahead and say

print of X Plus y so this is already a

bit new recall the C version required

that I use this format string as well as

printf itself Python's just a little

more user friendly all you want to do is

print out a value like X Plus y just

print it don't fuss with any percent

signs or format codes it's not printf

it's indeed just print now all right let

me go ahead and run python of calculator

dot Pi enter and just do a quick sample

one plus two indeed equals three as an

aside suppose I had taken a different

approach to importing the whole cs50

Library functionally it's the same

you're not going to notice any

performance impact here it's a small

library but notice what does not work


now whereas it did work in C python of

calculator.pi enter

we see our first Trace back deliberately

here so a trace back is just a term of

art that says here is a trace back

through all of the functions that just

got executed in the world of scene you

might call this a stack Trace stack

being the Opera operative word recall

that when we talked about the stack and

the Heap the stack like a stack of trays

was all of the functions that might get

called one after the other we had Maine

we had swap then swap went away and then

Maine finished recall so here's a

traceback of all the functions or code

that got executed there's not really any

functions other than my file itself

otherwise there'd be more detail but

even though it's a little cryptic we can

perhaps infer from the output here name

error so something related to the name

of something name get int is not defined

and this of course happens on line three

over there all right so why is that well

python essentially allows us to name

space our functions that come from

libraries there was a problem in C if

you were using the cs50 library and thus


had access to get ink get string and so

forth you could not you use another

library that had the same function names

they would Collide and the compiler

would not know how to link them together

correctly in Python and other languages

like JavaScript and in Java you have a

support for effectively what would be

called namespaces you can isolate

variables and function names to like

their own namespace like their own

container in memory and what this means

is if you import all of cs50 you have to

say that the get int you want is inside

the cs50 Library so just like with the

image blurring and the image edges

before where I had to specify image dot

in image filter dot similarly here am I

specifying with a DOT operator albeit a

little differently than I want

cs50.getint in both places and now if I

rerun pythonofcalculator.pi one and two

now we're back in business which one is

better generally speaking

it depends on just how many functions

you're using from the library if you're

using a whole bunch of functions just

import the whole thing if you're only

using maybe one or two import them line

by line
all right so let's go ahead and make a

little tweak here let's get rid of this

library and take this training wheel off

too as quickly as we introduced it

though for the problem set six you'll be

able to use all of these same functions

suppose I get rid of this and I just use

the input function just like I did

by replacing get string earlier let me

go ahead now and run this version of the

code python of calculator.pi

okay how about one plus two equals three

huh

all right obviously wrong incorrect can

anyone explain what just happened based

on instincts

what just happened here yeah

[Music]

sure yeah

[Music]

exactly python is interpreting boat or

treating both X and Y as strings which

is actually what the input function

returns by default and so plus is now

being interpreted as concatenation as we

defined it earlier so X Plus Y is an X

Plus y mathematically but in terms of

string joining just like in scratch so

that's why we're getting 12 or really


one two which isn't itself a number it2

is another string so we somehow need to

convert things and we didn't have this

ability quite as easily in C we did have

like the a to I function ASCII to

integer which did allow you to do this

the analog in Python is actually just to

do a cast a Typecast using int so just

like in C you can use the keyword int

but use it a little differently notice

that I'm not doing parenthesis int close

parenthesis before the value I'm using

int as a function so indeed function in

Python int is a function float is a

function that you can pass values into

to do this kind of conversion so now now

if I run python of calculator dot Pi 1

and 2 now we're back in business and

getting the answer of 3. but there's

kind of a catch here there's always

going to be a trade-off like that sounds

amazing that it just works in this way

we can throw away the cs50 library

already but what if the user

accidentally types or maliciously types

in like a cat instead of a number damn

well there's one of these Trace backs

like now my program has crashed this is

similar in spirit to the kinds of seg

faults that you might have had in C but


they're not seg faults per se it doesn't

necessarily relate to memory this time

it relates to actual runtime values not

being as expected so this time it's not

a name error it's a value error invalid

literal for INT with base 10 quote

unquote cat so again it's written for

sort of a programmer more than

um sort of a typical person because it's

pretty arcane the language here but

let's try to interpret it invalid

literal a literal is just something

someone typed for INT which is the

function name with base 10 it says

defaulting to decimal numbers cat is

apparently not a decimal number doesn't

look like it therefore can't be treated

like it therefore there's a value error

so what can we do unfortunately you

would have to somehow catch this error

and the only way to do that in Python

really is by way of another feature that

c did not have namely what are called

exceptions an exception is exactly what

just happened name error value error

they are things that can go wrong when

your python code is running that aren't

necessarily going to be detected until

you run your code so in Python and in


JavaScript and in Java in other more

modern languages there's this ability to

actually try to do something except if

something goes wrong and in fact I'm

going to introduce a bit of syntax here

even though we won't have to use this

much just yet instead of just blindly

converting X to an INT let me go ahead

and try to do that and if there's an

exception go ahead and say something

like uh

print

uh

that is not an INT and then I'm going to

do something like exit right there and

let me go ahead and do this here let me

try to get y except if there's an

exception then let me go ahead and say

again that is not an INT exclamation

point and then I'm going to exit from

there too otherwise I'll go ahead and

print X Plus y if I run python if

calculator.pi now whoops uh

oh forgot my close quote sorry

all right so close quote python of

calculator dot Pi uh one and two still

work but if I try to type in something

wrong like cat now it actually detects

the error so what is the cs50 library in

Python doing it's actually doing that


try and accept for you because suffice

it to say otherwise your programs for

something simple like a calculator start

to get longer and longer so we factored

that kind of logic out to the cs50 get

int function and get float function but

underneath the hood they're essentially

doing this try accept but they're being

a little more precise they're detecting

a specific error and they are doing it

in a loop so that these functions will

get executed again and again in fact the

best way to do this is to say except if

there's a value error then print that

error message out to the user and again

let's not get too into the weeds here

with this feature we've already put it

into the cs50 library but that's why for

instance we bootstrap things by just

using these functions out of the box

all right let's do something more with

our calculator here how about this in

the world of C we had another version of

this code which actually did some

division by way of

um which actually did division of

numbers not just

the addition herein so let me go ahead

and close the C version and let's focus


only on python now doing some of these

same lines of codes but I'm going to go

ahead and just assume that the user is

going to cooperate and use proper input

so from cs50 import get int that'll deal

with any errors for me X gets uh get int

ask the user for an into x y equals get

int ask the user for an into Y and then

let's go ahead and do this let's declare

a variable called Z set it equal to x

divided by y then let's go ahead and

print Z still no need for a format

string I can just print out the

variables value let me go ahead and run

python of calculator.pi let me do 1 10

and I get 0.1 what did I get in C though

if you think back

what do we have to happen and see

yeah we would have gotten zero in C but

Y in C when you divide one inch by

another and those ins are like one in

ten respectively

it will give you what

it will give you an integer back and

unfortunately 0.1 the integer part of it

is indeed zero so this was an example of

truncation so truncation was an issue in

C but it would seem as though this is no

longer a problem in python insofar as

the division operator actually handles


that for us as an aside if you want the

old Behavior because it actually is

sometimes useful for rounding or

flooring values you can actually use two

slashes and now you get the C Behavior

so that now one divided by ten is zero

so you don't give up that capability but

at least it does a more sensible default

most people especially new programmers

when dividing one value by another would

want to get 0.1 not zero for reasons

that indeed we had to explain weeks ago

but what about another problem we had

with the world of floats before whereby

there was in Precision let me go ahead

and somewhat cryptically print out the

value of Z as follows I'm going to

format it using an F string and I'm

going to go ahead and format not just Z

because this is essentially the same

thing notice this if I do python of

calculator.pi 1 and 10 I get by default

just one significant digit but if I use

this syntax in Python which we won't

have to use often I can actually do and

see like I did before 50 significant

digits after the decimal point so now

let me rerun python of calculator.pi 1

and 10 and let's see if floating point


and precision is still with us

unfortunately it is and you can see as

much here the F string the format string

is just showing us now 50 digits instead

of the default one so we've not solved

all problems but we have solved at least

some all right before we pivot away from

a mirror calculator any questions now

on syntax or concepts or the like yeah

[Music]

how do you what oh how do you comment

really good question if you're using

double slash for division with flooring

or truncation like I describe how do you

do a comment in Python this is a comment

and the convention is actually to use a

complete sentence like uh with a capital

T here you don't need a period unless

there's multiple sentences and

technically it should be above the line

of Code by convention so you would use a

hash symbol instead good question

haven't seen those yet all right let's

go ahead and make something else here

how about let me go ahead and open up

for instance an example called

points1.c which we saw a few weeks back

and let me go ahead on the other side

and create a file called points.pi this

was a program recall that asked the user


how many points they lost on uh the

first assignment and then it went ahead

and just printed out whether they lost

fewer points than me because I lost two

if you recall the photo more points than

me or the same points as me let me go

ahead and zoom out so we can see a bit

more of this and let me now on the top

right here go about implementing this in

Python so I want to First prompt the

user for some number of points so from

cs50 let's import get in so it handles

the error checking let's then do points

equals get int and ask the user how many

points did you lose question mark then

let's go ahead and say if points less

than 2 which was my value print you lost

fewer points than me otherwise if it's

else if points greater than 2 go ahead

and print

uh you lost more points than me

else let's go ahead and handle the final

scenario which is you lost the same

number of points as me before I run this

does anyone want to point out a mistake

I've already made yeah

[Music]

yeah so else if in C is actually now L

if in Python it's a single word so let


me change this to L if and now cross my

fingers python of points.pi suppose you

lost three points on some assignment you

lost more points than my two if you only

lost one point you lost fewer points

than me so the logic is the same but

notice the code is much tighter in 10

Total Lines we did in what was 424 lines

because we've thrown away a lot of the

syntax the curly braces are no longer

necessary the parentheses are gone the

semicolons so this is why it just tends

to be more pleasant pretty quickly using

a language like this

all right let's do one other example

here

um in C recall that we were able to

determine the parity of some number if

something is even or odd well in Python

let me go ahead and create a file called

parity.pi and let's look for a moment at

the C version at left here was the code

in C that we used to determine the

parity of a number and really the key

takeaway from all these lines was just

the remainder operator and that one is

still with us so this is a simple

demonstration just to make that point if

in Python I want to determine whether a

number is even or odd well let's go


ahead and from cs50 import get int then

let's go ahead and get a number like n

from the user using get int and ask them

for n and then let's go ahead and say if

n percent sign 2 equals zero then let's

go ahead and print quote unquote even

else let's go ahead and print out odd

but before I run this

anyone want to instinctively even though

we've not talked about this point out a

mistake here

would I do wrong

yeah so double equals again so even

though some of the stuff is changing

some of the same ideas are the same so

this two should be a double equal sign

because I'm comparing for equality here

and why is this the right math well if

you divide a number by two it's either

going to have zero or one as a remainder

and that's going to determine if it's

even or odd for us so let's run python

of parity dot Pi type in a number like

50 and hopefully we get indeed even so

again same idea but now we're down to

eight lines of code instead of the 20.

well let's now do something a little

more interactive and a little

representative of tools that actually


ask the user questions in C recall that

we had this agreement program agree.c

and then let's go ahead and Implement a

corresponding version in Python in a

file called agree dot pi and let's look

at the C version first on the left we

used get Char here and then we use the

double vertical bars to check if C is

equal to Capital y or lowercase Y and

then we did the same thing for n for no

and so let's go over here and let's do

from cs50 import get okay get Char is

not a thing and this here is another

difference with python there is no data

type for individual characters you have

strings stirs and honestly those are

fine because if you have a stir that's

just one character for all intents and

purposes it is just a single character

so it's just a simplification you don't

have to think as much you don't have to

worry about double quotes single quotes

in fact in Python you can use double

quotes or single quotes so long as

you're consistent so long as you're

consistent the single quotes do not mean

something different like they do in C so

I'm going to go ahead and use get string

here although strictly speaking I could

just use the input function as we saw


before I'm going to get a string from

the user that asks them this get string

quote unquote do you agree like a little

check box or interactive prompt do we

have to say yes or no you want to agree

to the following terms or whatnot and

then let's translate the conditionals to

python now too so if s equals equals

quote unquote y

or S equals equals lowercase y let's go

ahead and print out agreed just like in

c l if s equals equals n or S equals

equals little n let's go ahead then and

print out not agreed and you can already

see perhaps one of the differences here

too is python a little more english-like

in that you just literally use the

English word or instead of the two

vertical bars but it's ultimately doing

the same thing

can we simplify this code a bit though

this would be a little Annoying if we

wanted to add support not just for Big Y

and little y but yes or big yes or

little yes or Big Y Lowery case e

capital s right there's a lot of

permutations of y-e-s or just y that we

ideally should tolerate otherwise the

user has gonna have to type exactly what


we want which isn't very user friendly

any intuition for how we could logically

even if you don't know how to do it in

code make this better yeah

nice yeah we saw an example of a list

before just 0 1 2 why don't we take that

same idea and ask a similar question If

s is in the following list of values y

or little y or heck let me add to the

list now yes or maybe all capital yes

and it's going to get a little Annoying

admittedly but this is still better than

the alternative with all the ores I

could do things like this and so forth

there's a whole bunch more permutations

but let's leave this alone and let me

just go into here and change this to If

s is in the following list of n

or little n or no and I won't do as

let's just not worry about the weird

capitalizations there for now let's go

ahead and run this python of agree.pi do

I agree why okay how about yes all right

how about big yes okay that does not

seem to work notice it did not say

agreed and it did not say not agreed it

didn't detect it so how can I do this

well you know what I could do

what I don't really need the uppercase

and lowercase let me tighten this list


up a little bit and why don't I just

force s to be lowercase s dot lower

recall whether it's one character or

more is a function built into stirs now

strings in Python that forces the whole

thing to lower case so now watch what I

can do python of agree dot PI Little y

that works Big Y that works big yes that

works Big Y little e big S that also

works so we've now handled in one Fell

Swoop a whole bunch more logic and you

know what we can tighten this up a bit

here's an opportunity in Python for

slightly better design what have I done

in here that's a little redundant

does anyone see an opportunity

to eliminate a redundancy doing

something more than times than you need

is a stretcher no yep

we could move the s dot lower above

notice that I'm using s dot lower twice

but it's going to give me the same

answer both times so I could do a couple

of things here I could first of all get

rid of this lower and get rid of this

lower

and then Above This maybe I could do

something like this s equal I can't just

do this because that throws the value


away it does the math but it doesn't

convert the string itself it's going to

return a value so I have to say s equals

s dot lower I could do that or honestly

I can chain these things together and

this is not something we saw and see if

get string returns a string and strings

have functions like lower in them you

can chain these functions together like

this and do dot this dot that dot this

other thing and eventually you want to

stop because it's going to become crazy

long but this is reasonable still fits

on the screen it's pretty tight it does

in one place what I was doing in two so

I think that's okay let me go ahead and

do python of agree.pi one last time

let's try it one last time and it's

still working as intended it's also if I

tried those other inputs as well yeah

question

[Music]

case as well or like upper and then

cover all the functions where it's

lowercase four all functions

of course they do not put those together

um let me summarize could we could we

handle uppercase and lowercase together

in some form I'm actually doing that

already I just have to pick a lane I


have to either be all lowercase in my

logic or all uppercase and not worry

about what the human types in because no

matter what the human types in I'm

forcing their input to lower case and

then I am using a lowercase list of

values if I want to flip that fine I

just have to be self-consistent but I'm

handling that already yeah

[Music]

a really good loaded questions are

strings no longer an array of characters

conceptually yes underneath the hood no

they're a little more sophisticated than

that because with strings you have a few

changes not only do they have functions

built into them because strings are now

what we call objects in what's called

object oriented programming and we're

going to keep seeing examples of this

dot operator they are also immutable so

to speak

i-m-m-u-t-a-b-l-e immutable means they

cannot be changed which means unlike C

you can't go into a string and change

its individual characters you can make a

copy of the string that makes a change

but you can't change the original string

itself this is both a little Annoying


maybe sometimes but it's also pretty

protective because you can't do

screw-ups like I did weeks ago when I

was trying to copy s and call it t and

then one affected the other python

underneath the hood is handling all of

the memory management and the pointers

and all of that there are no pointers in

pythons if that wasn't clear all of that

pain if you will all of that power is

now handled by the language itself not

by us the programmers

all right so let's introduce maybe some

Loops like we've been uh in the habit of

doing let me open up meow.c which was an

example in C just meowing a bunch of

times textually let me create a file

called meow.pi here on the right and

notice on the left this was correct code

in C but it was kind of poorly designed

why because it was a missed opportunity

for a loop why say something three times

when you can say it just once so in

Python let me do it the poorly designed

way first let me print out meow and like

I generally should not let me copy paste

it three times run pythonofmeow.pi and

it works okay but not good practice so

let me go ahead and improve this a

little bit and there's a few ways to do


this if I wanted to do this three times

I could instead do something like this

for I in range of three recall that that

was the better version rather than

arbitrarily enumerate numbers yourself

let me go ahead and print out quote

unquote meow now if I run python of meow

still seems to work so it's a little

tighter in my God like programs can't

really get much shorter than this we're

down to two lines of code no main

function no gratuitous syntax let's now

improve the design further like we did

in C by introducing a function called

meow that actually does the meowing so

this was our first abstraction recall

both in scratch and in C let me Focus

now entirely on the python version here

let me go ahead and first uh Define a

function

let me first go ahead and do this for I

in range of three let's assume for the

moment that there's a meow function that

I'm just going to call let's now go

ahead and Define using the def keyword

which we saw briefly with the speller uh

demonstration a function called meow

that takes no arguments and all it does

for now is print meow let me now go


ahead and run python of meow.pi enter

one of those Trace backs so this is

another name error and again name meow

is not defined

what's your instinct here even though

we've not tripped over this yet in

Python where does your mind go here

[Music]

perfect as smart as as smarter as python

seems to be it also it still makes

certain assumptions and if it hasn't

seen a keyword yet it just doesn't exist

so if you want it to exist we have to be

a little clever here I could just put it

flip it around like this but this

honestly isn't particularly good design

why because now if you the reader of

your code whether you wrote it or

someone else you kind of have to go

fishing now like where does this program

begin and even though yes it's obviously

that it begins on line four logically

like if the trial were longer you're

going to be annoyed in fishing visually

for the right lines of code so let's

reintroduce Main and indeed this would

be a common Paradigm when you want to

start having abstractions and your own

functions just put your own code in main

so that one you can leave it up top and


two you can solve the problem we just

encountered so let me Define a function

called main that has that same Loop

meowing three times but now Watch What

Happens let me go into my terminal and

run python of meow.pi enter

nothing

all right

investigate this what could explain this

symptom I have not told you the answer

yet so all you have is your instinct

assuming you've never touched python

before

What might explain this symptom where

nothing is meowing

yeah

yeah I didn't run the main function so

in C this is functionality you get for

free you have to have a main function

but heck so long as you make it it will

be called for you in Python this is just

a convention to create a main function

borrowing a very common name for but if

you want to call that main function you

have to do it so this looks a little

weird admittedly that you have to call

your own main function now and it has to

be at the bottom of the file because

only once The Interpreter gets to the


bottom of the file have all of your

functions been defined higher up but

this solves both problems it keeps your

code that's the main part of your code

at the very top of the file so it's just

obvious to you and a TF or any reader in

the future where the programs logically

starts but it also ensures that main is

not called until everything else made

included has been defined so this is

another perfect example of we're

learning a new language for the first

time you're not going to have heard all

the answers before just apply some logic

as to like all right what could explain

this symptom start to infer how the

language does or doesn't work if I now

go and run this pythonofmeow.pi now

we're back in business and just so you

have seen it there is a quote unquote

better way of doing this that solves

different problems that we are not going

to encounter certainly in these initial

days typically you would see an online

tutorials or books something that looks

like this where you actually have a

weird conditional with multiple

underscores that's functionally the same

thing but it solves problems with

libraries if we ourselves were


implementing a library or something

similar in spirit but we're going to

keep things simpler and just write main

at the bottom because we're not going to

encounter that problem just yet all

right let's make one change to this just

to show how it's done in C the last

version of meow also took command line

argue sorry also took arguments to the

function meow so suppose that I want to

factor this out and I want to just call

meow as a better abstraction where I

just say meow this number of times and I

figure out how many times by just like

putting in number three or using get int

or something like that to figure out how

many times to say meow well now I have

to Define inside my meow function in

input let's call it n and then use that

as by doing this for I in range of n

let me go ahead and print out meow that

many times so again the only thing

that's different in C is we don't bother

specifying return types for any of these

functions and we don't bother specifying

the type of our arguments or our

variables so same idea is simpler in

some sense we're just throwing away

keystrokes all right let me run this one


final time python of meow.pi and we

still have the same program

all right let me pause here any

questions and I know this is going fast

but hopefully the C code is still

somewhat familiar

yeah

good question is there any difference

between Global and local variables short

answer yes and we would run into that

same problem if we declare a variable in

one function another function is not

going to have access to it we can solve

that by putting variables globally but

we don't have all of the features we had

in seed like there's no such thing as a

constant in Python the mentality in the

python Community is if you don't want

some value to change don't touch it like

just don't screw up so there's

trade-offs here too some languages are

stronger or more defensive than that but

that too is pyter of the mindset with

this particular language

[Music]

oh sorry where's say a ladder

[Music]

that is an amazing segue let's come to

that in just a moment because we're

going to recreate also that Mario


example where we had like the um

question marks for the coins and the

vertical bar so let's come back to that

in a second and your question

[Music]

correct strings are immutable anytime

you seem to be modifying it as with the

lower function you're getting back a

copy so it's taking a little more memory

somewhere but you don't have to deal

with it Python's doing that for you

[Music]

say it again you don't need what

[Music]

you don't free anything so if you

weren't a big fan over the past couple

of weeks of malloc or free or memory or

addresses or all of those low-level

implementation details python is the

language for you because all of that is

handled for you automatically Java does

the same JavaScript does the same yeah

[Music]

how do you define a global variable if

there's no main function in Python

Global variables by definition always

need to be outside of Maine as well so

that's not a problem if I wanted to have

a function that's outside of and


therefore Global to all of these like

Global actually don't use the word

Global that's a special word in Python a

variable equals foo foo just as an

arbitrary string value that a computer

scientist would typically use that is

now global there are some caveats though

as to how you access that but let's come

back to that another time but that

problem is solvable too all right so

let's go ahead and do this to come back

to the question about the print command

let me go ahead and create a file now

called mario.pi won't bother showing the

C code anymore we'll focus just on the

new language here but recall that in

Python in Mario we wanted to First do

something like this this was a random

screen from the side scroller version

one of my Super Mario Brothers and we

just want to print like three hashes to

represent those three blocks well in

Python we could do something like this

print oh sorry for I in the range of

three go ahead and print out quote

unquote hash and I think this is pretty

straightforward python of mario.pi we

get our three hashes you could imagine

parameterizing this now though and

getting actual user input so let's do


that let me go up here and let me go and

say from cs50 import get int and then

let's get the input from the user so it

actually is a value n like all right get

int the height of the column of bricks

that you want to do and then let's go

ahead and print out n hashes instead of

three so let me run this let's print out

like five hashes okay one two three four

five that seems to work too and it's

going to work for any positive value but

it's not going to work for how about

negative one that just doesn't do

anything but that seems okay but also

recall that it's not going to work if if

the user types in something weird like

oh sorry it is going to work if the user

types in something weird like cat why

we're using cs50s get int function which

is handling all of those headaches for

us but what if the user indeed types a

negative number we're tolerating that so

that was the bug I wanted to highlight

it would be nice to re-prompt them and

reprompt them and in C what was the

programming construct we used when we

wanted to ask the user a question and

then if they didn't cooperate prompt

them again prompt them again


what was that yeah yeah do while loop

right that was useful because it's

almost the same as a while loop but

instead of checking a condition and then

doing something you do something and

then check a condition which makes sense

with user input because what are you

even going to check if the user hasn't

done anything yet you need that inverted

logic unfortunately in Python there is

no do while loop there is a for Loop

there is a while loop and frankly those

are enough to recreate this idea and the

way to do this in Python the pythonic

way which is another term of Art in the

community is to say this deliberately

induce an infinite Loop while true with

capital T for true and then do what you

got to do like get an INT from a user

asking them for the height of this thing

and then if that is what you want like a

number greater than zero go ahead and

break out of the loop so this is how in

Python you could recreate the idea of a

do while loop you deliberately induce an

infinite Loop so something's going to

happen at least once then if you get the

answer you want you break out of it

effectively achieving the same logic so

this is the pythonic way of doing a do


while loop let me go ahead and run

python of mario.pi type in three this

time and now I get back just the three

hashes as well what if though I wanted

to get rid of

uh how about ultimately that cs50

Library function and also encapsulate

this in a function well let's go ahead

and tweak this a little bit let me go

ahead and remove this temporarily give

myself a main function so I don't make

the same mistake as I did initially

earlier and let me give myself a

function called get height that takes no

arguments and inside of that function is

going to be that same code but I don't

want to break in this case I want to

return n so recall that if you return

from a function you're done you're going

to exit from right that point so this

would be fine you can just say return n

inside of the loop or if you would

prefer to break out you could do

something like this instead break

and then down here you could return

down here you could return n as well and

let me make one point here before we go

back up to main this is a little

different from C and this one's subtle


what have I done here that in C would

have been a bug

but is apparently not I claim in Python

super subtle this one yeah

[Music]

so so similar it's not quite that we're

using it for so it's okay not to declare

a variable with like the data type we've

addressed that before but on line nine

we're assigning and a value it seems and

then we return uh n on line 12 but

notice the indentation in the world of C

if we had declared a variable inside of

a loop on line nine it would have been

scoped to that Loop which means as soon

as you get out of that Loop like further

down in the program n would not exist it

would be local to the curly braces

they're in here logically curly braces

are gone but the indentation makes clear

that n is still inside of this Loop

between lines 8 through 11 but n is

actually still in scope in Python the

moment you create a variable in Python

For Better or For Worse it is available

everywhere within that function even

outside of the loop in which you defined

it so this logic is actually okay in

Python in C recall to solve this same

problem we would have had to do


something a little package like this

like Define n up here on line eight so

that it exists now on line 10 and so

that it exists on line 13 that is no

longer an issue or need in Python once

you create a variable even if it's

nested nested nested inside of Some

Loops or conditionals it still exists

within the function itself

all right any questions then on on this

before we now run this and then get rid

of the cs50 library again

[Music]

okay so let me go ahead and get the

height from the user let's go ahead and

create a variable in main called height

let's call this get height function and

then let's use that height value instead

of something hard coded there and let me

see if this all works now python of

mario.pi hopefully I haven't messed up

but I did but this is an easy fix now

yeah

I gotta call Maine so again I deleted

that earlier but let me bring it back so

I'm actually calling main let me rerun

uh python of mario.pi there we go height

three now it seems to be working so

let's do one last thing with Mario just


to tie together that idea now of

exceptions from before again exceptions

are a feature of python whereby you can

try to do something and if there's a

problem you can handle it in any way you

see fit previously I handled it by just

yelling at the user that that's not an

INT but let's actually use this to

re-implement cs50's own get int function

let me throw away cs50's get in function

and now let me go ahead

and replace uh get int with input but

it's not sufficient to just use input

what do I have to add to this line of

code on line a

if I want to get back in int

yeah I have to cast it to an end by

calling the int function around that

value or I could do it on a separate

line just to be clear I could also do n

equals int of n that would work too but

it's sort of an unnecessary extra line

This is not sufficient because that does

not change the value it creates the

value but then it throws it away you

need to assign it to the conventional

way to do this would probably be in one

line just to keep things nice and tight

so that works fine now if I run python

of mario.pi I can still type in 3 and


all as well I can still type in negative

one because that is an INT that I am

handling what I'm not yet handling is

weird input like cats or some string

that is not a base 10 number so here

again is my Trace back and notice that

here let me scroll up a little bit

here we can actually see more detail in

the trace back notice that just like in

C or just like in the debugger in vs

code you can see a few things you can

see mention of module that just means

your file main which is my main function

and get height so notice it's kind of

backwards it's top to bottom instead of

bottom up as we drew it on the board the

other day and as we envisioned stacks of

trays in the cafeteria but this is your

stack of functions that have been called

from top to bottom get height is the

most recent main is the very first value

error is the problem so let's try to do

let's try to do this literally except if

there's an error so what do I want to do

I'm going to go in here and I'm going to

say try to do the following

whoops try to do the following except

if there's a value error value error

then go ahead and say something like


well like before print that's not an

integer exclamation point But the

difference this time is because I'm in a

loop the user is going to have a chance

to recover from this issue so if I run

mario.pi 3 still works as before if I

run mario.pi and type in cat I detect it

now and because I'm still in that Loop

and because the program hasn't crashed

because I've caught so to speak the

value error using this line of code here

that's the way in Python to detect these

kinds of errors that would otherwise end

up being on the user's own screen if I

type in cat dog that doesn't work if I

type in the two I get my two hashes

because that's indeed an INT

are any questions on this and we're not

going to spend too much time on

exceptions but just wanted to show you

what's involved with getting rid of

those training wheels yeah

okay so let's do this that actually

comes to the earlier question about

printing the hashes on the same line or

maybe something like this where we have

little bricks in the sky or little

question marks let's recreate this idea

because the problem with print as was

noted earlier is you're automatically


printing out new lines but what if we

don't want that well let's change this

program entirely let me throw away all

the functions let's just go to a simpler

world where we're just doing this so let

me start fresh in marioed up high I'm

not going to bother with exceptions or

functions let's just do a very simple

program to create this idea for I in

range of four this time because there

are four of these things in the sky

let's go ahead and just print out a

question mark to represent each of those

bricks

odds are you know this is not going to

end well because these are unfortunately

as you predicted on separate lines

so it turns out that the print function

actually takes in multiple arguments not

just the thing you want to print but

also some additional arguments that

allow you to specify what the default

line ending should be but what's

interesting about this is that if you

want to change the line ending to be

something like quote unquote that is

nothing instead of backslash n this is

not sufficient because in Python you can

have two types of arguments or


parameters some arguments are positional

which is the fancy way of saying it's a

comma separated list of arguments and

that's what we did all the time in C

something comma something comes

something we did in printf all the time

and in other functions that took

multiple arguments in Python you have

not only positional arguments where you

just separate them by commas to give one

or two or three or more arguments

they're also named arguments which looks

weird but is helpful for reasons like

this if you read the documentation you

will see that there is a named argument

that python accepts called n and if you

set that equal to something that will be

used as the end of every line instead of

the default which the documentation will

also say is quote unquote backslash n so

this line here has no effect on my logic

at the moment but if I change it to just

quote unquote essentially overriding the

default new line character and now Run

Mario again now I get all four on the

same line there's a bit of a bug though

my prompt is not meant to be on the same

line so I can fix that by just printing

nothing but really it's not nothing

because you get the new line for free so


let me run python of

mario.pi again and now we have what I

intended in the first place which was a

little something that looked like this

and this is just one example of an

argument that has a name but this is a

common Paradigm in Python 2 to not just

separate things by commas but to be very

specific because the print function

might take 5 10 even 20 different

arguments and my God if you had to

enumerate like 10 or 20 commas you're

going to screw up you're going to get

things in the wrong order named

arguments allow you to be resilient

against that so you only specify

arguments by name and it doesn't matter

what order they are in

are any questions then on on this and

the overriding of new line and to be

clear you can do something like very

weird but logically expected like this

by just changing the line ending too but

the right way to solve the Mario problem

would be just to override it to be

nothing like this

all right how about this for cool and

this is why a lot of people like python

suppose you don't really like Loops you


don't really like three line programs

because that was kind of three times

longer than it needs to be what if you

just printed out a question mark four

times

python whoops python of mario.pi that

also works so it turns out that just

like the plus operator in Python can

join things together the multiply

operator is not arithmetic in this case

it actually means take this and do the

concatenate it four times over so that's

a way of just distilling into one line

what would have otherwise taken multiple

lines in C fewer but still multiple

lines in Python but is really now rather

succinct in C in Python by doing that

instead let's do one last Mario example

which looked a little something like

this if this is another part of the

Mario interface this is like a grid of

like three by three bricks for instance

so two Dimensions now just not just

vertical not horizontal but now both

let's print out something like that

using hashes well how about

how do I do this so how about 4i in

range of 3

then I could do for J in range of 3 just

because J comes after I and that's


reasonable for counting I could now

print out a hash symbol

uh well let's see what this does python

of mario.pi okay that's just one crazy

long

column what's what do I need to fix and

where here to make this look like this

so three by three bricks instead of one

long column

any instincts

[Music]

okay so after printing three we want to

skip a line so maybe like print out a

blank line here okay let's try that I

like that instinct right print three new

line print three new line let's go ahead

and run python of mario.pi okay it's

more visible what I'm doing but still

wrong what can I what's the remaining

fix though yeah

[Music]

yeah I'm getting an extra new line here

which I don't want while I'm on this row

so let me do end equals quote unquote

and now together your Solutions might

take us the whole way there Mario uh

pythonfire.pi voila now we've got it in

two dimensions and even this we can

tighten up like we could just use the


little trick we learned so we could just

say print a hash times three times and

we can get rid of one of those Loops

altogether all it's doing is Autumn

whoops all it's doing is automating that

process but no I don't want to do that

what do I how do I fix this here oh I

don't think I want this anymore right

because that's giving me an extra new

line so now this program is really

tightened up same thing two lines of

code but we're now implementing the same

two-dimensional structure here

are any questions here

on these yeah

[Music]

if print n any space say that once more

uh

[Music]

[Music]

oh

um oh yes good question I see what

you're saying so in a previous version

Let Me rewind in time when we had this I

did not put spaces the convention in

Python is not to do that why it just

starts to add too much space and this is

a little inconsistent because earlier

when we talked about like pluses or

spaces around the less than or equal


signs I did say added here it's actually

clear and recommended to keep them

tighter together otherwise it just

becomes harder to read where the gaps

are good observation all right let's do

how about

um

another five minute break let's do that

and then we're going to dive into some

more sophisticated problems and then

ultimately build with some audio and

visual examples as well see you in five

all right

so almost all of the examples we just

did were Recreations of what we did in

week one and recall that week one was

like our most syntax heavy week it was

when we were first learning how to

program and see but after week one we

began to focus a bit more on ideas like

arrays and other higher level constructs

and we'll do that again here condensing

some of those first early weeks into a

fewer set of examples in Python and

we'll culminate by actually taking

python out for a spin and doing things

that would be way harder to do and way

more time consuming to do and see even

more so than the speller example but how


do you go about figuring out like what

functions exist if you didn't hear it in

class you don't see it online but you

want to see it officially you can go to

the python documentation

docs.python.org here and I will disclaim

that honestly the python documentation

is not terribly user friendly Google

will often be your friend so Googling

something you're interested in to find

your way to the appropriate page on

python.org or stackoverflow.com is

another popular website as always though

the line should be Googling things like

how do I convert a string to lower case

like that's reasonable to Google or how

to convert to uppercase or how Implement

function in Python but Googling of

course things like how to implement

problem set 6 in cs50 of course crosses

the line but moving forward and really

with programming in general like Google

and stack Overflow are your friends but

the line is between the reasonable and

the unreasonable so let me officially

use the python documentation search just

to search for something like the

lowercase function like I know I can

lowercase things in Python I don't quite

remember how so let me just search for


the word lower you're going to get often

an overwhelming number of results

because Python's a pretty big language

with lots of functionality and you're

going to want to look for familiar

patterns for whatever reason string dot

lower which is probably more popular or

more commonly used than these other ones

is third on the list but it's purple

because I clicked it a moment ago when

looking for it so stir dot lower is

probably what I want because I am

interested at the moment in lowercasing

strings when I click on that this is an

example of what Python's documentation

tends to look like it's in this General

format here's my stir dot lower function

this returns a copy of the string with

all the cased characters converted to

lowercase and the lowercasing algorithm

dot dot dot so that doesn't give me much

it doesn't give me sample code but it

does say what the function does and if

we keep looking you'll see mention of L

strip which is left strip I used its

analog R strip before Right strip which

allows you to remove that is stripped

from the end of a string something like

white space like a new line or even


something else and if you scroll through

string this web page here and we're

halfway down the page already if you see

my scroll bar tiny on the right there's

a huge amount of functionality built

into string objects here and this is

just Testament to just how rich the

language itself is but it's also reason

to uh to assure that the goal when

playing around with some new language

and learning it is not to learn it

exhaustively just like in English or any

human language there's always going to

be vocab words you don't know uh ways of

presenting the same information in some

language that's going to be the case

with python and what we'll do today and

this week in problem set 6 is really get

your footing with this language but you

won't know all of python just like you

won't know all of c and honestly you

won't know all of any of these languages

on your own unless you're perhaps using

them full-time professionally and even

then there's more libraries than one

might even retain themselves so let's

actually now pivot to a few other ideas

that we'll Implement in Python in a

moment let me switch back over to vs

code here and let me whip up say a


recreation of our scores example from

week two where we average like three

scores together and that was an

opportunity in week two to play with a

raise to realize how constrained arrays

are they can't grow or Shrink you have

to decide in advance but let's see

what's different here in Python so let

me do scores dot pi

and let me give myself an array in

Python called score sorry let me give

myself a variable in Python called

scores set it equal to a list of three

scores which are the same ones we've

used before 72 73 33 and this context

meant to be scores not ASCII values and

then let's just do the average of these

so average will be another variable and

it turns out I can do

well how did I sum these before I

probably had a for Loop to add one then

I knew how long there were turns out in

Python you can just say sum of scores

divided by the length of scores that's

going to give me my average so sum is a

function that takes a list in this case

as input and it just does the sum for

you with a for Loop or whatever

underneath the hood Lang gives you the


length of the list how many things are

in it so I can dynamically figure that

out now let me go ahead and print out

using print the word average and then in

curly braces the actual average close

quote all right so let's run this code

python of scores dot pi and there's my

average in this case

59.33333 and so forth based on the math

well let's actually now change this a

little bit and make it a little more

interesting and actually get input from

the user rather than hard coding this

let me go back up here and use from cs50

import get in because I don't want to

deal with all the exceptions and the

loops like I just want to use someone

else's function here let me give myself

an empty list called scores and this is

not something we were able to do and see

right because in C if you try to make an

empty array well that's pretty stupid

because you can't add things to it it's

a fixed size so it wouldn't even let you

do that but I can just create an empty

list in Python because lists unlike

arrays are really linked lists they'll

grow and Shrink but you and I are not

dealing with all the pointers underneath

the hood Python's doing that for us so


now let's go ahead and get a whole bunch

of scores from the user I'll have about

three of them in total so four I in

range of three let's go ahead and grab a

score from the user using get int asking

them for score and then let's go ahead

and append to the scores uh list that

particular score so it turns out that a

list and I could read the python

documentation to confirm as much lists

have a function built into them and

functions built into objects are

generally known as methods if you've

heard that term before same idea but

whereas a function kind of stands on its

own a method is a function built into

and object like a list here that's going

to achieve the same result strictly

speaking I don't need the variable just

like in C I could tighten this up and do

something like this as well but I don't

know I kind of like it this way it's

more clear to me at least that what I'm

doing here getting the score and then

appending it to the list now the rest of

the code can stay the same python of

scores.pi score will be 72 73 33 and I

get back the math but now the program's

a little more Dynamic which is nice but


there's other syntax I could use here

just so you've seen it python does have

some neat syntactic tricks whereby if

you don't want to do scores dot append

you can actually say scores plus equals

this score so you can actually

concatenate lists together in Python 2.

just as we use plus to join two strings

together you can use plus to join two

lists together the catch is you need to

put the one score I'm adding here in a

list of its own which is kind of silly

but it's necessary so that this thing

and this thing are both lists to do this

more verbosely which most programmers

wouldn't do but just for clarity this is

the same thing as saying scores plus

this score so now maybe it's a little

more clear that scores and bracket score

plural sorry singular are both lists

themselves being concatenated or joined

together so two different ways not sure

one is better than the other this way is

pretty common but dot append is also

quite reasonable as well

all right how about another example from

week two uh this one was called uh

uppercase so let me do this in uppercase

dot Pi though this time and let me

import from cs50 getstring again


and let me go ahead and say before will

be my first variable let me get a string

from the user asking them for a before

string and then let me go ahead and say

after just to demonstrate some changes

upper casing to this string uh let me

change my line ending to be that using

our new trick and this is where things

get cool in Python relatively speaking

if I want to iterate over all of the

characters in a string and print them

out in uppercase one way to do that

would be this 4 C in the before string

go ahead and print out C dot uppercase

sorry C Dot Upper but don't end the line

yet because I want to keep these all on

the same line until I'm all done so what

am I doing python of uppercase.pi let me

type in hello and I'll lower case I've

just uppercased the whole string how I

first get string calling it before I

then just print out some fluffy text

that says after colon and I get rid of

the line ending just so I can kind of

line these up notice I hit the space bar

a couple times just so letters line up

to be pretty 4C and before this is is

new this is powerful in C I'm sorry in

Python whereby you don't have to do like


in I equals zero and I less than this

you could just say for C in the string

in question for C and before and then

here is just uppercasing that specific

character and making sure we don't

output a new line too soon but this is

actually more work than I need to do

based on what we've seen thus far like

from our agreement example can I tighten

this up further can I Collapse lines

five and six maybe even seven all

together if I the goal of this program

is just to uppercase

the before string

how might I do this

yeah and back

[Music]

stir Dot Upper yeah so I could do

something like this after it gets before

Dot Upper so it's not stir literally Dot

Upper stir just represents the string in

question so it would be before Dot Upper

but right idea otherwise and so let me

go ahead and just tweak my print

statement a little bit let me just go

ahead and print out the after variable

here after creating it so this line is

the same I'm getting a string called

before I'm creating another variable

called after and as you propose I'm


calling upper on the whole string not

one character at a time why because it's

allowed and again in Python there aren't

technically characters individually

there's only strings anyway so I might

as well do them all at once so if I

rerun the code now python of

uppercase.pi now I'll type in hello in

all lowercase and oh so close I think I

can get rid of this override because I'm

printing the whole thing out at once not

character by character so now if I type

in hello before now I have an even

tighter version of the program here

all right any questions then on lists or

on strings

and what this kind of function upper

represents with its Docs

[Music]

all right so a couple other building

blocks before we start oh

where was that

to the right the right right yes thank

you

[Music]

yes do I have to create this variable

upper no I don't I could actually

tighten this up and if you really want

to see something neat inside of the


curly braces you don't have to just put

the names of variables you can put a

small amount of logic so long as it

doesn't start to look stupid and kind of

overwhelmingly complex such that it's

sort of bad design at that point I can

tighten this up like this and now run

python of uppercase dot Pi writing hello

again and that too worked but I would be

careful about this you want to resist

the temptation of having like a long

line of code that's inside the curly

braces because it's just going to be

harder to read but absolutely you could

indeed do that too all right how about

uh command line arguments which is one

thing we introduced in week two also so

that we could actually have the ability

to take input from the user whoops so we

could actually take input from the user

at the command line so as to take

literally command line arguments these

are a little different but it follows

the same Paradigm there's no no main by

default and there's no def main int rxc

Char or we called it string ARG V by

default there's none of this so if you

want access to the argument Vector ARG V

you import it and it turns out there's

another module in python or library in


Python called sys and you can import

from the system this thing called ARG V

so same idea different place now I'm

going to go ahead and do this let's

write a program that just requires that

the user types in two a word after the

program's name or none at all so if the

length of ARG V equals two let's go

ahead and print out how about hello

comma ARG V bracket one

close quote else if they don't type two

words total at the prompt let's just say

the default like we did weeks ago hello

world so the only thing that's new here

is we're importing RGB from sys and

we're using this fancy F string format

which kind of to your point too it's

it's putting more complex logic in the

curly braces but that's okay this in

this case it's a list called ARG V and

we're getting bracket one from it let's

do python of arcv dot Pi enter hello

world what if I do ARG V Dot Pi David at

the command line now I get hello David

so there's one curiosity here python is

not included in ARG V whereas in C dot

slash whatever was the first thing if

the analog in Python is that the name of

your Python program is the first thing


in bracket zero which is why David is in

bracket one the word python does not

appear in the ARG V list just to be

clear but otherwise the idea of these

arguments is exactly the same as before

and in fact what you can do which is

kind of cool is because arcv is a list

you can do things like this for ARG in

ARG V go ahead and print out each

argument so instead of using a for Loop

an i and all of this if I do python of

ARG V enter it just writes the program's

name if I do python of argv Foo it puts

arguv dot pi and Foo if I do sorry if I

do Foo and bar those words all print out

if I do foobar baz those print out too

and Foo and borrower baz are like a

mathematicians X and Y and Z for a

computer scientists when you just need

some placeholder words so this is just

nice it reads a little more like English

and a for Loop is just much more concise

allows you to iterate very quickly when

you want something like that suppose I

only wanted the real words that the

human typed after the program's name

like suppose I want to ignore

argv.pi I mean I could do something

hackish like this if ARG equals argv dot

Pi I could just ignore you know let's


invert the logic I could do this for

instance so if the ARG does not equal

the program name then go ahead and print

out the word so I get Fubar and baz only

or this is what's kind of neat about

python 2. let me undo that and let me

just take a slice of the array of the

list instead so it turns out if argue is

a list I can actually say you know what

go into that list start at element one

instead of zero and then go all the way

to the end and we have not seen the

syntax in C but this is a way of slicing

a list in Python so now watch what

happens if I run python of RV dot Pi Foo

bar baz enter I get only a subset of the

list starting at position one going all

the way to the end and you can even do

kind of the opposite if for whatever

reason you want to ignore the last

element you can say colon

we could say colon negative one and use

a negative number which we've not seen

before which slices off the end of the

list as well so there's some syntactic

tricks that tend to be powerful in

Python 2 even if at first glance you

might not need them for typical things

all right let's do one other example


with exit and then we'll start actually

applying some algorithms to make things

interesting so in one last program here

let's do exit dot Pi just to do one more

mechanic before we introduce some

algorithms and

let's do this let's import

um from sys import ARG V Let's now do

this let's make sure the user gives me

one command line argument so if the

length of ARG V does not equal 2 in

total then let's go ahead and print out

something like missing command line

argument just to explain what the

problem is and then let's do this we can

exit but I'm going to use a better

version of exit here let me import two

functions from sys turns out the better

way to do this is with sys.exit because

I can then exit specifically to with

this exit code otherwise down here I'm

going to go ahead and print out

something like uh hello comma RV bracket

one same as before and then I'm going to

exit with zero so again this was a

subtle thing we introduced in week two

where you can actually have your

programs exit with some number where

zero signifies success and anything else

signifies error this is just the same


idea in Python so if I for instance just

run the program like this oops I screwed

up I meant to say exit here and exit

here let me do that again if I run this

like this I'm missing a command line

argument so let me rerun it with like my

name at the prompt so I have exactly two

command line arguments the file name and

my name hello comma David and if I do

David Malin it's not going to work

either because now ARG V does not equal

two but the difference here is that

we're exiting with one so that special

programs can detect an error or zero in

the event of success and now there's one

other way to do this too suppose that

you're importing a lot of functions and

you don't really want to make a mess of

things and just have all of these

function names available without it

being clear where they came from let's

just import all of CIS and let's just

change our syntax kind of like I

proposed for cs50 where we just prepend

to all of these Library functions CIS

just to be super explicit where they

came from and if there's another uh exit

or arc V value that we want to import

from a library this is one way to avoid


Collision so if I do it one last time

here here missing command line argument

but David still actually works all right

only to demonstrate how we can Implement

that same idea let's now do something

more powerful like a search algorithm

like binary search I'm going to go ahead

and open up a file called numbers.pi and

let's just do some searching or linear

search rather on a list of numbers

let's go ahead and do this how about

import sys as before let me give myself

a list of numbers like four six eight

two seven five zero so just a bunch of

integers and then let's do this if you

recall from week three we searched for

the number zero at the end of the

lockers on stage so let's just ask that

question in Python no need for a loop or

anything like that if 0 is in the

numbers go ahead and print out found and

then let's just exit successfully with

zero else if we get down here let's just

say print not found and then we'll

assist then we'll assist exit with one

so this is where python starts to get

powerful again here's your list here is

your Loop that's doing all of the

checking for you underneath the hood

python is going to use linear search you


don't have to implement it yourself no

while loop no for Loop you just ask a

question if zero is in numbers then do

the following so that's one feature we

now get with python and get to throw

away a lot of that code we can do it

with strings too let me open in a file

called names.pi instead and do something

that was even more involved in C because

we needed stir comp and the for Loop and

so forth let me import CIS for this file

let's give myself a bunch of names like

we did in C and those were Bill and

Charlie and Fred and George and Ginny

and two more uh Percy and lastly Ron and

recall at the time we looked for Ron and

so we had to iterate through the whole

thing doing stir comp and I plus plus

and all of that now just ask the

question if Ron is uh in names then

let's go ahead and whoops let me hide

that I hit the command too soon let me

go ahead and say print uh found as

before sys Exit 1 just to indicate

success and then down here if we get to

this point we can say not found and then

we'll just sys exit one instead so again

this just does linear search for us by

default python of names.pi we found Ron


because indeed he's there and at the end

of the list but we don't need to deal

with all of the mechanics of it all

right let's take things one step further

in week three we also implemented the

idea of a phone book that actually

Associated keys with values but remember

the phone book in C was kind of a hack

right because we first had two arrays

one with names one with numbers then we

introduced structs and so we gave you a

person structure and then we had array

of we had an array of persons you can do

this in Python using objects and things

called classes but we can also just use

a general purpose dictionary because

just like in P said five you can

associate keys with values using a hash

table using a try will similarly can

python just do this for us from cs50

let's import get string and now let's

give myself a dictionary of people dict

open print close print gives you a

dictionary or you can simplify the

syntax actually and a dictionary again

is just keys and values words and

definitions you can also just use curly

braces instead that gives me an empty

dictionary but if I know what I want to

put in it by default let's put Carter in


there with a number of plus one six one

seven four nine five one thousand just

like last time and put myself David with

plus one nine four nine four six eight

2750 and it came to my attention

tragically after class that day that we

had a bug in our little Easter egg if

today you would like to call me or text

me at that number we have fixed the code

that underlies that little Easter egg

spoiler ahead all right so this now

gives me a variable called people that's

associating keys with values there is

some new syntax here in Python not just

the curly braces but the colons and the

quotes on the left and the right this is

a way in Python of associating keys with

values words with definitions anything

with anything else and it's going to be

a super common Paradigm including in

week seven when we look at CSS and HTML

and web programming keys and values are

like this omnipresent idea in computer

science and programming because it's

just a really useful way of associating

one thing with another so at this point

in the story we have a dictionary a hash

table if you will of people associating

names with phone numbers just like a


real world phone book let's write a

program that gets a string from the user

and asks them whose number they would

like to look up

then let's go ahead and say if that name

is in the people dictionary go ahead and

print out that person's number by going

into the people dictionary and going to

that specific name within there using an

F string for the whole thing so this is

similar to in spirit to before linear

search and Dictionary lookups will just

happen automatically for you in Python

by just asking the question if name and

people and this Line's just going to

print out whoever is in the people

dictionary at that name so I'm using

square brackets because here's the

interesting thing in Python just like

you can index into an array or a list in

Python using numbers 0 1 2 you can very

conveniently index into a dictionary in

Python using square brackets as well and

just to make clear what's going on here

let me go and create a temporary

variable person equals people bracket

name and then let's just or sorry let's

say number equals people bracket name

and they'll just print out the number in

question
in C and previously in Python anything

with square brackets like this would

have been go to a location in a list or

an array using a number but that can

actually be a string like a word the

human is typed and this is what's

amazing about dictionaries it's not like

a big line of a big linear thing it's

this table that you can look up in one

column the name and get back in the

other column the number so let's go

ahead and run python of phonebook.pi

found not that oh wait uh

that's not what's supposed to happen at

all

I think I'm in the wrong place

what's going on

print found

I am confused okay

let's run this again pythonophonebook.pi

what the

[Music]

okay stand by

[Music]

uh-huh

[Music]

foreign

[Music]

what am I not understanding here


[Music]

okay wrong Shin Carter do you see what

I'm doing wrong

[Music]

what the

say again

[Music]

oh what yeah uh I found okay we're gonna

do this one sec

[Music]

whoa okay

[Music]

um all this is coming out of the video

um so

thanks

all right I will try to figure out what

was going wrong the best I can tell it

was running the wrong program I don't

quite understand why so we will diagnose

this later I just put the file into a

temporary directory for now to run it so

let me go ahead and just run this python

of phonebook.pi type in for instance my

name and there's my corresponding number

I have no idea what was just happening

but I will get to the bottom of it and

update you if we can put our finger on

it so this was just an example now of

implementing a phone book Let's now

consider what we can do that's a little


more powerful in these examples like a

phone book that actually keeps this

information around thus far these simple

phone book examples throw the

information away but using CSV files

comma separated values maybe we could

actually keep around the names and

numbers so that like on your phone you

can actually keep your contacts around

long term so I'm going to go ahead now

and do a slightly different example and

let me just hide this detail so it's not

confusing whoops I'm going to change my

prompt temporarily so let me go ahead

now and refine this example as follows

I'm going to go into

phonebook.pi and I'm going to import a

whole Library called CSV and this is a

powerful one because python comes with a

library that just handles CSV files for

you a CSV file is just a file with comma

separated values and in fact to

demonstrate this let me check on one

thing here just to make this a little

more real to demonstrate this let's go

ahead and do this let me import the CSV

library from cs50 let me import get

string let me then open a file using the

open function open a file called


phonebook.csv in append format in

contrast with read format and write

format right just blows it away if it

exists append adds to the bottom of it

so I keep this phone book around just

like you might keep adding contacts to

your phone now let me go ahead and get a

couple values from the user let me say

get string and ask the user for a name

then let me get get string again

and ask the user for their number and

now let me go ahead and do this and this

is new and this is python specific and

you would only know this by following a

tutorial or reading the documentation

let me give myself a variable called

writer and ask the CSV library for a

writer to that file

then let me go ahead and use that writer

variable use a function or a method

inside of it called right row to write

out a list containing that person's name

and number notice the square brackets

inside the parentheses because I'm just

printing a list to that particular Row

in the file and then I'm just going to

close the file so what is the effect of

all of this well let me go ahead and run

this version of phonebook.pi and I'm

prompted for a name let's do Carter's


first plus one six one seven four four

nine five

one thousand

and then let's go ahead and LS notice in

my current directory there's two files

now phonebook.pi which I wrote and

apparently

phonebook.csv CSV just stands for comma

separated values and it's like a very

simple way of storing data in a

spreadsheet if you will where the comma

represents the separation between your

columns there's only two columns here

name and number but because I'm writing

to this file in append mode let me run

it one more time python of phonebook.pi

and let me go ahead and do David and

plus one

949-468-2750 enter and notice what

happened in the CSV file it

automatically updated because I'm now

persisting this data to the file in

question so if I wanted to Now read this

file in I could actually go ahead and do

linear search on the data using a read

function to actually read from the CSV

but for now we'll just leave it a little

simply as right and let me make one

refinement here it turns out that if


you're in the habit of re opening a file

you don't have to even close it

explicitly you can instead do this you

can instead say with the opening of a

file called phonebook.csv in append mode

calling the thing file go ahead and do

all of these lines here so the with

keyword is a new thing in Python and

it's used in a few different ways but

one of the ways it's used is to tighten

up code here and I'm going to move my

variables to the outside because they

don't need to be inside of the width

statement where the file is open this

just has the effect ensuring that you

the programmer don't screw up and

accidentally Don't Close Your file in

fact you might recall from C valgren

might have complained at you if you had

a file that you didn't close a file you

might have had a memory leak as a result

the width keyword takes care of all of

that for you as well

let me go ahead and propose that on your

phone or laptop here or online go to

this URL here where you'll find a Google

form and just to show that these csvs

are actually kind of omnipresent and if

you've ever like used a Google form or

manage a student group or something


where you've collected data via Google

forms you can actually export all of

that data via CSV files so go ahead to

this URL here

and those of you watching on demand

later will find that the form is no

longer working since we're only doing

this live but that will lead to a Google

form that's going to let everyone input

their answer to a question like what

house do you want to end up into sort of

an approximation of the Sorting Hat in

Harry Potter and Via this form will we

then have the ability to export we'll

see a CSV file

so let's give you a moment to do that

in just a moment I'll share my version

of the screen which is going to let me

actually open the file the form itself

and in just a moment I'll switch over

okay so this is now my version of the

form here where we have 200 plus

responses to a simple question of the

form what house do you belong in

Gryffindor Hufflepuff Ravenclaw or

Slytherin if I go over to responses I'll

see all of the responses in the GUI form

here so graphical user interface and we

could flip through this and it looks


like uh interestingly 40 of Harvard

students want to be in Gryffindor

um 22 in Slytherin and everyone else in

between the others but you might have

noticed if ever using a Google form this

Google spreadsheet slang so I'm going to

go ahead and click that and that's going

to automatically open in this case

Google spreadsheets but you can say you

do the same thing with Office 365 as

well and now you see the raw data as a

spreadsheet but in Google spreadsheets

if I go to file and then I go to

download notice I can download this as

an Excel file a PDF and also a CSV comma

separated values so let me go ahead and

do that that gives me a file in my

downloads folder on my computer I'm

going to now go back to my code editor

here and what I'm going to go and ahead

and do is upload this file from my

downloads folder to vs code so that we

can actually see it within here and now

you can see this open file and I'm going

to shorten its name just so it's a

little easier to read I'm going to

rename this using the MV command to just

hogwarts.csv and then we can see in the

file that there's two columns timestamp

column house where you have a whole


bunch of time stamps when people filled

out the form with someone very early in

class and then everyone else just a

moment ago and the second value after

each comma is the name of the house well

let me go ahead here and Implement a

program in a file called Hogwarts dot Pi

that processes this data so in

hogwarts.pi let's just wear at a program

that now reads a CSV in this case not a

phone book but everyone's Sorting Hat

information and I'm going to go ahead

and import CSV and suppose I want to

answer a reasonable question ignoring

the fact that Google's GUI or graphical

user interface can do this for me I just

want to count up who's going to be in

which house so let me give myself a

dictionary called houses that's in

initially empty with curly braces and

let me pre-create a few keys let me say

Griff indoor is going to be initialized

to zero Hufflepuff will be initialized

to zero as well Ravenclaw will be

initialized to zero and finally

Slytherin will be initialized to zero so

here's another example of a dictionary

or a hash table just being a very

general purpose piece of data you can


have keys and values the keys in this

case are the houses the values are

initially zero but I'm going to use this

instead of like four separate variables

to keep track of everyone's answer to

this form so I'm going to do this with

opening Hogwarts dot CSV in read mode

not append I don't want to change it I

just want to read it as file as my

variable name let's go ahead and create

a reader this time

that is using the reader function in the

CSV Library by opening that file I'm

going to go ahead and ignore the first

line of the file because recall that the

first line is just time stamp and house

I want to get the real data so this next

function is just a little trick for

ignoring the first file the first line

of the file then let's do this for every

other Row in the reader that is line by

line get the current person's house

which is in Row Bracket one this is what

the CSV Reader Library is doing for us

it's handling all of the reading of this

file it figures out where the comma is

and for every Row in the file it hands

you back a list of size 2 in bracket

zero is the timestamp and bracket one is

the house name so in my code I can say


house equals Row Bracket one I don't

care about the timestamp for this

program and then let's go into my

dictionary called houses plural index

into it at the house location by its

name and increment that zero to one

and now at the end of this block of code

that has the effect of iterating over

every line in the file updating my

dictionary in four different places

based on whether someone typed

Gryffindor or uh Slytherin or anything

else and notice that I'm using the name

of the house to index into my dictionary

to essentially go up to this little

cheat sheet and change the zero to a one

the one to a two the two to a three

instead of having like four separate

variables which would be just be much

more annoying to maintain down at the

bottom let's just print out the results

for each house in those houses iterating

over the keys therein by default in

Python let's go ahead and print out an F

string that says the current house has

the current uh count and count will be

the result of indexing into houses for

that given house and let me close my

quote
so let's run this to summarize the data

Hogwarts dot Pi 140 of you answered

Gryffindor 54 Hufflepuff 72 Ravenclaw

and 80 of you Slytherin and that's just

my now way of code and this is oh my God

so much easier than C to actually

analyze data in this way and one of the

reasons that Python's so popular for

data science and analytics more

generally is that it's actually really

easy to manipulate data and run

analytics like this and let me clean

this up slightly it's a little Annoying

that I just have to know and trust that

the house name is in bracket zero

bracket one and timestamp is in bracket

zero let's clean this up there's

something called a dictionary reader in

the CSV library that I can use instead

capital D capital r this means I can

throw away this next thing because what

a dictionary reader does is it still

returns to me every Row from the file

one after the other but it doesn't just

give me a list of size 2 representing

each row it gives me a dictionary and it

uses as the keys in that dictionary

array time stamp and house for every Row

in the file which is just to say it

makes my code a little more readable


because instead of doing this little

trickery bracket one I can say quote

unquote bracket house with a capital H

because it's capitalized in the Google

form itself so the code now is just

minorly different but it's way more

resilient especially if I'm using Google

spreadsheets and I'm moving the columns

around or doing something like that

where the numbers might get messed up

now I can run this on hogwarts.pi again

and I get the same answers but I now

don't have to worry about where those

individual columns are

all right any questions on those

capabilities there

it's a teaser of sorts for what some of

the manipulation we'll do in pset six

all right so some final examples and

Flair to Intrigue with what you can do

with python I'm going to actually switch

over to a terminal window on my own Mac

so that I can actually use audio a

little more effectively so here's just a

terminal window on Mac OS I before class

have pre-installed some additional

python libraries that won't really work

in vs code in the cloud because they

require audio that the browser won't


necessarily support but I'm going to go

ahead and write an example here that

involves writing a speech-based program

that actually does something with speech

and I'm going to go ahead and import a

library that again I pre-installed

called python text to speech and I'm

going to go ahead and per its

documentation give myself a speech

engine by using that libraries init

function for initialize I'm then going

to use this engine's save function to do

something fun like hello world and then

I'm going to go ahead and tell this

engine to run and wait while it says

those words all right I'm going to save

this file I'm not using vs code at the

moment I'm using another popular program

that we used in cs50 back in Day called

Vim which is a command line program

that's just in this black and white

window let me go ahead now and run

python of speech.pi and

hello world

all right so it's a little computerized

but it is speech that has been

synthesized from this example let's

change it a little bit to be more

interesting let's do something like this

let's ask the user for their name like


what's your name question mark and then

let's use a little F string and say not

hello world but hello to that person's

name let me save my file run

pythonofspeech.pi enter

David hello David all right so we

pronounce my name okay might struggle

with different names depending on the

phonetics but that one seemed to be okay

let's do something else with python

using similarly just a few lines of code

uh let me go into

uh today's examples and I'm going to go

into a folder called detect whoops a

folder called faces dot Pi sorry faces

and in this folder that I've written in

advance or a few files detect dot pi

recognize.pi and two full of photos

office.jpg and toby.jpg if you're

familiar with the show here for instance

is the cast photo from the office here

so here's a photo as input suppose I

want to do something very Facebook style

where I want to analyze all of the faces

or find detect all of the faces in there

well let me go ahead and show you a

program I wrote in advance

that's not terribly long much of it is

actually comments but let's see what I'm


doing I'm importing the pillow Library

again to get access to images I'm

importing a library called face

recognition which I downloaded and

installed in advance but it does what it

says according to its documentation you

go into that library and you call a

function called load image file to load

something like office.jpg and then you

can use a line of code like this call a

function called face locations passing

the images input and you get back a list

of all of the faces in the image and

then down here a for Loop that iterates

over all of those face locations and

inside of this loop I just do a bit of

trickery I figure out the top right

bottom and left corners of those

locations and then using these lines of

code here I'm using that image library

to just draw a box essentially and the

code looks cryptic honestly I would have

to look this up to write it again but

per the documentation this just draws a

nice little box around the image so let

me go ahead and zoom out here and run

this now on office.jpg

all right

it's analyzing analyzing and you can see

in the sidebar here here's the original


and here is every face that my what 10

lines of python code found within that

file what's a face presumably the

library is looking for something maybe

without a mask that has two eyes a nose

and a mouth and some kind of arrangement

some kind of pattern so it would seem

pretty reliable at least on these fairly

easy to read faces here what if we want

to look for someone specific for

instance someone that's always getting

picked on well we could do something

like this recognize.pi which is taking

two files as input that image and the

image of one person in particular and if

you're trying to find Toby in a crowd

here I conflated the program sorry this

is the version that draws a box around

the given face here we have Toby as

identified why because that program

recognized.pi has a few more lines of

code but long story short it

additionally loads as input

toby.jpg in order to recognize that

specific face and that specific face is

a completely different photo but it

looks similar and enough to the person

that it all worked out okay let's do one

other that's a little sensitive to


microphones let me go into

um how about my listen folder here which

is available online too and let's just

run python of listen zero dot Pi I'm

going to type in like David

oh sorry no I'm gonna

[Music]

hello world

[Music]

oh no that's the wrong version okay I

look like an idiot okay hello there we

go hello to you too and if I say goodbye

I'm talking to my laptop like an idiot

okay

now it's detecting what I'm saying here

so this first version of the program is

just using some relatively simple if L

if L if and it's just asking for input

forcing it to lower case and that was my

mistake with the first example and then

I'm just checking is hello in the user's

words is how are you in the user's words

didn't see that but it's there is

goodbye in the user's words now let's do

a cooler version using a library just by

looking at the effect python of listen

one dot pi

hello world

[Music]

huh let's do version two of this that


uses a audio a speech to text Library

hello world

okay so now it's artificial intelligence

now let's do something a little more

interesting the third version of this

program that actually analyzes the words

that are said

hello world my name is David how are you

[Music]

okay so that time it not only analyzed

what I said but it plucked my name out

of it let's do two final examples this

one will generate a QR code let me go

ahead and write a program called qr.i

that very simply does this let me import

a library called OS let me import a

library called QR code let me grab an

image here that's QR code.make and let

me give you like the URL of like a

lecture video on YouTube or something

like that with this ID

let me just type this so I don't get it

wrong

okay so if I now use this URL here of a

video on YouTube making sure I haven't

made any typos I'm now going to go ahead

and do two lines of code in Python I'm

going to first save that as a file

called qr.png which is a two dimensional


barcode or QR code and indeed I'm going

to use this format and I'm going to use

the OS dot system library to open qr.png

automatically and if you'd like to take

out your phone at this point you can see

the result of my barcode that's just

been dynamically generated

hopefully from afar that will scan

[Music]

I think that's an appropriate line to

line take it for cs50 we will see you

next time

[Applause]

[Music]

foreign

[Music]

foreign

[Music]

and this is week seven the week here of

Halloween indeed special thanks to

cs50's own Valerie and her mom for

having created this very festive scenery

and all past ones as well today we pick

up where we left off last time which

recall we introduced Python and that was

our big transition from C where suddenly

things started to look new again

probably syntactically but also probably

things hopefully started to feel easier

well with that said like problem sets


certainly added some challenges and you

did some new things but hopefully you'll

be if you're going to appreciate that

with python just a lot more stuff is

easier to do you get more out of the box

with the language itself and that's

going to be so useful over the coming

weeks as we transition further to

introducing something called databases

today uh web programming next week and

the week after so that by terms end and

perhaps even for your final project you

really are building something from

scratch using all of these various tools

somehow together so before we do that

though today let's consider what we

weren't really able to do last week

which was actually create and store data

ourselves right in Python we've played

around with the CSV comma separated

values library and you've been able to

read in csvs from disk so to speak that

is from files in your programming

environment but we haven't necessarily

started saving data persisting data

ourselves and that's a huge limitation

because pretty much all of the examples

we've done thus far with a couple of

exceptions have involved my providing


input at the keyboard or even vocally

but then nothing happens to it it

disappears The Moment the program quits

because it was only being stored in

memory but today we'll start to focus

all the more on storing things on disk

that is storing things in files and

folders so that you can actually write

programs that remember what it is the

human did last time and ultimately you

can actually make mobile or web apps

that actually begin to grow and grow and

grow their data sets as might happen if

you get more and more users for instance

on a website to play then with this new

capability of being able to write files

let's go ahead and just collect some

data in fact those of you here in person

if you want to pull up this URL on your

phone or laptop that's going to lead you

to a Google form and that Google form is

going to ask you

in just a moment for really just your

favorite TV show and it's going to ask

you to categorize it according to a

genre like comedy or drama or action or

musical or something like that and this

is useful because if you've ever used a

Google form before or Microsoft's

equivalent with Office 365 it's a really


useful mechanism of just collecting data

from users and then ultimately putting

it into a spreadsheet form so this is a

screenshot of the form that those of you

here in person are tuning in on Zoom are

currently filling out it's asking only

two questions What's the title of your

favorite TV show and what are one or

more genres into which your TV show

Falls and I'll go ahead and pivot now to

the view that I'll be able to see as the

person who created this form which is

quite simply a Google spreadsheet Google

forms has this nice feature if you've

ever noticed that allows you to export

your data to a Google spreadsheet and

then from there we can actually grab the

file and download it to my own Mac or

your own PC so that we can actually play

around with the data that's come in so

in fact let me go ahead and slide over

to this the live Google spreadsheet and

you'll see probably a whole bunch of

familiar TV shows here all coming in and

if we keep scrolling and scrolling and

scrolling only 46 47 there we go up to

50 plus already if you need that URL

again here if you're just tuning in you

can go to this URL here


and in just a moment we'll have a bunch

of data with which we can start to

experiment I'll give you a moment or so

there

[Music]

all right

let me hang in there a little longer

okay we've got over 100 submissions good

good even more coming in now

and we can see them coming in live here

let me switch back to the spreadsheet

the list is growing and growing and

growing and in just a moment let me give

Carter a moment to help me export it in

real time Carter just give me a heads up

when it's reasonable for me to download

this file

all right and I'll begin to do this very

slowly so I'm going to go up to the file

menu if you've never done this before

download you can download a whole bunch

of formats one in Excel but more simply

and the one we'll start to play with

here is comma separated values so CSV

files we use this past week why are they

useful now that you've played with them

or used them in past real world like

what's the utility of a CSV file versus

something like Excel for instance

why CSV in the first place


any instincts yeah

okay so storage is compelling a simple

text file with ASCII or Unicode text is

probably pretty small I like that other

thoughts

yeah well said it's just a simple text

format but using conventions like commas

you can represent the idea of columns

using new lines backslash ends invisibly

at the end of your lines you can create

the idea of rows so it's a very simple

way of implementing what we might call a

flat file database it's a way of storing

data in a flat that is very simple file

that's just pure ASCII or Unicode text

and more compellingly I dare say is that

with a CSV file it's completely portable

something's portable in the world of

computing if it means you can use it on

a Mac or a PC running this operating

system or this other one and portability

is nice because if I were to download an

Excel file there'd be a whole bunch of

people in this room and online who

couldn't download it because they

haven't bought Microsoft Excel or

installed it or if they have a Mac they

might not or if it's a DOT numbers file

in the Mac World a PC user might not be


able to download it so a CSV is indeed

very portable so I'm going to go ahead

and download quite simply the CSV

version of this file that's going to put

it onto my own Max downloads folder and

let me go ahead here and in just a

moment let me just simplify the name

because it actually downloads it at a

pretty small a pretty large name and

give me just one moment here and you'll

see that indeed on my Mac I have a file

called

favorites.csv I shorten the name real

quick and now what I'm going to do is go

over to vs code and in vs code I'm going

to open my File Explorer and if I

minimize my window here for a moment

it's a handy feature of vs code is that

you can just drag and drop a file for

instance into your Explorer and voila

it's going to automatically upload it

for you so let me go ahead in full

screen here close my Explorer

temporarily close my terminal window and

you'll see here a CSV file favorites.csv

and the first row by convention has

whatever the columns were in Google

spreadsheets or Office 365 in Excel

online timestamp comma title comma

genres then we have timestamps which


indicates when people started submitting

looks like a couple of people were super

eager to get started an hour or two ago

and then you have the title next after a

comma but there's kind of a curiosity

after that sometimes I see the genre

like comedy comedy comedy but sometimes

it's like crime comma drama or action

comma crime comma drama and those things

are quoted and yet I didn't do any

quotes you probably didn't type any

quotes where are those quotes coming

from in the CSV file

why are they there if we infer yeah

[Music]

yeah so you have a kind of a corner case

if you will because if you're using

commas as you describe to separate your

data into what are effectively columns

well you've kind of painted yourself

into a corner if your actual data has

commas in itself so what Google has done

what Microsoft does what Apple does is

they quote any strings of text that

themselves have commas so that these are

now sort of English grammatical commas

not CSV specific commas so it's a way of

escaping your data if you will and

escaping just means to call out a symbol


in a special way so it's not

misinterpreted as something else all

right so this is all to say that we now

have all of this data with which we can

play in the form of what we'll start

calling a flat file database so suppose

I wanted to now start manipulating this

data and I want to store It ultimately

indeed in this CSV format how can I

actually start to read this data Maybe

clean it up maybe do some analytics on

it and actually figure out what's the

most popular show among those who

submitted here over the past few minutes

well let me go ahead and close this let

me go ahead then and open up for

instance just my terminal window and

let's code up a file called favorites

dot pi and let's go ahead and

iteratively start simple by just opening

up this file and printing out what's

inside of it so you might recall that we

can do this by doing something like

import CSV to give myself some CSV

reading functionality then I can go

ahead and do something like with open

the name of the file that I want to open

in read mode quote and quote R means to

read it and then I can say as file or

whatever other name for a variable to


say that I want to open this file and

essentially store some kind of reference

to it in that variable called file then

I can give myself a reader and I can say

csv.reader passing in that file is input

and this is the magic of that Library it

deals with the process of opening it

reading it and giving you back something

that you can just iterate over like with

a for loop I do want to skip the first

row and recall that I can do this next

reader is this little trick that just

says ignore the first row because the

first one special it said time stamp

title genres that's not your data that

was mine but this means now that I've

skipped that first row everything

Hereafter is going to be the title of a

show that you all like so let me do this

for Row in the reader let's go ahead and

print out the title of the show each of

you typed in how do I get at the title

of the show each of you typed in it's

somewhere inside of row row recalls a

list so what do I want to type next in

order to get at the title of the current

row

just as a quick check here

what I want to type to get at the title


of the row keeping in mind again that it

was timestamp

title genres yeah

so Row Bracket one would give me the

second column zero index that is the one

in the Middle with the title so this

program isn't that interesting yet but

it's a quick and dirty way to figure out

all right what's my day to look like let

me actually just do a little bit of a

check here and see if it contains the

data I think it does let me maximize my

terminal window here let me run python

of favorites.pi hitting enter and you'll

see now a purely textual list of all of

the shows you all seem to like here

but what's no worthy about it specific

shows aside judgment aside as to

people's TV tastes like what's

interesting or noteworthy about the data

that might create some problems for us

if we start to analyze this data and

figure out what's the most popular how

many people like this or that

[Music]

what do you think yeah

[Music]

yeah there might be user errors or just

sort of stylistic differences that give

the appearance that one show is


different from the other for instance

here let's see if I can see an example

on the screen here yeah so friends here

is an all lower case friends here is

capitalized no no big deal we can sort

of mitigate that but this is just a tiny

example of where data in the real world

can get messy fast and that probably

wasn't even a typo it was just sort of

just someone not carrying as much to

capitalize it and that's fine your users

are going to type what they're going to

type so let's see if we can't now begin

to get at more specific data and maybe

even clean some of this data up let me

go back into my uh file called

favorites.pa here and let's actually do

something a little more user friendly

for me instead of a reader recall that

there was this dictionary reader that's

just a little more user friendly and it

means I can type in dictionary reader

here passing in the same file but now

when I iterate over this reader variable

what is each row when using a dict

reader instead of a reader recall and

this is just a particular peculiarity of

the CSV Library this gives me back not a

list of cells but what instead


which is marginally more user-friendly

for me yeah

yeah I can now use Open Bracket quotes

and the title because what's coming back

now is a dict object that is a

dictionary which has keys and values the

keys of which are the column headings

the values of which are the data I

actually care about so this is just

marginally better because one it's just

way more obvious to me the author of

this code what it is I'm getting at I

mean I don't remember what column the

title was was it zero was it one was it

two that's something you're going to

forget over time and God forbid someone

changes the data by just dragging and

dropping the columns in Excel or apple

numbers or Google spreadsheets that's

going to break all of your numeric

indices and so a dictionary reader is

arguably just better design because it's

more robust against changes and

potential errors like that now the

effect of this program of this change

isn't going to be really any different

if I run python of favorites.pi voila I

get all of the same results but I've now

not made any assumptions as to where

each of the columns actually is


numerically all right well let's go

ahead and now filter out some duplicates

because there's a lot of commonality I'm

on some of the shows here so let's see

if we can't filter out duplicates

if I'm reading a CSV file top to bottom

what intuitively might be the like logic

I want to implement to filter out

duplicates

it's not going to be quite as simple as

a simple function that does it for me

I'm going to have to build this

but logically if you're reading a file

from top to bottom how might you go

about in python or just any context

getting rid of duplicate values

yeah what do you think

[Music]

sure I could use a list and I could add

each title to the list but first check

if I've put this into the list before so

let's try a little something like that

let me go ahead and create a a variable

at the top of my program here I'll call

it titles for instance initialize to an

empty list Open Bracket close bracket

and then inside of my Loop here instead

of printing it out let's start to make a

decision so if the current rows title uh


uh is in the titles array I don't want

to the title is list I don't want to put

it there and actually let me invert the

logic so I'm doing something proactively

so if it's not the case that row dot Row

Bracket title is in titles then go ahead

and do something like titles dot uh

append the current rows title and recall

that we saw dot append uh a week or so

ago where it just allows you to append

to the current list and then what can I

do at the very end after I'm all done

reading the whole file why don't I go

ahead and say for title in titles go

ahead and print out the current title so

it's two Loops now and we can come back

to the quality of that design but let me

go ahead here and rerun python of

favorites.pi let me increase the size of

my terminal window so we can focus just

on this and hit enter and now

[Music]

I'm just skimming

I don't think I'm seeing duplicates

although I am seeing some near

duplicates

for instance there's friends again and

if we keep going and going and going and

going

there's forensic yeah oh interesting so


that's curious

that I seem to have multiple friends and

I have this one here too so how might we

clean this up further I like your

instincts and that's it's a step closer

to it what are we gonna have to do to

really filter out those near duplicates

any thoughts

aware

[Music]

yeah what are the common mistakes to

summarize we could ignore the

capitalization altogether and maybe just

force everything to lower case or

everything to uppercase doesn't matter

which but let's just be consistent and

for those of you who might have

accidentally or instinctively hit like

the space bar at the beginning of your

input or even at the end we can strip

that off too stripping white space is a

common thing just to clean up user input

so let me go back into my code here and

let me go ahead and tweak the title a

little bit let me say that the current

title inside of this Loop is not going

to be just the current rows title but

let me go ahead and strip off from the

left and the right implicitly any white


space if you read the documentation for

the strip function it does just that it

gets rid of white space to the left

white space to the right and then if I

want to force everything to maybe

uppercase I can just uppercase the

entire string and remember what's handy

about python is you can chain some of

these function calls together by just

using dots again and again and that just

takes whatever just happened like the

white space got stripped off then it

additionally up cases the whole thing as

well so now I'm going to check whether

this specific title is in titles and if

not I'm going to go ahead and append

that title massaged into this different

format if you will so I'm throwing away

some information like I'm sacrificing

all of the nuances of your uh grammar

and input to the form itself but at

least I'm trying to canonicalize that is

standardize what the data actually looks

like so let me go ahead and run python a

favorite stop Pi again and hit enter oh

and this is just user error maybe you

haven't seen this before this just looks

like a

mistake on my part I meant to say not

even uppercase that's completely wrong


the function is called upper now that I

think of it all right let's go and

increase the size of the terminal window

again run python of favorites.pi and now

you know it's a little more overwhelming

to look at because it's not sorted yet

and it's all capitalized but

I don't think I'm seeing multiple

friends so to speak there's one friends

up here and that's it I'm back up at my

prompt already so we seem now to be

filtering out duplicates now before we

dive in further and clean this up uh

further than this what else could we

have done well it turns out that in

Python 2 you often do get a lot of

functionality built into the language

and I'm kind of implementing myself the

idea of a set if you think back to

mathematics a set is typically something

with a bunch of values that has

duplicates filtered out recall that

python already has this for us and we

saw it really briefly when I whipped up

the dictionary implementation a couple

of weeks back so I could actually Define

my titles to be a set instead of a list

and this would just modestly allow me to

refine my code here such that I don't


have to bother checking for duplicates

anyway I can instead just say something

like titles dot add the current uh title

like this you know marginally better

design if you know that a set exists or

you're just getting more functionality

out of this alright so let's clean the

data up further we've now gone ahead and

fixed the problem of case sensitivity we

threw away white space in case someone

hit the space bar with some of the input

let's go ahead now and sort these things

by the titles themselves so instead of

just printing out the titles in the same

order you all inputted them but in

filtering out duplicates as we go let me

go ahead and use another function in

Python you might not have seen which is

literally called sorted and will take

care of the process of actually sorting

titles for you let me go ahead and

increase the font size of my terminal

run python of favorites.pi and hit enter

and now you can really see how many of

these shows start with the word the or

do not now it's a little easier to wrap

our minds around just because it's at

least sorted alphabetically but now you

can really see some of the differences

in people's inputs so far so good but a


few of you decided to stylize Avatar in

three different ways here Brooklyn 99s a

couple of different ways here and I

think if we keep going we'll see further

and further variances that we did not

fix by focusing on white space and

capitalization alone so already here

this is only what 100 plus 200 rows

already real world data starts to get

messy quickly and that might not bode

well when we actually want to keep

around real data from real users you can

imagine an actual website or a mobile

application dealing with this kind of

thing on scale well let's go ahead and

do this let's actually figure out the

popularity of these various shows by now

iterating over my data and keeping track

of how many of you inputted a given

title we're going to ignore the problems

like Brooklyn Nine-Nine and uh the

Avatar sorry uh yeah uh Avatar where

there was things that were different

Beyond just white space and

capitalization but let's go ahead and

keep track of now how many of you

inputted each of these titles

so how can I do this I'm still going to

take this approach of iterating over the


CSV file from top to bottom

we've used a couple of data structures

thus far a list to keep track of titles

or a set to keep track of titles but

what if I now want to keep around a

little more information for each title I

want to keep around how many times I've

seen it before

I'm not doing that yet I'm throwing away

the total number of times I see these

shows how could I start to keep that

around

we could use a dictionary and how what

elaborate on that

perfect really good instincts using a

dictionary insofar as it lets us store

keys and values that is associate

something with something else this is

why a dictionary or hash tables more

generally are such a useful practical

data structure because they just let you

remember stuff in some kind of

structured way so if the keys are going

to be the titles I've seen the values

could be the number of times I've seen

each of those titles and so it's kind of

like just having a two column a two

column table on paper for instance if I

were going to do this on a piece of

paper I might just have a two columns


here where maybe this is the title that

I've seen and this is the count over

here this is in effect a dictionary in

Python it's two columns keys on the left

values on the right and this if I can

Implement in code will actually allow me

to store this data and then maybe do

some simple arithmetic to figure out

which is the most popular so let's do

this let me go ahead and change my

titles to not be a list not be a set

let's have it be a dictionary instead

either doing this or more succinctly two

curly braces that are empty gives me an

empty dictionary automatically what do I

now want to do I think most of my code

can stay the same but down here I don't

want to just blindly add titles to the

data structure I somehow need to keep

track of the count

and unfortunately if I just do this

let's do titles bracket title

plus equals one

this is a reasonable first attempt at

this because what am I doing if titles

is a dictionary and I want to look up

the current title therein you the Syntax

for that like before is titles bracket

and then the key you want to use to


index into the dictionary it's not a

number in this case it's an actual word

a title and you're just gonna increment

it by one and then eventually I'll come

back and finish my second Loop and do

things uh in terms of the order but for

now let's just keep track of the total

counts let me go ahead and increase my

terminal window let me do python of

favorites.pi and hit enter and huh How I

Met Your Mother is giving me a key error

what does that mean

and why am I seeing this and in fact

just just to give a little bit of a

breadcrumb here let me zoom out here let

me open up the CSV file again real

quickly and wow we didn't even get past

the second row in the file or the first

show in the file notice that How I Met

Your Mother somewhat lowercased is the

very first show in therein what's your

instinct for why this is happening

I don't have a starting point right I'm

adding one to what like I'm blindly

indexing into the dictionary using a key

How I Met Your Mother that doesn't yet

exist in the dictionary and so python

throws what's called a key error because

the key you're trying to use just

doesn't exist yet so logically how could


we fix this

we're close we got like half of the

problem solved but I'm not handling the

obvious now case of nothing being there

yeah

creating a

creating the counter itself so maybe I

could do something like this let me

close my terminal window and let me ask

a question first if the current title is

in the dictionary already

if title in titles that's going to give

me a true false answer it turns out then

I can safely say titles bracket title

plus equals one and recall this is just

shorthand notation for the same thing as

in C title plus one whoops typo don't do

that

that's the same thing as this but it's a

little more succinct just to say plus

equals one else if it's logically not

the case that the current title is in

the title's dictionary then I probably

want to say titles bracket title equals

feel free to just shout it out

zero I just have to put some value there

so that the key itself is also there all

right so now that I've got this going on

let me go ahead and undo my sorting


temporarily and now let me go ahead and

do this I can as a quick check let me go

ahead and just run the code AS is python

of favorites.pi I'm back in business

it's printing correctly no key errors

but it's not sorted and I'm not seeing

any of the counts let me just quickly

add the counts and there's a couple of

ways I could do this I could say print

out the title and then maybe let's do

something like uh how about just comma

titles bracket title so I'm going to

print two things at once both the

current title in the dictionary and

whatever its value is by indexing into

it let me increase my terminal window

let me run python of favorites.pi enter

and okay huh

none of you said a whole lot of TV shows

it seems what's the logical error here

what did I do wrong if I look back at my

code here

[Music]

yeah why so many zeros

[Music]

exactly to summarize I initialized the

count to zero the first time I saw it

but I should have initialized it at

least to one because I just saw it or I

should change my code a bit so for


instance if I go back in here the

simplest fix is probably to initialize

to one because on this iteration of the

loop obviously I'm seeing this title for

the very first time or I could change my

logic a little bit I could do something

like this instead if the current title

is not in titles then I could initialize

it to zero and then I could get rid of

the else and now blindly index into the

titles dictionary because now on line 11

I can trust that lines 9 and 10 took

care of the initialization for me if

need be which one is better I don't know

this one's a little nicer maybe because

it's one line fewer but I think both

approaches are perfectly reasonable and

well designed but the key thing no pun

intended is that we have to make sure

the key exists before we presume to

actually increment oh this is wrong

don't this is this is incorrect out what

did I do wrong

okay yes there we go so otherwise

everyone would have liked this show once

and no matter how many people said the

same thing now the code is as it should

be so let me go ahead and open up my

terminal window again let me run python


of favorites.pi and now we see more

reasonable counts some of the shows

weren't that popular there's just ones

and maybe twos but I bet if we sort

these things we can start to see a

little more uh detail so how else can we

do this Well turns out when dealing with

um when dealing with a dictionary like

this let's go ahead and just sort the

titles themselves so let's reintroduce

the sorted function as I did before but

no other changes let me go ahead now and

run python of favorites.pi now it's just

a little easier to wrap your mind around

it because at least it's alphabetical

but it's not sorted by value it's sorted

by key but sure enough if we scroll down

there's something down here for instance

like uh let's see the office that's

definitely going to be a contender for

most popular 15 responses but let's see

what's actually going to Bubble up to

the top unfortunately the sorted

function only sorts dictionaries by keys

by default not by values but it turns

out in Python if you read the

documentation for the sorted function

you can actually pass in other arguments

that tell you tell it how to sort things

for instance if I want to do things in


reverse order I can add a second

parameter to the sorted function called

reverse and it's a named parameter you

literally say reverse equals true so

that the position of it in the comma

separated list doesn't matter if I now

rerun this after increasing my terminal

window you'll see now that it's in the

opposite order now adventure and and

with an E is at the bottom of the output

instead of the top

how can I tell it to sort by a different

part of the of the of a different um by

values instead of by key well let's go

ahead and do this let me go ahead and

Define a function I'm just going to call

it f to keep things simple and this F

function is going to take a title as

input and given a given title it's going

to return the value of that title so

actually maybe a better name for this

would be get value and or we could come

up with something else as well the

purpose of the get value function to be

clear is to take as input a title and

then return the corresponding value why

is this useful well it turns out that

the sorted function in Python according

to its documentation also takes a key


parameter where you can pass in crazy

enough the name of a function that it

will use in order to determine what it

should sort by by the key or by the

value or in other cases even other types

of data as well so there's a curiosity

here though that's very deliberate key

is the name of the parameter just like

reverse was the name of this other

parameter the value of it though is not

a function call it's a function name

notice I am not doing this no

parentheses I'm instead passing in get

value the function I wrote by its name

and this is a feature of python in

certain other languages just like

variables you can actually pass whole

functions around so that they can be

called for you later on by someone else

so what this means is that the sorted

function written by python they didn't

know what you're going to want to sort

by today but if you provide them with a

function called get value or anything

else now their sorted function will use

that function to determine okay if you

don't want to sort by the key of the

dictionary what do you want to sort by

this is going to tell it to sort by the

value by returning the specific value we


care about so let me go ahead now and

rerun this after increasing my terminal

python a favorites.pi enter here we have

now an example of all of the titles you

all typed in albeit Force to capitalize

Force to uppercase and with any white

space is thrown out and now the office

is an easy win over friends versus

Community versus Game of Thrones

Breaking Bad and then a lot of variants

thereafter so there's a lot of steps to

go through like this you know isn't that

bad once you've done it once and you

know what these functions are and you

know that these parameters exist but

it's a lot of work I mean that's 17

lines of code just to analyze a CSV file

that you all created by way of those

Google form submissions but it took me a

lot of work just to get simple answers

out of it and indeed that's going to be

among the goals for today ultimately is

how can we just make this easier it's

one thing to learn new things in Python

but if we can avoid writing code or this

much code that's going to be a good

thing and so one other Technique we can

introduce here that does allow us to

write a little less code is we can


actually get rid of this function it

turns out in Python if you just need to

make a function but it's going to be

used and then essentially thrown away

it's not something you're going to be

reusing in multiple places it's not like

a library function that you want to keep

around you can actually just do this you

can change the value of this key

parameter to be what's called a Lambda

function which is a fancy way of saying

a function that technically has no name

it's an anonymous function why does it

have no name well it's kind of stupid

that I invented this name on line 13. I

used it on line 16 and then I never

again used it right if there's only

being used in one place why bother

giving it a name at all so if you

instead in Python say Lambda and then

type out the name of the parameter you

want this Anonymous function to take you

can then say go ahead and return this

value now it's notice the

inconsistencies here when you use this

special Lambda keyword that says hey

python give me an anonymous function a

function with no name it then says

python this Anonymous function will take

one parameter notice there's no


parentheses and that's deliberate if

confusing it just tightens things up a

little bit

notice that there's no return keyword

which similarly tightens things up a bit

albeit inconsistently but this line of

code I've just highlighted is actually

identical in functionality to this but

it throws away the word def it throws

away the word get value it throws away

the parentheses and it throws away the

return keyword just to tighten things up

and it's well suited for a problem like

this where I just want to pass in a tiny

little function that does something

useful but it's not something I'm going

to reuse it doesn't need multiple lines

to take up space it's just a nice

elegant one-liner that's all the Lambda

function does it allows you to create an

anonymous function right then and there

and then the function you're passing it

to like sorted will use it as before if

indeed if I run python a favorite stop

Pi after growing my terminal window the

result is exactly the same and we see at

the bottom here all of those small

results

are any questions then


on this syntax on these ideas the goal

here has been to write a Python program

that just starts to analyze or clean up

data like this

[Music]

could you use the Lambda if it's just

returning immediately it's really meant

for one line of code generally so you

don't use the return keyword you just

say what it is you want to return

[Music]

good question could you do more in that

one line if it's got to be a more

involved algorithm yes but you would

just ultimately return the value in

question in short if it's getting it all

sophisticated you don't use the Lambda

function in Python you go ahead and

actually just Define a name for it even

if it's a one-off name JavaScript

another language we'll look at in a few

weeks makes heavier use I dare say of

Lambda functions and those that can

actually be multiple multiple lines but

python does not support that that

instinct

all right so let's go ahead and do one

other thing office was clearly popping

out of the code here quite a bit let's

go ahead and write a slightly different


program that maybe just focuses on the

office for the moments just focuses on

the office so let me go ahead and throw

most of this code away up until this

point when I'm inside of my inner loop

and let me go ahead and I don't even

want the global variable here all I want

to do is focus on the current title how

could I detect if someone likes the

office well I could say something like

uh how about this so counter equals zero

we'll just focus on the office if title

equals equals the office I could then go

ahead and say counter

uh plus equals one I don't need a key

there's no dictionary involved now it's

just a simple integer variable and then

down here I'll say something like uh uh

number of people who like the office is

whatever this value is and I'll put in

counter and curly braces and then I'll

turn this whole thing into an F string

all right let me go ahead and run this

python a favorites.pi enter number of

people who like the office is 15. all

right so that's great but let's go ahead

now and deliberately muddy the data a

bit all of you were very nice in that

you typed in the office but you could


imagine someone just typing office for

instance maybe there maybe there and

many people might just write office you

could imagine didn't happen here but

suppose it did and probably would have

if we had even more and more submissions

over time now let's go ahead and rerun

this program no changes to the code now

only 13 people like the office so let's

fix this the data is now as I mutated it

to have a couple of offices and many the

offices how can I change my python code

to now count both of those situations

what could I change up here

in order to improve this situation any

thoughts

yeah so I could just ask two questions

like that if title equals the office or

title equals equals uh just office for

instance and I'm still don't have to

worry about capitalization I don't have

to worry about spaces because I at least

threw that all away now I can go ahead

and rerun this code let me go over it

run it a third time okay so we're back

up to 15. so I like that

um but this could you can imagine this

not scaling very well right like Avatar

had three different permutations and

there were some others if we dug deeper


that there might have been more variants

could we do something a little more

general purpose well we could do

something like this if uh office in the

title this is kind of a cool thing you

can do with python it's very

english-like just ask the question

albeit tersely this interesting just got

me into trouble now all of a sudden

we're up to 16. does anyone know what

the other one is

foreign

what office

[Music]

oh interesting yes so they hit V and

okay

okay someone did that sure so DB office

um we so okay this one's actually gonna

be hard to correct for like I can't

really think of a general well

this is I mean this is actually a good

example of like data gets messy fast and

you could imagine doing something where

okay we could have like 26 conditions if

someone said the a office or the B

office or right you could imagine doing

that but then there's surely going to be

other typos that are possible so that's

actually a hard one to fix but in it


turns out we got lucky and now this is

actually the accurate count

um but the data is itself messy let me

show another way that just has another

tool to our toolkit it turns out that

there's this feature in many programming

languages python among them called

regular expressions and this is actually

a really powerful technique that will

just scratch the surface of here but

it's going to be really useful actually

maybe toward final projects in web

programming anytime you want to clean up

data or validate data and actually just

to make this clear give me a moment

before I switch screens here and let me

open up a Google form from scratch give

me just a moment to create something

real quick

if you've never noticed this before when

creating a Google form you can do like a

a question and if you want the user to

type in something very specific as a

short text answer like this you might

know that there's toggles like this in

Google's world like you can require it

or you can do response validation like

you could say what's your email

and then you could say something like

text
is an email so here's an example in

Google

forms how you can validate users input

but a feature most of you have probably

never noticed or cared about or used is

this thing called a regular expression

where you can actually Define a pattern

and I could actually re-implement that

same idea by doing something like this I

can say let the user type in anything

represented by dot star then an at sign

then something else then at a literal

period Then for instance something else

so it's very cryptic admittedly at first

glance but this means any character Zero

more times this means any character Zero

more times this means a literal period

because apparently dot means any

character in the context of these

patterns then this thing means any

character Zero more times so I should

actually be a little more nitpicky you

don't want zero or more times you want

one or more times so this with a plus

means any character one or more time so

there has to be be something there and I

think I want the same thing here one or

more times one or more times or Heck if

I want to restrict this form in some


sense to edu addresses I could change

that last thing to literally.edu and so

long story short even though this looks

I'm sure pretty cryptic there's this

sort of mini language built into Python

and JavaScript and Java and other

languages that allows you to express

patterns in a standardized way and this

pattern is actually something we can

Implement in code two and let me switch

back to python for a second just to do

the same kind of idea let me um so I'll

go back to my code here let me put up

for instance a summary of what it is you

can do and here's just a quick summary

of all of the available

uh some of the available symbols a

period represents any character dot star

or dot asterisk means zero or more

characters so the dot means anything so

it can be a or nothing it can be b or

nothing it can be a b a b c it can be

any combination of zero or more

characters change that to a plus and you

now Express one or more characters

question mark means something is

optional uh carrot symbol means start

matching at the beginning of the user's

input dollar sign means start matching

at the end of the user or stop matching


at the end of the user's input so we

won't play with all of these just now

but let me go over here and actually

tackle this office problem let me go

ahead and import a new library called

the regular expression Library import re

and then down here let me say this if r

e dot search

this pattern uh let's just search for

office quote unquote in the current

title then we're going to go ahead and

increase increase the counter so it

turns out that the regular expression

library has a function called search

that takes us its first argument a

pattern and then as its second argument

the string you want to analyze for that

pattern so it's sort of look for a

needle in this Haystack from left to

right let me go ahead now and run this

version of the program enter and now I

screwed up because I forgot my colon but

that's old stuff enter

huh number of people who like the office

is now zero so this seems like a big

thank you big step backwards

what did I do wrong

yeah

yeah so my I forced all my input to


uppercase so I probably need to do this

so we'll come back to other approaches

there let me rerun it now okay now we're

back up to 16. but I could even let's

say I could tolerate just the office how

about this or how about something like

or the office let me do this instead and

let me use these other special

characters this carrot sign means the

beginning of the string this dollar sign

weirdly represents the end of the string

I'm adding in some parentheses just like

in math just to add another symbol here

the or symbol here and this is saying

start matching at the beginning of the

user string check if the beginning of

the string is office or the beginning of

the string is the office and then you

better be at the end of the string so

they can't keep typing words before or

after that input let me go ahead and

rerun the program and now we're down to

15 which used to be our correct answer

but then we notice the all the V office

yes how can we deal with that it's going

to be Messier to deal with that but I

could how about if I tolerate any

character represented by dot in between

the an office now if I rerun it now I

really have this expressive capability


so this is only to say there are so many

ways in languages in general to solve

problems and some of these tools are

more sophisticated than others this is

one that you've actually probably

glanced at but never used in the context

of Google forms for years if you're in

the habit of creating these for student

groups or other activities but it's now

something you can start to leverage and

we're just scratching the surface of

what's actually possible with this but

let's now do one final example just

using some python code here and let's

actually write a program that's a little

more general purpose that allows me to

search for any given title and figure

out its popularity so let me go ahead

and simplify this let's get rid of our

regular expressions

let's go ahead and continue capitalizing

the title and let's go ahead to at the

beginning of this program and first ask

the user for the title they want to

search for so title equals let's ask the

user for input which is essentially the

same thing as our cs50 getstring

function ask them for the title and then

whatever they type in let's go ahead and


strip white space and uppercase the

thing again and now inside of my loop I

could say something like this uh if the

current rows title after stripping white

space and forcing it to uppercase 2

equals the user's title then go ahead

and maybe increment a counter so I still

need that counter back so let me go

ahead and Define this maybe in here

counter equals zero and then at the very

end of this program let me go ahead and

print out just the popularity of

whatever the human typed in so again the

only difference is I'm asking the human

for some input this time I'm

initializing my counter to zero then I'm

searching for their title in the CSV

file by doing the same massaging of the

data by forcing it to uppercase and

getting rid of the white space so now

when I run python of favorites.pi enter

I could type in the office all lowercase

even and now we're down to 13.

[Music]

13.

oh that's correct because I'm the one

that went in and removed those the

keywords a bit ago if we fixed those we

would be back up to 15 if we hadn't did


support for the V office we would be up

to 16 as well all right any questions

then on these various manipulations and

if you're feeling like oh my God this is

so much python code just to do simple

things that's the point and indeed even

though it's a powerful language and can

solve these kinds of problems we had to

write almost 20 lines of code just to

ask a single question like this but any

questions on how we did this

or on any of these building blocks along

the way

anything here

no all right that was a lot let's take a

five minute break here when we come back

we'll do it better

so we are back and the rest of today is

ultimately about how can we store and

manipulate and change and retrieve data

more efficiently than we might by just

writing raw code this isn't to say that

you shouldn't use Python to do the kinds

of things that we just did and in fact

it might be super common if you're

getting a lot of like messy input from

users that you might want to clean it up

and maybe the best way to do that is to

write a program so that step by step you


can make all of the requisite changes

and fixes like we did with the office

for instance again and again and reuse

that code especially if more and more

submissions are coming through but

another theme of today ultimately is

that sometimes there are different if

not better tools for the same job and in

fact now at this point in the term as we

begin to introduce not just python but

in a moment a language called SQL and

next week a language called JavaScript

and the week after that synthesizing a

whole lot of these languages together is

to just kind of paint a picture of like

how you might decide what the trade-offs

are between using this tool or this tool

or this other tool because on

undoubtedly you can solve problems

moving forward in many different ways

with many different tools so let's give

you another tool one with which you can

Implement a proper relational database

what we just saw in the form of CSV

files are what we might call flat file

databases again just a very simple file

flat in that there's no hierarchy to it

it's just like rows and columns and that

is all ultimately storing ASCII or

Unicode text a relational database


though is something that's actually

closer to a proper spreadsheet program

right like a CSV is an individual sheet

if you will from a spreadsheet when you

export it if you had multiple sheets in

a spreadsheet you would have to export

multiple csvs and that gets annoying

quickly in code if you have to open up

this CSV this CSV all of which represent

different sheets or tabs in a proper

spreadsheet a relational database is

more like a spreadsheet program that you

a programmer now can interact with you

can write data to it you can read data

from it and you can have multiple sheets

AKA tables storing all of your data so

whereas Excel and numbers in Google

spreadsheet are meant to be reused

really by humans with their Mouse and

their keyboard clicking and pointing and

manipulating things graphically a

relational database using a language

called SQL is one in which the

programmer has similar capabilities but

doing so in code specifically using a

language called SQL and at a scale

that's much grander than spreadsheets

alone in fact if you try on your Mac or

PC to open a spreadsheet that's got tens


of thousands of rows it'll probably work

fine hundreds of thousands of rows

millions of rows no way like at some

point your Mac or PC is going to

struggle to open particularly large data

sets and that too is where proper

databases come into play and proper

languages for databases come into play

when it's all about scale and indeed

most any mobile app or web app today

that you or someone else might write

should probably plan on lots of data if

it's successful so we need the right

tools for that problem so fortunately

even though we're are about to learn yet

another language it only does four

things fundamentally known by this silly

acronym crud

uh sequel this language for databases

supports the ability to create data read

data update data and delete data like

that's it there's a few more keywords

that exist in this language called SQL

that we'll soon see but at the end of

the day even if you're starting to feel

like this is a lot very quickly it all

boils down to these four basic

operations and the four commands in SQL

if you will functions in a sense that

Implement those four ideas happen to be


these they're almost the same but with

some slight variance the ability to

create or insert data is the C the

ability to select data is the r or read

update is the same delete is the same

but drop is also a keyword as well so

we'll see these and a few other keywords

in SQL that at the end of the day just

allow you to create read and update data

using verbs if you will like these so so

to do that what's the syntax going to be

well we won't get into the weeds too

quickly on this but here's a

representative syntax of how you can

create using this language called SQL in

your very own database a brand new table

right this is so easy in Excel in Google

spreadsheets and Apple Numbers you want

a new sheet you like click the plus

button you get a new tab you give it a

name and boom you're done in the world

of programming though if you want to

create the analog of that spreadsheet in

the computer's memory you create

something called a table like a sheet

that has a name and then in parentheses

has one or more columns but unlike

Google spreadsheets and Apple Numbers in

Excel you have to decide as the


programmer what types of data you're

going to be storing in each of these

columns now even though Excel and Google

spreadsheets and numbers does allow you

to format or present data in different

ways it's not strongly typed data like

it is for instance when we were using C

and heck even in Python there's

underlying data types even if you don't

have to type them explicitly databases

you're going to want to know are you

storing integer years are you storing

real numbers or floats are you storing

text why because especially as your data

scales the more hints you give the

database about your data the more

performant it can be the faster it can

help you get at and store that data so

types are about to be important again

but there's not going to be that many of

them fortunately now how can I go about

converting for instance some real data

like that from you my favorites.csv file

into a proper relational database well

it turns out that using SQL I can do

this in vs code on my own Mac or PC or

in the cloud here by just importing the

CSV into a database we'll see eventually

how to do this manually for now I'm

going to use more of an automated


process so let me go over to vs code

here let me type LS to see where we left

off before I had two files favorites.csv

which I downloaded from Google

spreadsheets recall that I made a couple

changes we deleted a couple of those

from the file for the office but this is

the same file as before and then we have

favorites.pi which we'll set aside for

now I'm going to go ahead now and run a

command SQL Lite 3 so in the world of

relational databases there's many

different products out there many

different software that implements the

SQL language Microsoft has their own

there's something called MySQL that was

that's been very popular for years

Facebook for instance used it early on

uh postgres SQL Microsoft Access server

Oracle and maybe another a whole bunch

of other product names you might have

encountered over time which is to say

there's many different types of tools

and servers and software in which you

can use SQL we're going to use a very

lightweight version of the SQL language

today called SQL Lite this is the

version of SQL that's generally used on

iPhones and Android devices these days


if you download an app that stores data

like your own contacts typically is

stored using SQL Lite because it's

fairly lightweight but you can still

store hundreds thousands even tens of

thousands of pieces of data even using

this lightweight version thereof sqlite3

is like version three of this tool we're

gonna go ahead and run SQL Lite 3 with a

5 file called favorites.db it's

conventional in the world of SQL Lite to

name your file something.db I'm going to

create a database called favorites.db

once I'm inside of the program now I'm

going to go ahead and enter CSV mode

again not something you have to memorize

just something you can look up as needed

and then I'm going to import

favorites.csv into a table that is a

sheet if you will called favorites as

well now I'm going to hit enter and I'm

going to go ahead and exit the program

altogether and type LS now I have three

files in my current directory the CSV

file the python file from before and now

favorites.db but if I did this right all

of the data you all typed into the CSV

file has now been loaded into a proper

database where I can now use this SQL

language to access it instead so let's


go ahead and again and run sqlite3 of

favorites.db which now exists and now at

the SQL Lite prompt I can start to play

around and see what this data is for

instance I can look by typing dot schema

and what the schema is of my data what's

the design now no thought was put into

the design of this data at the moment

because I automated the whole process

once we start creating our own databases

we'll give more thought to the data

types and the columns that we have but

we can see what SQL Lite presumed I

wanted just by importing the data by

default what the import command did for

me a moment ago is essentially the

syntax it automated the process of

creating a table if it doesn't exist

called favorites and then notice in

parentheses it gave me three columns

timestamp title and genres which were

inferred obviously from the CSV all

three of which have been decreed to be

text again once we're more comfortable

we'll create our own tables choose our

own types and column names but for now I

just automated the whole process just to

get us started by using this built-in

import command as well


all right so what now can I begin to do

well if I wanted to for instance start

playing around with data therein I might

execute a couple of different commands

um one of which

let me find the right one here one of

which would be

select select being one of our most

versatile tools to select data from this

database so if I have these three

columns here timestamp title and genres

suppose I want to select all of the

titles doing that earlier in Python

required importing the CSV Library

opening the file creating a reader or a

dict reader iterating over every row

adding every title to a dictionary or

just printing it out and dot dot right

there's like a dozen or so lines of code

when we first began now how about this

Select Title from favorites semicolon

done

so now with this particular language the

output is very textual and it's

simulating what it looks like if it were

more graphical by creating this table so

to speak Select Title from favorites is

a distillation in a different language

called SQL of all the lines of code I

wrote early on when we first started


playing with favorites.pi SQL is

therefore optimized for reading and

creating and updating and ultimately

deleting data so here's a perhaps a

better tool for the job once you have

the data tossing it into a more powerful

versatile format might allow you now to

get more work done more quickly without

having to reinvent the wheel someone

else has figured out how to select data

like this what more can I do here well

let me go ahead and pull up in a moment

just a little bit of a cheat sheet here

give me one second to find this

[Music]

so suppose I want to now select data a

little more powerfully so here's what I

just did in a canonical way so select

typically works like this you select

columns from a specific table semicolon

unfortunately stupid semicolons are back

select columns from table then is the

sort of generic form of what I just did

more specifically I selected one column

called title from favorites favorites is

the name of the table semicolon ends my

thoughts suppose I wanted to get two

things like the genres that each of you

inputted I could instead do Select Title


comma genres from favorites and then a

semicolon and enter it's going to look a

little ugly on my screen because some of

these titles and some okay one of you

really went all out with Community

you can see that it's just wrapping in

an ugly way but it's just now showing me

two columns if we scroll up to the very

top again the leftmost of one Black

Mirror went all out too thank you and

now okay we're gonna have to clean some

of these up Game of Thrones good comedy

yes

keep going keep going keep going so now

we've selected two of The Columns that

we care about there it is okay so it's

crazy wide because of all of those

genres but it allows me to select

exactly the data I want let's go back to

the titles though and perhaps start

playing around with some modifiers here

for instance it turns out using SQL

there's a lot of functionality built

into the language you've got a lot of

functions similar to Excel or Google

spreadsheets we can have formulas SQL

provides you with some of the same

heuristics that allow you to apply

operations like these on entire columns

for instance you can take averages count


the total get the distinct values Force

things to lower case uppercase Min and

Max and so forth so let's try distinct

for instance let me go back to my

terminal and let's say select how about

the distinct titles from the favorites

table enter I didn't bother selecting

the genres because I want it to be a

little prettier and you can see here

that we have just the distinct titles

except for issues of formatting right so

white space is going to be an issue

again capitalization is going to be a

thing again so there's a trade-off I

mean one of the things I was doing in

Python was forcing everything to

uppercase and then getting rid of white

space but we could combine some of these

I could do something like force every

title to uppercase then get the distinct

value and that's actually going to get

rid of some of those values as well and

again I did it all in one simple line

that was fast so let me pull up at the

bottom of the screen again I selected

distinct upper titles from favorites and

that did everything for me at once in

just one breath suppose I want to get

the total number of counts of titles how


about select uh count of all of those

titles from uh favorites semicolon enter

and now you get back sort of mini table

that contains just your answer 158 in

this case so that's the total number of

not distinct but total titles that we

had in the file and we could continue to

manipulate the the data further using

again functions like these here but

there's also additional filtration we

can do we can also qualify our

selections by saying where some

condition is true so just as in scratch

and C in Python you have Boolean

Expressions you can have the same in SQL

as well where I can filter my data where

something is true or false like allows

me to do approximations if I want to get

something that's like the office but not

necessarily th e space office I could do

pattern matching using like here order

by limit and group by or other commands

I can execute too so let me go back and

do a couple of these here how about let

me just get uh oh I don't know all of

the titles from favorites but limit it

to 10 results that might be one thing

that's helpful to see if you just care

about some of the data at the top there

instead how about select all of the


titles from favorites where the title

itself is like

quote unquote office and this will give

me only two answers those are the two

rows recalled that I mutated by getting

rid of the word the notice that like

allows me to tolerate uppercase and

lowercase because if I instead just use

the equal sign and in SQL in single

equal sign does in fact mean equality

it's not uh for comparison's sake it's

not doing assignment this is not how you

assign data in SQL I got back no answers

there so indeed the equal sign is giving

me literal answers that searches just

for what I typed in how could I get all

of these well similar in spirit to

regular Expressions but not quite as

powerful in SQL I could do something

like this I can select the title from

favorites where the title is like quote

unquote office but I can add a bit

weirdly percent signs to the left and

the right

so the language SQL supports the same

notion of pattern matching but much more

limited out of the box if we want more

powerful regular Expressions we probably

do want to use Python instead but the uh


percent sign here means zero or more

characters on the left zero or more

characters on the right so this will

just grab any title that contains o f f

i c e in it in that order and now I get

all 16 it would seem of those results

again how do I know it's 16 well I can

just get the count of those titles and

get back that answer instead as well so

again take some getting used to the

library the vocabulary and sort of the

syntax that you can use there's these

building blocks and others but SQL is

really designed again for creating

reading updating deleting data for

instance I've never really been a fan of

friends for instance so right now if I

do select how about title from favorites

where title like quote-unquote friends

with the percent signs we can see that

there's a whole bunch of them that's how

many exactly let's just do a quick count

so that's nine of them uh well delete

from

favorites okay you and me daily from

favorites where title like

friends enter nothing seems to happen

but bye bye friends so now we've

actually thank you

so now we've actually changed the data


and this is what's compelling about a

database a proper database yes you could

technically write python code that not

only reads the CSV file but also writes

it like you can change using quote

unquote a for append or quote unquote W

for right instead of quote unquote R for

read alone but it's definitely a little

more involved to do that in Python but

with SQL you can update the data in real

time and if I were actually running like

a web application here or a database for

a mobile app that change theoretically

would be reflected everywhere on your

own devices if you're somehow talking to

this application so that's sort of the

direction we're headed this other thing

has been bothering me so select uh how

about title from favorites

where uh title equals what was it the

the office was it yeah it was that one

how about we update favorites by setting

title equal to the office where title

equals quote unquote the V office

semicolon and now if I select the same

thing again I can go up and down with my

arrow keys quickly now there is no the V

office we've actually changed that value

how about genres select genres from


favorites where the title is title

equals Game of Thrones semicolon these

were kind of long and you know I don't

really agree with all of that so how

about we update favorites set genres

equal to I mean short action adventure

short drama okay so it's a decent list

fantasy sure Thriller War okay anything

really but comedy I would say

let's go ahead and hit enter now and now

if I select genres again same query now

we've sort of canonicalize that we've

thrown data away so whether or not that

is right is probably a bit subjective

and argumentative but I have at least

cleaned up my data which is again the U

in crud create read update delete you

can do it that easily beware using

delete beware worse using drop whereby

you can drop an entire table but via

these kinds of commands can we actually

now manipulate our data much more

rapidly and with single thoughts and in

fact if you're an aspiring statistician

or data scientist or analyst in the real

world I mean SQL is such a commonly used

language because it allows you to really

dive into Data quickly and ask questions

of the data and get back answers quite

quickly and this is a simple data set


you can do this with much larger data

sets as we soon will

two

are any questions on what we've seen of

SQL thus far only scratched the surface

but again it boils down to creating

reading updating and deleting data

[Music]

questions here

all right well let's consider the design

of this data recall that if I do dot

schema that shows me the design of my

table the so-called schema of my data

this is okay it gets the job done and

frankly everything the user typed in was

arguably text including the timestamp

which is the date and time but so the

data set itself is somewhat simple but

if we look at the data set itself

especially genres let's do this select

genres from favorites and let me point

out one other thing stylistically too I

am very deliberately capitalizing all of

the special SQL keywords and I'm

lowercasing all of the column names and

the table names this is a convention and

honestly it honestly it just helps you

read I think the code when you're

commingling your names for columns and


tables with proper SQL keywords but I

could just as easily do select genres

from favorites but again the SQL

specific keywords don't quite jump out

as much so stylistically we would

recommend this selecting genres from

favor it's semicolon so here is where

oh

okay that was not intended I

accidentally made every show including

the office about action adventure Drama

Fantasy Thriller and War

how did I do that accidentally

what did I do wrong

yeah so beware it's funny I think I did

say beware around this time so I the SQL

database took me literally I updated

favorites setting genres equal to that

semicolon end of thought I really wanted

to say where title equals quote-unquote

Game of Thrones unfortunately there

isn't an undo command or time machine

with a SQL database so the best we can

do here is let's actually get rid of

favorites.db let's run SQL Lite of

favorites dot DB again which now will be

recreated let me change myself into CSV

mode let me import into my favorites

table the CSV file and now

um friends is back for better for worse


but so are all of our genres so if I now

do select uh if I now reload the file

and do select star from sorry select

genres from favorites that was the

result I was getting it's much Messier

but that's because some of these are

quite long but now we're back to the

original data less than here be sure to

back up your work all right so what more

can we now do with this data well I

don't love the design of the genres

table for a couple of reasons one I mean

we didn't have any sort of validation

but user input is going to be messy

there's just a lot of redundancy in here

right like let's go ahead and do this uh

let me select all the comedies you all

typed in so Select Title from

uh favorites where genres equals

quote-unquote comedy

okay so there's

all of the shows that are explicitly

comedies but I think there might

actually be others let me scroll back up

here

comedy drama what was a comedy and a

drama how about let's search for the

oops let me copy paste comedy comma

drama
okay so the office in this case was

considered comedy and drama billions

It's Always Sunny in Philadelphia and

Gilmore Girls as well but notice that I

get many more when I just search for

comedy so the catch here is that because

I have all of these genres implemented

the way Google did as a comma separated

list it's actually really hard and messy

to get at any show all of the shows that

are somewhere described as comedy right

because if I search for quote-unquote

comedy the only answers I'm gonna get

are this one Whatever that show is this

one Whatever that show is this one but

I'm not going to get this one I'm not

gonna get this one why if I'm searching

for where genres equals quote-unquote

comedy why am I missing those other

shows

why am I missing yeah

[Music]

exactly it's not just a comedy it's a

comedy and a drama and a comedy or a

news show and so forth so I have to

search for these commas so this gets

messy quickly right like let me copy

this so I can do this let me search for

where genres equals comedy uh how about

uh or genres equals comedy drama or


genres equals this whole thing comedy

News Talk Show I'm gonna get more and

more results but that's not going to

scale well what could I do instead of

enumerating with ores all of the

different permutations of genres do you

think

[Music]

yeah so I could use the keyword is

similar in Python to the word in I could

use the like keyword so that so long as

the genres is like comedy somewhere in

there that's going to give me all of

them so long as the word comedy is in

there but let me go ahead and just open

the form from earlier uh the form had

let me see if I can open this real quick

before I toggle over if we look back at

the form recall that there were all of

those radio buttons asking for the

specific genres into which something

fill and if I open this let me full

screen Here and Now open the original

form you'll see all of the genres here

none of which are that worrisome except

for

a corner case is jumping out at me where

might the like keyword alone get me into

trouble
it's not with comedy I'm okay with

comedy but

yeah music and musical are deliberately

on the list here because one there are

separate genres but if I just search for

something that's like music I'm going to

accidentally suck in all of the musicals

which might not be what I intend if

music is like a music video or whatever

and musical is actually a different type

of show I don't want to just do that so

it seems just very messy like I could

probably hack something together with

maybe add some commas in there or

something like this but this is just not

a good design for the data Google has

done it this way because it's just

simple to actually keep the user's data

all in a single column and just as they

did separate it by commas but this is a

real messy way to use csvs by putting

comma separated values in your comma

separated values arguably the folks at

Google probably just did this because

it's just simpler and they didn't want

to give people multiple sheets or

complicate things using some other

weirder character than commas alone but

I bet there's a better way for us to do

this and let me go ahead and do this let


me go back into my code here year and in

just a moment I'm going to grab a

program that I wrote in advance that's

going to use Python to open up the CSV

file iterate over all of the rows and

load the data into two tables this time

two tables one called shows and one

called genres so as to actually separate

these two things out give me just a

moment to grab the code and when I run

this I'll only have to run it once let

me go ahead and run python in a moment

and I'll reveal the results in a sec

uh this is going to be version 8 of The

Code online when I do this let me go

ahead and open up this file

give me a second to move it into this

directory

version eight okay so here we have

version 8 of this that's available

online that's going to do the following

and I'll gloss over some of the details

just so that it uh we don't get stuck in

the weeds of some of this code I'm going

to be using at the top of this program

as we'll soon see a cs50 library not for

the sake of get string or get int or get

float but because there's some built-in

SQL functionality that we didn't discuss


a couple of weeks back with the cs50

library itself but inside of the cs50

library we'll see there is a special

function called SQL that gives you the

ability using this weird URL like

looking thing technically called a URI

that allows me to open a file called

favorites.db and long story short all of

the subsequent code is going to iterate

over this favorites.csv file that we

downloaded and it's going to import it

into the SQL Lite database but it's

going to use two tables instead of just

one so give me just a moment to run this

and then I'll reveal the actual results

this is going to be run on favorites uh

dot CSV

[Music]

and taking a look here

give me just a moment

oh uh give me a sec

come on

come on this program should not be

taking this long

sorry

let's open this real fast

[Music]

whoops not that file

okay let me just skim this code real

quick to see where we've gone wrong with


favorite Sesame Street reader

reader title show ID insert into shows

genre split until we execute

all right this is me debugging in real

time all those times we encourage you to

use print this is me actually using

print

we'll see how quickly I can recover from

this python favorites version 8.

okay so here's me debugging in real time

it's printing oh maybe I just didn't

wait long enough okay so here we go what

I'm doing is printing out the dictionary

that represents each row that you all

typed in and we're actually making

progress all right I'm just didn't I was

too impatient and didn't wait long

enough so in a moment there we go all

right so all you have to do sometimes is

wait let me go ahead now and open this

file using SQL Lite 3. so in SQL Lite 3

I now have a different version of

favorites.db I named it number eight for

consistency once I've run the program I

can do dot schema to look inside of it

and here's what the two tables in this

database are going to look like I've

created a table called shows this time

to represent all the TV shows that are


favorites that has two columns one is

called ID one is called title but now

I'm going to start taking out for a spin

some of the other features of SQL and

besides there being text it turns out

there's a data type called integer

besides there being a data type to call

text there's also a special key phrase

that you can specify that the title can

never be null think back to our use of

null in C think back to the keyword none

in Python this is a database constraint

that allows you to ensure that none of

you can't have a favorite TV show like

if you submit the form you have to have

typed in a title for it to end up in our

database here and you'll notice one

other new feature and turns out on this

table I'm defining what's called a

primary key specifically to be the ID

column more on that in just a moment

meanwhile the second table my code has

created for me is we'll soon see gives

me a column called show ID and then a

genre the value of which is text that

can also not be null and then more on

this in a moment this table has what

we're going to call a foreign key

specifically the show ID column that

references shows ID so before we get


into the weeds of this this is now a way

of creating the relay nation in

relational database if I have two tables

now not just one they can somehow be

linked together by a common column in

other words the shows column

shows table is going to give me a table

with two columns an ID and a title every

title you gave me I'm going to assign a

unique value the genres table meanwhile

is going to associate individual genres

singular with that same ID and the

result of this to pop back to the uh the

terminal here is let's do this select

star from shows of this new database

and you'll see that I've given indeed

all of the shows you all typed in unique

identifiers I didn't filter out

duplicates or do anything beyond just

forcing everything to uppercase so

there's going to be some duplicates here

because I didn't want to get rid of

anyone's data but you'll see that indeed

I've given everyone a unique identifier

from the very first person who typed How

I Met Your Mother all the way down to

input number 158. meanwhile if I do

select star from genres which is now a

table not just a column in the original


data now you'll see a much better design

for this data notice what I've done here

let me go all the way to the top and

you'll see two columns one of which is

called show ID the other of which is

called genre

and again I wrote some code to do this

because I had to take Google's messy

output where everything was separated by

commas I had to tear away the commas and

then put each genre into this table by

itself even though we haven't introduced

the syntax via which we can reconstitute

the data and reassociate your genres

with your titles why at a glance might

this be a better design now

even though I've doubled the number of

tables from one to two

why is this probably

on the direction toward a better design

what might your instincts be

why is this cleaner again first time

with SQL why is it better perhaps that

we've done this with our genres table

can I come to you

why might this be better yep oh just

because we had the conversation before

about the commas

exactly it's as simple as that we've

cleaned up the data by giving every


genre every word in the genre's column

in the original Google spreadsheet its

own cell in this table if you will and

now notice show ID might appear multiple

times whoever typed in How I Met Your

Mother they only Associated one genre

with it and so we see that show id1 is a

comedy but whoever typed in I forget the

name of the second show offhand but that

person whoever was assigned show id2

checked off a whole bunch of the genres

boxes that happened again with uh show

ID three four persons five six seven

only checked one box and so you can see

now that we've Associated the data with

what we might call a one-to-many

relationship a one-to-many relationship

whereby for every one show in the show's

table it can now have many genres

associated with it Each of which is

represented by a separate

a separate row here so again if I go

ahead and select star from shows let's

limit it to the first 10 just to focus

on a subset of the data How I Met Your

Mother The Sopranos was the second input

there it would seem that now that I've

created the data in this way I could

ideally somehow
search the data but a little more

correctly I don't have to worry about

the commas I don't have to worry about

the hackish approach of Music being a

substring of musical but how can I

actually get back at this data well

let's go ahead and do this suppose I did

want to get back maybe all of the

comedies all of the comedies no matter

whether the person checked just the

comedy box or multiple boxes instead

how now given that I have two tables

could I go about selecting

only the titles of comedies like I've

actually made the problem a little

harder but again SQL is going to give me

a solution for this the problem is that

if I want to search for comedies I have

to check the genres table first and then

that what's that going to give me like

if I search the genres table for

comedies what's that going to give me

back

potentially yeah

maybe show ID so let me try that let me

do select show ID from genres where the

genre in a given row equals

quote-unquote comedy no commas no like

no percent signs because literally that

column now is singular words like comedy


or drama or the like let me go ahead and

hit enter here okay so I got back a

whole bunch of ID numbers now this could

very quickly get annoying it looks like

show ID one two four five six seven nine

and so forth are all comedies so I could

do something really crazy like Select

Title from shows where ID equals one or

ID equals two or ID equal I mean this is

not going to scale very well but this is

why SQL is especially powerful you can

actually compose one SQL question from

multiple ones so let's do this why don't

I select the title where the ID of the

show is in the following list of IDs

select show ID from genres where the

specific genre is quote unquote comedy

so I've got two SQL queries one is

deliberately nested inside of

parentheses that's going to give me back

that whole list of show IDs but that's

exactly what I want to then look up the

titles for by selecting title from shows

where the ID of the show is in that big

tall list and so now if I hit enter

I get back only those shows that were

somehow flagged as comedy whether you in

the audience checked one box for comedy

two boxes or all of the boxes somehow we


teased out comedy again just by using

that python script which loaded this

data not into one big table but instead

two and if we want to clean this up

let's do a couple of things let's

outside of the parentheses do order by

title this is a way of sorting the data

in SQL very easily now we have a whole

list of the same titles that are now

sorted and what was the keyword with

which I could filter out duplicates

yeah distinct so let's try this same

query but let's select only the distinct

titles from that whole query and notice

I very deliberately done it this way and

to this day anytime I'm using SQL I

don't just start at the beginning and

type out my whole thought and just get

it right on the first try I very

commonly start with the sub query if you

will the thing in parentheses just to

get myself one step toward what I care

about then I add to it then I add to it

then I add to it just like we've

encouraged in Python and C taking baby

steps in order to get to the answer you

actually care about like this one now

and other than this mistake which we

didn't fix because I re-imported the

data after accidentally changing


everyone's genre we now have an

alphabetized list of all of the same

data but now it's better designed

because we have it split across these

two tables

oh thank you we're okay just thanks

[Music]

what questions do we have if any here

questions on this approach

[Music]

oh now that we have a database how do we

transfer it to a CSV

there are ways to do that and in fact

there's a command within SQL Lite that

allows you to export your data back to a

CSV file if you want to email it to

someone you want them to be able to open

it in Excel or Google spreadsheets or

apple numbers or the like you can go in

the other direction generally though

once you're in the world of SQL you're

probably storing your data there long

term and you're probably updating it

maybe deleting it adding to it and so

forth for instance the one command I did

not show earlier is suppose someone uh

forgot a show let's see do I did I see

this in the output all right so Curb

Your Enthusiasm saw that last night is


Just yeah did anyone see it last night

yeah all right well just the one person

that checked that box so you and me uh

what's another show that didn't make the

list how about uh Seinfeld is now on

Netflix apparently so insert into

uh shows uh what do we want to insert

well we want to insert maybe an ID and a

title but you know I don't actually care

what the ID is so I'm just going to

insert a title and the value I'm going

to give to that title is going to be

quote unquote Seinfeld and then let me

go ahead and hit semicolon nothing seems

to happen but let me rerun the bigquery

from before looking for comedies and

unfortunately Seinfeld has not yet been

flagged as a comedy so let's get this

right too what intuitively I'm gonna

have to do to associate now Seinfeld

with my comedies

I just inserted into the shows table

what more needs to happen before we can

flag Seinfeld as a comedy

say again

yeah so I need to insert into the genres

table two things now a show ID

like this and then the name of the genre

which presumably is comedy what values

do I want to insert well the show ID I


better grab that oh I don't even know

what it is I'm gonna have to figure out

what that is so I could do this in a

couple of ways let me do a one-time

thing select star from shows where title

equals quote unquote Seinfeld semicolon

159. so now I could do insert into

genres a show ID and a genre name the

values 159 and quote unquote comedy

semicolon enter and now if I scroll back

in my history and execute that really

big query again looking for all distinct

comedies now Seinfeld has made the list

but I did this manually so I didn't

actually capitalize it let's clean that

up let's do update uh let's do update my

shows set title equals to Seinfeld

semicolon

no okay thank you where title equals

quote unquote Seinfeld let's not make

that mistake again enter and now if I

execute that really big query now

Seinfeld is indeed considered a

um a comedy so where are we going with

this well thus far we've been doing all

this pretty manually and this is

absolutely what an analyst a data

scientist type person might do if just

manipulating a pretty large data set


just to get at interesting answers that

might be across one two or even many

more tables eventually in a few weeks

we're going to start to automate all of

this by writing code in Python that

generates SQL to do this right if you go

to most any website on the internet

today and you for instance log in odds

are you're typing a username and

password clicking submit what's then

happening well the website might not be

implemented in Python but it's probably

implemented in some language python

JavaScript Java Ruby something else and

that language is probably using

something like a relational database to

use SQL to get your username get your

password and compare the two against

what what you've typed in and actually

it's hopefully not getting your actual

password but something called the hash

thereof but there's probably a database

involved doing that when you buy

something on amazon.com and you click

check out odds are there's some code on

Amazon server that's looking at what

unit is you add it to your shopping cart

and then maybe using a for Loop of some

sort in python or another language it's

doing a whole bunch of SQL inserts to


store in their database what it is you

bought there's other types of databases

too but SQL databases or relational

databases are quite popular so let's go

ahead and write one other program here

in Python that now merges these two

languages together whereby I'm going to

use SQL inside of a Python program so I

can Implement my sort of logic of my

program in Python step by step line by

line but when I want to get at some data

I can actually talk to a SQL database so

let me go ahead and open

Favorites dot python here dot pi and let

me go ahead and

throw away some of what we did earlier

and really just now add a sequel to The

Mix from the cs50 library let's import

the SQL function this will be useful to

use because most third-party libraries

that deal with SQL and python are more

complicated than they need to be so I

think you'll find this Library easier to

use let's then do the following create a

variable called DB for database I could

call it anything I want let's use that

URI which is a fancy way of saying

something that looks like a URL but that

actually opens up a database


on disk that is in the current folder

let's now ask the user for a title by

prompting them for a quote unquote title

like this and let's strip off any white

space just so that the data is not messy

and then let's go ahead and do this and

this is the new logic I'm going to go

ahead now and write a line of code that

uses python to talk to the original

favorites.db so again I'm not using the

two table database which is in

favorites8.db I'm using the original

that we imported from your own data and

I'm going to do the following I'm going

to use db.execute to execute a SQL

command inside of python I'm going to

select the count of

um shows

from the favorites table where the title

you typed in where the title user typed

in is like this question mark and why

I'm doing that is as follows just like

in C when we had percent s in SQL for

now the analog is going to be a question

mark So same idea different syntax

instead of percent s it's just a

question mark and using a comma outside

of this first string using cs50's

execute function I can pass in a SQL

string a command then any arguments I


want to plug into the question marks

they're in so the goal at hand is to

actually write a program that's going to

search favorites dot CSV AKA

favorites.db for the total number of

people that liked a particular show so

this is going to select the count of

people from the favorites table where

the title they typed in is like whatever

the user has just now typed in this DB

execute function returns a list it

returns a list of rows and you would

only know that by my telling you or

reading the documentation and therefore

if I want to get back the total count

I'm going to go ahead and grab the first

row from those rows because it's only

going to give me back the count and then

I'm going to go ahead and print out that

rows first value

but it's going to be a little weird

technically the column is going to be

called count star quote unquote which is

a little weird let me add one more

feature to the mix you can actually give

nicknames to columns that are coming

back especially if they are the result

of functions like this I can just call

that column counter in all lower case


that means I can now say get back the

counter

key inside of this dictionary so just to

recap what have we done we've imported

the cs50 library SQL function we've with

this line of code opened the

favorites.db file that you and I created

earlier by importing your CSV into SQL

Lite I'm now just asking the user for a

title they want to search for I'm now

executing this SQL query on that

database plugging in whatever the human

typed in is their title in order to get

back a total count and I'm giving the

count a nickname an alias of counter

just so it's more self-explanatory this

function DB execute no matter what

always returns a list of rows even if

there's only one row inside of it so

this line of code just gives me the

first and only row and then this goes

inside of that row which it turns out is

a dictionary

and gives me the key counter and the

value it corresponds to so what to be

clear is this doing let's go ahead and

run this manually in my terminal window

first let me run SQL Lite 3 on favorites

uh dot oh let's do this on favorites.db

let me import the data again so uh mode


CSV

dot Import in from favorites.csv into a

favorites table

so I've just recreated the same data set

that you all gave me earlier in

favorites.db if I were to do this

manually let's search for the office

again select count star from favorites

where title like and let's just manually

type it in for now uh the office we'll

search for the one with the the word the

semicolon I get back 12. but technically

notice what I get back I technically get

back a miniature table containing one

column and one row

what if I want to rename that column

that's where the ads keyword comes in so

select count Star as counter notice what

happens enter I just get back same

simple table but I've renamed the column

to be counter just because it's a little

more self-explanatory as to what it is

so what am I doing with this line of

code this line of code is returning to

me that miniature temporary table in the

form of a list of dictionaries the list

contains one row as we'll see and it

contains one column as we'll see the

column the key for which is counter so


let's now run the code itself I'm going

to get out of SQL Lite 3 and I'm going

to run python of favorites dot Pi enter

I'm being prompted for a title I'm going

to type in the office and cross my

fingers and there's that 12. why is it

12 well there's a typo again because I

re-imported the CSV I had deleted two of

those so we're back at the original data

set so there's 12 total that have quote

unquote the office in the title like

that

so what have we done we've combined some

python with some SQL but we've relegated

all of the complexity of searching for

something the selecting of something

gotten rid of all of the with keyword

the open keyword the for Loop the reader

the dict reader and all of that and it's

just one line of SQL Now sort of using

the best of both worlds

are any questions

on what we've just done here or how

any of this works

any questions here yeah

foreign

[Music]

does this function return more than one

row well let me let me is was that the

question yeah so let's do that by


changing the problem at hand this

program was designed just to select the

total count let's go ahead and select

for instance

um

all of the ways you all typed in the

office by selecting the title this time

if I do this in SQL Lite 3 whoops if I

do this in SQL Lite 3

let me go ahead and do this again after

increasing my terminal window let's do

it manually Select Title from favorite

where the title is like quote unquote uh

the office semicolon I get back all of

these different rows and we didn't even

notice this one there's actually another

little typo in there with some

capitalization of the E and the C and

the E that would be an example of a

query that gives me back there for

multiple rows so let's now change my

Python program if I now in my Python

program do this

I get back a whole bunch of rows

containing all of those titles I can now

do four row in rows I can print out the

current rows title and now manipulate

all of those things together let me keep

both on the screen let me run Python A


favorites.pi and that for Loop now

should iterate what 10 or more times

once for each of those titles and indeed

if I type in the office again enter

oops uh

row title what did I do wrong oh I

should not be renaming title to counter

this time so that's just a dumb mistake

on my part let me rerun it again and now

I should see after typing in the office

enter a whole bunch of the offices and

because I'm using like even the

miscapitalizations are coming through

because like is case insensitive doesn't

matter if it's uppercase or lowercase

whereas how to use the equal sign I

would get back only the same ones

capitalize correctly

all right any questions on this next

all right so let's transition to a

larger juicier data set and consider

some of the issues that arise when

actually now using SQL and skating

toward a world in which we're using SQL

for mobile apps web apps and generally

speaking very large data sets so let's

start with a larger data set just like

that give me just a moment to switch

screens over to

what we have for you today which is an


actual relational database that we've

created out of a real world data set

from IMDb so internetmoviedatabase.com

is a website where you can search for TV

shows and movies and actors and so forth

all using their database behind the

scenes IMDb wonderfully makes their data

set available as not CSV files but tsv

files tab separated values and so what

we did is before class we downloaded

those tsv files we wrote a Python

program similar to my favorites eight

dot Pi file earlier that read read in

all of those tsv files created some SQL

tables in an IMDb

database for you in SQL Lite that has

multiple tables and multiple columns so

let's go and wrap our minds around

what's actually in this data set let me

go back to vs code here and in just a

moment I'm going to go ahead and copy

the file which we've named shows.db

and I'm going to go ahead and increase

my terminal and do sqlite3 of shows.db

whenever playing around with a SQL Lite

database for the first time typing dot

schema is perhaps a good place to start

to give you a sense of what's in there

and things just escalated quickly like


there's a lot in this data set because

indeed there's going to be tens of

hundreds of thousands of rows in this

data set and also problem set 7 where

we'll look at the movie side of things

and not just the TV shows so what is the

schema that we have created for you from

IMDb is actual real world data one

there's a table called shows and notice

we've just added white Space by hitting

enter a bunch of times to make it a

little more stylistically readable the

shows table has an ID column a title

column a year and the total number of

episodes for a given show and the types

of those columns are integer text

numeric and integer so it turns out

there's actually a few different data

types that are worth being aware of when

it comes to creating tables themselves

in fact in SQL light it there's five

data types and only five fortunately one

of which is indeed integer negative or

positive numeric which is kind of a

catch-all for dates and times things

that are kind of numeric but are not

just integers and not just real numbers

for instance real number is what we've

generally thought of as float up until

now text of course is just text but


notice that you don't have to worry

about how big it is like in Python it

will size to fit and then there's blob

which is binary large object which is

for just like raw zeros and ones like

for files or things like that but we'll

generally use the other four of these

and so indeed when we imported this data

for you we decided that every show would

be given an ID which is just an integer

every show has of course a title which

should not be null otherwise Y is it in

the database every show has a year which

is numeric according to that definition

a moment ago and the total number of

episodes for a show is going to be an

integer what now is with these primary

keys that we mentioned earlier too a

primary key is the column that you

uniquely identifies all of the data in

our case with the favorites I

automatically gave each of your

submissions a unique ID so that even if

two or more of you typed in the office

your submission still had a unique

identifier a number that allowed me to

then correlate it with your genres just

as we saw a moment ago in this version

of imdbs there's also genres but they


don't come from us they come from

imdb.com and so a genre has a show ID

and a genre just like our database but

these are real world genres with a bit

more filtration notice though just like

my version there's a foreign key a

foreign key is the appearance of another

table's primary key

in its own table so when you have a

table like genres which is somehow

cross-referencing the original shows

table

if shows have a primary key called ID

and those same numbers appear in the

genres table under the column called

show ID by definition show ID is a

foreign key it's the same numbers but

it's foreign in the sense that the

number is being used in this table even

though it's officially defined primarily

in this other table this is what we mean

by relational databases you have

multiple tables with some column in

common numbers typically and those

numbers allow to align the two tables up

in such a way that you can reconnect the

shows with their genres just like we did

with our smaller data set a moment ago

this logic is extended further notice

that the IMDb database we've created for


you has a Stars table like a TV show

Stars the actors they're in and that

table interestingly has no mention of

people and no mention of shows per se it

only has a column called show ID which

is an integer and a person ID which is

an integer meanwhile while if we scroll

down to

um if we scroll down to the bottom you

will see a table called people and we

have decided in IMDB's world that every

person in the movie in the TV show world

will have a unique identifier that's a

number a name that's text a birth date

which is numeric and then again

specifying that ID is going to be their

primary

primary key so what's going on here well

it turns out that TV stars and writers

are both types of people so using this

relational database notice the the

um the road we're going down we're sort

of factoring out commonalities and if a

person can be different things in life

well we're defining them first and

foremost as people and then notice these

two tables are almost the same the Stars

table has a show ID which is a number

and a person ID which is a number which


allows us via US this middleman table if

you will to link people with TV shows

similarly The Writer's table allows us

to connect shows with people too by just

recording those numbers so if we go into

this data set let's do the following

let's do select star from people

semicolon it's a huge amount of data is

coming back right this is hundreds of

thousands of rows now based on the ID

numbers alone so this is a real world

data now flying across the screen

there's a lot of people in the TV show

business not just actors and writers but

others as well

it's still going there's a lot of data

there so my God like if you had to do

anything manual in this data set it's

probably not going to work out very well

and actually we're up to what a million

people in this data set plus which would

mean this probably isn't even going to

open very well in Excel or Google

spreadsheets or Apple Numbers SQL

probably is the better approach here

let's search for someone specific like

select star from people where name

equals like Steve Carell for instance

sticking with comedies all right so

there's Steve Carell he is person number


136

797 born in 1962 and that's as much data

as we have on Steve Carell here how do

we figure out what shows for instance

he's in well let's see select star from

shows semicolon

there's a crazy number of shows out

there in the IMDb database and you can

see it here again flying across the

screen feels like we're going to have to

employ some techniques in order to get

at all of Steve Carell's shows

so how are we going to do that well God

this is a lot of data here and in fact

yeah we have what uh 15 million shows

Plus in this data set too so doing

things efficiently is now going to start

to matter so let's actually do this let

me select a specific show select star

from shows where title equals quote

unquote the office and there presumably

shouldn't be typos in this data because

it comes from the real website imdb.com

let's get back the show turns out

there's been a lot of the offices out in

the world the one that started in 2005

is the one that we want presumably the

most popular with 188 episodes how can

we get just that maybe we could do like


and year equals uh how about 2005. all

right so now we've got back just the ID

of the office that we care about and

let's do this too let me turn on a timer

within SQL Lite just to get a sense of

running time now let me do that again

select star from shows where title

equals the office and year equals 2005

and let's keep it simple let's just do

titles for now enter all right so not

terribly long it found it pretty fast

but it looks like it took how much real

time 0.02 seconds not bad for just a

title but just to plant a seed it turns

out that we can probably speed even this

up let me do this let me create

something called an index which is

another use of the C in crud for

creating something and I'm going to call

this like title index and I'm going to

create it on the shows table

uh specifically on the title column and

we'll see in a moment what this is going

to do for me enter

took a moment like .349 seconds to

create something called an index but now

watch if I select star from shows

searching for the office again

previously it took me .021 seconds not

bad but now


wow like literally no time at all or so

low that it wasn't really measurable and

I'll do it again just to get a sense of

things still quite low now even though

0.021 seconds not crazy long imagine now

having a lot of data a lot of users

running a real website a real mobile app

every millisecond we can start to shave

off is going to be compelling so what is

it we just did well we actually just

created something called an index and

this is a nice way to tie in now some of

our week five discussion of data

structures and our week three discussion

of running times in index in a database

is some kind of fancy data structure

that allows the database to do better

than linear search I mean literally as

you just saw these tables are crazy long

or tall right now very linear that is

and so when I first searched for the

office it was literally doing linear

search top to bottom looking at as many

as like what a million plus rows that's

relatively slow I mean it's not that

slow 0.021 seconds but that's relatively

slow just theoretically I'll go

rhythmically doing anything linearly but

if you instead create an index using


syntax like this which I just did

creating an index on the title column of

the shows table that's like giving the

database a clue in advance saying hey I

know I'm going to search on this column

in this table a lot do something with

data structures to speed things up and

so if you think back to our discussion

of data structures maybe it's using a

tree maybe it's using a try or a hash

table some fancier two-dimensional data

structure is generally going to lift the

data up creating right maybe a tree

structure so it's just much faster to

find data especially if it's sorting it

now based on title and not just storing

in one long list and in fact in the

world of relational databases the type

of structure that's often used in a

database is something called a b tree

it's not a binary tree different use of

the letter B but it looks a little

something like the table the trees we've

seen it's not binary because some of the

nodes might have more than two children

or fewer but it's a very wide but

relatively shallow tree it's not very

tall and the upside of that is that if

your data is stored in this tree the

database can find it more quickly and


the reason it took like half a second a

third of a second to build the index is

because SQL light needed to take some

non zero amount of time to just build up

this tree in memory and it has

algorithms for doing so based on like

alphabetization or other techniques but

you spend a bit of time up front a third

of a second and then thereafter wow like

every subsequent query if I keep doing

it again and again is going to be crazy

low 0.000 maybe 0.001 but in order of

magnitude a factor of 10 or 100 faster

than it previously was earlier

so we have these indexes which allow us

to get at data faster but what if we

want to actually get data that's now

across these multiple tables how can we

do that and how might these indices or

indexes help further Well turns out

there is a way that we've seen already

indirectly to join two tables together

previously when I selected the ID of the

office and then I searched for it in the

other table using select in a nested

query I was kind of joining two tables

together and it turns out there's a

couple of ways to do this let's go ahead

now and for instance find all of like


Steve Carell's TV shows not just the

office but all of them too unfortunately

if we look at our schema

shows up here have no mention of TV

shows over here has no mention of the TV

stars in them

and people have no mention of shows we

somehow need to use this table here to

connect the two and this is called a

join table in the sense

that using two integer columns it kind

of joins the two tables together

logically and so if you're kind of Savvy

enough with SQL you can kind of do what

I did with my hands earlier and like

recombine Tables by using these common

IDs these integers together so let me do

this let me go ahead and figure out step

by step Steve Carell show so how am I

going to do this well if I select star

from people where name equals Steve

Carell fortunately there's only one of

them so this gives me back his uh his

name his name his ID and his birth year

but it's really only his ID that I care

about why because in order to get back

his shows I need to link person ID with

show ID right so I need to know his ID

number so what could I do with this well

remember the schema and the Stars table


I've just gotten from the people table

Steve Carell's ID I bet by transitivity

I could now use his person ID his ID to

get back all of his show IDs and then

once I've got all of his show IDs I can

take it one step further and get back

all of his Show's title so the answer is

actually English words and not just

random seemingly integers so let me go

ahead and do this let me again get Steve

Carell's

ID number but not star star represents

everything it's a wild card character in

SQL let me just select the ID of Steve

Carell and that gives me back 136 797

and it's only giving me back one value

the thing called ID is just the column

heading up above now suppose I want to

select all of the show IDs that Steve

Carell is affiliated with let me select

show ID from Stars where the person ID

in Stars

happens to equal Steve Carell's ID so

again I'm sort of building up my answer

in reverse and taking these baby steps

on the right in parentheses I'm getting

Steve Carell's ID on the left I am now

selecting all of the show IDs that have

some connection with that person ID in


the Stars table this answer too is not

going to be that Illuminating it's just

a whole bunch of integers that have no

meaning to me the human but let's take

this one step further and even though my

code is getting long I could but hit

enter and kind of format it nicely

especially if I were doing this in a

code file but I'm just doing it

interactively for now let's now select

all of the titles from the shows table

where the ID of the show is in this

following previous query so again the

queries getting long but notice it's the

last third and last step Select Title

from the shows table where the ID of the

show is in the list of all of the show

IDs that came back from the Stars table

searching for Steve corel's person ID

how did we get that person ID let me

scroll to the end well I selected in my

innermost parentheses Steve carells

own ID so now when I hit enter voila I

get all of Steve Carell's TV shows up

until now and if I want to tidy this up

further I can use the same tricks as

before order by title semicolon now I've

got it all alphabetized as before so

again with SQL comes the ability to

search I mean look how quickly we did


this .094 seconds to search across three

different tables to get back this answer

but my data is now all kind of neatly

designed in individual tables which is

going to be important now that the data

set is so large but but let me take this

one step further let me go ahead and do

this let me go ahead and

point out that with this query notice

that I'm searching on

uh let's say

I'm searching on a person ID here and at

the end here I'm searching on a name

column here so let me actually go ahead

and do this let me go ahead and

see if we can't speed this up this query

at the moment takes .092 seconds let's

see if we can't speed this up further by

just quickly creating a few more of

those B trees in the databases memory

create an index called person index and

I'm going to do this on the Stars table

uh on the person ID column enter it's

taking a moment taking a moment that's

almost a full second because that's a

big table let's create another index

called show index on the Stars table why

because I want to search by the show ID

also that was part of my bigquery it


takes a moment okay just more than two

third about two-thirds of a second now

let's create one last one another index

called name index but I could call these

things anything I want on the people

table why because I'm also searching on

the name column so in short I'm creating

indexes on each of the columns that are

somehow involved in my search query

going from one table to the other now

let's go back to the previous query

which recall took Point o whoops that

took

um

I think I erased it

.091 all right well it was roughly this

order of magnitude we're not seeing the

data now but let me go ahead and run my

original bigquery once and boom we're

down to almost nothing so again creating

these indexes in memory has the effect

of rapidly speeding up our computation

time now if you've ever used for

instance uh the my.harvard core shopping

tool here on campus or Yale's analog you

might wonder like why is the thing so

slow this could be one of the reasons

why large data sets with thousands of

rows thousands of courses tend to be

slow if and I'm only conjecturing if the


database isn't properly indexed if

you're building your own web application

and you're finding that users are

waiting and waiting and things are

spinning and spinning what might be

among the problems what could absolutely

just be bad algorithms and bad code that

you wrote or it might be that you

haven't thought about well what columns

should be optimized for searches and

filtration like I've done here in order

to speed up subsequent queries again

from the outside in we can only conject

picture but ultimately this is just one

of the things that explains performance

problems as well all right let's point

out just a couple of final syntactic

things and then we'll consider bigger

picture some problems that might arise

in this world

if these nested nested queries start to

get a little much there are other ways

just so you've seen it that you can

execute similar logic in SQL for

instance if I know in advance that I

want to connect Steve Carell to his show

IDs and to their titles we can do

something more like this

Select Title from the people table


joined with the Stars table on whoops on

people ID equals stars.person ID so what

am I doing new syntax and again this is

not something you'll have to memorize or

ingrain right away but just so you've

seen other approaches Select Title from

people join Stars this is an explicit

way to say take the people table in one

hand the Stars table and in the other

hand and somehow join them as I keep

doing with my fingertips here how to

join them join them so that the people

the ID column in the people table lines

up with the person ID in the Stars table

but that's not quite everything I could

also say join further on the shows table

where uh the Stars show ID

equals the show's ID column so what am I

doing here that's saying go further and

join the star the people table sorry the

Stars table with the shows table joining

the show ID column with the ID column

again it's this starts to get a little

messy to think about but now I can just

say where name equals quote unquote

Steve Carell I can do in one query what

previously took me three nested queries

and get back the same answers and I can

still add in my order by title to get

back the result and if I do this a


little more

uh neatly let me type this out a little

differently let me type this out by

adding a new line uh I can't do that

here

I'm going to leave it alone for now we

can type it on multiple lines in other

contexts and let me do one last

thing so I want to show that I'm going

to show it but this is not something you

should ingrain just yet either Select

Title from people stars and shows if you

know in advance that you want to do

something with all three tables you can

just enumerate them one table name after

the other and then you can say where

people.id equals stars.person ID and now

I'm hitting enter so that it formats a

little more readably on my screen and

stars.show ID equals shows.id and lastly

name equals Steve Carell in short you

specify that you want to select data

from all three of these tables and then

you tell the database how to combine

foreign keys with primary keys that is

the columns that have those integers in

column common if I hit enter now I get

these same exact results ever more so if

I also add in an order by title oops uh


that's why I didn't want to do this

earlier right I'd have to hit I'd have

to go back through my history multiple

times to actually get back the

multi-line query this time all right

that was a lot all at once but this is

only to say that even as we sort of make

the design of the data more

sophisticated and we put some of it over

here some of it over here some of it

over here so as to avoid duplication of

data weird hacks like putting commas in

the data we can still get back all of

the answers that we might want across

these several tables and using indexes

we can significantly speed up these

processes so as to handle 10 times as

many a hundred times as many users on

the same actual database there is going

to be a downside and thinking back to

our discussion of algorithms and data

structures in past weeks what might be a

downside of creating these indexes

because as of now I created four

separate indexes on the name column the

title column and some other columns too

like why wouldn't I just go ahead and

index everything if it's clearly

speeding things up

memory so space anytime you're starting


to benefit time wise in computer science

odds are you're sacrificing space or

vice versa and probably indexing

absolutely everything is a little dumb

because you're going to create what

you're going to waste way more space

than you might actually need so figuring

out where the right inflection point is

is part of the process of Designing and

just getting better at these things

now unfortunately a whole lot of things

can go wrong in this world and they

continue to in the real world with

people using SQL databases in fact here

on out if you're reading something

technical about SQL databases and

websites being hacked in some form and

passwords leaking out unfortunately all

too often it is because of what are

called SQL injection attacks and just to

give you a sense now to counterbalance

maybe any enthusiasm for like oh that

was neat how we can do things so quickly

with great power comes responsibility in

this world too and so many people

introduce bugs into their code by not

quite appreciating what it is that um

how it is the data is getting into your

application so what do I mean by that


here for instance is a typical login

screen for Yale and here's the analog

for Harvard where you're prompted like

every day probably for your username and

your password your email address and

your password Here suppose though that

behind this login page whether harvards

or yales there's some websites and that

website is using SQL underneath the hood

to store all of the Harvard or Yale

people's usernames passwords ID numbers

courses transcript all of that stuff so

there's a SQL database underneath the

website well what might go wrong with

this process unfortunately there's some

special syntax in SQL just like there is

in C in Python for instance there are

comments in SQL two if you do two

hyphens dash dash that's a comment in

SQL and if you the programmer aren't

sufficiently distrustful of your users

such that you defend against potentially

adversarial attacks you might do

something like this suppose that I

somewhat maliciously or uh seriously log

in by typing my username

maylandharbor.edu and then maybe a

single quote and a Dash Dash why because

I'm trying to suss out if there is a

vulnerability here to a SQL injection


attack do not do this in general but if

I were the owner of the website trying

to see if I've made any mistake I might

try using potentially dangerous

characters in my input dangerous how

because single quote is used for quoting

things in SQL as we've seen single

quotes or double quotes dash dash I

claim now is used for commenting but

let's now imagine what the code

underneath the hood might be for

something like Yale's login or Harvard's

login what if it's code that looks like

this

so let me read it from left to right

suppose that they are using something

like cs50's own execute function and

they've got some SQL typed into the

website that says select star from users

where username equals this and password

equals that and they're plugging in

username and password

so what am I doing here well when the

user types their username password hits

enter I probably want to select that

user from my database to see if the

username and passwords match so the

underlying SQL might be select star from

users where username equals question


mark and password equals question mark

users is the table one column is

username one column is password all

right and if we get back one row

presumably mailing and harvard.edu

exists with that password we should let

him proceed from there on out so that's

like some pseudo code if you will for

this scenario what if though this code

is not as well written as it currently

is and isn't using question marks so the

question mark syntax is a fairly common

SQL thing where the question marks are

used as placeholders just like in printf

percent s was but these function

db.execute from cs50s library and

third-party libraries as well is also

doing some good stuff with these

question marks and defending against the

following attack suppose that you were

not using a third-party Library like

ours and you were just manually

constructing your SQL queries like this

you were to do something like this

instead using an F string in Python

you're comfortable with form map strings

now you've gotten into the habit of

using curly braces and plugging in

values suppose that you the aspiring

programmer is just using techniques that


you've been taught so you have an F

string with select star from users where

username equals quote unquote username

and curly braces and password equals

quote unquote password in curly braces

right like as of that what two weeks ago

this was perfectly legitimate technique

in Python to plug in values into a

string but notice if you are using

single quotes yourself and the user has

typed in single quotes to their input

what could go wrong here like where are

we going with this

if you're just blindly plugging user

input into your own prepared string of

text yeah

[Music]

yeah worst case they could insert what

is actually SQL code into your database

as follows generally speaking if you're

using special syntax like single quotes

to surround the user's input you'd

better hope that they don't have an

apostrophe or in their name or you

better hope that they don't type a

single code as well because what if

their single quote finishes your single

quote instead and then the rest of this

is somehow ignored well let's consider


how this might happen let me go ahead in

here it's got a little blurry here but

let me plug in here wow that looks awful

let me fix the red

just change this to White so it's more

readable what happens if the user does

this instead

they type in like I did into the

screenshot mailing at harvard.edu single

quote dash dash what has just happened

logically even though we've only just

begun with SQL today well select star

from users where username equals mailing

harbor.edu end quote

what's bad about the rest of this

dash dash I claim means a comment which

means my color coding is going to be a

little blurry again but everything after

the dash dash is just ignored the logic

then of the SQL query then is to just

say select mailing harbor.edu from the

database not even checking the password

anymore therefore you will get back at

least one row so length of rows will

equal one and so presumably the rest of

the pseudo code logs the user in gives

them access to my my DOT Harvard account

or whatever it is and they've pretended

to be me simply by using a single quote

and a Dash Dash in the username field


again please don't go start doing this

later today on Harvard Yale or other

websites but it could be as simple as

that why because the programmer

practiced what they were taught which

was just to use curly braces to plug in

in fstrings values but if you don't

understand how the user's input is going

to be used and if you don't distrust

your users fundamentally for every good

person out there there's going to be

unfortunately some adversary who just

wants to try to find fault in your code

or hack into your data set this is

what's known as a SQL injection attack

because the user can type something that

happens to be or look like SQL and trick

your database into doing something it

didn't intend to like for instance

logging the user in worst case they

could even do something else maybe the

user types a semicolon then the word

drop or the word update you could

imagine doing semicolon update table

grades where name equals mailing and set

the grade equal to a instead of b or

something like that the ability to

inject SQL into the database means you

can do anything you want with the data


set either constructively or worse

destructively all right and now just a

quick little cartoon that should now

make sense

[Music]

okay it's like one of us two of us

awkwardly so much funny all right so

let's move on to one last condition

there's one other problem that can go

around here oh and I should explain this

so this is an allusion to uh the son

Robert having typed in semicolon the

word drop table students and doing some

of the same technique this is sort of

humor that only CS people would

understand because it's the mom

realizing oh her son's doing a SQL

injection attack onto the database less

money when you explain it but if once

you notice the syntax that's all this is

an illusion to all right so one final

threat now that you are graduating to

the world of proper databases and away

from CSV files alone things can go wrong

when using databases and honestly even

using CSV files if you have multiple

users and thus far you and I have had

the luxury in almost every program we've

written that it's just me using my code

it's just you using your code and even


if you're teaching fellow or ta is using

it probably not at the same time but the

world gets interesting if you start

putting your code on phones on websites

such that now you might have have two

users literally trying to log in at the

same time literally clicking a button at

the same or nearly the same time what

happens then if a computer is trying to

handle requests from two different

people at once as might happen all the

time on a website you might get what are

called race conditions and this is a

problem in Computing in general not just

with SQL not just with python really

just any time you have shared data like

a database as follows this apparently is

one of the most liked Instagram posts

ever it is literally just a picture of

an egg has anyone clicked on this egg it

was like okay wow all right so yes so go

search for this photo if you'd like to

add to the likes on Instagram the

account is world record egg this is just

a screenshot of Instagram of that

picture of an egg if you're an habit of

using Instagram or like any social media

site there's some equivalent of a like

button or a heart button these days and


that's actually a really hard problem

such a simple idea to like count the

number of likes something has but that

means someone has to click on it your

code has to detect The Click your code

has to update the database and then do

it again and again even if multiple

people or perhaps right now clicking on

that same egg and unfortunately bad

things can happen if two people try to

do something this at the same time on a

computer

how might this happen so here's some

more code sap pseudo code have python

code here as follows suppose that what

happens when you literally right now

maybe click on the like button on the

Instagram post

suppose that code like the following is

executed on Facebook servers db.execute

of Select likes from Posts where ID

equals question mark all right so what

am I assuming here I'm assuming that

that photograph has a unique ID it's

like some big integer whatever it was

randomly assigned I'm assuming that when

you click on the heart the unique ID is

somehow sent to Instagram server so that

their code can call it ID and I'm

assuming that Instagram's using its SQL


database and selecting from a posts

table the current number of likes of

that egg

for that given ID number why because I

need to know how many likes it already

has if I want to add one to it and then

update the database right I need to

select the data then I need to update

the data here all right so in some

python code here let's store in a

variable called likes whatever comes

back in the first row from the likes

column again this is new syntax specific

to our library but a common way of

getting back first row and the column

called likes therein so at this point in

the story likes is storing the total

number of likes in the millions or

whatever it is of that particular egg

then I do this execute update posts set

the number of likes equal to this value

where the ID of the post equals this

value what do I want to update the likes

to whatever likes currently is plus one

and then plugging in the ID so a simple

idea right I'm checking the value of the

likes and maybe it's 10. I'm changing 10

to 11 and then updating the table but a

problem can arise if two people have


clicked on that egg at roughly the same

time or literally the same time why is

that well in the world of databases and

servers and the Instagrams of the world

have thousands of physical servers

nowadays so they can support millions

billions even of users nowadays what can

go wrong well typically code like this

is not what we'll call Atomic to be

Atomic means that it all executes

together or not at all rather code

typically is executed as you might

imagine line by line and if your code is

running on a server that multiple people

have access to which is absolutely the

case for an app like Instagram if you

and I click on the heart at roughly the

same time for efficiency the computer

the server owned by Instagram might

execute this line of code for me then it

might execute this line of code for you

then this line of code for me then this

line of code for you then this line of

code for me then this line of code for

you that is to say our queries kind of

might get intermingled uh

chronologically because it'd be a little

obnoxious if when you're using Instagram

I'm blocked out while you're interacting

with the site it'd be a lot nicer for


efficiency and fairness if somehow they

do a little bit of work for me a little

bit of work for you and back and forth

and back and forth equitably on the

server so that's what typically happens

by default these lines of code get

executed independently and they can

happen in alternating order with other

users you can get them sort of combined

like this same same order top to bottom

but other things might happen in between

so suppose that the number of likes at

the very beginning was like 10 and

suppose that Carter and I both click on

that egg at roughly the same time and

suppose this line of code gets executed

for me and that gives me a value in

likes ultimately of 10. suppose then

that the computer takes a break from

dealing with my request does the same

code for car harder and gets back what

value for the current number of likes

also 10 for Carter because mine has not

been recorded yet at this point in the

story somewhere in the computer's memory

there's a likes variable for me storing

10 there's a likes variable represent

storing 10 for Carter then this line of

code executes for me it updates the


database to be likes plus one which

stores 11 in the database then Carter's

code is executed updating the same Row

in the database to

11 unfortunately because his value of

likes happen to be the same value of

mine and so the metaphor here that if we

had a refrigerator on stage we would

actually act out is something that was

taught to me years ago in an operating

systems class whereby the kind of most

similar analog in the real world would

be if like you've got like a mini fridge

in your dorm room and you and a roommate

uh are uh one of you and your roommates

comes home opens the fridge and realizes

like oh we're out of milk was how the

story went in my day so you close the

refrigerator and you walk across the

street go to CVS and get in line to buy

some milk Meanwhile your roommate comes

home they too inspect the state of your

refrigerator AKA a variable open the

door and realize this oh we're out of

milk I'll go get more milk close the

fridge go across the street and head to

maybe a different store or the line is

long enough that you don't see each

other at the store so long story short

you both eventually get home open the


door and damn it now there's milk from

your other roommate there because you

both made a decision on this based on

the state of a variable that you

independently examined and you didn't

somehow communicate now in the real

world this is absolutely solvable like

how would you fix this or avoid this

problem in the real world

literally own roommate own fridge

perfect let them know so somehow

communicate and in fact the terminology

here would be multiple threads can

somehow intercommunicate by having

shared State like the iMessage thread on

your phone you could leave a note you

could more dramatically lock the

refrigerator somehow thereby making the

milk purchasing process Atomic the

fundamental problem is that for

efficiency again computers tend to

intermingle logic that needs to happen

when it's happening across multiple

users just for fairness is sake for

scheduling's sake you need to make sure

that all three of these lines of code

execute for me and then for Carter and

then for you if you want to ensure that

this count is correct and for years when


social media was first getting off the

ground this was a super hard problem

Twitter used to go down all of the time

and tweets and retweets were a thing

that were similarly happening with a

very high frequency these are hard

problems to solve and thankfully there

are solutions and we won't get into the

weeds of how you might use these things

but know that there are solutions in the

form of things called locks which I use

that word deliberately with the fridge

software locks can allow you to protect

a variable so no one else can look at it

until you're done with it there are

things called transactions which allow

you to do the equivalent of sending a

message to or really locking out your

roommate from accessing that same

variable too but for slightly less

amount of time there are solutions to

these problems so for instance in Python

the same code now in green might look a

little something like this when you know

that something has to happen all at once

all together you first begin a

transaction and you do your thing and

then you commit the transaction at the

very end here too though there's going

to be a downside typically the more you


use transactions in these way the

potentially the higher the probabilities

that you're going to box someone out or

make Carter's request a little slower

why because we can't interact at the

same time or you might make his request

fail if he tries to update something

that's already been updated so you

generally want to have as few lines of

code together in between these

transactions so that you get in and you

get out and you go to CVS and you get

back really fast so as to not cause

these kind of performance things so

things indeed escalated quickly today

the original goal was just to solve

problems using a different language more

effectively than python but as soon as

you have these more powerful techniques

a whole new set of problems arises takes

practice to get comfortable with but

ultimately this is all leading us toward

the introduction next week of web

programming with HTML CSS and some

JavaScript the week after bringing

Python and SQL back into the mix so that

by terms in we've really now used all of

these different languages for what

they're best at and over the next few


weeks the goal is to make sure you're

understanding and comfortable with what

things each of these things is good and

bad for let's go ahead and wrap here

I'll Stick Around for questions we'll

see you next time

foreign

[Music]

foreign

[Music]

this is cs50 and this is already week

eight and if we think back to like the

past several weeks now recall that

things started pretty interestingly

pretty interactively in like week zero

when we were using scratch because with

scratch we had a GUI a graphical user

interface so even as we explored

variables and loops and conditionals and

all of that you had kind of a fun

environment in which to express those

ideas and then in week one we sort of

took a lot of that away when we

introduced c and a terminal window and a

command line because now all of your

programs became very textual uh very

keyboard based and gone was the mouse

the animations the menus and so forth

and so now fast forward to week eight

we're gonna bring those kinds of user


interface UI elements back in the form

of web programming and this goes beyond

just laying out websites this will to

this week and next week combine elements

of the back end server stuff that we've

been doing for the past several weeks

using python using SQL and now

introducing a couple of other languages

on the so-called client side on your own

Mac your own PC your own phone that's

going to talk to those back-end services

so indeed at this end of cs50 does

everything rather come together into a

user interface that's just super

familiar all of us are on our phones

desktops laptops every day and

increasingly even the mobile apps that

you all are using are implemented not

necessarily in languages like Swift or

Java if you're familiar with those but

with languages called HTML CSS and

JavaScript which we'll focus on here

today but before we do that let's kind

of provide a foundation on which these

apps can run because indeed we'll start

to look underneath the hood of how the

internet itself Works albeit quickly so

that we have kind of a mental model for

where all of this code is running how


you can troubleshoot issues and how

really ultimately after cs50 you can

learn by just poking around other actual

websites so the internet we're all on it

literally right now what is it in your

own words

what is the internet

[Music]

it's this utility nowadays that we all

rather take for granted how would you

describe it

okay big storage and indeed that's how

the cloud is described which is kind of

an abstraction if you will for a whole

lot of wires and cables and Hardware

and the internet other formulations of

the term

how else

okay a bunch of data that we can all

reach by way of being interconnected

somehow with wires or wirelessly and so

really the internet too is is a hardware

thing there's a whole lot of servers out

there that are somehow interconnected

via physical cables via internet service

providers via Wireless connectivity and

the like and once you start to have

networks of networks of networks you get

the internet indeed Harvard has its own

network and Yale has its own network and


your own home probably has its own

network but once you start connecting

those networks do you get the

interconnected Network that is the

internet as we now know it so there's

this whole alphabet soup that goes with

the internet some of whose acronyms and

terms you've probably seen before but

let's at least peel back some of those

layers and consider what some of the

building blocks are so here's a picture

of the internet before it was known as

the internet back in 1969 when it was

something called arpanet from the

advanced research projects agency and

the intent originally was just to

interconnect a few universities here in

Utah and California literally servers or

computers on each of those in each of

those areas somehow interconnected with

wires so that people could start to

share data a year later it expanded to

include MIT and Harvard and others and

now fast forward to today you have a

huge number of systems around the world

that are on this same network and in

fact if I just pull up a web page here

that's sort of constantly changing a

visualization of the internet as it


might now be today this here in the

abstract all of these lines and

interconnections represent just how

interconnected the world is today and it

just means that there's all the more

servers all the more cabling all of the

more Hardware giving us this underlying

infrastructure but if we focus really on

just these nodes these individual dots

whether Back in 1970 or now in 2021 each

of these dots you can think of as yes a

server but a certain type of server

namely known as a router and a router as

the name implies just routes data left

to right top to bottom from one point to

another and so there's all these servers

here on campus at Harvard on Yale's

campus in Comcast network for Verizon's

Network your own home network you have

your own routers out there whose purpose

in life is to take in data and then

decide should I send it this way or this

way or this way so to speak assuming

there are multiple options with multiple

cables you and your home probably have

just one cable coming in or going out

but certainly if you're a place like

Harvard or Yale or comcast or the like

there's probably a whole bunch of

interconnect interconnections that the


data can then travel across ultimately

so how do we get data among these

routers for instance if you want to send

an email to someone at Stanford in

California from here on the East Coast

or if you want to visit

www.stanford.edu how does your laptop

your phone your desktop actually get

data from point A to point B well

essentially your laptop or phone knows

when it boots up at the beginning of the

day what the local router is what the

address of that local router is so if

you want to send an email from like my

laptop over here my laptop is

essentially going to hand it to the

nearest Harvard router and then from

there I don't know I don't care how it

gets the rest to the distance but

hopefully within some small number of

steps later Harvard's router is going to

send it to maybe Boston's router is

going to send it to California's router

is going to send it to Stanford's writer

until finally it reaches Stanford's

email server and we can depict this

actually how about a bit playfully

thankfully the course of Staff kindly

volunteer to create a visualization for


this using a familiar technology so here

we have some of our TFS and Tas and Cas

present and passed let me go ahead and

full screen this window here give me

just a moment to pull it up on my screen

here and we'll consider what happens if

we want to send a packet of information

from one person or router namely Phyllis

in this case in the bottom right hand

corner up to Brian in this case in the

top left hand corner so each of the

staff members here represents exactly

one of these routers on the internet

oh

[Music]

[Applause]

[Music]

[Applause]

my God

the Applause is appreciated it actually

took us a significant number of attempts

to get that ultimately right so when

what was it the staff were all passing

here here we have just physically what

it was the staff were passing around so

Phyllis started with an envelope inside

of which was that email presumably on

the east coast and she wanted to send it

to Brian on the west coast top left-hand

corner and so she had all of these


different options different connections

between her and point B namely Brian she

could go up down uh in her case and then

each of those subsequent routers could

go up down left or right until it

finally reaches Brian and long story

short there's algorithms that figure out

how you decide to send a packet up down

left or right so to speak but they do so

by taking an input and in the form of

input is this envelope and there's at

least a couple of things on the outside

of this because all of these routers and

in turn all of our Macs and PCs and

phones these days speak something called

tcpip a set of acronyms you've probably

seen somewhere on your phone your Mac or

PC and print somewhere which refers to

two protocols two conventions that

computers use to intercommunicate these

days now what's a protocol a protocol is

like a set of rules that you behave in

healthier times I might extend my hand

and someone like Carter might extend his

hand thereby interacting with me based

on a human protocol of like literally

physically shaking hands nowadays we

have mask protocols whereby what you

need to do is wear a mask indoors but


that too is just a set of rules that we

all follow and adhere to that some were

standardized and documented so computers

use protocols all the time to govern how

they are sending information and

receiving information in TCP and IP are

two such protocols that standardize this

as follows what tcpip tells someone like

Phyllis to do is if she wants to send an

email to Brian is put the inmo in a

virtual envelope so to speak but on the

outside of that virtual envelope put

Brian's unique address and I'll describe

this as destination on the middle of the

envelope just like in our human world

you would write the destination address

on the envelope and then she's going to

put her own source address in the top

left hand corner just like you the

sender would put your own source address

in the human world but instead of these

addresses being like something Kirkland

streets Cambridge Massachusetts 02138

USA you probably know that computers on

the internet have unique addresses of

their own known as IP addresses and an

IP address is just a numeric identifier

on the internet that allows computers

like Phyllis and Brian to address these

envelopes to and from each other and


you've probably seen the format at some

point typically the format of IP

addresses is something dot something dot

something dot something each of those

some things represented here with the

hash symbol is a number from zero

through

[Music]

255 and based on that little hint if

each of these hashes represents a number

from 0 to 255 each of those hashes is

represented with how many bytes or bits

eight bits or one byte which is to say

we can extrapolate from that there an IP

address must use 32 bits or 4 bytes if

we rewind now to some of the Primitives

we looked at in week zero and what that

means is at least At a Glance it looks

like we have 4 billion some odd IP

addresses available to us now

unfortunately there's a huge number of

humans in the world these days all of

whom have many of whom have multiple

devices certainly in places like this

where you have a laptop and a phone and

you have other Internet of Things type

devices all of which need to be

addressed so there's another type of IP

address that's starting to be used more


commonly this is version 4 of Ip there's

also version 6 which instead of 32 bits

uses 128 bits which gives us a crazy

number of possible addresses for

computers so we can at least handle all

of the more the additional devices we

now have today so this is to say what

ultimately is going on this envelope is

the destination address that is Brian's

IP address and the source address that

is phyllis's IP address so that this

packet can go from point A to point B

and if need be back by just flipping

this source and the destination but on

the internet you presumably know that

there's not just email servers there's

web servers there's chat servers video

servers Game servers like there's all of

these different functions on the

internet nowadays and so when Brian

receives that envelope how does he know

it's an email versus a uh web page

versus a Skype call versus something

else altogether well it turns out that

we can look at the other part of this

acronym the TCP in tcpip and what TCP

allows us to do for instance is specify

a couple of things one the type of

service whose data is in this envelope

that is it does this with the new a


numeric identifier and I'm going to go

ahead and write down a colon and the

word port p-o-r-t and I'm going to write

that in the source address too colon

import so technically now what's on this

envelope is not just the addresses but

also a unique number that represents

what kind of service has is is being

sent from point A to point B whether

it's email or web traffic or Skype or

something else these numbers are

standardized and here are just two of

the most common ones not even the

context of email but in the context of

the web Port 80 is typically used

whenever an envelope contains a web page

or a request therefore or the number 443

when that request is actually encrypted

using that thing you probably know in

URLs known as https where the S

literally means secure more on what the

http means later if it's email the

number might be 25 or 465 or 587 like

these are the kinds of things you Google

if you ultimately care about but if

you've ever had to configure like

Outlook or even Gmail to talk to another

account you might very well have seen

these numbers by typing in something


like

smtp.gmail.com and then a number which

is only to say these numbers are

omnipresent but they're typically not

things you and I have to care about

because servers and computers nowadays

automate much of this process but that's

all it takes ultimately for Phyllis to

get the this message to Brian

but what if it's a really big message if

it's a short email it might fit

perfectly in one single packet so to

speak but suppose that Phyllis wants to

send Brian a picture of a cat like this

or worse a video of a cat it would be

kind of inequitable if no one else could

do anything on the internet just because

Phyllis wants to send Brian a really big

picture a really big video of a cat it

would be nice if we could kind of

timeshare the interconnections across

these routers so that we can give a

little bit of time to Phyllis a little

bit of time to someone else a little bit

of time to someone else so that

eventually phyllis's entire cat gets

through the internet but in terms of uh

in terms of fairness she doesn't

monopolize the bandwidth of the network

in question and this then allows us to


do one other feature of tcpip which is

fragmentation where we can temporarily

and phyllis's computer would do this

automatically fragment the big packet in

question or the big file in question and

then use not just a single envelope but

maybe a second a third and a fourth or

more

if we do that though we're probably

going to need one other piece of

information just logically on these

envelopes like if you were implementing

this chopping up this picture of a cat

into four parts like intuitively what

might you want to put virtually on the

outside of this envelope now

yeah

the order of them somehow so probably

something like part one of four part two

of four part three of four and so forth

so I'm gonna write one more thing in

like the memo line of the envelope here

I put some kind of sequence number

that's just a little bit of a clue to

Brian to know in what order to

reassemble these things and even more

powerfully than that this actually gives

us this simple primitive of just using

ins on these envelopes in these packets


if Brian receives envelopes like these

with numbers like these in the memo

field what other feature does TCP

apparently enable Brian and Phyllis to

implement this is a bit subtle

but it's not just the ordering of the

packets what else might be useful about

putting numbers on these things might

you think

what might be useful here yeah I'm back

if you missed something that was

intended to be sent if I heard that

correct so shorter answer exactly yes so

TCP because of this simple little

integer that we're including can quote

unquote guarantee delivery why because

if Brian receives one out of four two

out of four four out of four but not

three out of four he now knows

predictably that he needs to ask Phyllis

somehow to resend that packet and so

this is why pretty much always if you

receive an email you either receive the

whole thing or nothing at all like

sentences and words and paragraphs

should never really be missing from an

email or if you download a photograph on

the web it shouldn't just have a blank

hole in the middle just because though

that packet of information happened to


be lost TCP if it is the protocol being

used to transmit data from point A to

point B ensures that it either all gets

there or ultimately none of it at all so

this is an important property but just

as a teaser there's other protocols out

there there's something called UDP which

is an alternative to TCP that doesn't

guarantee delivery and just as a taste

of why you might ever not watch to

guarantee delivery maybe you're watching

like a streaming video like a sports

event online you probably don't

necessarily want the thing to buffer and

buffer and buffer just because you have

a slow connection because you're going

to start to miss things and then you're

going to be only one in the world

watching the game that ended 20 minutes

ago when everyone else is sort of up to

speed similarly for a voice call be

really annoying if our voice is

constantly buffered so UDP might be a

good protocol for making sure that even

if the person on the other end sounds a

little crappy at least you can hear them

it's not pausing and resending and

resending because that would really slow

down that sort of human interaction so


in short IP handles the addressing of

these packets and standardizes numbers

that every computer your own included

gets and TCP handles the standardization

of like what services can be used

uh between points a and point B

all right this is great but presumably

when Phyllis sends a message to Brian

like she doesn't really know and

probably shouldn't care what his IP

address is right these days it's like I

don't know most of the phone numbers

that my friends have I instead look them

up in some way and indeed when you visit

a website what do you type in it's

typically not something not something

not something not something where each

of those somethings is a number what do

you typically type into a browser

so a domain name right something like

stanford.edu harvard.edu yale.edu

gmail.com or any other such domain name

and so thankfully there's another system

on the internet one more acronym for

today called DNS domain name system and

pretty much every Network on the

internet harvards yales Comcast your own

home network somewhere somehow has a DNS

server you probably didn't have to

configure it yourself someone else did


your campus your job your internet

service provider but there's some server

connected somehow to the network you're

on Via wires or wirelessly that just has

a really big table in its memory a big

spreadsheet if you will or if you prefer

a hash table that has at least two

columns of keys and values respectively

where on the left hand side is what

we'll call domain name something like

harvard.edu yale.edu an IP address on

the right hand side that is to say a DNS

server's purpose in life is just to

translate domain names to IP addresses

and vice versa if you want to go in the

other direction and technically just to

be precise it translates fully qualified

domain names to IP addresses and we'll

see what those are in just a moment but

again all of this just kind of happens

magically when you turn on your phone

your laptop today because all of these

things are pre-configured for us

nowadays so how can we actually start to

see some of these things in action

well let's go ahead and and poke around

for instance at a couple of URLs here

let's see what we can actually do now

with these basic Primitives if we now


have the ability to move data from point

A to point B and what can be in that

envelope could be yes an email but today

onward it's really going to be web

content there's going to be content that

you're requesting like give me today's

homepage and there's content you're

sending which would be the contents of

that actual home page and so just to go

one level deeper now that we have these

packets that are getting from point A to

point B using TCP IP let's put something

specific inside of them not just an

email and a bunch of text but something

called HTTP which stands for hypertext

transfer protocol you've seen this for

decades now probably in the form of URLs

so much so that you probably don't even

type it nowadays your browser just adds

it for you automatically and you just

type in harvard.edu or yale.edu or the

like but HTTP is just a final protocol

that we'll talk about here that just

standardizes how web browsers and web

servers intercommunicate so this is a

distinction now between the internet and

the web the internet is really like the

low level Plumbing all of the cables all

of the technology that just moves

packets from left to right right to left


top to bottom that gets data from point

A to point B you can do anything you

want on top of that internet nowadays

email and web and video and chat and

gaming and and all of that so HTTP or

the web is just one application that is

conceptually on top of built on top of

the internet once you take for granted

that there is an internet you can do

really interesting things with it just

like in our physical world once you have

electricity you can just assume you can

do really interesting things with that

too without even knowing or caring how

it works but now that you'll be

programming for the web it's useful to

understand how some of these things

indeed work so let's take a peek at

the format of the things that go inside

of these messages these days it's

usually actually https that's in play

where again the S just means secure more

on that later but the HTTP is what

standardizes what kinds of messages go

inside of these envelopes and

wonderfully it's just textual

information typically there's a simple

text format that humans decided on years

ago that goes inside of these envelopes


that tells a browser how to request

information from a server and how to

respond from the server to that client

with information so here's for instance

a canonical URL https colon slash

www.example.com what might you see at

the end of this you might sometimes see

a slash browsers nowadays kind of

simplify things and don't show it to you

but slash as we'll see just represents

like the default folder the root of the

web server's hard drive like whatever

the base is of it it's like C colon

backslash on Windows or it's uh you know

my computer on Mac OS

but a URL can have more than that it can

have slash path where path is just a

word or multiple words they sort of

describe a longer part of the URL that

path could actually be a specific file

we'll see like something called

file.html more on HTML in just a bit or

it can even be slash folder maybe with

another slash or maybe it can be slash

folder slash file.html now these days

Safari and even Chrome to some extent

and other browsers are in the habit of

trying to hide more in these more and

more of these details from you and I

from you and me ultimately though we'll


it'll be useful to understand what URLs

you're at because it Maps directly to

the code that we're ultimately going to

write but this is only to say that all

this stuff in yellow refers to

presumably a specific file and or folder

on the web server on which you're

programming all right what's this

example.com this is the domain name as

we described it earlier

example.com is the so-called domain name

this whole thing

www.example.com is the fully equal

domain name and what the WW is referring

to is specifically the name of a

specific server in that domain so back

in the day there was a

www.example.com web server there might

have been a mail.example.com mail server

there might have been a chat.example.com

chat server nowadays this host name or

subdomain depending on the context can

actually refer to a whole bunch of

servers right when you go to

www.facebook.com that's not one server

that's thousands of servers nowadays so

long story short there's technology that

somehow get your data to one of those

servers but this whole thing is what we


meant by fully qualified domain name

this thing here hostname in the context

of an email address it might

alternatively be called a sub domain

this thing here top level domain

probably know that.com means commercial

although anyone can buy it these

days.org similar.net some of them are a

bit restricted dot mil is just for the

US military.edu is just for credit

educational institutions but there are

hundreds if not more top level domains

nowadays some more popular than others

cs50s tools for instance use cs50.io IO

sort of connotes input output it

actually is a belongs though to a small

island nation a country whose country

code is dot IO and you see other two

letter uh top level domains that are

country specific indeed uh it's

something.uk something.jp and the like

typically refer to countries but some of

them have been rather co-opted dot TV as

well because they have these meanings in

English as well lastly this is what

we'll call the protocol that specifies

how the server uses this URL to get data

from point A to point B so what is

inside of this envelope let's now start

poking around a little bit more what is


inside of this envelope it's essentially

for our purposes today one of two verbs

either get or post and any of you have

dabbled with HTML or made your own

website you might have seen some of

these terms before but these two verbs

describe drive just how to send

information from you to the server long

story short more on this next week get

means put any user input in the URL post

means hide it so that things you're

searching for credit card numbers you're

typing in usernames and passwords are

inputting don't show up in the URL and

are therefore visible to anyone with

access to your computer in your search

history but rather they're somehow

provided elsewhere deeper into that

envelope but for now we'll focus almost

entirely on get which is perhaps the

most common one that we're always going

to use and what we're going to do is

this let me switch over just to a blank

screen here and if we assume that little

old me is this laptop here

and I'm connected to the cloud and in

that cloud is some server that I want to

request the webpage of harvard.edu or

yale.edu it's really going to be a


two-step process there's going to be a

request that goes from point A to point

B and then hopefully the server that

hears that request is going to reply

with what we'll typically call a

response and other terms that are

relevant here is my laptop is the

so-called client harvard.edu yale.edu

whatever it is is the so-called server

and just like in a restaurant where you

might request something to eat the

server might bring it to you it's again

that kind of bi-directional relationship

one request one response for each such

web page we request

all right so what's inside these

envelopes and what do we actually see

well this Arrow this line I just drew

from left to right representing their

request technically looks a little more

like this

when you visit a web page using your

browser on your phone laptop or desktop

what's going inside that envelope and

the textual message your Mac or PC or

phone is automatically generating looks

a little something like this the verb

get the URL or rather the path that you

want to get slash represents the default

page on the website HTTP 1.1 is just


some mention of what version of HTTP

you're speaking now we're up to versions

two and version three but 1.1 is quite

common and the envelope contains some

mention of the host that was typed in

the fully qualified domain name this is

because single servers can actually host

many different websites if you're using

Squarespace or Wix or one of these

popular hosting websites nowadays you

don't get your own personal server most

likely you're on the same server as

dozens hundreds of other customers but

when your customers your users browsers

include a little mention of your

specific fully qualified domain name in

the envelope Squarespace and Wix just

know to send it to your web page or my

webpage or or some other customer

altogether dot dot dot there's some

other stuff there but

that sense of what's in these response

requests

hopefully then when your browser

requests this webpage from the server

what comes back well hopefully a

response that looks like this HTTP 1.1

so the same version some status code

like a number 200 and then literally a


short phrase like okay which means

exactly that like okay this request was

satisfied then it contains some other

information like the type of content

that's coming back and we'll see that

this too is standardized text HTML means

here comes some HTML which is just a

text language it could instead be image

slash jpeg or image slash ping or video

slash MP4 there are these different

content types otherwise known as mime

types that uniquely identify types of

files that come back similar in spirit

to file extensions but a little more

standardized this way then there's more

some more stuff dot dot but in general

what you see here are our familiar

pattern keys and values these keys and

values are otherwise known as HTTP

headers and your browser has been

sending these every time you visit a

website and indeed we can see this right

now ourselves let me go over in just a

second to Chrome on my computer though

you can do this kind of thing with most

any browser today I'll go ahead and

visit HTTP colon slash harvard.edu enter

and voila I'm Ed Harvard's homepage for

today the content often changes but this

is what it looks like right now well I


typed in the URL but notice it changed a

little bit it actually sent me to https

and it added www even though I didn't

type that but it turns out we can poke

around at what my browser is actually

doing let me open another page and I'm

going to start to use incognito mode

this time not because I care that people

know I'm visiting harvard.edu but

because it throws away any history that

I just did so that every request is

going to look like a brand new one and

that's just useful diagnostically

because we're always going to see fresh

information my browser's not going to

remember what I previously already

requested but I'm going to go up to view

developer developer tools which is

something that all of you have if you

use Chrome and there's something

analogous for Firefox and Edge and

Safari and other browsers developer

tools is going to open up these tabs

down here I don't really care what's new

so I'm going to close the bottom thing

there and I'm going to hover over the

network tab for just a moment and now

I'm going to go and say HTTP colon slash

harvard.edu so the shorter version I'm


going to hit enter

and a whole bunch of stuff just flew

across the screen and it's still coming

in and if I zoom in down here my God

visiting harvard.edu still going is

downloading what 17 18 19 megabytes 20

megabytes millions of bytes of

information over 111 HTTP requests in

other words a bit of a simplification

but my browser unbeknownst to me sent

one envelope initially with the request

then the server said okay by the way

there's 110 other things you need 112

other things you need to get so my

computer went back and forth requesting

even more content for me why well inside

of Harvard's web page is a whole bunch

of images and maybe sound files and

videos and other stuff that all need to

be downloaded and to compose what is

ultimately the web page but I don't care

about like 100 plus of these things

let's focus on the very first one first

the very first request I sent was up

here and I'm going to click on this row

under the network Tab and then I'm going

to see a bit of diagnostic information

to an average person using the web they

needn't care about this just as you

probably didn't care about until right


now and even then perhaps not but if I

scroll down to request headers you will

see if I click view Source literally

everything that was in the request my

Mac just sent to harvard.edu two of the

lines are familiar get slash HTTP 1.1

host colonharvard.edu and then other

stuff that for now it's not that

interesting for us but let's look at the

response that came back from the server

I'm going to scroll up now

and see response headers view source and

this is interesting it is not okay

there's no 200 there's no word okay

curiously harvard.edu has moved

permanently

what does that mean well there's a whole

bunch of stuff here that's not that

interesting for us but this line

location is interesting this is an HTTP

header a standardized key value pair

that's part of the HTTP protocol that is

conventions and if I highlight just this

one it's telling me Harvard is not a

HTTP colon harvard.edu Harvard's website

is now and perhaps Forever at https

colon

www.harvard.edu so what's the value here

probably someone at Harvard wants you to


use a secure connection so they

redirected you from HTTP to https maybe

the marketing people want you to be at

www instead of justharvard.edui just to

standardize things but there are

technical reasons to use a host name and

not just the raw domain name and all

this other stuff is sort of

uninteresting for our purposes now

because a browser that receives a 301

response Knows by design by the

definition of HTTP to automatically

redirect the user and that's why in my

browser all of this happened in like a

split second because I didn't really

know or care about all of those headers

but that's why and how I ended up at

this URL here my browser was told to go

elsewhere via that new location and the

browser just followed those breadcrumbs

if you will at which point it downloaded

all of the other images and files and so

forth that compose

this particular page

well let me let me zoom out and let me

actually go into vs code if only because

it's a little more pleasant to do things

in just a terminal window without

actually using a full-fledged browser so

now let's just use an equivalent program


it's called curl for connecting to a URL

that's going to allow me to play with

websites and just see those headers

without bothering to download all the

images and text and so forth from the

website it's going to allow me to do

something like this let me go ahead and

run for instance curl Dash I Dash X get

which is just the command line argument

that says simulate a get request

textually as though you're a browser and

let's go to http colon slash

harvard.edu enter now by way of how curl

works I'm just seeing the headers it

didn't bother downloading the whole

website and you see exactly the same

thing 301 move permanently location is

indeed this one here so that's kind of

interesting but let's follow it manually

now let's now do what it's telling me to

do let's go to the location with https

and the www and hit enter and now

what's a good sign with this output most

of it's irrelevant

200 okay that means I'm seeing

presumably if I were using a real

browser the actual content of the web

page looks like Harvard's version of

HTTP is even newer than my the one I'm


using it's using HTTP version 2 which is

fine but 200 is indeed indicative of

things being okay well what if I try

visiting some bogus URL like harvard.edu

and this file does not exist something

completely random probably doesn't exist

and hit enter

what do you see now that's perhaps

familiar in the real world yeah

yeah error 404 all of us have seen this

probably endlessly from time to time

when you screw up by mistyping a URL or

someone deletes the web page in question

but all that is is a status code that a

browser is being sent from the server

that's a little clue as to what the

actual problem is underneath the hood so

instead of getting back for instance

something like okay or move permanently

what I've just gotten back quite simply

is 404 not found well it turns out

there's other types of status codes that

you'll start to see over time as you

start to program for the web 200 is okay

301 is moved permanently 302 304 307 are

all similar in spirit they're related to

redirecting the user from one place to

another

401 403

unauthorized or forbidden if you ever


mess up your password or you try

visiting a URL you're not supposed to

look at you might see one of these codes

indicating that you just don't have

authorization for those 404 not found

418 I'm a teapot was a April Fool's joke

by the community Tech Community years

ago 500 is bad and unfortunately all of

you are probably on a path now to

creating HTTP 500 errors once next week

we start writing code because all of us

are going to screw up we're going to

have typos logical errors and this is on

the horizon just like seg faults were in

the world of C but solvable with the

right skills 503 service unavailable

means maybe the server is overloaded or

something like that and there's other

codes there but those are perhaps some

of the most common ones

um has anyone I can we can get away with

this here less so in New Haven has

anyone um ever visited uh

safetyschool.org

HTTP colon slash slash safety School dot

org there we do this enter

oh look at that where did we end up

Okay so

so this has been like a joke for like 10


or 20 years someone out there has been

paying for the domain name

safetyschool.org just for this two

second demonstration but we can now

infer how did this work the person who

bought that domain name and somehow

configured DNS to point to like their

web server the IP address of their web

server what is their web server

presumably spitting out whenever a

browser requests the page

what status code perhaps well we can

simulate this let me go over to vs code

let me go back over here let me increase

my terminal window let me do curl Dash I

Dash X get HTTP colon slash safe T

School dot org enter and that's all this

website does there's not even an actual

website there no HTML no CSS languages

we're about to see it literally just

exists on the internet to do that

redirect there in fairness

um there are others let me actually do

another one here instead of

safetyschool.org

turns out someone some years ago bought

Harvard sucks dot org enter

and when we do this you'll see that oh I

they don't need us to be secure but I do

need the the www let's do this one enter


oh that one is not found this demo

actually worked for so many years but

someone has stopped paying for the

Squarespace discount recently apparently

so

okay

so fortunately we did save the YouTube

video to which this thing refers and so

just to put this into context since it's

been quite a few years Harvard and Yale

of course have this long-standing

rivalry uh there is this tradition of

pranking each other and honestly hands

down one of the best pranks ever done in

this rival was by Yale to Harvard it's

about a three minute retrospective it's

one of the earliest videos I dare say on

YouTube so the quality is uh

representative of that but let me go

ahead and full screen my you my page

here and what used to live at

harvardsucks.org is this video here if

we could dim the lights for about three

minutes

foreign

[Music]

[Applause]

pass them down

[Music]
[Applause]

we're nice

[Music]

[Applause]

what do you think of Yale they don't

think good

dude

is there another stuff

[Music]

what houses sometimes

[Applause]

all right now

[Applause]

[Music]

[Applause]

[Applause]

one more time

[Applause]

oh there it goes again

[Applause]

[Applause]

all right so thanks to our friends at

Yale for that one

let's go ahead here and consider in just

a moment what further is deeper down

inside of the envelope because we now

have the ability to get data from oh

okay oh YouTube autoplay again gotta

stop doing that let's consider for just

a moment that
let's consider for just a moment that we

now have this ability to get data from

point A to point B and we have the

ability to specify in those envelopes

what it is we want from the website we

want to get the home page we want to get

back the HTML but what is that HTML in

fact we don't yet have the language with

which the web pages themselves are

written namely HTML and CSS but let's go

ahead and take a five minute break here

and when we come back we'll learn those

two languages

all right we are back so we got three

languages to look at today but two of

them are not actually programming

languages what makes something a

programming language like C or Python

and SQL is that there are these

contracts via which you can express

conditionals you might have variables

you might have looping constructs you

have the ability ultimately to express

logic HTML and CSS aren't so much about

logic as they are about structure and

the Aesthetics of a page and so we're

going to create like the skeleton of a

web page using this pair of languages

HTML and CSS and then toward the end of


the today we'll introduce an actual

programming language that actually is

pretty similar in spirit and

syntactically to both C and python but

that's going to allow us to make these

web pages not just static things that

you look at but interactive applications

as well and then next week again in week

nine will we reintroduce Python and SQL

tie all of this together so that you can

actually have a browser or a phone

talking to a back-end server and

creating the experience that you and I

now take for granted for most any app or

website today well let's go ahead and do

this let's quickly whip up something in

this language called HTML I'm in vs code

here I'm going to go ahead and create a

file quite simply called

hello.html the convention is typically

to end your file names in HTML and I'm

going to go ahead and bang this out real

quick but then we'll more slowly step

through what the constructs are here in

so I'm going to say Doc type HTML Open

Bracket HTML and then notice I'm going

to do Open Bracket slash HTML close

bracket and I'm leveraging a feature of

vs code and programming environments

more generally to do a bit of


autocomplete so you'll see that there's

this symmetry to much of what I'm going

to type but I'm not typing all of these

things vs code is automatically

generating the end of my thought for me

if you if you will let me go ahead and

say uh open the head tag open the title

tag I'll say something cute like hello

title and then down here I'm going to

create the body of this web page and say

something like hello body and let me

specify at the very top that all of this

is really in English Lang equals Yen so

at this moment I have a file in my vs

code environment called hello.html vs

code as we're using it of course is

cloud-based we're using it in a browser

even though you can also download it and

run it on a Mac and PC so we're kind of

in this weird situation where I'm using

the cloud to create a web page and I

want that web page to also live in the

cloud that is on the internet

but the thing about vs code or really

any website that you might use in a

browser by default that website is using

probably TCP port number 80 or TCP port

number 443 which is HTTP and https

respectively but here I am sort of a


programmer myself trying to create my

own website on an existing website so

it's a bit of a weird situation but

that's okay because what's nice about

TCP is that you and I can just pick port

numbers to use and run our own web

server on a web server that is we can

control the environment entirely by just

running our own web server via this

command HTTP

server in my terminal window this is a

command that we pre-installed in vs code

here and you'll notice a pop-up just

came up your application running on port

8080 is available that's a commonly used

TCP port number when 80 is already used

and 443 is already used you can run your

own server on your own port 8080 in this

case I've opened that tab in advance and

if I go into another browser tab here

here I see a so-called directory listing

of the web server I'm running so I don't

see any of my other files I don't see

anything belonging to vs code itself I

only see the file that I've created in

my current directory called hello.html

and so if I click on this file now I

should see Hello body I don't see the

title but that's because the title of a

web page nowadays is typically embedded


in the tab and if I'm full screen in my

browser there are no tabs so let me

minimize the window a bit and now you

can see just in this single browser

window and my own URL here that hello

body is in the top left hand corner and

if I zoom in there's hello title so what

have I done here I have gone ahead and

created my own web page in HTML in a

file called

hello.html and then I have opened up

a web server of my own configured it to

listen on TCP port 8080 which just says

to the internet Hey listen for requests

from web browsers not on the standard

port number 80 or 443 listen on 8080 and

this means I can develop a website using

a web-based tool like this one here

which is increasingly common today all

right so now let's consider what it is I

actually just typed out HTML is

characterized really by just two

features two vocab words tags and

attributes most of what I just typed

were tags but there was at least one

attribute already here's the same source

code that I typed out in HTML from top

to bottom let's consider what this is

the very first line of code here doctype


HTML is the only anomalous one it's the

only one that starts with an Open

Bracket a less than sign and an

exclamation point there's no more

exclamation points thereafter for now

this is the document type declaration

which is a fancy way of saying it's just

got to be there nowadays it's like a

little breadcrumb at the beginning of a

file that says to the browser I am you

are about to see a file written in HTML

version in five that line of code has

changed over time over the years the

most recent version of it is nice and

succinct like this and it's just a clue

to the browser as to what version of

HTML is being used by you the programmer

all right what comes after that well

after that and I've highlighted two

things in yellow this is what we're

going to start calling an open tag or a

start tag Open Bracket HTML then

something close bracket is the so-called

start or open tag then the corresponding

close or end tag is down here and it's

almost the same you use the same tag

number use the same angle bracket but

you do add a slash and you don't repeat

yourself with any of the things called

attributes because what is this thing


here Lang equals quote unquote en means

the language of my page is written in

the English language the humans have

standardized two and three letter codes

for every human language right now and

so this is just a clue to the browser

for like automatic translation and

accessibility purposes what language the

web page itself is and not the tags but

the words like hello title and hello

body which while minimalists are indeed

in English so when you close a tag you

close the name of it with the Slash and

the angle brackets you don't repeat the

attribute that would just be annoying to

have to type everything again but notice

the pattern here it's new syntax but

this is another example of key value

pairs in Computing the key is Lang the

value is en for English the attribute is

called Lang the value is called it is en

so again it's just key value Pairs and

just yet another context probably the

browser is using a hash table underneath

the hood to keep track of this stuff

like a two column table with keys and

values again humans keep using the same

Paradigm in different languages what's

inside of that the nesting is important


visually not to the computer but to us

the humans because it implies that

there's some hierarchy here and indeed

what is inside of the HTML tag here well

we have what we'll call the head tag the

head tag says Hey browser here comes the

head of the page and then the body tag

says Hey browser here comes the body of

the page the body is like 99 of the

user's experience the big rectangular

window the head is really just the

address bar and other such stuff at top

like the title that we saw a moment ago

just to introduce the vernacular then

the HTML tag otherwise known as an

element has two children the head child

and the body child which is to say that

head and body are now siblings so you

can use the same kind of family tree

terminology that we used when talking

about trees weeks ago if we look at the

head tag how many children does it seem

to have

I'm saying one and indeed at least if we

ignore all the white space the the

spaces or tabs or new line characters

there's just one child a title element

and an element is the terminology that

includes the start tag and the end tag

and everything in between so this is the


title element and the title element has

one child which is just pure text

otherwise known as a text node recall

node from our discussions of data

structures weeks ago if we jump into the

body which is the other child of the

HTML tag it too has one child which is

just another chunk of text a text node

that says quote unquote hello body

what's nice about this indentation even

though the browser technically is not

going to care is that it implies this

kind of structure and this is where we

connect like weeks five and now weeks

eight here is the tree structure we

began to talk about even in our world of

C it's not a binary tree even though

this one happens to have no more than

two children it's an arbitrary tree that

can have zero or any number of children

but if we have a special note here that

refers to the document the root node so

to speak is HTML drawn with a rectangle

here just for discussion sake it has two

children head and body also rectangles

head has a title child and then it and

body have text nodes which I've drawn

with ovals instead which is only to say

that when your browser Chrome Safari


whatever downloads a web page opens up

that envelope and sees the contents that

have come back from the server it

essentially reads the code that someone

wrote the HTML code top to bottom left

to right and creates in the browser's

memory in your Mac or your PC or your

phone's memory or Ram this kind of data

structure that's what's going on

underneath the hood and that's why

aesthetically it's just nice as a human

to indent things stylistically because

it's very clear then to you and to other

programmers what's the structure

actually is so that's it for like the

fundamentals of HTML we'll see a bunch

of tags and a bunch of examples now but

HTML is just tags and attributes and

it's the kind of thing that you look

them up when you need to eventually many

of them get ingrained I constantly

checked the reference guides or stack

Overflow if I'm trying to figure out how

do I lay something out it's really just

these building blocks that allow you to

assemble the structure of a web page

this one's being super simple but it's

just tags and attributes any questions

on this framework before we start to add

more tags more vocabulary if you will


and in the middle yeah

[Music]

if we put the Tello tag around body uh

that's a good question let's try it so

let me actually go to

uh this and say Open Bracket title

whoops sometimes you don't want it to

finish your thought for you but it did

that time I've gone ahead and uh changed

the file let me go and open up give me a

second to open my terminal window and go

back to the URL that has my page

give me a second

there's my hello.html let me zoom in on

this let me zoom in on this and let me

go ahead now and click on hello.html

and in this case it looks like we don't

actually see anything so the browser is

hiding it technically speaking

browsers tend to be pretty generous and

half the time when you make mistakes in

HTML it will display it might display

not display as you intended it might not

display the same on Macs or PCS or

Chrome or on Firefox there is a tool

though that we'll see that can help

answer this question for you for

instance if I go to

validator.w3.org W3 is the worldwide Web


Consortium a group of people that

standardize this kind of stuff I can

click on validate by direct input and

just copy paste my sample HTML into this

box and click check and I should see

hopefully that indeed it's an error what

you propose that I do the browser just

did its best to do something which was

to show me nothing at least rather than

the incorrect information but if I

revert that change and let me undo what

we just did let me copy my original code

back into this text box and click check

now you can see conversely my code is

now correct and there's automated tools

to check that but we'll encourage you

for problem sets and projects to use

that particular manual tool all right so

let's go ahead and enhance this a little

bit by introducing whole bunch of tags

just to give you a sense of some of the

building blocks here so I'm going to go

ahead and create a new file called

paragraphs.html and I'm just going to do

a bunch of copy paste just to start

things off so I'm not constantly typing

all this darn stuff again and again

because I want everything to be the same

here except I'm going to change my title

to be paragraphs for this demo and


inside of the body I need a whole bunch

of paragraphs of text and I don't really

want to come up with some text so let me

go to some random website here and grab

lorem ipsum text which if you're

involved in like student newspaper or

just design this is just placeholder

text kind of looks like Latin but

technically isn't here though I have a

handy way of just getting three long

paragraphs and something that looks like

Latin and I've put those notice inside

of the body and they're indeed long look

how long the uh the made up words here

are so let me go now into my

browser

tab here

let me reload this page and you'll see

two files have now appeared

paragraphs.html which is my new one and

hello.html let me click on

paragraphs.html and what clearly seems

to be wrong

yeah

yeah it's obviously one massive

paragraph instead of three so that's

interesting but it's just a little hint

as to how pedantic HTML is it will only

do what you say in each of these tags


tells the browser to start doing

something and then maybe stop doing

something like hey browser here comes my

HTML hey browser here comes the head of

my page hey browser here comes the title

of my page hello title hey browser

that's it for the title that's it for

the head here comes the body tag so it's

kind of having this conversation between

the browser between the HTML and the

browser doing literally what it says so

if you want to paragraph you're probably

going to want to use the P tag for

paragraph and I'm going to go ahead and

add this to my code I'm going to keep

things neat even though the browser

won't care by indenting things here let

me create another paragraph tag here

and close it right after that one

indenting again and I'm keeping

everything nice and orderly let me do

one more here

let me indent that and then let me add

it to the end of my page here

so again a little tedious but now I have

three paragraphs of text that say Hey

browser start a paragraph hey browser

stop that paragraph start stop and so

forth let me go back to the browser

window here let me hit command r or


control R to reload the page and voila

now I have three cleaner paragraphs all

right so there's a P tag for paragraphs

so now we have that particular building

block what if I want to add for instance

some headings to this page well that's

something that's possible too let me go

ahead and create a new file called

headings.html let me copy and paste that

same code as before but now let's

preface each paragraph with maybe H1 and

I'm going to just write the word one

and here I'm going to say H2 2 and down

here I might say h33 so this is another

tag another three tags H1 H2 H3 as you

might have inferred by the file name I

chose this gives you headings like in a

chat in a book different chapters or

sections or subsections or in an

academic paper you have different

hierarchies to the text that you're

writing so now that I've added an H1 tag

and the word one h2 tag the word two H3

tag and the word three let's go back to

the browser reload the page again and

voila

once the page reloads

I'll do it with the manual button reload

the page
oh what am I doing wrong

yeah

I'm right I'm not in the headings file

so let me go back a page now there's

headings.html let me click on that okay

now we see some evidence of this again

it's nonsensical content but you can

kind of see that H1 is apparently big

and bold H2 is slightly less big but

still bold H3 is the same but a little

smaller and it goes all the way down to

H6 after that you should probably

reorganize your thoughts but there's six

different hierarchies here as you might

use for chapters sections subsections

and so forth all right so those are

headings as an HTML tag in our

vocabulary what's a common thing too

well let me go to

uh vs code again let me go ahead and get

some boilerplate here create a file

called

list.html let's create a simple list

inside of my body and I'll give this a

title of a list and let me fix the title

of this one to be headings as well so in

list.html suppose I want to have a list

of things uh Foo bar and baz or like a

computer scientist go to words just like

a mathematician might say XYZ fubarbaz


is in list.html let me go back to my

browser

hit the back button there's list.html

and hopefully I'll see Foo bar and baz

one on each line like a nice little list

but of course I do not and this is not

English Chrome thinks it might be Arabic

but that's

um curious too because the Lang

attribute should be overriding that so

Google is trying to override it all

right what's the obvious explanation why

we're seeing Fubar and Bez on the same

line and not three separate ones

we didn't tell it to do that so we need

paragraph tags or maybe something else

turns out there is something else there

is a UL tag for an unordered list in

HTML inside of which you can have Li

tags for list item inside of which you

can put your words so there's my Foo

there's my bar there's my baz and again

notice that vs code is finishing my

thought for me but notice the hierarchy

open UL openli close Li open Li close Li

open Li close Li close UL so what it's

sort of done in reverse order here let

me go back to my browser reload the same

page list.html and voila a default


bulleted list that still seems to be in

Arabic what if I want this list to be

numbered well you can probably guess if

you don't want an unordered list but an

ordered list what type should I use

[Music]

oh well sure so let's try that's not

always that easy as just guessing but in

this case ol is going to do the trick

let me go back to my other browser let

me reload the page and now it's going to

automatically number for me it's a tiny

thing but this is actually useful if you

have a very long list of data and maybe

you might add some things in the middle

the beginning or the edit would just be

annoying to have to go and renumber it

the computer is doing it for us by

instead just a numbering from top to

bottom here all right what about another

type of layout not just paragraphs not

just lists but what about tabular data

you've got some research data you want

to present some financial data you want

to present a phone book that you want to

present how might we go about laying out

data a la a table well let me create a

file called table.html

and I'll just copy paste where we

started earlier let me start to close


some of these other files and in

table.html this is going to be a bit

more HTML but I'm going to go ahead and

do this table and close table tables can

have table headings so T head is the

name of that tag and tables can have t

bodies table bodies so I'm going to add

that tag and this is a common technique

sort of start your thought finish your

thought and then go back and fill in

what's in between what do I want to put

in this table how about a bunch of names

and numbers so for instance like left

column name right column number so let's

create a table row with What's called

the TR tag let's create a table heading

with the th tag and let's say name here

let's create another table heading

called number here and all of that to be

clear is in one table row meanwhile in

the table body let me create another

table row but this time it's not a

heading now I'm in the guts of my table

let's do table data which is synonym

with like the cell of the table in uh

like an Excel spreadsheet or Google

spreadsheet in this TD I'm going to say

like Carter's name and then let's you

grab Carter's number from our past demo


617-495-1000 then let's put me into the

mix and I'll go ahead and copy paste

here which often is not good but we'll

see that there's a lot of shared

structure with HTML let me go ahead and

do mine

949-468-2750 and now save this page so

we're getting to be a lot of indentation

I'm using four spaces by default some

people use two spaces by default so long

as you're consistent that's considered

good style but let me go back to my

browser here and hit back

that then brings me to my directory

listing again here's table.html and this

is not that interesting yet but you can

see that there's two columns name and

number because it's a table heading th

the browser made it bold-faced for me in

their in the table or two rows below

that Carter and David it's a little oh I

forgot my number one sorry about that

one and one it's not the prettiest table

right I feel like I kind of want to

separate things a little more maybe put

some Borders or the like but with HTML

alone I'm really focusing on the

structure alone so we'll make this

prettier soon but for now this is how

you might lay out tabular data all right


let me pause here just to see if there's

any questions but again the goal right

now is just to kind of throw at you some

basic building blocks that again can be

easily looked up in a reference but

we're going to start stylizing these

things soon too

and yeah in the middle

how do you indent paragraphs really good

question for that we're probably going

to want something called CSS cascading

style sheets so let me come back to that

in just a little bit for the stylization

of these things beyond the basics like

big and bold we're going to need a

different language all together

all right well let's now create what the

web is full of which is uh like

photographs and images and the like let

me go ahead and create a new file called

image.html and let me go ahead and

change the title here to be say image

and then in the body of this page let's

go ahead and put an image the

interesting thing about an image is that

it's actually not going to have a start

tag and an end tag because that's kind

of illogical like how can you start an

image and then eventually finish it it's


either there or It Isn't So some tags do

not have n tags so let me do image IMG

Source equals

harvard.jpg and let me go ahead and in

my terminal window I actually came with

a photo of Harvard let me grab this for

just a second

uh let me grab harvard.jpg and put it

into my directory pretend that I

downloaded that in advance and so I'm

referring to now a file called

harvard.jpg that apparently is in the

same folder as my image.html file if

this image were on the internet like

Harvard server I could also say like

https colon slash

www.harbor.edu folder name whatever it

is slash harvard.jpg but if you've in

advance uploaded a file to your own vs

code environment like I did before class

by dragging and dropping this full file

this photo of Harvard you can just refer

to it relatively so to speak this would

be the same thing as saying dot slash

harvard.jpg go to the current directory

and get the file called harvard.jpg but

that's unnecessary to type for

accessibility purposes though for

someone who's vision impaired it's ideal

if we also give this an alternative text


something like Harvard University and

the so-called alt tag and this is so

that screen readers will recite what it

is the photo is for folks who can't see

it and if you're just on a slow

connection sometimes you'll see the text

of what you're about to see before the

image itself downloads especially on a

mobile device so let's now go back to my

open browser tab

and let's look in the directory I now

have harvard.jpg which I downloaded in

advance and image.html let me click on

image.html and here we have a really big

picture of Memorial Hall the building

we're currently in suffice it to say I

should probably fix this and maybe make

it only so wide but to do that we're

going to probably want to use this other

language CSS there are some historical

attributes that you can still use to

control width and height and so forth

but we're going to do it the better way

so to speak with a language designed for

just that how about a video though I

also came prepared with let me grab

another file here let me grab a file

called uh

halloween.mp4 which is an MPEG file and


let me go ahead and

change this now to be a file called

video.html I'll change my title to be

video and let's go ahead and now

introduce another tag a video tag Open

Bracket video and then let me go ahead

and close that tag proactively and then

inside of the video tag you can say the

source of the video is going to be

specifically Halloween dot MP4 the type

of this file I know is video slash MP4

because I looked up its content type or

mime type and the video tag actually has

a few attributes I can have this thing

auto play I can have it Loop forever I

can mute it so that there's no sound

which is necessary nowadays most

browsers to prevent ads don't auto play

videos if they have sound so if you mute

your video it will auto play but

presumably not annoy users and let me

set the width of this thing to be like

oh 1280 pixels wide but I can make it

any size I want so I know this just from

having you know looked up the Syntax for

this tag but notice one curiosity

sometimes attribute don't have values

they're empty attributes they're just

single words autoplay Loop muted and

that kind of makes sense for any


attribute that really does what it says

like it doesn't make sense to say muted

equals something like it's either muted

or not the attribute is there or not

similarly for these others as well so

let me go back to my other browser tab

reload the directory listing there is

both my mp4 and also video.html which is

the web page that embeds it and this is

actually a video that was just on

Harvard's website last yesterday and it

was amazing so we included it in this

demo here

this is the video that was on

harvard.edu last night

same photo but you can see here that an

image alone probably would not have the

same effect this is actually a movie a

small video file that's now looping now

there's some artifacts here like there's

a white border around the top I feel

like it'd be nice to fill the screen but

again we'll come back to a language that

can allow us to do exactly that well

it's not just videos like this that you

might want to put into a web page let me

create another file called

iframe.html if you've ever poked around

with it if you have your own YouTube


account or you had your own blog or

WordPress site or Wix or Squarespace you

might have been in the habit of

embedding videos in websites using like

embedded YouTube players well this is

possible too using what's called an

inline frame an iframe and an iframe is

just a tag that is literally iframe it

has Source equals and then a URL and if

it happens to be a YouTube video there's

a certain URL format you need to follow

per YouTube's documentation so you might

do

www.youtube.com embed and then here's an

ID of a video

uh so this is essentially what we do if

we want to embed uh cs50's own lecture

videos in the course's website or the

video player does literally this if I

want to allow full screen I can add this

attribute to that I know exists by just

having checked the documentation and if

I now go back to my browser here

reload my directory listing there's

iframe.html it's not going to fill the

screen because I haven't customized the

Aesthetics yet but it does seem to embed

a tiny little video there for for you to

play with later if you'd like so we

could change the width change the height


get rid of that margin and so forth but

an iframe is a way of embedding someone

else's web page in your web page if they

allow it so as to create all the more of

an interactive experience for them on

say your site all right well let what

the web is of course known for things

like links let's go ahead and create a

file called

link.html and if we want to create a web

page that actually links from itself

somewhere else let's go ahead and do

this something very simple like visit

Harvard

dot edu period now in like Facebook

Instagram a lot of websites nowadays if

you just type in a domain name or a

fully qualified domain name it

automatically becomes a link that's

because those websites have code in them

that automatically detect something that

looks like a URL and turns it into a

proper link HTML itself does not do that

for you and so if I go back to my web

page here click on link.html if you type

visit harvard.edu period that's all

you're literally going to see but

instinctively even if you've never

written HTML before what should we


probably do here to solve this problem

[Music]

what could we do to solve this problem

what do I probably want to add yeah

yeah so I want to surround the URL with

some kind of Link text and you wouldn't

necessarily know this until someone told

you or you looked it up but the tag for

creating a link is somewhat weirdly

called the a tag for anchor it has an

attribute called href for hyper

reference which is like a link in the

virtual world to a URL so let me type in

Harvard's full and proper URL here then

I'm going to close the tag

and then I can still say harvard.edu and

make that what the human sees but the

place they're going to go should be a

full URL protocol and all HTTP or https

and all now if I go back here and reload

the page now it automatically gets

underlined it happens to be purple by

default why because we visited

harvard.edu a few minutes ago so my

browser by default is indicating in

purple that I've been there before but

now I have a link that I can click on

and if I hover over it but don't click

you'll see that in most browsers there's

a little clue as to where you will go if


you click subsequently on this link and

without going too far down a rabbit hole

but to tie together our discussion of

cyber security recently what if I were

to do something like this

right now you have the beginnings of a

phishing attack of sorts

p-h-i-s-h-i-n-g whereby you can create

clearly a web page or heck even an email

using HTML that tells the user they're

going to go one place but they're really

going to go someplace else altogether

and that is the essence of phishing

attacks these days if you've ever gotten

a bogus email pretending to be from

PayPal or your bank or some other

website odds are they've just written

HTML that says whatever they want but

the underlying tags might do something

very different and so having the

instinct to look in the bottom left-hand

corner or be a little suspicious when

you're just told blindly to click on a

link it's this easy to socially engineer

people that is deceive them by just

saying one thing and linking to another

well what if I want to link my page to

another page I already created well if I

want to link to like that photo of


Harvard I can just do href equals quote

unquote in the name of a file in my same

account that is itself a web page so

this is how you can create relative

links a multi-page web pages multi-page

websites yourself so if I now reload

this page hover overharbor.edu you'll

see in the bottom left hand corner a

very long URL but that's because I'm in

code spaces right now vs code and it's

appending automatically to the end of my

current URL the file name image.html but

this should work when I click on this I

go immediately to that file we created

earlier with a crazy big version of the

image but that's just a way that one

page on a website can link to another

page on a website

let's do one other thing here uh making

things more responsive because in fact

that wasn't a particularly responsive

website responsive means responding to

the size of the user's device which is

so important when someone might be on a

screen like this or on a screen like

this these days there are special tags

we can use to tell the browser to modify

its display based on the hardware so let

me create a file called responsive.html

I'm going to copy paste some and


starting point here call this title

responsive and let me go ahead and just

grab let me grab some of that lorem

ipsum text from before just so that we

have a sizable paragraph to play with

here and let me go ahead and grab this

text here

and I'm just going to paste this into

the body of this page and that's it so I

just have a big paragraph at the moment

inside of my body let me go back to my

browser let me open up this file called

responsive.html to make the point that

it is not yet responsive let me go ahead

and click on responsive.html that looks

fine but here's another trick you can do

using Chrome or Edge or other browsers

these days you can pretend to be another

device let me go to view developer

developer tools again last time we used

this to use the network tab which was

kind of interesting because we could see

what the underlying Network traffic is

but notice we can also click on this

icon in Chrome at least that looks like

a mobile phone I can turn my laptop into

what looks like a mobile device by

clicking this I'm going to click the dot

dot menu over here and just move the


dock instead of on the bottom where it

might be by default I'm going to move it

to the right hand side so that now on

the left you see what looks more like

the shape of a vertical phone and in

fact if I go to my Dimensions here I'll

choose something like iPhone x so a few

years back here's what that same website

might look like on an iPhone x you know

that looks pretty damn small you know to

be able to read it and that's because

the website has not automatically

responded to The Fairly narrow

dimensions of the iPhone in question or

Android device or whatnot so let me go

ahead and do this let me go back into my

code and let me go into the head of the

page and for the first time add another

tag up here this word is now all over

the internet but there is a meta tag

that is called that allows you to

specify the name of some kind of

configuration detail here or property if

you will viewport is the technical term

for the rectangular region that the

human sees in a browser it's essentially

the body of the page but only the part

the human is currently seeing and you

can specify the content of the viewport

should have an initial scale of one so


it shouldn't be zoomed in or out and the

width that the browser should assume

should be equal to the devices with

these are sort of magical statements

that you just have to know or copy paste

or transcribe that just expressed to the

browser assume that the width of the

page is the same thing as the width of

the device don't assume the luxury of a

big laptop or desktop computer now

making only that change let me go back

to my pretend iPhone here using Chrome's

developer tools let me reload the page

and now it's not very effective on this

screen if I were showing you this on

[Music]

is there well I'm there we go let's do

this there we go so if I zoom into a

hundred percent this would be on an

actual physical device much more

readable than it would have been a

moment ago even though I realized that

demo was not necessarily persuasive but

it's as simple as telling the browser to

resize the thing to the width of the

page all right let me pause here to see

if there's any questions because that

feels like enough HTML tags We'll add

just a couple of more in but for the


most part like HTML tags are things you

Google and figure out over time just to

build up your vocabulary the basic

building blocks are tags attributes some

attributes have values some do not and

that's sort of the structure of HTML in

essence

questions on any of these though yeah

do attributes have an order uh no

attributes can be in any order from left

to right I tend to be a little nitpicky

and so I alphabetize them if only

because then I can easily spot if

something's Missy if it missing if it's

not there alphabetically most people on

the internet don't seem to be uh do that

yeah in the middle

yeah good question I mentioned that HTML

is starting to replace other languages

for user interfaces and it's not just

HTML alone it's HTML with CSS with

JavaScript both of which we'll get a

taste of here today that rather has been

the trend for portability and the

ability for companies for individual

programmers to write one version of an

app and have it work on Android devices

and iPhones and Macs and PCs and the

like it is very expensive it is very

time consuming to learn a language like


Java and write an Android app learn

another language called Swift and make

an IOS app not to mention make them look

and behave the same not to mention fix a

bug in one and then remember to fix it

in the other I mean this is just very

painful and time consuming and costly so

this this standardization on HTML CSS

and JavaScript even for mobile apps and

web apps has been increasingly

compelling because it solves problems

like that

all right so let's go ahead and now do

something that's finally interactive all

of these Pages thus far really just

tastes of static content content that

does not change well let's go ahead and

and do this let me introduce one other

format of URLs which looks a little

something like it did before so slash

path but it could actually be something

like this slash path question mark key

equals value you might not have noticed

or cared to notice the URLs in your url

bar every day but these things are

everywhere often when you type into a

search engine like Google a search query

whatever you just typed ends up in the

URL when you click on a link that


contains some information there might be

a question mark and then some keys and

values there might be an ampersand and

more keys and values here again is that

very common programming Paradigm of just

associating keys with values we can see

this as follows let me actually go to

google.com in a browser here and let me

search for something the internet is

filled with cats enter

notice now that my URL changed from

google.com to google.com search question

mark Q equals cats Ampersand and then a

bunch of stuff that I don't understand

or know so let's just delete it for now

and leave it with the essence of that

URL and that still works if I zoom out

here years ago you would get pictures of

cats now you get videos of the the movie

and the top query there's cat's a bad

movie

um but we can also of course click on

images and they are the adorable creepy

cats all right this didn't used to

happen when we searched for cats but

anyhow the point is that the URL changed

to include the user's input and this is

such a simple but such a powerful thing

this is how humans provide input to

servers they don't manually create the


URLs like I sort of just did but when

you fill out a form on the web and you

hit enter typically the URL suddenly

changes to include whatever you typed in

in the URL assuming the form is you

using the verb get that's not ideal if

you're typing in a username a password a

credit card information because you

don't want the next person to sit down

at your laptop to see literally

everything you typed in saved in your

history so there's another verb post

that can hide all of that and it's just

sent a little differently but things

like this are typically sent via get and

what that means underneath the hood is

that your browser is just making a

request like this get slash search

question mark Q equals whatever you

typed in the host that you visited and

so forth and hopefully what comes back

is a page full of search results

including cats and what's interesting

here now is if I go back to vs code on

my own computer

and let me go ahead and create a file

called how about search.html

in search.html I'm going to start with

some copy paste from before change my


title to search and in the body of this

page I'm going to introduce a form tag

and in this form tag I'm going to have a

couple of inputs and the types of inputs

are going to be text

and the type of the input is going to be

submit

and this isn't that interesting yet but

let's see what's happening in the page

itself let me go back to my directory

listing let me click on search.html I

seem to have the beginning of my own

search engine it's not very interesting

it's just a text box and a submit button

but let's finish my thoughts here so

let's specifically give this text box a

name of Q which if you roll back to the

late 90s when Larry and Sergey of Google

Fame created google.com Q represented

query the query that the human's typing

in so the name of this text box shall be

uh text that shall be Q the form is

going to use what method technically it

uses get by default but I'll be explicit

and say method equals quote unquote get

stupidly it's lowercase in HTML even

though what's in the envelope is indeed

uppercase by convention

the action of this form specifically

would ideally go to my own server but we


don't really have time today to

implement Google itself so we're just

going to send the user's request to

google.com search

so I'm creating a form the action of

which is to send the data to Google's

search path using the get method it's

going to send a input called Q whenever

I click this submit button let me go

back to the browser reload the page

nothing seems to have changed yet but if

I search for let me zoom out so we can

see the URL bar right now I'm in

search.html

if I zoom out and search for cats now

and click submit I'm whisked away to

google.com but notice that the URL is

parameterized with those key value pairs

that key value pair and I get back a

whole bunch of cat results and I can

very easily now make this a little

prettier right now it's not ideal that

like the human has to move their cursor

and click in the box and it's a little

obnoxious that autocomplete is enabled

if I don't want to search for cats

anymore well according to html's

documentation I can say something like

this autocomplete equals off to turn off


autocomplete auto focus to automatically

put the cursor inside of that text box

if I want some explanatory text I can

put placeholder text like quote unquote

query

and now if I go back to this page and

reload now it's a little more user

friendly you see query and kind of gray

text the cursor is already there and

blinking I don't have to even move my

cursor I can search for dogs now and you

didn't see any autocomplete at all hit

enter to submit and now I'm searching

for there we go adorable dogs instead so

what have I done I've implemented the

front end of google.com just not the

back end so implement the back end we're

obviously going to need like a really

big database maybe something like SQL

we're going to need some code that like

searches the database for dogs or cats

or anything else we're going to need

python for something like that and in

fact that's the direction we're steering

next week when we Implement that back

end but today it's all about this front

end or any question then about

forms these URL parameters

before we now transition to making

things look a little prettier with CSS


and then we'll and by making things a

little more functional with JavaScript

anything at all

no all right so let's start to answer a

couple of the questions they came up by

making these Pages a little more

aesthetically interesting let's go ahead

now and introduce to the mix one other

language as follows let me go ahead and

create a file called home.html as though

I'm making a home page for the very

first time and in this page I'm going to

give a title of home I'm just going to

have like three things first I'm going

to have maybe a paragraph

of text up here at the top that says

something welcoming for my home page

like my name John Harvard for instance

for John Harvard's homepage then in the

middle of the page I'm going to have

some text like uh welcome to my home

page exclamation point and at the bottom

of the page I'm going to have a final

paragraph that says something like

copyright the copyright symbol John

Harvard or something like that all right

so it's like a web page with three

different structural areas made with

text this isn't that interesting if I


open this page called home.html let me

go ahead and create three quick

paragraphs a first paragraph for John

Harvard inside the middle I'm going to

say something like welcome to my home

page exclamation point and at the bottom

whoops at the bottom a little footer

that says something like copyright a

little simple copyright symbol and John

Harvard's name all right now let me

reload the page and there we go it's

very simple very underwhelming web page

that has three main sections let's start

to now stylize this in an interesting

way so that it's a little more

aesthetically pleasing first these

aren't really paragraphs they're sort of

like areas of the page divisions like

the header is up here there's like the

main part of my screen and then there's

the footer of my screen so paragraphs

isn't quite right if these aren't really

paragraphs of texts I might more

properly call them dibs or divisions of

the page which is a very commonly used

tag in HTML which just has this generic

rectangular region to it it does not do

anything aesthetically no bold facing no

size changes it just creates an

invisible rectangular region inside of


which you can start to style the text or

I can take this one step further there's

some other tags in HTML known as

semantic tags that literally have names

that describe the types of your page

which is all the more compelling these

days for accessibility to for screen

readers for search engines because now a

screen reader a search engine can

realize that footer is probably a little

fluffy the header might be a little

interesting the main part of the page is

probably the juicy part that I want

users to be able to search for or read

aloud substantively so let's start to

stylize this page somehow let's

introduce a style attribute in HTML

inside of which is going to be text like

this font size colon large text align

Colon Center

on Main I'm going to add a style

attribute and say font size medium text

align Center

and then on the footer I'm going to say

style equals font size small text align

Center

what's going on here well in blue is the

language we promised called CSS for

cascading style sheets we're not really


seeing the cascading style sheet of it

yet but in blue here notice is another

very common Paradigm it's different

syntax now

but how would you describe what you're

looking at here in blue this is another

example of what kind of programming

convention

yeah it's just more key value pairs

right it'd be nice if the world

standardized how you write key value

pairs because we've now seen equal signs

and arrows and colons and semicolons and

all this but it's just different

languages different choices the key here

is font Dash size the value is large the

other key is text Dash align the coal

the value is Center the semicolon just

separates one key value pair from

another just like in the URL the

Ampersand did in the context of HTTP the

designers of CSS use semicolons instead

strictly speaking this semicolon isn't

necessary I tend to include it just for

symmetry but it doesn't matter because

there's nothing after that this is a bit

of a weird example this is the

commingling of CSS inside of JavaScript

so as of now you can use the CSS

language inside of the quote marks in


the value of a style attribute we did

something a little similarly last two

weeks a week plus ago when we included

some SQL inside of python so again

languages can kind of cross barriers

together but we're going to clean this

up because this is going to get messy

quickly certainly for large web pages

the size of harvards or yales or the

like

so let's see what this looks like let me

go back to my browser window here reload

the page

and it's not that different but it's

indeed centered and it's indeed large

medium and small text and let me make

one refinement the copyright symbol

actually can be expressed but there's no

key on my US keyboard here I can

actually magically say uh Ampersand hash

169 semicolon using what's called an

HTML entity turns out there are numeric

codes with this weird syntax that allow

you to specify symbols that exist in

Macs and PCs and phones but that don't

exist on most keyboards if I reload the

page now now it's a proper copyright

symbol so mine are aesthetic but it

introduces us to these HTML entities


so even if you've never seen CSS before

you can probably find something kind of

dumb about what I did here like poor

design it is correct if my goal was

small medium and large bottom up

what looks like a bad design perhaps

even if you've never seen this language

before yeah

yeah I've used the same style three

times like copy paste or typing the

exact same thing again and again it has

rarely been a good thing well here's

where we can take advantage of the

design of CSS because it supports what

we might call inheritance whereby child

children inherit the properties the key

value pairs of their parents or

ancestors and what that means is I can

do this let me get rid of this text

aligned let me get rid of this text of

line let me get rid of this one I could

get rid of the semicolon too but I'll

leave it for now and let me add all of

that style to the parent element the

body so that it sort of Cascades down to

the header the main and the footer tags

as well and let me close my quotes there

too now if I go back to my browser and

hit reload nothing changes but it's a

little better designed right because if


I want to change the text alignment to

maybe be right aligned I can now reload

the page and voila now it's over there I

change it in one place not in three

different places so that would seem to

be marginally better design

and could we do this any more

differently well it's not that elegant

that it's all just kind of in line with

my HTML this generally tends to be bad

practice where you co-mingle your HTML

and your CSS especially since some of

you might be really good at like laying

out the structure of web pages and the

content and the data and you might have

a horrible sense of design or just not

care about the Aesthetics you might work

with a designer an artist who's much

better at all of these fine tunings

aesthetically wouldn't it be nice if you

could work on the HTML they could work

on the CSS and you don't have to somehow

like literally edit the same lines of

code AS each other well just like we can

move stuff into header files in C or

packages in Python we can do the same in

CSS so I'm actually going to go ahead

and do this let me get rid of all of

these style attributes and let me now


start to practice a Convention of not

co-mingling CSS with my HTML let me

instead move it into the head of the

page in a style tag instead of an

attribute this this is one of the rare

examples where they there are attributes

that have the same name Zips tags as

vice versa it's not very common but this

one does exist

here's a slightly different Syntax for

expressing the same key value pairs if I

want to apply CSS properties that is key

value pairs to the header of the page I

say header and then I use curly braces

and inside of those I say font Dash size

large text Dash align Center

then if I want to apply some properties

to the main section of the page I again

do font size say medium and then I can

do text align Center then lastly on the

footer of the page I can assign some

properties like font size small and then

text align Center semicolon and I don't

have to do anything more in my HTML it

all is just represents the structure of

my page but because of this style tag in

the head of the page the browser knows

in advance that the moment it encounters

a header tag a main tag or a footer tag

it should apply those properties those


Styles if I reload the page other than

it being re-centered now there's no

other changes all we're doing is sort of

iteratively improving the design here

but now everything's in the top of the

file but there's still a bad design here

what could I now do that would be

smarter

similar problem to before

yeah

okay create a new file with just the CSS

I like that let's go there in just one

second but even as we're here there's

still a redundancy we can probably chip

away at yeah get rid of the text align

Center in three different places which

doesn't seem necessary and perhaps

someone else if I get rid of text align

Center what should I add to my style tag

in order to bring it back but apply it

to everything in the page and the page

if I scroll down looks like this in HTML

yeah

yeah so the body tag so let me go ahead

and say body and then in here put text

align Center and that now if I reload

the page has no visual effect but it's

just better design because now I

factored out that kind of commonality


and so just to make clear what we've

been doing here these are all again CSS

properties these key value Pairs and

there's different types of ways of using

them and there's this whole taxonomy

what we've been doing thus far or what

we're going to call type selectors where

the type is the name of a tag and so it

turns out there's other ways though to

do this and let's head in this direction

let's go ahead and maybe write our CSS

slightly differently because you know it

would be nice I bet after today once I

start creating other files for my home

page or John Harvard's homepage I might

want to have Center text on other pages

and I might want to have large text or

medium text or small text it'd be nice

if I could reuse these properties again

and again and kind of create my own

Library maybe even ultimately putting it

in a separate file so let me do this

instead of explicitly applying text

align Center to the body let me create a

new noun or an adjective rather for

myself called centered it has to start

with the dot because what I'm doing is

inventing my own class so to speak this

has nothing to do with classes in Java

or python class here is this aesthetic


feature and actually let me rename these

to be dot large dot medium and Dot small

what this is doing for me is it's

inventing new words well named words

that I can now use in this file or

potentially in other web pages I make as

follows I can now say if I want to

Center the whole body I can say class

equals centered on the header tag I can

say class equals large on the main tag I

can say class equals medium on the

footer tag I can say class equals small

but let me take this one step further as

you suggested why don't I go ahead now

and let me actually get rid let me grab

all of the CSS copy it to my clipboard

let me get rid of the style tag here and

create a new file called home.css

and let me just save all of that same

text in a separate file ending in dot

CSS nothing else no HTML whatsoever but

let me go back to my home.html page and

this is one of the most annoyingly named

tags because it doesn't really mean what

it does this link href home.css Rel

equals style sheet so ideally we would

have used the link tag for links and web

pages but this is Link in the sort of

conceptual sense we're linking this file


to this other one so that they work

together

using this hyper-reference home.css the

relationship of that file to this one is

that of style sheet a style sheet is a

file containing a whole bunch of

stylizations a whole bunch of properties

as we just did so here too it's

underwhelming the effect if I reload the

page nothing changed but now I not only

have a better design here because I can

now use those same classes in my second

page that I might make my third page my

fourth page my bio my you know resume

page whatever it is I'm making on my

website here I can reuse those styles by

just including one line of code instead

of copying and pasting all of that style

stuff into file after file after file

and Heck if the rest of the world is

really impressed by my centered class

and my large and medium and small

classes I could bundle this up let other

people on the internet download it and I

have my own Library my own CSS library

that other people can use why should you

ever invent a centered class again if I

already did it for you stupid and small

as this one is but it would be nice now

to package this up in a way that's


usable

by other people as well

so this is perhaps the best design when

it comes to CSS use classes where you

can use external style sheets where you

can but don't use the style attribute

where we began which while explicit

starts to get messy quickly especially

for large files

all right any questions then on this

no all right so that's class selectors

when you specify dot something that

means you're selecting all of the tags

in the page that have that particular

class and applying those properties so

there's a couple of others here just to

give you a taste now of what's possible

there's so much more that you can

actually do with HTML and CSS together

let me go ahead and open up a few

examples that I did here in advance let

me go ahead and open up vs code and let

me go ahead and copy

um my source 8 directory

give me one sec to grab the source 8

directory for today's lectures so that I

can now go into my browser

go into some of the pre-made examples in

Source a and let me open up paragraphs


one here so here's something it's a

little subtle

but does anyone notice how this is

stylized

this is just some generic lorem of some

text again but what's noteworthy

stylistically

a book might do this

yeah

a little bigger why who knows it's just

a stylistic thing at the beginning of

the chapter the first paragraph is

bigger how did we do that well we can

actually explore this in a couple of

ways one I can obviously go into vs code

and show you the code but now that we're

using Chrome and we're using these

developer tools let's again go into them

view developer developer tools and now

notice let me turn off the mobile

feature and let me move the dock back to

the bottom just so that it's fully wide

we looked at the network tab before we

looked at the mobile button before now

let me click on elements what's nice

about the elements tab is you can see a

pretty printed version of the web pages

HTML nicely color coded syntax

highlighted for you so that you can now

henceforth learn from look at the source


code the HTML source code of any web

page on the internet notice that my own

web page here it's not that interesting

there's a bunch of paragraph tags of

alarm ipsum text but notice what I did

the very first one I gave an ID to this

is something that you as a web designer

can do you can give an ID attribute to

any tag in a page to give it a unique

identifier the onus is on you not to

reuse the word anywhere else if you if

you reuse it you've screwed up it's

incorrect Behavior but I chose an ID of

first just so that I have some way of

referring to the very first paragraph in

this file if I look in the head of the

page and the style tag here notice that

I have hash first so just as I use dot

for classes the world of CSS uses a hash

symbol to represent IDs unique IDs and

what this is telling the browser

whatever element has the first ID f i r

s t without the hash apply font size

larger to it and that's why the first

paragraph and only the first paragraph

is actually stylized if I actually go

into vs code now and let me go into my

source 8 directory let me open up

paragraphs1.html
here's the actual file if I want to

change the color of that first paragraph

to Green for instance I can do color

colon green let me close the developer

tools reload the page and now that page

is green as well you don't have to just

use words you can use

[Music]

hexadecimal what was the hex code for

Green in RGB

like no red lots of green no blue so you

could do 0 0 ff00 using a hash which

coincidentally is the same symbol but it

has nothing to do with IDs this is just

how Photoshop and web pages represent

Colors Let's go back here and reload

it's the same although it's slightly

different version of green this is pure

green here if I want to change it to Red

that would be let's see RGB

ff000 and here I can go and reload now

it's first paragraph red this actually

gets pretty tedious quickly like if

you're a web designer trying to make a

website for the first time it actually

might be fun to Tinker with the website

before you open up your editor and you

start making changes and save and reload

that's just more steps so notice what

you can do with developer tools too in


Chrome and other browsers when I

highlight over this paragraph under the

elements tab notice that one it gets

highlighted in blue if I move my cursor

it doesn't get highlighted if I move it

it gets highlighted so it's showing me

what that tag represents but notice over

here on the right right you can also see

all of the stylizations of that

particular element some of them are

built in the italicized ones here at the

bottom means user agent style sheet that

means this is what Google makes all

paragraphs look like by default but in

non-italicized here you see hash first

which is my code that I just changed and

if I want to start tinkering with colors

I can do like zero zero zero zero FF

enter

I changed it to Blue but notice if I go

back to vs code I didn't change my

original vs code code this is now purely

client-side and this is a key detail

when I drew that picture earlier of the

browser going making a request to the

cloud the server and the cloud and the

response coming back the browser your

Mac your PC your phone has a copy of all

the HTML and CSS so you can change it


here however you actually want and for

instance you can do this with any

website let's go uh say on a field trip

here to

uh how about stanford.edu

so here's Stanford's website as of today

uh let's go ahead here and let's see

there's their admissions page campus

life and so forth let me go ahead and

view developer tools on Stanford's page

developer tools elements you can see all

of their HTML and notice it's collapsed

so here is their header

here's their main part and you can I'm

using my keyboard shortcuts to just open

and close the tags to dive in deeper and

deeper suppose you want to kind of mess

with Stanford you can actually like

right click on any element of a page or

control click inspect and that's going

to jump you automatically to the tag in

the elements tab that shows you that

link and notice if I hover over this Li

notice Stanford's using a list as an

unordered list from left to right though

it doesn't have to be a bulleted list

top to bottom they've used CSS to change

it to be a list from news events

academics research Healthcare campus

admission about well so much for


admission that's gone so now if I close

developer tools now it's gone from

Stanford's website but of course what

have I really done

I've just like mutated my own local copy

so this is not hacking even though this

might be how they do it in TV in the

movies it's still there if I reload the

page but it's a wonderfully powerful way

to one just iterate quickly and I try

different things stylistically figure

out how you want to design something and

two just learn how Stanford did

something so for instance if I right

click or control click on admission

again go to inspect and let me go to the

LI tag let me keep going up up up up to

the UL tag there's going to be a lot

going on here but notice they have

applied all of these CSS properties to

that particular UL tag but notice here

this is how it's something like this and

we'd have to read more to learn how this

works list style type none this is how

they probably got rid of the bullets and

what you can do is just Tinker like all

right well what does this do well let me

uncheck it all right didn't really

change anything font weight uncheck this


there we go so now the margin is changed

the padding around it has changed let's

get rid of this we can just start

turning things on and off just to get a

sense of how the web page works I'm not

really learning anything here so far let

me go to the LI here for uh let's go to

the admissions one here

um margin

there we go okay so when you there's a

display property in CSS that's

apparently effectively changing things

from vertical to horizontal if I turn

that off now Stanford's links all look

like this and there are those bullets so

again just default styles that they've

somehow overridden and a good web

designer just knows ultimately how to do

these kinds of things all right how

about a couple final building blocks

before we'll take one more break and

then we'll dive in with JavaScript to

manipulate this stuff programmatically

let me go ahead and open up how about

paragraphs two here let me close this

tab let me go into paragraphs two which

is pre-made and this one looks the same

except when I go ahead and inspect this

first paragraph notice that I was able

to get rid of the ID somehow which is


just to say there's many many ways to

solve problems in HTML and CSS just like

there is in C in Python let me look in

the head and the style of the page now

this is what we might call a this is

another type of selector that allows us

to specify of a paragraph tag that

itself happens to be the first child

only so you can apply CSS to a very

specific child namely first child

there's also Syntax for last child if

just the first one is supposed to look a

little different so here I've just

gotten out of the business of creating

my own unique identifier and instead I'm

using this type of selector as well well

what more can we do let me go into

another example here called

link1.html and here we have a very

simple page that just says visit Harvard

but notice it's purple by default

because we've been to harvard.edu before

let's see if we can't maybe stylize

Harvard's links to be a little different

let me go into linked version 2 now

which looks like this and now Harvard is

very red how did I do that well let me

right click on it click inspect and I

can start to poke around it looks like


my HTML is not at all noteworthy it's

just very simple HTML anchor tag with an

href so let's look at the style let me

zoom out and we can look at it in two

different ways we can literally look at

the style contents here or we can look

at Chrome's pretty version of it over

here it looks like my style sheet in the

style tag has changed the color to be

red and the text decoration which is a

new thing but it's another CSS property

To None notice if I turn that off links

on the internet are underlined by

default which tends to be good for

familiarity for visibility for

accessibility but if it's very obvious

what is text and what is a link maybe

you change text decoration to none but

maybe watch this maybe the link comes

back the line comes back when you hover

over it well let's look at how I did

this in style notice that I have

stylization and I put my curly braces on

the same line here as tends to be

convention in CSS color is red text

decoration is none but whenever an

anchor tag is hovered over

you can change the decoration text

decoration to be back to the default

underline so again just little ways of


playing around with the Aesthetics of

the page once you understand that really

there's just different types of

selectors and you might have to remind

yourself look them up occasionally as to

what the syntax is but it's just another

way of scoping your properties to

specific tags let's look at a version

three of this here which adds Yale to

the mix if I go to link3.html maybe I

want to have Harvard links read

Yale links blue how might I have done

this well let's right click and click

inspect

and here we might have two lengths with

a couple of techniques just to again

emphasize you can do this so many

different ways I gave my Harvard link an

ID of Harvard my Yale link an ID of Yale

in my CSS if we go to the head of the

page I then did this

the tag with the Harvard ID AKA hash

Harvard should be read hash Yale should

be blue and then any anchor tag should

have no text decoration unless you hover

over it at which point it should be

underlined and so if I hover over

Harvard it's red underlined Yale it's

blue underlined if I want to get rid of


the IDS I can do this a slightly

different way let me go into link four

same effect but notice I got rid of the

IDS now how else can I express myself

well let's look at the CSS here the

anchor tag has no text decoration by

default unless you're hovering over it

and this is kind of cool this is what we

would call on our list here an attribute

selector where you spec select tags

using CSS notation based on an attribute

so this is saying go ahead and find any

anchor tag whose href value happens to

equal this URL and make it red do the

same for Yale and make it blue now this

might not be ideal because if there's

something after the slash these equal

signs don't work because if it's a

different Harvard or different Yale link

this is a little too precise so let me

look at version 5 here of link.html look

at this style and I did this a little

smarter this is new syntax and again

just the kind of thing you look up star

equals means change any anchor tag whose

href contains anywhere in it harvard.edu

to red and do the same thing for Yale

based on Star equals so star here

connotes wild card so search for

harvard.edu or yale.edu anywhere in the


href and if it's there colorize the link

and again we could do this all day long

with diminishing returns to actually

achieve the same kind of stylizations in

different ways and as projects just get

larger and larger you just have more and

more decisions to make and so you have

certain conventions you start to adopt

and indeed if I may you have the

introduction of what are called

Frameworks ultimately if you're a

full-time web developer or you're

working for a company doing the same you

might have internal conventions that you

adhere to for instance the company might

say always use classes don't use IDs or

always use attribute selectors or don't

use this and it wouldn't be necessarily

as draconia as that Draconian is that

but they might have a style guide of

sorts but what many people and many

companies do nowadays is they do not

come up with all of their own CSS

properties they start with something off

the shelf a framework typically a free

and open source framework that just

gives them a lot of pretty stylizations

for free just by using a third-party

library and one of the most popular ones


nowadays is something called bootstrap

that cs50 uses on all of its websites

super popular in Industry as well it's

at

getbootstrap.com and this is just to

give you a taste of it a website that

documents the library that they offer

and there's so much documentation here

but let me just go to things like how

about components it just gives you out

of the box the CSS with which you can

create little alerts if you've ever

noticed on cs50's website little

colorful warnings that the top of the

page or call outs to draw your attention

to things how did we do that it's

probably a paragraph tag or a div tag

and maybe we change the font color we

change the background color right it's a

lot of stuff we could absolutely do from

scratch but you know what why do I

reinvent the wheel if we can just use

bootstrap so for instance let me just

scroll down if you've ever seen on

cs50's website a yellow warning alert

like this let me just zoom in

on this we are just using HTML like this

we're using a div tag which again is an

invisible division a rectangular region

of the page but we're using classes


called alert and another class called

Alert warning those are classes that the

friend the the folks at bootstrap

invented they Associated certain text

colors and background colors and padding

and margin and like other Aesthetics

with so all we have to do is use those

classes role equals alert just makes

clear to like a screen reader that this

is an alert that should probably be

recited and whatever's in between the

open tag and close tag is what the human

would see how do you use something like

bootstrap well you just read the

documentation under getting started

there is a link tag you copy paste into

your own so let me do this so in

table.html we had code like this let me

actually read bootstraps documentation

really fast and they tell me dot dot

copy paste this code I'm going to put

this into the head of my page and it's

quite long but notice it's a link tag

which I used earlier for my own CSS file

the href of which is this CDN link

content delivery Network that's

referring to a specific version of

bootstrap that's available on this day

and the file that I'm including is


called

bootstrap.min.css this is an actual file

I can visit with my browser if I open

this in a separate tab this is the CSS

that bootstrap has made freely available

to us crazy long no white space that's

because it's been minimized just to not

waste Space by adding lots of white

space and comments but this contains a

whole lot hundreds of CSS properties

that we can reuse thanks to classes that

they invented if I want to use some

JavaScript code I can also copy this

script tag but we'll come back to that

before long let me now just make a

couple of tweaks to this this table if I

go into my browser from before this is

what it looked like previously where

name and number were bold but centered

and then Carter and David were on the

left and the numbers were to the right

you know it's fine it's not that pretty

but it'd be nice if it were a little

prettier than that so if we add

bootstrap into it notice one thing

happens first when I reload the page

no longer are Chrome's default styles

used Now bootstraps default styles are

used which is a way of enforcing

similarity across Chrome Edge Firefox


Safari and others notice it went from a

serif font to a Sans serif fonts and

something cleaner like this it still

looks pretty ugly but let me go into

bootstraps documentation

let me go under their uh content tab for

tables and if I just kind of start

skimming this these are some good

looking tables right like there's some

underlining here some uh it's Bolder

font there's a dark line if I keep going

oh that's getting pretty too if I want

to have a colorful table like I could

figure all of this stuff out myself if I

want sort of dark mode here uh if I want

to have uh alternating highlights and so

forth there's so many different

stylizations of tables that I could do

myself but I care about making a phone

book not about Reinventing these wheels

so if I read the documentation closely

it turns out that all I need to do is

add bootstraps table class to my table

tag and watch with a simple reload what

my now table.html file looks like much

nicer right might not be what you want

but my god with like two lines of code I

just really prettied things up and so

here then is the value of using


something like a framework it allows you

to actually create much prettier Much

More Much More user-friendly websites

than you might otherwise be able to make

on your own certainly quickly in fact

let's iterate one more time on one other

example before we introduce a bit of

that code let me go ahead and open up uh

search.html from before which recall

looked like this and search.html in my

browser

was this very simple Google search and

suppose I want to reinvent google.com's

UI a bit more here's a screenshot of

google.com on a typical day it's got an

about link a store link Gmail images

these weird dots sign in their logo it's

not appearing well on the screen here

but there's a big text box in the middle

and then two buttons Google search and

I'm feeling lucky well could I maybe go

about implementing this UI myself using

some HTML some CSS and maybe bootstraps

help just so I don't have to figure out

all of these various stylizations well

here's my starting point in search.html

let's go and add in bootstrap first and

foremost so that we have access to all

of their classes that are reusable now

and let me go ahead and


figure out how to do this well just like

Stanford's site had like its nav

navigation bar using a UL but they

changed it from being a bulleted list to

being left to right I bet I can do

something like this myself so let me go

into the body of my page and first based

on bootstraps documentation let me add a

div called a div with a class of

container fluid container fluid is just

a class that comes with bootstrap that

says make your web page fluid that is

grow to fill the window so that way it's

going to resize nicely I'm going to go

ahead and fix my indentation here if you

haven't discovered this yet if you

highlight multiple lines in vs code you

can hit Tab and indent them all at once

so now I have all of that inside of this

div now just like in Stanford's site

let's create an unordered list

that has maybe An Li uh called uh with a

class of nav item

and then in here whoops in here let me

go ahead and say

uh a

href equals https colon slash about dot

Google which is the real URL of Google's

about page and I'll put the about text


in there then I'm going to close my li

tag here and I want to do one other

thing because I'm using bootstrap

bootstraps documentation if I read it

closely says to add a class to your

links called like nav link and text dark

to make it dark like black or dark gray

instead of the default blue all right so

I think I have now an about Link in a

navigation part of my screen let me go

ahead and save this and reload

all right so not exactly what I wanted

it's a bulleted list still so I need to

override this somehow let me read

bootstraps documentation a little more

clearly and let me pretend to do that

for time's sake if I go under content

oops if I go under components and I go

to navs and tabs long story short if you

want to create a pretty menu like this

where your links are from the left to

the right just like Stanford I

essentially need HTML like this and this

is subtle but I left off this class I

should have added a class called nav on

my UL so that was my bed let me go in

here and say add class equals nav and

then again this class nav item bootstrap

told me to nav link text dark bootstrap

told me to let me go back to my page


here reload and okay it's still kind of

ugly but at least the about link is in

the top left hand corner just like it

should be in the real google.com now let

me whip up a couple of more links real

fast let me go and do a little copy

paste so I bet next week we can avoid

this kind of copy paste let me change

this link to uh be store.google.com the

text will be store

let me go ahead and create another one

here for Gmail

so this one's going to go to officially

how about technically it's

www.google.com Gmail

normally it just redirects and let me

grab one more of these and for Google

images and I'm going to paste this

whoops I'm gonna come on I'm going to

put this here too this is going to be

images and that URL is IMG HP is the URL

all right let me go ahead and reload the

browser page now it's coming along right

about storage email images it's not

quite what I want so I'd have to read

the documentation to figure out how to

maybe nudge one of these over to start

right aligning it and there's a couple

of ways to do this but one way is if I


want Gmail to move all the way over and

push everything else I can say that

at uh add some margin

to the Gmail list item margin start Auto

this is in bootstraps documentation a

way of saying whatever space you have

just automatically shove everything

apart and now if I reload the page again

now voila Gmail and images is over to

the right all right so now we're kind of

moving along let me go ahead and add the

big blue button to sign in so here with

sign in let me go ahead and over in my

same nav yeah so let's go ahead and do

one more Li class equals nav item and

then inside of this Li tag what am I

going to do turns out there is a class

that can turn a link into a button if

you say BTN for button and then button

primary makes it blue the href for this

one is going to be https

accounts.google.com service login which

is literally where you go if you click

on the big blue button the role of this

link is that of button and then sign in

is going to be the text on it if I now

reload the page now we're getting even

closer although it looks a little stupid

notice that sign in is way in the top

right hand corner whereas the real


google.com has a little bit of margin

around it okay that's an easy fix too

let me go back into my HTML here let me

add margin Dash three this two is a

bootstrap thing they have a class called

M Dash something the something is a

number from like one to five I believe

that adds just some amount of white

white space so if I reload now now okay

it's just a little prettier and now let

me accelerate just to demonstrate how I

can take this home let me go ahead and

open up my pre-made version of this

whereby

I added to this

some final flourishes if I go to

search2.html I decided to replace their

logo with just this out of a cat and

notice that I re-implemented essentially

google.com here's a text box here's two

buttons even though they're a little

washed out on the screen I even figured

out how to get dots that look pretty

similar to Google's and if we view

Source you can see how I kind of finish

this code if I go to view developer

tools and I go to elements

and I go into this div and I go into

this div you'll see that here's an image


tag for happy cat and I added some

classes there to make it fluid and width

25 of the screen if I go into the form

tag this is the same form tag as before

but notice I used button tags this time

with button and button light classes and

then I stylize them in a certain way and

so in the end result if I want to go

ahead and search now for birds and click

Google search voila I've implemented

something that's pretty darn close to

google.com without even touching raw CSS

myself and now here's the value then of

a framework you can just start to use

off-the-shelf functionality that someone

else created for you but if you want to

make refinements you don't really like

the shade of blue that bootstrap chose

or the gray button or you want to curve

things a bit more that's where you can

create your own CSS file and do the last

mile sort of fine tuning things and that

tends to be best practice stand on the

shoulders of others as much as you can

using libraries and then if you really

don't like what the library is doing

then use your own skills and

understanding of HTML and CSS to refine

things

a bit further
but still after all of that all of these

examples we've done thus far are still

static other than the Google one which

searches on the real google.com let's

take a final five minute break and we'll

give you a sense of what we can next do

next week onward with JavaScript see you

in five

all right so I think it's fair to say

we're about to see our very last

language next week and final projects

are ultimately going to be about

synthesizing so many of these thankfully

this language called JavaScript is quite

similar syntactically to both C and

Python and indeed if you can imagine

doing something in either of those you

can probably do it in some form in

JavaScript the most fundamental

difference today though is that when you

have written C code and python code thus

far you've done it on the server you've

done it in the terminal window

environment and when you run the code

it's running in the cloud on the server

the difference now today with JavaScript

is even though you're going to write it

in the cloud using vs code recall that

when a browser gets the page containing


this code it's going to get a copy of

the HTML the CSS and the JavaScript code

so JavaScript that we see today is all

going to be executed in the browser on

user's own Max PCS and phones not in the

server JavaScript can be used on the

server using an environment called

node.js it's an alternate to python or

Ruby or Java or other languages we are

using it today client side which is a

key difference so in scratch let's do

this one last time if you wanted to

create a variable in scratch setting

counter equal to zero in JavaScript it's

going to look like this you don't

specify the type but you do use the

keyword let and there's a few others as

well that say let counter equals zero

semicolon if you want to increment that

variable by one you and JavaScript could

say something like counter equals

counter plus one or you can do it more

succinctly with plus equals or the plus

plus is back in JavaScript you can now

say counter plus plus semicolon again in

scratch if you wanted to do a

conditional like this asking if x less

than y it looks pretty much like C the

parentheses are unfortunately back the

curly braces here are back if you have


multiple statements in particular but

syntactically it's pretty much the same

as it was for if for if else and even

for its else if else unlike python it's

two words again again else if so quite

quite like C nothing new beyond that if

you want to do something forever in

scratch you'd use this block in

JavaScript you can do it a few ways

similar to python similar to C you'd say

while true in JavaScript booleans are

lowercase again just like in C so it's

lowercase true if you want to do

something a finite number of times like

repeat three times looks almost like C

as well the only difference really is

using the word let here instead of ins

and again you'll use let to create a

string or an inch or any other type of

variable in JavaScript the browser will

figure out what type you mean from

Context in C we would have said int

instead

ultimately this language and that's it

for our tour of JavaScript syntax

there's Bunches of other features but

syntactically it's going to be that

accessible relatively speaking the power

of JavaScript running in the user's


browser is going to be that you can

change this thing in memory think about

most any website that's at all

interesting today that you use it's

typically very interactive and dynamic

if you're sitting in front of Gmail on a

laptop or desktop with the browser tab

open and someone sends you an email all

of a sudden another row appears in your

inbox another row another row how is

that implemented honestly it could be an

HTML table maybe it's a bunch of dibs

top to bottom the point though is you

don't have to hit command r or control R

to reload the page to see more email it

automatically appears every few seconds

or minutes how is that working when you

visit gmail.com You are downloading not

just HTML and CSS with your initial

inbox presumably you're downloading some

JavaScript code that is designed to keep

talking every second every 10 seconds or

something to Gmail servers and they then

are using their code to add another

element another element another element

to the existing Dom document object

model which is the fancy term for tree

in memory that represents HTML so that

the web page can continue to update in

real time Google Maps same thing if you


click and drag and drag and drag your

browser did not download the entire

world to your Mac or PC by default it

only the downloaded what's in your

viewport the rectangular region but when

you click and drag it's going to get

some more tiles up there some more

images some more images as you keep

dragging using JavaScript again behind

the scenes so let's actually use

JavaScript to start interacting with

Pages how can we do this we can put the

JavaScript code in the head of the page

in the body of the page or even Factor

it out to a separate file so let's take

a look here is a new version of

hello.html that during the break I just

added a form to because it'd be nice if

this page didn't just say hello title

hello body it said hello David hello

Carter hello whoever uses it I've got a

form that I borrowed from some of our

earlier code and that form has an input

whose ID is name and whose uh that also

has a submit button but there's no code

in this yet so let's add a little bit of

JavaScript code as follows suppose that

when this form is submitted I want to

greet the user how can I do that well


let's do it the somewhat messy way first

I can add an attribute called on submit

to the form element and I can say on

submit call a function called greet

close quotes unfortunately this function

doesn't yet exist but I can make it

exist but there's another detail here

when the user clicks submit normally

forms get submitted to the server I

don't want to do that today I want to

just submit the form to the browsers

keep on the same page and just print on

the screen hello David or so forth so

I'm also going to go ahead and say

return false and this is a JavaScript

way of telling the browser even when the

user tries to submit the form return

false like no don't let them actually

submit the form but do call this

function called greet in the head of my

page I'm going to add a script tag where

in the language is implicitly JavaScript

and has no relationship for those of you

who took apcs with Java just a similarly

named language but no relation I am

going to name a function called greet

apparently in JavaScript the way you

create a function is you literally say

the word function instead of Def you

don't specify a return but type and in


this function I could do something like

this

alert quote unquote uh how about hello

there initially I'm going to keep it

simple using a built-in function called

alert which is not a good user interface

there are better ways to do this but

we're doing something simple first let

me now go ahead and load this page again

it still looks as simple as before with

just a simple text box I'll zoom in to

make it bigger I'm going to type my name

but I think it's going to be ignored

when I click submit it just says hello

there and this is again this is an ugly

user interface it literally says the

whole code space URL of the web page is

saying this to you it's really just

meant for simple interactions like this

for now all right let's have it say

hello David somehow well how can I do

this well if this element on the page

was given by me a unique ID it'd be nice

if just like in CSS I can go grab the

value of that text box using code and I

actually can let me go ahead and do this

let me store in a variable called name

the result of calling a special function

called document.com query selector


this query selector function is

javascript's version of what we were

doing in CSS to select nodes using

hashes or dots or other syntax it's the

same syntax so if I want to select the

element whose unique ID is name I can

literally just pass in single or double

quotes hash name just like in CSS that

gives me the actual node from the tree

it gives me one of these rectangles from

the Dom the document object model if I

actually want to get at the specific

value they're in I need to go one step

further and say dot value so similar in

spirit to python where we saw a lot of

dot notation where you can go inside an

object inside of an object that's what's

going on long story short in JavaScript

there is a special Global variable

called document that lets you just do

stuff with the document the web page

itself one of those functions is called

query selector that function returns to

you whatever it is you're selecting and

Dot value means go inside of that

rectangle and grab the actual text that

the human typed in so so if I want to

now say hello to that person the syntax

is a little different from C and python

I can use concatenation which actually


does exist in Python but we didn't use

it much I can go ahead and say hello

quote unquote uh hello plus name all

right now if I go back to the browser

window reload the page to get the latest

version of the code Type in David and

click submit now I see hello David not

the best website but it does demonstrate

how I can start to interact with the

page but let me stipulate that this

commingling of languages is never a good

thing it's fine to use classes but using

style equals quote unquote and a whole

bunch of CSS that was not going to scale

well once you have lots and lots of

properties same here once you have more

and more code you don't want to just put

your code inside of this on submit

Handler so there's a better way let's

get rid of that on submit attribute and

literally never use it again that was

for demonstration's sake only and let's

do this let me move the script tag

actually just below the form but still

inside the body so that the script tag

exists only after the form tag exists

logically just like in Python your code

is read top to bottom left to right and

let me now do this let me Define this


function called grit and then let me do

this document dot query selector let me

select the form on the page it doesn't

have a unique ID it doesn't need to I

can just reference it by name form

because there's only one of them and let

me call this special function add event

listener this is a function that listens

for events now this is actually a term

of art within programming many different

languages are governed by events and

pretty much any user interface is

governed by events especially phones on

phones you have touches and you have

drags and you have long press and you

have pinch and all of these other

gestures on your Mac or PC you have

click you have drag you have key down

key up as you're moving your hands up

and down on the keyboard This is a a

non-exhaustive list of all of the events

that you can listen for in the context

of web programming and this might be a

throwback to scratch where recall

scratch let you broadcast events and we

had the two puppets sort of talking to

one another via events in the world of

web programming game programming any

human physical device these days they're

just governed by events and you write


code that listens for these events

happening so what do I want to listen

for well I want to add an event listener

for the submit event and when that

happens I want to call the Greet

function like this

so this is kind of interesting

thank you I have my greet function as

before no changes but I'm adding one

line of code down here I'm telling the

browser to use document.queryselector to

select the form then I'm adding an event

listener specifically for the submit

event and when that happens I call greet

notice I'm not using parentheses after

greet I don't want to call greet right

away I want to tell the browser to call

greet when it hears this submit event

now let me go ahead and

um

let me go ahead and

deliberately I think trip over something

here let me type in my name David submit

and there we go all right hello David

all right but let's let's now make this

slightly better designed right now I'm

defining a function greet which is fine

but I'm only using it in one place and

you might recall we we stumbled on this


in Python where I was like why are we

creating a special function called get

value when we're only using it like one

line later and we introduced what type

of function in Python the other day

yeah so Lambda functions Anonymous

functions you can actually do this in

JavaScript as well if I want to define a

function all at once I can actually do

this let me cut this onto my clipboard

paste it over here let me fix all of the

alignment let me get rid of the name and

I can actually now

do this the syntax is a little weird but

using now just these four lines of code

I can do this I can tell the browser to

add an event listener for the submit

event and then when it hears that call

this function that has no name and

unlike python this function can have

multiple lines which is actually a nice

thing it looks a little weird there's a

lot of indentation curly braces going on

now but you can think of this as just

being run these two lines of code when

the form is submitted but if I want to

block the form from actually being

submitted I got to do one other thing

and you would only know this from being

told it or reading the documentation I


need to do this function prevent default

passing in this e argument which is a

variable that represents the event more

on that another time that just allows us

to prevent whatever the default handling

of that particular event is so long

story short this is representative of

the type of code you might write in

JavaScript whereby you can actually

interact with your code the user's

actual form and we can do interesting

things 2 built into browsers nowadays

this functionality like this so here's a

very simple example that has just three

buttons in it one red one green one blue

well it turns out using JavaScript you

can control the CSS of a page

programmatically I can change the

background of the body of the page to

Red to green to blue just by listening

for clicks on these buttons and then

changing CSS properties just to give you

a taste of this if I view the page as

Source similar code here I can select

the red button by an ID that I

apparently defined on it right up here I

can add an event listener this time not

for submit but for click and when it's

clicked I execute this one line of code


and this one line of code we haven't

seen before but you can go into the body

of the page it's style property and you

can change its background color to Red

this is one example of like two

different groups not talking to one

another in advance in CSS properties

that have two words are usually

hyphenated like background Dash color

unfortunately in JavaScript if you do

something Dash something that's

subtraction which is logically

nonsensical here so in CSS you can

convert background Dash color to in

JavaScript background color where you

capitalize the C and you get rid of the

minus sign what else can we do here well

back in the day there used to be a blink

tag and it's one of the few historical

examples of a tag that was removed from

HTML because in the late 90s early 2000s

this is what the web looked like there's

a lot of this kind of stuff there was

even a marquee that would move text from

left to one right over the screen and

the web was a very ugly place I will

admit my very first web page probably

used both of these tags but how can we

bring it back well this is a version of

the blink tag implemented in JavaScript


how I wrote some code in this example

that Waits every 500 milliseconds to

change the CSS of the page to be visible

invisible visible invisible because

built into JavaScript is support for a

clock so you can just do something on

some sort of schedule let me go ahead

and open up this example Auto completes

let me Zoom back out in

autocomplete.html I whipped up as an

example that has just a text box but I

also grabbed the dictionary from problem

set five speller so that if I want to

search for something like apple this

searches that 140 000 words using

JavaScript to create what we know in the

world of the web as autocomplete when

you start searching for something you

sure start to see words that start with

that phrase and sure enough if I search

for something like banana here's the

three variants of bananas that appear in

that file and so forth how is that

working just JavaScript when it finds

matching words it's just updating the

Dom the tree and the computer's memory

to show more and more text or less and

for one final example this is how

programs like doordash and Google Maps


and ubereats and so work you have built

into browsers today some fancy apis

application programming interfaces

whereby you can ask for information

about the user's device for instance

here I wrote a program in

geolocation.html that's apparently

asking to know my location

all right let me go ahead and allow it

this time if that's something you're

comfortable with on your own device

it's taking a moment because sometimes

these things take a little while to

analyze but hopefully in just a moment

there are apparently my GPS coordinates

and as a final flourish today for what

you can do with a little bit of HTML for

your structure CSS for your style and

now JavaScript for your logic which will

tie in again next week let me go ahead

and search Google for those GPS

coordinates zoom in here

on Google Maps and if we zoom in in okay

we're pretty close we're not on that

street but there oh there it is actually

there's the marker it had put for us

we're indeed here in Memorial Hall so

all that with JavaScript the basic

understanding of the Dom and the

document object model we'll pick up


where we left off next week and now add

a back end see you next time

foreign

[Music]

foreign

[Music]

this is cs50 and this is week nine and

this is kind of it in terms of

programming fundamentals today we come

rather full circle with so many of the

languages that we've been looking at

over the past several weeks and with

HTML and CSS and JavaScript last week

we're going to add back into the mix

Python and SQL and with that do we have

the ability to program for the web and

even though this isn't the only user

interface out there increasingly are

people certainly using laptops and

desktops and a browser to access

applications that people have written

but it's also increasingly the way that

mobile apps are written as well there

are languages called Swift for iOS there

are languages called Java for Android

but coding applications in both of those

language means knowing twice as many

language building twice as many

applications potentially so we're


increasingly seeing For Better or For

Worse that the world is starting to

really standardize at least for the next

some number of years on HTML CSS and

JavaScript coupled with other languages

like Python and SQL on the so-called

back end and so today we'll tie all of

those together and give you the last of

the tools in your toolkit with which to

tackle final projects to go off into the

real world ultimately and somehow solve

problems with programming but we need an

additional tool today and we've sort of

outgrown HTTP server this is just a

program that comes on certain computers

that you can install for free happens to

be written in a language called

JavaScript but it's a program that we've

been using to run a web server in vs

code but you can run it on your own Mac

or PC or anywhere else but all this

particular HTTP server does is serve up

static content like HTML files CSS files

JavaScript files maybe images Maybe

video files but just static content it

has no ability to really interact with

the user Beyond Simple clicks you can

create a web form and serve it visually

using HTTP server but if the human types

in input into a form and click submit


unless you submit it elsewhere to

something like google.com like we did

last time it's not actually going to go

go anywhere because this server can't

actually process the requests that are

coming in so today we're going to

introduce another type of server that

comes with python that allows us to not

only serve web pages but also process

user input and recall that all that

input is going to come ultimately from

the URL or more deeply inside of those

virtual envelopes so here's like the

canonical URL we talked about last week

for random website like

www.example.com and I've highlighted the

slash just to connote the the root of

the web server like the default folder

where presumably there's a file called

index.html or something else in there

otherwise you might have a more explicit

mention of the actual file name

file.html you can have folders as you

probably gleaned from the most recent

problem set you can have files in

folders like this and these are all

examples of what a programmer would

typically call a path so it might not

just be a single word it might have


multiple slashes and multiple folders

and sold folders and files but this is

just more generally known as a path but

there's another term of art that's

essentially equivalent that wants you

today this is also synonymously called a

route which is maybe a better generic

description of what these things are

because it turns out they don't have to

map to that is referred to a specific

folder or a specific file you can come

up with your own routes in a website and

just make sure that when the user visits

that you give them a certain web page if

they visit something else you give them

a different web page it doesn't have to

map to a very specific file as we'll

soon see and if you want to get input

from the user just like Google does like

Q equals cats you can add a question

mark at the end of this route the key or

the HTTP parameter name that you want to

Define for yourself and then equal some

value that presumably the human typed in

if you have more of these you can put an

ampersand and then more key equals value

pairs Ampersand repeat repeat repeat

the catch though is that using what the

tools that we had last week alone we

don't really have the ability to parse


that is to analyze and extract things

like Q equals cats you could have

appended question and Mark Q equals cats

or anything else to any of your URLs in

your home page for problem set 8 but it

doesn't actually do anything useful

necessarily unless you use some fancy

JavaScript the server is not going to

bother even looking at that for you but

today we're going to introduce using a

bit of python and in fact we're going to

use a web server implemented in Python

instead of using HTTP server alone to

automatically for you look for any key

value pairs after the question mark and

then hand them to you in the form of a

python dictionary recall that a

dictionary in Python a dict object is

just key value pairs that seems like a

perfect fit for these kinds of

parameters and you're not going to have

to write that code yourself it's going

to be handed to you by way of what's

called a framework so this will be the

second of two Frameworks really that we

look at in the class and a framework is

essentially a bunch of libraries that

someone else wrote and a set of

conventions therefore for doing things


so those of you who really started

dabbling with bootstrap this past week

to make your home pages prettier and

nicely laid out you were using a

framework why well you're using

libraries code that someone else wrote

like all the CSS maybe some of the

JavaScript that the bootstrap people

wrote for you but it's also a framework

in the sense that you kind of have to go

all in like you have to use bootstraps

classes and you have to kind of lay out

your divs or your spans or your table

tags in a sort of bootstrap friendly way

and it's now too onerous but you're

following conventions that a bunch of

humans standardized on so similarly in

the world of python is there another

framework we're going to start using

today and whereas bootstrap is used for

CSS and JavaScript flask is going to be

used for Python and it just solves a lot

of common problems for us it's going to

make it easier for us to analyze the

URLs and get at key value pairs it's

going to make it easier for us to find

files or images that the human wants to

see when visiting our website it's even

going to make it easier to send emails

automatically like when someone fills


out a form you can dynamically using

Code send them an email as well so flask

and with it some related did libraries

it's just going to make stuff like that

easier for us and to do this all we have

to do is adhere to some pretty

minimalist requirements of this

framework we're going to have to create

a file for ourselves called app.pi This

is where our web app or application is

going to live if we have any libraries

that we want to use the convention in

the python world is to have a very

simple text file called requirements.txt

where you list the names of those

libraries top to bottom in that text

file similar in spirit to the include or

the import statements that we saw in C

in Python respectively we're going to

have a static folder or static directory

which means any files you create that

are not ever going to change like images

CSS files JavaScript files they're going

to go in this folder and then lastly any

HTML that you write web pages you want

the human to see are going to go in a

folder called templates so this is again

evidence of what we mean by a framework

like do you have to make a web app like


this no but if you're using this

particular framework this is what people

decided would be the human conventions

if you've heard of other Frameworks like

Django or asp.net or Bunches of others

there are just different conventions out

there for creating applications flask is

a very nice micro framework and that

that's it like all you have to do is

kind of adhere to these pretty

minimalist requirements to get some code

up and running

all right

so let's go ahead and make a web app let

me go ahead and switch over to vs code

here and let me practice what I'm

preaching here by first creating app dot

pi and let's go ahead and create a

application that very simply maybe says

hello to the user so something that

initially is not all that Dynamic pretty

static in fact but we'll build on that

as we've always done so in app.pi what

I'm going to do first is exactly the

line of code I had on the screen earlier

from flask import flask with a capital f

second and a lowercase f first and I'm

also going to preemptively import a

couple of functions render template and

request more on those in just a bit and


then below that I'm going to say go

ahead and do this give me a web a

variable called app that's going to be

the result of calling the flask function

and passing in it this weird incantation

here name so we've seen this a few weeks

back when we played around with python

and we had that if main thing at the

bottom of the screen for now just know

that underscore underscore name

underscore underscore refers to the name

of the current file and so this line

here simple as it is tells python hey

python turn this file into a flask

application flask is a function that

just figures out then how to do the rest

the last thing I'm going to do for this

very simple web application is this I'm

going to say that I'm going to have a

function called index that takes no

arguments and whenever this function is

called I want to return the results of

rendering a template called

index.html and that's it so let's assume

there's a file somewhere haven't created

it yet called index.html but render

template means render this file that is

printed to the user's screen so to speak

the last thing I'm going to do is I have


to tell flask when to call this index

function and so I'm going to tell it to

define a route for quote-unquote Slash

and that's it so let's take a look at

what I just created here this is

slightly new syntax and it's really the

only weirdness that we'll have today in

Python this is what's known in Python is

what's called a decorator a decorator is

a special type of function that modifies

essentially another function for our

purposes just know that on line six this

says Hey python Define a route for Slash

the default page on my web application

the next two lines seven and eight say

hey python Define a function called

index takes no arguments and the only

thing you should ever do is return

render template of quote unquote

index.html

all right so that's it so really the

next question naturally should be like

all right well what is in uh

index.html well let me go ahead and do

that next let me create a directory

called templates practicing again what I

preached earlier so I'm going to create

a new empty directory called templates

I'm going to go and uh CD into that

directory and then do code of index.html


so here's going to be my index page and

I'm going to do a very simple web page

doctype HTML just going to borrow some

stuff from last week HTML language

equals English I'll close that tag I'll

then do a head tag I'll do a meta tag

the name of which is viewport this makes

my site recall responsive that is it

just grows and shrink to fit the size of

the device the initial scale for which

is going to be one and the width of

which is going to be device with so I'm

typing this out I have it printed here

this is stuff I typically copy paste but

then lastly I'm going to add in my title

which will just be hello for the name of

this app and then the body whoops and

Bobby the body of this tag will be

there we go the body of this page rather

will just be hello comma world so very

uninteresting and really kind of a

regression to where we began last week

but let's go now and experiment with

these two files I'm not going to bother

with a static folder right now because I

don't have any other files that I want

to serve up no images no CSS nothing

like that and honestly requirements.txt

is going to be pretty simple I'm going


to go requirements.txt and just say make

sure the system has access to the flask

Library itself all right but that's the

only thing I'm going to add in there for

now all right so now I have two files

app.pi and I have index.html but

index.html thank you is inside of my

templates directory so how do I actually

start a web server last week I would

have said HTTP server but HTTP server is

not a python thing it has no idea about

flask or python or anything I just wrote

HTTP server will just spit out static

files so if I ran HTTP server and then I

clicked on app.pi I would literally see

my python code it would not get X

executed because HTTP server is just for

static content but today I'm going to

run a different command called flask run

so this framework flask that I actually

pre-installed in advance that's so it

wasn't strictly necessary that I create

that requirements.txt file just yet

comes with a program called flask takes

command line arguments like the word run

and when I do that you'll see somewhat

similar output to last week whereby

you'll see the name your URL for your

unique preview of that you might see a

pop-up saying that your application is


running on TCP Port something or other

by default last week we used port 8080

flask just because prefers Port 5000 so

that's fine too I'm going to go ahead

and open up this URL now

and once it authenticates and redirects

me just to make sure I'm allowed to

access that particular Port let me zoom

in voila there's the extent of this

application if I view Source by

right-clicking or control clicking

there's my HTML that's been spit out so

really I've just reinvented the Wheel

from last week because there's no

dynamism now nothing at all but what if

I do this let me close the source and

let me zoom out so you can see my URL

bar let me zoom in now and I have a very

unique cryptic URL but the point is that

it ends with nothing or implicitly it

ends with Slash this is just Chrome

being a little helpful it doesn't bother

showing you a slash even though it's

implicitly there but let me do something

explicit like

my name equals quote unquote David so

there's a key value pair that I've

manually typed into my URL bar and hit

enter
okay nothing happens nothing changes it

still says hello world but the

opportunity today is to now dynamically

get at the input from that URL and start

displaying it to the user so let me go

back over here to my terminal window and

code let me move that down to the bottom

there and what if I want to say huh

hello name I ideally want to say

something like I don't want to hard code

David because then it's never going to

say any hello to anyone else I kind of

want to put like a variable name here

like name should go here but it's not an

HTML tag so I need some kind of

placeholder well here's what I can do if

I go back to my python code I can now

Define a variable called name and I can

ask flask to go into the current request

into its argument that is in the URL as

they're called and get whatever the

value of the parameter called name is

that puts that into a variable for me

and then in render template this is one

of those functions that can take more

than one argument if it takes another

argument you can pass in the name of any

variable you want so if I want to pass

in my name I can literally say name

equals name so this is the name of a


variable I want to give to the template

this is the actual variable that I want

to get the value from and now lastly in

my index.html the syntax as of today in

flask is to do two curly braces and then

put the name of the variable that you

want to plug in

so here's what we mean by a template a

template is kind of like a blueprint in

the real world where it's like planned

to make something this is the plan to

make a web page that has all of this

code literally but there's this

placeholder with two curly braces here

and here that says go ahead and plug in

the value of the name variable right

there so in this sense it's similar in

spirit to our F strings or format

strings in Python the syntax is a little

different just because reasonable people

disagree different people different

Frameworks come up with different

conventions the convention in flask in

their templates is to use two curly

braces here the hope is that you the

programmer will never want to display

two curly braces in your actual web page

but even if you do there's a work around

we can escape that so now let me go


ahead and go back to my browser tab here

previously even though I added name

equals David to the end of the URL with

a question mark it still said hello

world but now hopefully if I made these

changes let me go ahead and open up my

terminal window I'm let me restart flask

so it loads my changes by default let me

go back to my hello Tab and click reload

so it grabs the page Anew from the

server

and there we go hello David I can play

around now and I can change the url up

here to for instance Carter zoom out hit

enter and now we have something more

Dynamic so the new pieces here are in

Python we have some code here that

allows us to access programmatically

everything that's after the question

mark in the URL and the only thing we

have to do to do that is call this

function

request.orgs.get you and I don't have to

bother figuring out where's the question

mark where is the equal sign where are

the ampersands potentially the framework

flask does all of that for us

okay any questions then on these

principles thus far

yeah and back


[Music]

why do you need a question mark in the

URL

the short answer is uh just because that

is where key value pairs must go if

you're making a get request from a

browser to a server the convention

standardized by the HTTP protocol is to

put them in the URL after the so-called

route or path than a question mark and

it delineates what's part of the route

or the path and what's part of the human

input to the right

other questions yeah

sure this is this annoying thing about

python when you pass in parameters two

functions that have names you typically

say something equals something else so

let me make a slight tweak here uh how

about I say name of person here this

allows me to invent my own

variable for my template and assign it

the value of name I now though have to

go into my index file and say name of

person did I get that right name of

person yeah so these two have to match

and so this is just kind of stupid

because it's unnecessarily verbose so

what typically people do is they just


use the same name as the variable itself

even though it looks admittedly kind of

stupid but it has two different roles

the thing to the left of the equal sign

is the name of the variable you plan to

use in the template the thing on the

right is the actual value you're

assigning it and this is because it's

general purpose I could override this

and I could say something like name

always equals Emma no matter what that

variable is and now if I go back to my

browser and reload no matter what's in

the URL David or Carter it's always okay

Emma broke the server uh

what did I do oh I might I didn't change

my template back there we go let me

change that back to be name so that its

name there and its name here but I've

hard-coded Emma's name so now we're only

ever going to see Emma no matter whose

name is in the URL that's all

right so this is kind of bad user

interface if in order to get a greeting

for the day you the user have to

manually change the url which none of us

ever do this is not like how web pages

work how what is the more normal

mechanism for getting input from the

user and putting it in that URL


automatically

how did we do that last week

with Google if you recall

[Music]

okay so we did make something in order

to get the input from the user and

specifically what was the the tag or the

terminology we used last week

sorry a little letter

oh no

but yeah

so the input tag inside of the form tag

so in short forms are of course like how

the web works and how we typically get

input from the user whether it's a

button or a text box or a drop down menu

or something else so let's go ahead and

add that into the mix here so let's

enhance this Hello app to do a little

something more by this time just doing

this let me get rid of this name stuff

and let me just have a very simple

index.html file but that by default is

going to Simply ask the user for some

input as follows I'm going to go back

into my

index.html and instead of printing out

the user's name this is the page I'm

going to use to actually get input from


the user so I'm going to create a form

tag the method I'm going to use for now

is going to be quote unquote get then

inside of that form I'm going to have an

input tag and I'm going to turn off

autocomplete like we did last week I'm

going to turn on auto focus so it puts

the cursor in the text box for me I'm

going to give the name of this input the

name name not to be too confusing but

I'm asking the human for their name so

it makes sense sense that the name of

the input should be quote unquote name

the placeholder I want the human to see

in light gray text will be named with a

capital N just so it's a little

grammatical and then type of this text

field type of this input is going to be

text then I'm just going to give myself

like last week a submit button and I

don't care what it says it's just going

to say the default submit terminology

let me go ahead now and open up my

terminal window again let me go to that

same URL

so that I can see whoops

there we go so that was just cast from

earlier let me go back to that same URL

my GitHub review.dev URL and here I have

the form and now I can type in anything


I want the catch though is when I click

submit where is it going to go well

let's be explicit it does have a default

value but let me go into my index.html

and let me add just like we did last

week for Google whereas previously I

said something like

www.google.com search but today we're

not going to rely on some third party

I'm going to implement the so-called

back end and I'm going to have the user

submit this form to a second route not

just slash but how about Slash greet I

can make it up whatever I want greet

feels like a nice operative word so

slash greet is where the user will be

sent when they click submit on this form

alright so let's go ahead now and go

back to my browser tab let me go ahead

actually and let me reload flask here so

that it reloads all of my changes let me

reload this tab so that I get the very

latest HTML and indeed quick safety

check if I view page page Source we

indeed see that my browser has

downloaded the latest HTML so it

definitely has changed let's go ahead

and type in David and when I click

submit here
what's going to happen

hypotheses

what's going to happen visually

functionally however you want to

interpret when I click submit

yeah

okay the user is going to go to an empty

page pretty good Instinct because

nowhere else have I mentioned slash

greet doesn't seem to exist how's the

URL going to change just to be clear

what's going to appear suddenly in the

URL

yeah

404 no not in the URL specifically in

the URL something's going to get added

automatically when I click

the key value pair right that's how

forms work that's why our Google trick

last week worked I sort of recreated a

form on my own website and even though I

didn't get around to implementing

google.com itself I can still send the

information to Google Just relying on

browsers standardizing to your question

earlier that whenever you submit a form

it automatically ends up after a

question mark in the URL if you're using

get so this both of you are right this

is going to break and all three of you


are right in effect 404 not found you

can see it in the tab here that's the

error that has come back but what's

interesting and most important the URL

did change and it went to slash greet

question mark name equals David so I

just now need to add some logic that

actually looks for that so-called route

so let me go back to my app.pi let me

Define another route for quote unquote

slash greet and then inside of under

this let me Define another function I'll

call it uh greet but I could call it

anything I want no arguments for now for

this and then let me go ahead and do

this in my app.pi this time around I do

want to get the human's name so let me

say

request.orgs get quote unquote name and

let me store that in a variable called

name then let me return a template and

you know what I'm going to give myself a

new template greet.html because this has

a different purpose it's not a form I

want to say hello to the user in this

HTML file and I want to pass into it the

name that the human just typed in

all right so now if I go up and reload

the page what might happen now


other logical check here

if I go ahead and hit reload or resubmit

the form what might happen now

any instincts

let me try so let's try this let's go

ahead and reload the page previously it

was not found now it's worse and this is

the 500 error internal server error that

I promise next week we will all

encounter accidentally ultimately but

here we have an internal server error

because it's an internal error this

means something's wrong with your code

so the route was actually found because

it's not a 404 this time but if we go

into vs code here and we look at the

console the terminal window you'll see

that this is actually a bit misleading

uh do I want to do this let me reload

this

let me reload here oh standby

come on

there we go

come on

okay here we have this error here and

this is where your terminal window is

going to be helpful in your terminal

window by default is typically going to

go helpful stuff like a log log of what

it is the server is seeing from the


browser for instance here's what the

server just saw in purple get slash

Greek question mark name equals David

using HTTP version 5 1.0 here though is

the status code that the server returned

500 why what's the error well here's

where we get these annoying pretty

cryptic python messages that help 50

might ultimately help you with or here

we might just have a clue at the bottom

and this is actually pretty clear even

though we've never seen this error

before what did I screw up here I just

didn't create greet.html right template

not found all right so that must be the

last piece of the puzzle and again

representative of how you might diagnose

problems like these let me go into my

terminal window after hitting Ctrl C

which cancels or interrupts a process

let me go into my templates directory if

I type LS I own only have index.html so

let's code up greet.html and in this

file let's quickly do doctype

doctype HTML Open Bracket HTML language

equals English inside of this I'll have

the head tag inside of here I'll have

the meta the name is viewport the

content of which is uh see I always


forget this too

the content of which is initial scale

equals one with equals device with quote

unquote title is still going to be I'll

call this Greek because this is my

template and then here in the body I'm

going to have hello comma name so I

could have kept the around to the old

version of this but I just recreated

essentially my second template so

index.html now is almost the same but

the title is different and it has a form

greet.html is almost the same but it

does not have a form it just has the

hello comma name so let me now go ahead

and rerun in the correct directory you

have to run flask wherever app.pi is not

in your templates directory so let me do

flask run to get back to where I was

let me go into my other tab cross my

fingers this time that when I go back to

slash and I get index.html's form now I

type in David and click submit now

we get hello David and now we have kind

of a full-fledged web app that has two

different routes slash and slash greet

the latter of which takes input like

this and then using a template spits it

out but something could go wrong let's

see what happens here suppose I don't


type anything in let me go here and just

click submit now

I mean it looks kind of stupid so

there's Bunches of ways we could solve

this I could require that the user have

input on the previous page I could have

some kind of error check for this but

there's another mechanism I can use that

I'll just show you it turns out this get

function in the context of HTTP and also

in general with python dictionaries you

can actually Supply a default value so

if there is no name parameter or no

value for a name parameter you can

actually give it a default value like

this so I'll say world for instance now

let me go back here let me type in

nothing again and click submit and

hopefully this time I'll in oops sorry

let me restart flask to reload the

template let me go ahead and type

nothing this time clicking submit and

hopefully

we now oh

interesting I should have faked this uh

suppose that

the reason this uh suppose I just get

rid of name altogether like this and hit

enter now I see Hello World and this is


a subtlety that I didn't intend to get

into here when you have question mark

name equals nothing you're passing in

what's called whoops when you have greet

question mark name equals something you

actually are giving a value to name it

is quote unquote with nothing in between

that is different from having no value

at all so allow me to just propose that

the error here uh we would want to

require this in a different way and

probably the most robust way to do this

would be to go in here in my HTML and

say that the name field is required

now if I go back to my form

after restarting flask here

and I go ahead and click reload on my

form and type in nothing and click

submit now the browser is going to yell

at me but just as a teaser for something

we'll be doing in the next problem set

in terms of error checking you should

never ever ever rely on client-side

safety checks like this because we know

from last week that a curious programmer

can go to inspect and let me poke around

the HTML here let me go into the body

the form okay you save required I say

not required you can just delete what's

in the Dom in the browser and now I can


go ahead and submit this form and it

appears to be broken not a big deal with

a silly little greeting application like

this but if you're trying to require

that humans actually provide input that

is necessary for the correct operation

of the site you don't want to trust that

the HTML is not altered by some

adversary

all right any questions then on this

particular app before we add another

feature here

any questions here

yeah

sorry little ladder in the index

function

sorry

would it be a problem if what

[Music]

no I mean no this is okay what you

should really do is something we're

going to do with another example where

I'm going to start error checking things

so let me wave my hands at that and

propose that we'll solve this better in

just a bit but it's not bad to do what I

just did here it's only going to handle

one of the scenarios that I was worried

about not all of them all right so even


though this is new to most of us here

consider index.html my first template

and consider greet.html my second

template

what might be arguably badly designed

even though this might be the first time

you've ever touched web programming like

this

like what's kind of bad or dumb about

this design of these two templates alone

and there's a reason too that I kind of

bored Us by typing it out that second

time yeah

Chicago

[Music]

yeah there's so much repetition I mean

it was to kind of deliberately tedious

that I was retyping everything the

doctype the HTML tag the head tag the

title tag and little things did change

along the way like the title and

certainly the content of the body but so

much of this I mean almost all of the

page is a copy of itself in multiple

files and God forbid we have a third

template a fourth template a hundredth

template for a really big website this

is going to get very tedious very

quickly and suppose you want to change

something in one place you're gonna have


to change it now in two three a hundred

different places instead so just like in

programming more generally we have this

ability to factor out commonalities so

do you in the context of web programming

and specifically templating have the

ability to factor out all of those

commonalities the syntax is going to be

a little curious but it functionally is

pretty straightforward let me go ahead

and do this let me go ahead and copy the

contents of index.html let me go into my

templates directory and go to file that

by default is called layout Dot HTML and

let me go ahead and per your answer copy

all of those commonalities into this

file now instead so here I have a file

called layout.html I don't want to give

every page the same title maybe but for

now that's okay I'm going to call

everything hello but in the body of the

page what I'm going to do here is just

have a placeholder for actual contents

that do change so in this layout I'm

going to go ahead and here

and just put in the body of my page how

about this syntax and this is admittedly

new block body

and then percent sign close curly brace


and then I'm going to do end block so

kind of a curious syntax here but this

is more template syntax the other

template syntax we saw before was the

two curly braces that's for just

plugging in values there's this other

syntax with flask that allows you to say

a single curly brace a percent sign and

then some functionality like this

defining a block and this one's a little

weird because there's like literally

nothing between the clothes curly and

the open curly brace here but let's see

what this can do for us let me now go

into my

index.html which is where I kind of

borrowed most of that code from and let

me focus on what is minimally different

the only thing that's really different

in this page title aside is the form

so let me go ahead and just cut that

form out to my clipboard let me change

the first line of index.html to say this

this file is going to extend layout.html

and notice I'm using the curly braces

again and this file is going to have its

own body block inside of which is just

the HTML that I actually want to make

specific to this page and I'll keep my

indentation nice and neat here and let's


consider what I've done this is starting

to look weird fast and this is now a mix

of HTML with templating code

index.html first line now says Hey flask

this file extends layout.html whatever

that is this next line 3 through 10 says

Hey flask here is what I consider my

body block to be plug this into the

layout placeholder therefore so if I now

go back to layout.html

in layout.html it's almost all HTML by

contrast but there is this placeholder

and if I want to put a default value I

could say whoops if I want to put a

default value I could put a default

value there just in case some page does

not have a body block but in general

that's not going to be relevant so this

is just a placeholder albeit a little

verbose that says plug in the page

specific content right here

so if I go now into greet.html this

one's even easier I'm going to cut this

content and get rid of everything else

greet.html2 is going to extend

layout.html extends plural and then I'm

going to have my body block here simply

be this one line of code and then I'm

going to go ahead and end that block


here these are not HTML tags this is not

HTML to syntax technically the syntax we

keep seeing with the curly braces and

these now curly braces with percent

signs is an example of Ginger syntax

j-i-n-j-a which is a language that some

humans invented for this purpose of

templating and the people who invented

flask decided we're not going to come up

with our own syntax we're going to use

these other people's syntax called

ginger syntax so again there starts to

be at this point in the tours and really

in Computing a lot of sharing now of

ideas and sharing of code so flask is

using this syntax but other libraries

and other languages might also too all

right so now index.html you know is half

HTML half templating code Ginger syntax

greet.html is almost all Ginger syntax

no tags even but because they both

extend layout.html now I think I've kind

of improved the design of this thing if

I go back to app.pi none of this really

needs to change I don't change my

templates Dimension layout.html that's

already implicit in the fact that we

have the extends keyword so now if I go

ahead and open my terminal window go

back to the same folder as app.pi and do


flask run

all right my application is running on

Port 5000 let me now go back to the

slash route in my browser and hit enter

I have this form again and just as a

little check let me view the source of

the page that my browser is seeing and

there's all of the code no no mention of

Jinja no curly braces no percent signs

just HTML it's not quite pretty printed

in the same way but that's fine because

now we're starting to dynamically

generate websites and by that I mean

this isn't quite indented nicely or

perfectly that's fine if it's indented

in the source code version doesn't

matter what the browser really sees let

me now go ahead and type in my name

click submit I should see yep hello

David let me go ahead and view the

source of this page and we'll see almost

the same thing with what's plugged in

there so this is now web programming in

the literal sense I did not hard code a

page that says hello comma David hello

comma Carter hello comma Emma I hard

coded a page that has a template with a

placeholder and now I'm using actual

logic some code in app.pi to act


actually tell the server what to send

to the browser

are any questions then

on where we're at here this is now a web

application simple though it is it's no

longer just a web site

is it better for design or for memory uh

both it's definitely better for design

because truly if we had a third page

fourth page I would really start just

resorting to copy paste and as you saw

with home page often in the head of your

page you might want to include some CSS

files like bootstrap or something else

you might want to have other information

up there if you had to upgrade the

version of bootstrap or you change

libraries you want to change one of

those lines you would literally have to

go into like three four a hundred

different files to make one simple

change so that's bad design and in terms

of memory yes theoretically the server

because it knows there's this common

layout it can theoretically do some

optimizations underneath the hood flask

is probably doing that but not in the

mode we're using it we're using it in

development mode which means it's

typically reloading things each time


other questions

on this application

anything at all all right so let me ask

a question not just in terms of uh the

code design what about the implications

for privacy like why is this maybe not

the best design for users how I've

implemented this I've used a web form

but yeah

yeah I mean if you have a nosy sibling

or roommate and they have access to your

laptop and they just go trolling through

your autocomplete or your history like

literally what you typed into a website

is going to be visible not a big deal if

it's your name but if it's your password

your credit card or anything else that's

mildly sensitive you probably don't want

it ending up in the URL at all even if

you're in incognito mode or whatnot like

you just don't want to expose yourself

or your users to that kind of risk so

perhaps we can do better than that

unfortunately this one's actually an

easy change let me go into my

index.html where my form is and in my

form I can just change the method from

get to post it's still going to send key

value pairs to the server but it's not


going to put them in the URL the upside

of which is that we can assuage this

privacy concern but I'm going to have to

make one other change too because now if

I go ahead and run flask again after

making that change and I now reload the

form to make sure I have the latest

version you should be in the habit of

going to view developer view Source or

developer tools just to make sure that

what you're seeing in your browser is

what you intend and yes I do see what I

wanted method equals post now let me go

ahead and type in David and click submit

now I get a different error this one is

HTTP 405 method not allowed well why is

that well in my flask application I've

only defined a couple of routes so far

one of which is for slash then that

worked fine one of which is for slash

greet and that used to work fine but

apparently what flask is doing is it

only supports

get by default so if I want to change

this route to support different methods

I can say quote unquote uh post inside

of this parameter here so that now I can

actually support post not just get and

if I now re start flask so flask run

enter
and I go back to this URL let me go back

one screen to the form reload the page

just to make sure I have the latest even

though nothing there has changed type

David and click submit now now I should

see Hello World notice that I'm at the

slash greet route but there's no mention

of

name equals anything in the URL

all right so that's kind of an

interesting takeaway right like it's a

simple change but whereas get puts

things in the URL post does not but it

still works so long as you tweak the

back end to look

as a post request which kind of means

look deeper in the envelope it's not

going to be as simple as looking at the

URL itself why shouldn't we just always

use post

[Music]

like why not use post everywhere

any thoughts right because it's kind of

obnoxious to be putting any information

in URLs if you're leaving these little

breadcrumbs in your history and people

can poke around and see what you've been

doing

[Music]
yeah what do you think

yeah I mean if you get rid of get

requests and put nothing in the URL your

history like your autocomplete gets

pretty less useful right because none of

the information is therefore stored so

you can't just go through the menu and

hit enter you'd have to like refill out

the form and there's this other symptom

that you can see here let me zoom out

and let me just reload this page notice

that you'll get this warning and it'll

look different in Safari and Firefox and

Edge and chrome here confirm form

submission so your browser might

remember what your inputs were and

that's great but just while you're on

the page and this is in contrast to get

where the state state is information

like key value pairs is in embedded in

the URL itself and if you looked at an

email I sent earlier today I

deliberately linked to https colon slash

www.google.com search question mark Q

equals what plus time plus is plus it

this is by definition a get request when

you click on it because it's going to

grab the information that a key value

pair from the URL send it to Google

server and it's just going to work and


the reason I sent this via email earlier

was I wanted people to very quickly be

able to check what is the current time

and so I can sort of automate the

process of creating a Google search for

you but that you induce when you click

that link if Google did not support get

they only supported this the best I

could do is send you all to this URL

which unfortunately has no useful

information I would have had to add to

my email by the way type in the words

what time is it so it's just bad for

usability so there too we might have

design when it comes to the low level

code but also the design when it comes

to the user experience or ux as a

computer scientist would call it just in

terms of what you want to optimize for

ultimately so get and post both have

their roles it depends on what kind of

functionality you want to provide and

what kind of sensitivity there might be

around it

all right any questions then on this our

first web application super simple just

get someone's name and prints it back

out but we kind of now have all the

plumbing with which to create


really most anything we want

no all right let's go ahead and take a

five minute break and when we come back

we'll add to this some first year

intramural sports all right so we are

back and recall that the last thing we

just changed was the route to use post

instead of get So Gone is my name and

any value in the URL but there was kind

of a subtle bug or change here that we

didn't call out earlier like I did type

David into the form and I did click

submit and you hear it is saying hello

comma world

so that seems to be broken all of a

sudden even though we added support for

post

but something must be wrong logically it

must be the case here intuitively that

if I'm seeing hello world that's the

default value I gave the name variable

it must be that it's not seeing a key

called name in request.orgs

which is this gives you access to

everything after the URL that's because

there's this other thing we should know

about which is not just request.orgs but

request.form these are horribly named

but request.orgs is for get requests

request.form is for post requests


otherwise they're pretty much

functionally the same but the onus is on

you the user or the programmer to make

sure you're using the right ones so I

think if we want to get rid of the world

and actually see what I the human typed

in I think I can just change request.org

to request dot form

still dot get still quote unquote name

and now if I go ahead and rerun flask

and my terminal window go back to my

browser go back to and actually I won't

even go back to the form I will

literally just reload command r or

control R and what this war warning is

saying is it's going to submit the same

information to the website when I click

continue now I should see hello comma

David so again you too are going to

encounter probably all these kinds of

like little subtleties but if you focus

on really the first principles of last

week like what is HTTP how does a get

request work how does a post request

work now you should have a lot of the

mental building blocks with which to

solve problems like these and let me

give you one other mental model now for

what it is we're doing this framework


called flask is just an example of many

different Frameworks that all implement

the same Paradigm the same way of

thinking and the same way of programming

applications and that's known as MVC

model view controller and here's a very

simple diagram that represents the

process that you and I have been

implementing thus far and actually it's

this is more than we've been

implementing thus far in app.pi is what

a programmer would typically call the

controller that's the code you're

writing the so-called business logic

that makes all of the decisions decides

what to render what values to show and

so forth in layout.html index.html

greek.html is the so-called view

templates that is the visualizations

that the human actually sees the user

interface those things are kind of dumb

they pretty much just say plop some

values here all of the hard work is done

in app.pi so controller AKA app.pi is

doing is where your python code

generally is and in your view is where

your HTML and your Jinja code your

Ginger templating the curly braces the

curly braces with percent signs usually

is we haven't added an m to MVC yet


model that's going to refer to things

like CSV files or databases the model

where do you keep actual data typically

long term so we'll come back to that

this picture where you have one of these

each of these components kind of inter

communicating with one another is

representative of how a lot of

Frameworks work what we're teaching

today this week is not really specific

to python it's not really specific to

flask even though we're using flask it

really is a very common Paradigm that

you could Implement in Java C sharp or

Bunches of other languages as well all

right so let's now pivot back to uh vs

code here let me stop running flask and

let me go ahead and create a new folder

altogether

after closing these files here

and let me go ahead and create a folder

called frosh IMS representing freshman

intramural sports or first year

intramural sports that I can now CD into

and now I'm going to code an app.pi

and in anticipation I'm going to create

another templates directory this one in

the frosh IMs folder and then in my

templates directory I'm going to create


a layout.html and I'm just going to get

myself started here frosh IMS will go

here I'm just copying my layout from

earlier because most of my interesting

work this time is now going to be

initially in app.pi so what is it we're

creating so like literally the very

first thing I wrote as a web application

like 20 years ago was a site that

literally looked like this so I was like

a sophomore or Junior at the time I

taken cs50 in a follow-on class only I

had no idea how to do web programming

neither of those two courses taught web

programming back in the day so I taught

myself at the time a language called

Pearl and I learned a little something

about CSV files and I sort of read

enough can't even say Googled enough

because Google didn't come out for a

couple of years later read enough online

to figure out how to make a web

application so that students on campus

First Years could actually register via

a website for intramural sports back in

my day you would literally fill out a

piece of paper and then walk it across

the yard to Wigglesworth Hall one of the

dorms slide it under the dorm of the

Proctor or ra and thus you were


registered for sports so 1996 1997 like

we could do better by then there was an

internet just wasn't really being used

much on campus or more generally so uh

background images that repeat infinitely

was kind of in Vogue apparently at the

time all of this was like images that I

had a handmade because we did not have

the features that JavaScript and CSS

nowadays have so it was really just HTML

and it was really just controller code

written not in Python but in Pearl and

it was really just the same building

blocks that we hear already today now

have so we'll get rid of all of the

imagery and focus more on the

functionality in the Aesthetics but

let's see if we can't whip up a web

application via which someone could

register for one such intramural sport

so in app.pi let me go ahead and import

some familiar things now from flask

let's import capital flask which is that

function we need to kick everything kick

start everything render templates so we

have the ability to render that is print

out those templates and request so that

we have the ability to get at input from

the human let me go ahead and create the


application itself using this magical

incantation here and then let's go ahead

and Define a route

for slash for instance first I'm going

to define a function called index but

just to be clear this function could be

anything Foo bar baz anything else but I

tend to name them in a manner that's

consistent with what the route is called

but you could call it anything you want

it's just the function that will get

called for this particular route let me

go ahead here and just get things

started return render template of

index.html just keep it simple nothing

more so there's nothing really frosh I

am specific about this here I just want

to make sure I'm doing everything

correctly meanwhile I've got my layout

okay let me go ahead and in my templates

directory code a file called index.html

and let's just do extends layout dot

HTML at the top just so that we get

benefit from that template and down here

I'm just going to say to do just so that

I have something going on visually to

make sure I've not screwed up yet in my

frosh IMS directory let me do flask run

let me now go back to my previous URL

which used to be my hello example but


now I'm serving up the frosh IM site

oh and I'm seeing nothing that's because

I screwed up accidentally

what did I do wrong in index.html

[Music]

what am I doing wrong this file extends

layout.html but

[Music]

yeah I forgot to tell flask what to plug

into that layout so I just need to say

block body and then in here I can just

say to do or whatever I want to

eventually get around to then end the

block let me end this tag here okay so

now it looks kind of ugly more cryptic

but this is again the essence of doing

templating let me now restart flask up

here let me go back to the page let me

reload crossing my fingers this time and

there we go to do so it's not the

application I want but at least I know I

have some of the plumbing there by

default all right so if I want the user

to be able to register for one of these

Sports let's enhance now index.html to

actually have a form that's maybe got

like a drop down menu for all of the

sports for which you can register so let

me go into this template here and


instead of to do let's go ahead and give

myself

um how about an H1 tag that just says

register so the user knows what it is

they're looking at how about a form tag

that's going to use post just because

it's not really necessary to put this

kind of information in the URL the app

action for that how about we plan to

create a register route so that we're

sending information from slash to a

register route so we'll have to come

back to that in here let me go ahead and

create

um how about in input with autocomplete

equals off auto focus on uh how about a

name equals name because I'm going to

ask the student for their name using

placeholder text of quote unquote name

and the type of this box will be text so

this is pretty much identical to before

but if you've not seen this yet let's

create a select menu a so-called drop

down menu in HTML and maybe the first

option I want to be in there is going to

be oh how about the current three sports

with the for the fall which are uh

basketball

and another option is going to be soccer

and the third options going to be


Ultimate Frisbee for first year

intramural is right now so I've got

those three options I've got my form I

haven't implemented my route yet but

this feels like a good chance a good

chance a good time to go back now and

check if my form has reloaded so let me

go ahead and stop and start flask you'll

see there's ways to automate the process

of restarting the server that we'll do

for you for problem set nine so you

don't have to keep stopping flask let me

reload my index route and okay it's not

that pretty it's not not though maybe

nor was this but it now has at least

some functionality where I can type in

my name and then type in the sport now I

might be biasing people toward

basketball like uxy's user experience

Wise It's kind of obnoxious to like

pre-check basketball but not the others

so there's some little tweaks we can

make there let me go back into

index.html let me create like an empty

option up here that technically

this option is not going to have the

name of any sport but it's just going to

have a word I want the human to see so

I'm actually going to disable this


option and make it selected by default

but I'm going to say sport up here and

there's different ways to do this this

is just one way of creating essentially

a whoops option yep that looks right

creating a placeholder sport so that the

user sees something in the drop down let

me go ahead and restart flask reload the

page and now it's just going to be

marginally better now you see sport

that's checked by default but you have

to check one of these other ones

ultimately all right so that's pretty

good so let me now type in David I'll

register for Ultimate Frisbee okay I

definitely forgot something

submit button so let's add that all

right so input type equals submit all

right let's put that in restart flask

reload okay getting better submit could

be a little prettier recall that we can

change some of these HTTP these HTML

attributes the value of this button

should be register maybe just to make

things a little prettier let me now

reload the page

and register all right so now we really

have the beginnings of the user

interface that I created some years ago

to let people actually register for the


sport so let's go now and create maybe

the other route that we might need let

me go into app.pi and in here if we want

to allow the user to register let's do a

little bit of error checking which I

promised we'd come back to like what

could the user do wrong because assume

that they will one they might not type

their name two they might not choose the

sports so they might just submit an

empty form so that's two things we could

check for just so that we're not storing

like bogus entries in our database

ultimately so let's create another route

called greet slash greet and then in

this route let's create a function

called grit but can be called anything

we want and then let's go ahead and in

the Greet function let's go ahead and

validate the submission so a little

comment to myself here how about if

it there is not a request dot form get

name value

so that is if that function returns

nothing like quote unquote or the

special word none in python or

request.form

dot get quote unquote sport

uh not in quote unquote what were they


basketball

uh the other one was soccer and the last

was Ultimate

frisbee getting a little long but notice

what I'm the question I'm asking if the

user did not give us a name that is if

this function Returns the equivalent of

false which is quote unquote or

literally none if there's no such

parameter or if the sport the user

provided is not some value in basketball

soccer or Ultimate Frisbee which I've

defined as a python list then let's go

ahead and just yell at the user in some

way let's return uh render template of

failure.html and that's just going to be

some error message inside of that file

otherwise if they get this far let's go

ahead and confirm registration by just

returning whoops returning render

template quote unquote success.html

all right so a couple quick things to do

let me first go in and in my templates

directory let's create this

failure.html file and this is just meant

to be a message to the user that they

fail to provide the information

correctly so let me go ahead and in

failure.html not repeat my past mistake

so let me extend layout.html and in the


block body you are not registered I'll

just yell at them like that so that they

know something went wrong and then let

me create one other file called

success.html that similarly is mostly

just Ginger syntax and I'm just going to

say for now even though they're not

technically registered in any database

you are registered that's what we mean

by success

all right so let me go ahead and back in

my frosh IMS directory run flask run

let me go back to the form and reload

should look the same

all right so now let me not cooperate

and just immediately click register and

patiently

okay what did I do wrong register oh I'm

confusing our two examples

all right I spotted the error what did I

do wrong

unintentional

there's where I am what did I actually

invent over here

[Music]

where did I screw up

anyone

thank you so register not greed I had

last example on my mind so the route


should be register ironically the

function could be great because that

actually doesn't matter but to keep

ourselves saying let's use the one in

the same words there let me go ahead now

and start flask as intended let me

reload the form just to make sure all is

working now let me not cooperate and be

a bad user clicking register

[Music]

okay

other unintended mistake but this one

we've seen before notice that by default

routes only support get so if I want to

specifically support post I have to pass

in Via a methods parameter a list of

allowed

route methods that could be get comma

post but if I don't have no need forget

in this context I can just do post all

right now let's do this one last time

reload the form make sure everything's

okay click register and you are not

registered so it's catching that all

right let me go ahead and at least give

them my name register you are not

registered all right fine I'm going to

go ahead and be David with Ultimate

Frisbee register

huh
okay

what should I what did I mean to do here

all right so let's figure this out how

to debug something like this which is my

third and final unintended unforced

error

how can we go about troubleshooting this

turn this into the teachable moment

all right well first some like safety

checks like what did I actually submit

let me go ahead and view page Source A

good rule of thumb look at the HTML that

you actually sent to the user so here I

have an input with a name name so that's

what I intended that looks okay ah I see

it already even though you if you've

never used a select menu might not know

what apparently is missing from here

that I did have for my

text input

just intuitively

like logically what's going through my

head embarrassingly is like all right if

my form thinks that it's missing a name

or a sport how did I create a situation

in which name is blank or sport is blank

well name I don't think is going to be

blank because I explicitly gave this

text field a name name and that did work


last time I've now given a second input

in the form of the select menu but what

seems to be missing here that I'm

assuming exists

here

it's just a dumb mistake I made

[Music]

what might be missing here

if request.form gives you all of the

inputs that the user might have typed in

let me go into my actual code here in my

form and

name equals sport I just didn't give a

name to that input so it exists and the

browser doesn't care it's still going to

display the form to you it just hasn't

given it a unique name to actually

transmit to the server so now if I'm not

going to put my foot in my mouth I think

that's what I did wrong and again my

process for figuring that out was

looking at my code thinking through

logically is this right is this right no

I was missing the name there so let's

run flask

let's reload the form just to make sure

it's all defaults again type in my name

and type in Ultimate Frisbee crossing my

fingers extra hard this time

and there you are registered so I can't


emphasize I did not intend to screw up

in that way but that's exactly the right

kind of thought process to diagnose

issues like this go back to the basics

go back to what HTTP and what HTML forms

are all about and just rule things in

and out there's only a finite number of

ways I could have screwed that up yeah

[Music]

it's gonna say a little louder

why did name equals sport address the

problem well let's first go back to the

HTML previously it was just the reality

that I had this user input drop down

menu but I never gave it a name but

names or more generally key value pairs

is how information is sent from a form

to the server so if there's no name

there's no key to send even if the human

types of value like it would be like

nothing equals Ultimate Frisbee and that

just doesn't work the browser is just

not going to send it however in app.pi I

was naively assuming that in my requests

form there would be a name called

quote-unquote sport it could have been

anything but I was assuming it was sport

but I never told the form that and if I

really wanted to dig in we could do a


little something more let me go back to

the way it was a moment ago let me get

rid of the name of the sport drop down

menu let me rerun flask down here and

reload the form itself after it finishes

being served and now let me do this view

developer tools

and then let me watch the network tab

which recall we played around with a

little bit last week and we also played

around with curl which let us see the

HTTP requests here's another here's what

I would have done if I still wasn't

seeing the error and was really

embarrassed on stage I would have typed

in my name as before I would have chosen

Ultimate Frisbee I would have clicked

register and now I would have looked at

the HTTP request and I would click on

register here and just like we did last

week I would go down to the request down

here and there's a whole lot of stuff

that we can typically ignore but here

let me zoom in way at the bottom what

Chrome's developer tools are doing for

me it's showing me all of the form data

that was submitted so this really would

have been my Telltale clue I'm just not

sending the sport even if the human

typed it in and logically because I've


done this before that must mean I didn't

give the thing a name but another good

tool like good programmers web

developers are using these kinds of

tools all the time they're not writing

bug-free code that's not the point to

get to the point to get to is being a

good diagnostician I would say in these

cases okay other questions on this

I'm sorry sorry a little letter

[Music]

uh so if how would you edit a uh CSS if

you have these templates that process

will actually see before long is almost

going to be the exact same just to give

you a teaser for this and you'll do this

in the problem set but we'll give you

some distribution code to automate this

process you can absolutely still do

something like this link href uh equals

quote unquote Styles dot CSS Rel equals

style sheet that's one of the techniques

we showed last week the only difference

today using flask is that all of your

static files by convention should go in

your static folder so the change you

would make in your layout would be to

say that styles.css is in your static

folder and then if I go into my frosh


IMS directory I can create a static

folder I can CD into it nothing's there

by default but if I now code a file

called styles.css I could now do

something like this body and in here I

could say back ground back uh ground

color say uh

[Music]

ff000 to make it red let me go ahead now

and restart flask in the frosh I am's

directory across my fingers because I'm

doing this on the Fly

go back to my form and reload

voila now we've tied together last

week's stuff as well

if I answered the right question

[Music]

if you want to change one page and not

the other in terms of CSS

that depends in that case you're Pro you

might want to have different CSS files

for each page if they're that different

you could use different classes in one

template than you did in the other

there's different ways to do that you

could even have a placeholder in your

layout that allows you to plug in the

URL of a specific style sheet in your

individual files but that starts to get

more complicated quickly so in short you


can absolutely do it but typically I

would say most websites

try not to use different style sheets

per page they reuse the Styles as much

as they can

okay all right let me go ahead and

revert this real quick and let's start

to add a little bit more functionality

here I'm going to go ahead and just

remove the static folder just so as to

not complicate things just yet and let's

go ahead and just play around with a

different user interface mechanism in my

form here the drop down menu is

perfectly fine nothing wrong with it but

suppose that I wanted to change it to

like check boxes instead maybe I want

students to be able to register for

multiple Sports instead well it might

make sense to clean this up in a couple

of ways and let's do this before we even

get into the check boxes there's one

subtle bad design here notice that I've

hard-coded basketball soccer and

Ultimate Frisbee here and if you recall

an app.pi I also enumerated all three of

those here and anytime you see like copy

paste or the equivalent thereof feels

like we could do better so what if I


instead do this what if I instead give

myself like a global variable of sports

I'll capitalize the word just to connote

that it's meant to be constant even

though python does not have constants

per say the first sport will be

basketball the second will be soccer the

third will be Ultimate Frisbee

now I have one convenient place to store

all of my sports if it changes next

semester or next year or whatnot but

notice what I could do too I could now

do something like this let me pass into

my index template a variable called

sports that's equal to that Global

variable Sports let me go into my index

now and this is really now going to hint

at the power of templating and Jinja in

this case here let me go ahead and get

rid of all three of these hard-coded

options and let me show you some

slightly different Syntax for sport in

sports

then end four

we've not seen this N4 syntax there's

like n block syntax but it's as simple

as that so you have a start and an end

to your block without indentation

mattering watch what I can do here

option uh curly brace sport close curly


brace let me save that let me go back

into my terminal window do flask run and

if I didn't mess up here let me go back

to this the Red's going to go away

because I deleted my CSS and now I still

have a sport drop down and all of those

sports are still there I can make one

more Improvement now I don't need to

mention these same Sports manually in

app.pi I can now just say if the user's

inputted sport is not in my Global

variable Sports and ask the same

question and this is really handy

because if there's another sport for

instance that gets added like say

football all I have to do is change my

Global variable and if I reload the form

now and look in the drop down boom now I

have support for a fourth Sport and I

can keep adding and adding there so

here's where templating starts to get

really powerful in that now in this

template I'm using jinja's for Loop

syntax which is almost identical to

python here except you need the curly

brace and the percent sign and you need

the weird ending and four but it's the

same ideas in Python iterating over

something with a for Loop lets you


generate more and more HTML and this is

like every website out there for

instance Gmail when you visit your inbox

and you see all of this big table of

emails you know Google has not

hard-coded your emails manually they

have grabbed them from a database they

have some kind of for Loop like this and

are just outputting table row after

table row or div after div

dynamically all right so now let's go

ahead and change this maybe to oh how

about uh little uh check boxes or radio

buttons so let me go ahead and do this

instead of a select menu I'm going to go

ahead and do something like this for

each of these Sports let me go ahead and

output not an option but let me go ahead

and output an input tag the name for

which is quote unquote sport the type of

which is check box the value of which is

going to be the current sport quote

unquote and then afterward I need to

redundantly seemingly output the sport

so you see a word next to the checkbox

and we'll look at the result of this in

just a moment so it's actually a little

simpler than a select menu a drop down

menu because now watch what happens if I

reload my form different user interface


and you know it's not as pretty but it's

going to allow users to sign up for

multiple Sports at once now it would

seem now I can click on basketball and

football and soccer or some other

combination thereof if I view the Page's

Source this is again the power of

templating I didn't have to type out

four inputs I got them now automatically

and these things all have the same name

but that's okay it turns out with flask

if it sees multiple values for the same

name it's going to hand them back to you

as a list if you use the right function

all right but suppose we don't want

users registering for multiple Sports

maybe capacity is an issue let me go

ahead and change this checkbox to radio

button which a radio button is mutually

exclusive so you can only sign up for

one so now once I reload the page it now

there we go it now looks like this and

because I've given each of these inputs

the same name quote unquote sport that's

what makes them mutually exclusive the

browser knows all four of these things

are types of sports therefore I'm only

going to let you select one of these

things and that's simply because they


all have the same name again if I view

page Source notice all of them name

equals sport name equals sport name

equal sport but what differs is the

value that each one is going to have

all right any questions then on this

approach

[Music]

all right well let me go ahead and open

a version of this that I made in advance

that's going to now start saving the

information so thus far we're not quite

at the point of where this website was

which actually allowed the Proctors to

see like in a database everyone who had

registered for sports now we're

literally telling students you are

registered or you are not registered but

we're literally doing nothing with this

information so how might we go about

implementing this well let me go ahead

and close these tabs and let me go into

what I called version three of this in

the code for today and let me go into my

source 9 directory froshims3 and let me

go ahead and open up app.pi so this is a

pre-made version I've gotten rid of

football in this case but I've added one

thing at the very top what's in English

does this represent on line seven


what would you describe what that thing

is

[Music]

what are we looking at what do you think

yeah it's an empty dictionary right

registrants is apparently a variable on

the left it's being assigned an empty

dictionary on the right and a dictionary

again is just key value pairs here again

is we're dictionaries are just such a

useful data structure why because this

is going to allow me to remember that

David registered for Ultimate Frisbee

Carter registered for soccer Emma

registered for something else you can

associate keys with values names with

sports assuming a model where you can

only register for one sport for now and

so let's see what the logic is that

handles this

here in my register route in the code

I've pre-made notice that I'm validating

the user's name slightly differently

from before but same idea I'm using

request.form.get to get the human's name

if not name so if the human did not type

a name I'm going to Output

error.html but notice I've started to

make the user interface more expressive


I'm telling the user apparently with a

message what they did wrong well how I'm

apparently passing to my error template

instead of just failure.html a specific

message so let's go down this Rabbit

Hole let me actually go into

template slash error.html and sure

enough here's a new file I created here

that adorably is apparently going to

have a grumpy cat as part of the error

message but notice what I've done in my

block body I've got an H1 tag that just

says error big and bold I then have a

paragraph tag that plugs in whatever the

mirror message is that the controller

app.pi is passing in and then just for

fun I have a picture of a grumpy cat

connoting that there was in fact an

error let's keep looking how do I

validate sport I do similarly

request.form dot get of sport and I

store it in a variable called sport if

there's no such sport that is the human

did not check any of the boxes then I'm

going to render error.html2 but I'm

going to give a different message

missing sport else if the sport they did

type in is not in my sports Global

variable I'm going to render error.html

but complain differently you gave me an


invalid sport somehow they you know as

like a hacker went into the HTML of the

page changed it to add their own sport

like volleyball even though it's not

offered they submitted volleyball but

that's okay I'm rejecting it even though

they might have maliciously tried to

send it to me by changing the Dom

locally and then really the magic is

just this I remembered that this person

is registered by indexing into the

registrants dictionary using the name

the human typed in is the key and

assigning it a value of sport why is

this useful well I added one final route

here I have a slash registrants route

with a registrants function that renders

a template called registrants at HTML

but it takes as input that Global

variable

just like before so let's go down this

Rabbit Hole let me go into

templates

registrants.html here's this template or

it looks a little crazy big but it

extends the layout here comes the body

I've got an H1 tag that says registrants

big and bold then I've got a table that

we saw last week this has a table head


that just says name sport for two

columns then it has a table body where

in using this for Loop in ginger syntax

I'm saying for each name in the

registrants variable output a table row

start tag and end tag inside of which

two table datas two cells table data for

name table data for registrants bracket

name

so it's very similar to python syntax it

essentially is python syntax albeit with

these curly braces and the percent sign

so the net effect here is what let me

open up my terminal window run flask run

let me now go into the form that I

pre-made here so gone is football let me

go ahead and type in David let me choose

oh no sport register

error missing Sport and there is the

grumpy cat so missing sport though

specifically was outputted all right

fine let me go ahead and say uh no name

but I'll choose basketball register

missing name all right let me

maliciously now do this right now I'm

hacking let me go into this I'll type my

name sure but let me go into the body

tag down here let me maliciously go down

in Ultimate Frisbee uh heck with that

let's volleyball change that and change


this to

volleyball league ball enter so now I

can register for any sport I want to

create let me click register but invalid

Sports so again that speaks to the power

and the need for checking things on the

back end and not trusting users it is

that easy to hack websites otherwise if

you're not validating data server side

all right finally let's just do this for

real David is going to register for

Ultimate Frisbee clicking register and

now the output's not very pretty but

notice I'm at the registrants route

and if I zoom out I have an HTML table

two columns name and Sport David and

Ultimate Frisbee let me go back to the

form let me pretend like Carter walked

up to my laptop and registered for

basketball register now we see two rows

in this table David Ultimate Frisbee

Carter basketball and if we do this one

more time maybe Emma comes along and

registers for soccer register all of

this information is being stored in this

dictionary now

all right so that's great now we have a

database albeit in the form of like a

python dictionary but why is this maybe


not the best implementation

why is it not great yeah

[Music]

yeah so we're only storing this

dictionary in the computer's memory and

that's great until I hit Ctrl C and kill

flask stopping the web server or the

server reboots or maybe I close my

laptop or whatever if the server stops

running memory is going to be lost right

Ram is volatile it's thrown away when

you lose power or stop the program so

maybe this isn't the best approach maybe

it would be better to use a CSV file and

in fact some 20 years ago that's

literally what I did I stored everything

in a CSV file but let's skip that step

because we already saw last week or a

couple of weeks ago now how we can use

SQL Lite let's see if we can't marry in

some SQL here to store an actual

database for the program let me go back

here and let me open up say version four

of this which is almost the same but it

adds a bit more functionality let me

close these tabs and let me open up

app.pi now in version four

so notice it's almost the same but at

the top I'm creating a database

connection to a database called


froshims.db so that's a database I

created in advance so let's go down that

rabbit hole what does it look like let

me make my terminal window bigger let me

run SQL Lite 3 of froshims.db okay I'm

in let's do dot schema and let's just

infer what I designed this to be I have

a table called registrants which has one

to three columns an ID column that's an

integer a name column that's text but

cannot be null and a sport column that's

also text cannot be null and the primary

key is just ID so that I have a unique

ID for every registration let's see if

there's anyone in there yet select star

from registrants okay there's no one in

there no one is yet registered for

sports so let's go back to the code and

continue on in my code now I've got the

same Global variable for validation and

generation of my HTML looks like my

index route is the same it's dynamically

gen generating the menu of sports

interestingly we'll come back to this

there's a deregister route that's going

to allow someone to deregister

themselves if they want to exit the

sport or undo their registration but

this is the juicy part here's my new and


improved register route still works on

post so some mild privacy there

I'm validating the submission as follows

I'm getting the user's inputted name the

user's inputted Sport and if it is not a

name or the sport is not in sports I'm

going to render failure.html so I kept

it simple there's no cat in this version

it just says failure otherwise recall

how we co-mingled SQL and python before

we're using cs50s SQL library but that

just makes it a little easier to execute

SQL queries and we're executing this

insert into registrants name comma sport

what two values the name and the sport

that came from that HTML form and then

lastly and this is a new function that

we're calling out explicitly now flask

also gives you access to a redirect

function which is how

um which is how a safetyschool.org

Harvard sucks dot org and those other

sites we played around with last week

were all implemented redirecting the

user from one place to another this

flask function redirect comes from my

just having imported it at the very top

of this file it handles the HTTP 301 or

302 or 307 code whatever the appropriate

one is it does that for me all right so


that's it for Regis that's it for

register ring via this row let's look at

what the slash registrants route is

here we have a new route for Slash

registrants and instead of just

iterating over a dictionary like before

we're getting back let's see db.execute

of Select star from registrants so

that's like literally the programmatic

version of what I just did manually that

gives me back a list of dictionaries

Each of which represents one row in the

table then I'm going to render

registrants.html passing in literally

that list of dictionaries just like

using cs50s library in the past

so let's go and look at these that form

if I go into templates and open up

registrants.html oh okay it's just a

table like before

and actually let me change this

syntactically for consistency

we have a Jinja for Loop that iterates

over each registrant and for each of

them outputs a table row but this is

interesting instead of just having two

columns with the person's name and Sport

notice that I'm also outputting a

full-fledged form all right this is


starting to get kind of Juicy so let's

actually go back to my terminal window

run flask and actually see what this

example looks like now let me reload the

page

all right and the home page looks

exactly the same but let me now register

for something David for Ultimate Frisbee

register

oh damn it uh

never let's try this again David

registering for Ultimate Frisbee

register

okay so good thing I have deregister so

this is what it should now look like I

have a page at the route called slash

registrants that has a table with two

columns name and Sport David ultimate

freeze people oh wait a third column why

because if I view the page Source notice

that it's not the prettiest UI for every

Row in this table I'm also going to be

outputting a form just to deregister

that user but before we see how that

works let me go ahead and register

Carter for instance so Carter will give

you basketball again register the table

grows now let me go back and let's

register Emma for soccer and the table

should grow
before we look at that HTML let's go

back to my a terminal window let's go

into SQL Lite frosh IMS uh

let me go into

frosh IMS and let me open up with SQL

Lite three

froshims.db and now do select star from

registrants and whereas previously when

I executed this there were zero people

now

there's indeed three so now we see

exactly what's going on underneath the

hood

so let's look at this form now at this

page now if I want to unregister

deregister one of these people

specifically

how do we do this

clicking those one of those buttons will

indeed delete the row from the database

but how do we go about linking a web

page with python code with a database

like this is the last piece of the

puzzle up until now everything's been

with forms and also with URLs but what

if the user is not typing anything in

they're just clicking a button

well watch this

let me go ahead and sniff the traffic


which you could be in the habit of doing

now anytime you're curious how a website

works let me go to the network Tab and

uh Carter shall we deregister you from

basketball

let's deregister Carter and let's see

what just happened if I look at the

deregister request notice that it's a

post the status code that eventually

came back is 302 but let's look at the

request itself all the headers there

will ignore

the only thing that button submits kind

of cleverly is an ID parameter a key

equaling two what is two presumably

represent or map to

like where did this two come from

it doesn't say Carter it doesn't say

basketball

what is it

the second person that registered so

those primary keys that we started

talking about a couple of weeks ago why

it's useful to be able to uniquely

identify a row and a table here is just

one of the reasons why if it suffices

for me just to send the ID number of the

person I want to delete from the

database because I can then have code

like this if I go into app.pi


and I look at my deregister route now

the last of them notice that I got this

I first go into the form and I get the

ID that was submitted hopefully if there

was in fact an ID and the form wasn't

somehow empty I execute this line of

code delete from registrants where ID

equals question mark and then I plug in

that number deleting Carter and only

Carter and I'm not using his name

because what if we have two people named

Carter two people named Emma or David

you don't want to delete both of them

that's why these unique IDs are so

so important and here's another reason

why you don't want to store some things

in URLs suppose we

went to this URL D register question

mark ID equals three suppose I

maliciously

emailed this URL to Emma it doesn't

matter so much what the beginning is but

suppose I emailed her this URL slash

deregister question mark ID equals three

and I said hey Emma click this

and it uses get instead of post what did

I just trick her into doing

what's going to happen if Emma clicks

this yeah
you would trick her into de-registering

herself why because if she's logged into

this frosh I am's website and the URL

contains her ID just because I'm being

malicious and she clicked on it and the

website is using get unfortunately get

URLs are again staple they have state

information in the URLs and in this case

it's enough to delete the user and boom

she would have accidentally deregistered

herself and this is pretty innocuous

suppose that this was her bank account

trying to make a withdrawal or a deposit

suppose that this were some other

website a Facebook URL trying to trick

her into posting something automatically

here too is another consideration when

you should use post versus get because

get requests can be plugged into emails

sent via slack message messages text

messages or the like and unless there's

a prompt saying are you sure you want to

deregister yourself you might blindly

trick the user into being vulnerable to

what's called a cross-site request

forgery a fancy way of saying you trick

them into clicking a link that they

shouldn't have because the web website

was using get alone

all right any question then on these


building blocks

[Music]

when three columns you mean

[Music]

uh the three forward slashes I'm not

sure I follow

[Music]

sorry it's in where

which file

[Music]

sorry the other direction

okay

[Music]

it keeps growing more oh this thing okay

sorry

um this is just a this is uh a URI

that's refer that's typical syntax

that's referring to the SQL light

protocol so to speak which means use SQL

Lite to talk to a file locally colon

slash slash is just like you and I see

in URLs the third slash essentially

means current folder that's all so it's

it's a weird curiosity um but it's

typical whenever you're referring to a

local file and not one that's Elsewhere

on the internet that's a bit of an

oversimplification but that's indeed a

convention sorry for not that not


clicking earlier all right let's do one

other iteration of frosh IMS here just

to show what I was actually doing too

back in the day was not only storing

these things in CSV files as I recall I

was also automatically generating an

email to the Proctor in charge of the

intramural sports program so they would

have sort of a running history of people

registering and they could easily reply

to them as well let me go into frosh I

am version 5 which I pre-created here

and let me go ahead and open up say

app.pi this time and this is some code

that I wrote in advance and it looks a

little scary at first glance but I've

done the following I have now added the

flask mail library to the picture by

adding flaskmail to requirements.txt and

running a command to automatically

install email support for flask as well

and this is a little bit cryptic but

it's honestly mostly copy paste from the

documentation what I'm doing here is I'm

configuring my flask application with a

few configuration variables if you will

this is the Syntax for that app.config

is a special dictionary that comes with

flask that is automatically created when

you create the app up here on line nine


and I just had to fill in a whole bunch

of configuration values for the default

sender address that I want to send email

as the default password I want to use to

send email the port number the TCP Port

that we talked about last week the mail

server I'm going to use Gmail's

smtp.gmail.com server use TLS this means

use encryption and so I set that to True

mail username this is going to grab it

from my environment so for security

purposes I didn't want to hard code my

own Gmail username and password into the

code so I'm actually storing those in

what are called environment variables

you'll see more of these in problem sets

nine and it's a very common convention

on a server in the real world to store

sensitive information in the computer's

memory so that it can be accessed when

your website is running but not in your

source code it's way too easy if you put

credentials sensitive stuff in your

source code to post it to GitHub or to

screenshot it accidentally or for

information to leak out so for today's

purposes know that the OS dot Environ

dictionary refers to what are called

environment variables and this is like


an out of band a special way of defining

key value pairs in the computer's memory

by running a certain command but that

never show up in your actual code

otherwise there would be so many

usernames and passwords accidentally

visible on the internet so I've

installed this in advance let me see if

I can do this correctly let me go over

to another tab in just a moment

and here I have on my second screen here

John Harvard's inbox it's currently

empty and I'm going to go ahead and

register for some Sport AS John Harvard

here hopefully so let me go ahead and

run flask run on this version 5.

let me go ahead and reload the main

screen not that one let me reload the

main screen here this time clearly I'm

asking for name and email so name will

be John Harvard

Jay harbored at

cs50.harvard.edu

he'll register for about soccer

register

and if I did this correctly not only is

John Harvard on his screen seeing you

are registered but when he checks his

email

on this other screen


Crossing his fingers that this actually

works as a demonstration

[Music]

promise it did right before class

[Music]

fortifying

I don't think there's a mistake this

time

let me try something over here real

quick but I don't think this is broken

it wouldn't have said success if it were

I just tried submitting again so I just

did another Ur registered

[Music]

well I'm really sad right now

[Music]

what's that

I could check spam but then it's

foreign

not sure we want to show spam here on

the internet that every one of us gets

oh maybe

oh

okay wow that was a risky click I

worried all right so you are registered

is the email that I sent out and it

doesn't have any actual information in

it but back in the day it would have

because I included like the student's


name in their dorm and all the other

fields of information that we asked for

so let's just take a quick look at how

that code might work I did have to

configure Gmail in a certain way to

allow what they call less secure apps

using SMTP which is the protocol used

for outbound email but besides setting

these things let's look at the register

route down here it's actually pretty

straightforward in my register route I

validated the submission just like

before nothing new there I then

confirmed the registration down here

nothing new there all I did was use two

new lines of code and it's this easy to

automate the sending of emails I

apparently have done it too many times

which is why it ended up in spam I

created a variable called message I used

a message function that I must have

imported higher up so we'll go back to

that here's apparently the subject line

as the first argument and the second

argument is the uh named parameter

recipients which takes a list of emails

that should get the confirmation email

so in brackets I just put the one user's

email and then mail.send that message so

let's scroll back up to see what message


uh and what mail actually is mail I

think we saw yep male is this which I

have as a variable because I followed

the documentation for this Library you

simply configure your current app with

mail support capital M here and if you

look up here now on line seven here's

the new library from flaskmail I

imported Capital mail Capital message so

that I had the ability to create a

message and send a mail so such a simple

thing whether you want to confirm things

for users you want to do password resets

it can be this easy to actually generate

emails provided you have the requisite

access and software installed and just

to make clear that I did add something

here let me open up my requirements.txt

file and indeed I have both flask and

flask Dash mail

ready to go but I ran the command in

advance to actually do that all right

any questions then

on these examples here

no all right so what other pieces might

actually remain for us let me flip over

here it turns out that a key component

of most any web application nowadays

that we haven't touched on yet but it'll


be one of our final flourishes today is

the notion of a session and a session is

actually a feature that derives from all

of the basics we talked about today and

last week and a session is the technical

term for like what you and I know as a

shopping cart when you go to amazon.com

and you start adding things to your

shopping cart they follow you from page

to page to page Heck if you close your

browser come back to the next day

they're typically still in your shopping

cart which is great for Amazon because

they want your business they don't want

you have to like start from scratch the

next day similarly when you log into any

website these days even if it's not an

e-commerce thing but it has usernames

and passwords you and I are not in the

habit of logging into every darn page we

visit on a website typically you log in

once and then for the next hour day week

year you stay logged into that website

so somehow the website site is

remembering that you have logged in and

that is being implemented by way of this

thing called a session and perhaps a

more familiar term that you might know

as and worry about called cookies let's

go ahead and take one more five minute


break here and when we come back we'll

look at cookies sessions and these final

features

all right so the promise now is that

we're going to implement this notion of

a session which is going to allow us to

like log users in and keep them logged

in and even Implement things like a

shopping cart and the overarching goal

here is to build an application that is

quote unquote stateful Again State

refers to information and something

that's stateful remembers information

and in this context the curiosity is

that HTTP is technically a stateless

protocol like once you visit a URL HTTP

colon slash something hit enter web page

is downloaded to your browser like

that's it like you can unplug from the

internet you can turn off your Wi-Fi but

you still have the web page locally and

yet we somehow want to make sure that

the next time you click on a link on

that website it doesn't forget who you

are or the next thing you add to your

shopping cart it doesn't forget what was

already there so we somehow want to make

HTTP stateful and we can actually do

this using the building blocks we've


seen thus far so concretely here's like

a form you might see occasionally but

pretty rarely when you log into Gmail

right and I say kind of rarely because

most of you don't log into Gmail

frequently you just stay logged in

pretty much endlessly in your browser

and that's because Google has made the

conscious choice to give you a very long

session time maybe a day a week a month

a year because they don't really want to

add friction to using their tool and

making you log in every darn Day by

contrast there's other applications on

on campus including some of cs50 Zone

that makes you log in every time because

we want to make sure that it's indeed

you accessing the site and not a

roommate or friend or someone

maliciously so once you do fill out this

form how does Google subsequently know

that you are you and when you reload the

page even or open a second tab for your

same Gmail account how do they know that

you're still David or Carter or Emma or

someone else well let's look underneath

the hood of what's going on when you log

into Gmail

essentially you initially see a form

like this using a get request and the


website responds like we saw last week

with some kind of HTTP response

hopefully two 200 okay with the form

meanwhile the website might also respond

with an HTTP header that last week we

didn't care about this week we now do

whenever you visit a website it is very

commonly the case that the website is

putting a cookie on your computer and

you may generally know that cookies can

be bad and they kind of track you in

some way and that's both a a blessing

and a curse without cookies you could

not Implement things like shopping carts

and logins as we know them today

unfortunately they can also be used for

ill purposes like tracking you on every

website and serving you ads more

effectively and so forth so with good

comes some bad but the basic primitive

for us the computer scientist boils down

to just HTTP headers a cookie is

typically a big number a big seemingly

random value that a server tells your

browser to store in memory or even

longer term store on disk so you can

think of it like a file that a server is

planting on your computer and the

promise us that HTTP makes is that if a


server sets a cookie on your computer

you will represent that same cookie or

that same value on every subsequent

request so when you visit the website

like Gmail they plop a cookie on your

computer like this with some session

equals value some wrong random value one

two three ABC or something like that and

when you then visit another page on

gmail.com or any other website you send

the opposite header not set cookie but

just cookie colon and you send the exact

same value it's similar to going to a

club or an amusement park where you pay

once you go through the gates once you

get checked by security once and then

they you know very often take a like a

little stamp and say okay now you can

come and go and then for you efficiency

wise if you come back later in the day

or later in the evening you can just

present your hand you've been stamped

presumably they've already you've

already paid you've already been

searched or whatnot and so it's this

sort of Fast Track ticket back into the

club back into the park that's

essentially what a cookie is doing for

you whereby it's a way of reminding the

website we've already done this you


already asked me for my username and

password this is my past to now come and

go now unlike this hand stamp which can

you know kind of be easily copied or

transferred or duplicated or you know

kept on over multiple days these cookies

are really big seemingly random values

letters and numbers so statistically

there's no way someone else is just

going to guess your cookie value and

pretend to be you it's just very low

probability statistically but this is

all it boils down to is this agreement

between browser and server to send these

values back and forth in this way so

when we actually translate this now to

code let's do something like a simple

login app let me go into a folder I made

in advance today called login and let me

code up

app.pi and let's take a look in here

so what's going on a couple of new

things up top if I want to have the

ability to stamp my users hands

virtually and Implement sessions I'm

going to have to import from flask

support for sessions so this is another

feature you get for free by using a

framework and not having to implement


all this yourself and from the flask

session Library I'm going to import

session capital S why I'm going to

configure this session as follows long

story short there's different ways to

implement sessions the server can store

these cookies in a database in a file in

memory and RAM in other places too we

are telling it to store these cookies on

the server's hard drive so in fact

whenever you use sessions as you will

for problem set 9 you'll actually see a

folder suddenly appear called flask

underscore session inside of which are

the cookies essentially for any users or

friends or yourself who've been visiting

your particular application so I'm

setting it to use the file system and I

don't want them to be permanent because

I want when you close your browser the

session to go away they could be made to

be permanent and last much longer then I

tell my app to support sessions and

that's it for now let's see what this

application actually does before we

dissect the code let me go over to my

terminal window run flask run

and then let me go ahead and reload

my preview URL

give it a second to kick back in


let me go ahead and open my URL come on

oops let me go ahead

too long of a break there we go so this

website simply has a login form there's

no password though I could certainly add

that and check for that too it just asks

for your name so I'm going to log in as

myself David and click login and now

notice I'm currently at the slash login

route but notice this if I try to go to

the default route just slash which is

where most websites live by default

notice that I magically get redirected

to log in so somehow my code knows hey

if you're not logged in you're going to

slash login instead let me type in my

name David and click login and now

notice I am back at slash Chrome is sort

of annoyingly hiding it but this is the

same thing as just a single slash and

now notice it says you are logged in as

David log out what's kind of cool is

notice if I reload the page it still

knows that if I create a second Tab and

go to the same URL it still knows that I

could even

um I could keep doing this in multiple

types it's still going to remember me on

both of them as being logged in as David


so how does that work especially when I

click log out then I get uh forgotten

altogether all right so let's see how

this works and some basic building

blocks under my slash route notice I

have this

if there is no name in the session

redirect the user to slash login so

these two lines together are what

Implement that automatic redirection

using HTTP 301 or 302 automatically it's

handled for me with these two lines

otherwise show index.html all right

let's go down that rabbit hole what's in

index.html well if I look in

my let me look in my templates folder

uh for my login demo and look at

template slash index.html

all right so what's going on here I

extend layout.html I have a block body

and then I've got some other syntax so

we haven't seen this yet but it's more

Ginger stuff which again is almost

identical to python if there's a name in

the session variable then literally say

you are logged in as curly braces

session bracket name and then notice

this I've got a simple HTML link to log

out via slash logout else if there is no

name in the session then it apparently


says you are not logged in and it leads

me to an HTML link to slash login and

then end diff so again Jinja does not

rely on indentation recall the HTML and

CSS don't really care about indentation

only the human does but in code with

Jinja you need these end tags and block

N4 and if to make super obvious that

you're done with that thought

so session is just this magic variable

that we now have access to because we've

included these two lines of code and

these that handle that whole process of

stamping every user's hand with a

different unique identifier if I made my

code space public and I let all of you

visit the exact same URL all of you

would be logged out by default you could

all type your own names individually all

log in at the same URL using different

sessions and in fact I would then see if

I go into my terminal window here and my

login directory notice the flask session

directory I mentioned and if I C D into

that and type LS notice that I had two

tabs open or actually I think I started

the server twice I have two files in

there I would ultimately have one file

for every one of you and that's what's


beautiful about sessions is it creates

the illusion of per user storage inside

of my session is my name inside of your

session so to speak is your name and the

same is going to apply to shopping cart

ultimately as well let's see how login

works here my login route supports both

get and post so I could play around if I

want and notice this this login route is

kind of interesting as follows if the

user got to this route via post my

inference is that they must have

submitted a form why because that's what

how I'm going to design the HTML form in

a second and if they did submit the form

via post I'm going to store in the

session at the name key whatever the

human's name is and then I'm going to

redirect them back to slash otherwise

I'm going to show them the login form so

this is what's kind of cool if I go to

this login form which lives at literally

slash login by default when you visit a

URL like that you're visiting a via get

and so that's why I see the form however

notice this the form very cleverly

submits to itself like the one route

slash login submits to its same self

slash login but it uses post when you

submit the form and this is a nice way


of having one route but for two

different types of operations or views

when I'm just there visiting slash login

via a URL it shows me the form but if I

submit the form then this logic these

three lines kick in and this just avoids

my having to have both an index route

and a greet route for instance I can

just have one route that handles both

get

and post how about logout what does this

do well it's as simple as this change

whatever name is in the session to be

none which is Python's version of like

null essentially and then redirect the

user back to slash because now in

index.html I will not notice a name

there anymore this will be false and so

I'll tell the user instead you are not

logged in

so like it's I want to say as simple as

this is though I realize this is a bunch

of steps involved this is the essence of

every website on the internet that has

usernames and passwords and we skip the

password name step for that more on that

in problem set nine but this is how

every website out there remembers that

you're logged in and how this works


ultimately is that as soon as you use in

Python lines like this and lines like

this flasks takes care of stamping the

virtual hand of all of your users and

whenever flask sees the same cookie

coming back from a user it grabs the

appropriate file from that folder loads

it into the session Global variable so

that your code is now unique to that

user and their name

let's do one other example with sessions

here that'll show how we might use these

now for shopping carts let me go into

the store example here let me go ahead

and run this thing first if I run store

in my same Tab and go back over here

we'll see a very ugly e-commerce site

that just sells seven different books

here but each of these books has a

button via which I can add it to my cart

all right well where are these books

coming from let's kind of poke around

let me go into my terminal window again

let me go into this example which is

called store and let me open up about uh

index.ht whoops let's open up index

how about

books.html is the default one not Index

this time so if I look here notice that

that route that we just saw uses a for


Loop in Jinja to iterate over a whole

bunch of books apparently and it outputs

in an H2 tag the title of the book and

then another one of these forms so

that's kind of interesting let's go back

one step let's go ahead and open up

app.pi because that must be excuse me

what's kicking all of this off notice

that this file is importing session

support it's configuring sessions down

here but it's also connecting to a

store.db file so it's adding some SQL

light and notice this in my slash route

I'm selecting star from books which is

going to give me a list of dictionaries

Each of which represents a row of books

and I'm going to pass that list of books

into my books.html template which is why

this for Loop works the way it does

let's look at this actual database let

me increase my terminal window and do

SQL Lite of store.db e dot schema will

show me everything there's not much

there it's a book it's a table called

books with two columns ID and title

let's do select star from books

semicolon there are the seven books Each

of which has a unique ID and you might

see where this is going if I go to the


UI and I look at each of these buttons

for add to cart just like Amazon might

have notice that each of these buttons

is just a form and what's magical here

just like deregister even though I

didn't highlight it at the time there's

another type of input that allows you to

specify a value without the human being

able easily to change it instead of type

equals text or type equals submit type

equals hidden will put the value in the

form but not reveal it to the user so

that's how I'm saying that the ID of

this book is one the idea of this book

is two the idea of this book is three

and so forth and each of these forms

then will submit apparently to slash

cart using post and that would seem to

be what adds things to cart so let's try

this let me click on one or two of these

let's add the first book add to cart

here's my cart notice my route changed

to slash cart all right let's go back

and let's add the book number two

there we have that one and let's skip

ahead to the seventh book Deathly

Hallows and how now we have all three

books here so what does the cart route

do at slash cart well let's look if I go

back to my terminal window look at


app.pi and look at slash cart okay

there's a lot going on here but let's

let's see so the slash cart route

supports both get or post which is a

nice way to consolidate things into one

URL

all right this is interesting if there

is not a quote-unquote cart key in

session we haven't technically seen the

syntax but long story short these lines

here do ensure that the cart exists what

do I mean by that it makes sure that

there's a cart key in the session Global

variable and it's by default going to be

an empty list why that just means you

have an empty shopping cart but if the

user visits this route via post

and the user did provide an ID they

didn't muck with the form in any way and

like try to hack into the website they

gave me a valid ID then I'm going to use

this syntax if session bracket cart is a

list recall from a couple of weeks ago

that dot append just adds something to

the list so I'm going to add the ID to

the list and return the user to cart

otherwise if the user is at slash cart

via get implicitly we just do this

select star from books where ID is in


and this might be syntax you recall from

pset 6 it lets you look for multiple IDs

all at once because if I have a list of

session a list of IDs in my cart I can

get all of those books at once so long

story short what has happened here I am

storing in the cart

the books that I myself have added to my

cart my browser is sending the same

handstamp again and again which is how

this website knows that it's me adding

these books to my cart and not you or

not Carter or not Emma indeed if all of

us visited the same long URL and I made

it public and allowed that then we would

all have our own illusions of our own

separate carts and each of those carts

in practice would just be stored in this

flask session directory on the server so

that the server can keep track of each

of us using again these cookie values

that are being sent back and forth via

these headers

all right I know that's a lot but again

it's just the new python way of just

leveraging those HTTP headers from last

week in a clever way

any questions before we look at one

final

set of examples yeah


[Music]

so I think you're asking about using the

get and post in the same function so

this is just a nice uh aesthetic if you

will if I had to have separate routes

forget in post I mean it literally might

mean I need twice as many routes in my

file and it just starts to get a little

Annoying and these days too in terms of

user experience this is you know maybe

only appeals to The Geek in us but like

having clean URLs is actually a thing

like you don't want to have lots of

words in the URL it's nice if the URLs

are nice and succinct and canonical if

you will so it's nice if I can

centralize all of my shopping cart

functionality in slash cart only and not

in multiple routes one forget one for

post it's a little you know a nitpicky

of me but this is a commonly done here

so what this code here means is that

this route this function henceforth will

support both get requests and post

requests but then I kind of need to

distinguish between whether it's get or

post coming in because if it's a get

request I want to show the cart if it's

a post request I want to update the cart


and the simplest way to do that is just

to check this value here in the request

variable that we imported from flask up

above you can check what is the current

type of request is it a get is it a post

or is it something else altogether there

are other verbs if it's a post that must

mean because I created the web form that

uses post that the user clicked the add

to cart button

otherwise if it's not post it's

implicitly going to be logically get

then I just want to show the user the

contents of the cart and I use these

lines instead so it's just one way of

avoiding having two routes for two

different HTTP verbs you can combine

them so long as you have a check like

this if I really wanted to be pedantic I

could do this uh L if or L if request

dot method equals gets this would be

more symmetric but it's not really

necessary because I know there's only

two possibilities

hope that helps

all right let's do one final set of

examples here that's going to tie the

last of these features together to

something that you probably see quite

often in real world applications and


that For Better or For Worse is now

going to involve tying back in some

JavaScript from last week the goal at

hand of these examples is not to

necessarily Master how you yourself

would write the python code the SQL code

the JavaScript code but just to give you

a mental model for how these different

languages work so that for final

projects especially if you do want to

add JavaScript functionality much more

interactive user interface you at least

have like the bare bones of a mental

model for how you can tie these

languages together even though our Focus

generally has been more on Python and

SQL than on JavaScript from last week

let me go ahead and open up an example

called shows version zero of this and

let me do flask run and let me go into

my URL here and see what this

application looks like by default this

has just a simple query text box with a

search box let's take a look at the HTML

that just got sent to my browser all

right there's not much going on here at

all so there's a form whose action is

slash search it's going to submit via

get it's going to use a q parameter just


like Google it seems and submit it so

this actually looks like the Google form

we did last week so let's see what what

goes on here let me search for something

like cat

enter

okay so it looks like all right so this

is actually a somewhat familiar file

what I've gone ahead and done is I've

grabbed all of the titles of TV shows

from a couple of weeks ago when we first

introduced SQL and I loaded them into

this demo so that you can search by

keyword for any word you want I just

searched for cat if we were to do this

again we would see all the title of TV

shows that contain dog

as a substring somewhere and so forth so

this is a traditional way of doing this

just like in Google it uses slash search

question mark Cube equals cat Q equals

dog and so forth how does that work well

let's just take a quick look at app.pi

here

let me go into my zero example here show

zero and open up app.pi and see what's

going on all right very simple here's

the form that's kind of how we started

today and here is the slash search route

well what's going on here this gets a


little interesting so I first select a

whole bunch of shows by doing this

select star from shows where title like

question mark and then I'm using some

percent signs from SQL on both the left

and the right and I'm plugging in

whatever the user's input was for Q if I

didn't use like and I used equal instead

I could get rid of these curly these

percent signs but then it would have to

be a show called cat or call dog as

opposed to it being like cat or like dog

this whole line returns to me a list of

dictionaries Each of which represents a

show in the database and then I'm

passing all of those shows to a template

called search.html so let's just follow

that breadcrumb let's open up so dot uh

sorry search.html

all right so this is where templating

gets kind of cool so I just passed back

hundreds of results potentially but the

only thing I'm outputting is an

unordered list and using a ginger for

Loop An Li tag containing the titles of

each of those shows and just to prove

that this is indeed a familiar data set

and I actually simplified it a bit if I

look at shows.db with SQL Lite I threw


away all the other stuff like ratings

and actors and everyone else and I just

have for instance select select star

from shows limit 10 just so we can see

10 of them there's 10 of the shows from

that database so that's all that's in

the database itself so it would look

like this is a pretty vanilla web

application it uses get it submits it to

the server the server spits out a

response and that response then looks

like this which is a huge number of Li

tags one for each cat or one for each

dog match but everything else comes from

a layout.html all the stuff at the top

and at the bottom all right so these

days though we're in the habit of seeing

autocomplete and you start typing

something and you don't have to hit

submit you don't have to click a button

you don't have to go to a new page web

applications nowadays are much more

Dynamic so let's take a look at this

version one of this thing let me go into

shows one

and close my previous tabs and run flask

run in here and it's almost the same

thing but watch the behavior change a

little bit I'm reloading the form

there's no button now so gone is the


need for a submit button I want to

implement autocomplete now so let's go

ahead and type in C okay there's every

show that starts with C A there's every

show that has C A in it rather T there's

every show with cat in it I can start it

again and do dog but notice how

instantaneous it was and notice my URL

never changed there's no slash search

route and it's just immediate like with

every keystroke it is searching again

and again and again that's kind of a

nice ux user experience because it's

immediate this is what users are used to

these days but if I look at the source

code here

notice that in the core source code

there's just an empty UL by default but

there is some fancy JavaScript code so

let's see what's going on here this

JavaScript code is doing the following

let me zoom in a little bit more

this JavaScript code is first selecting

with query selector which you used this

past week quote unquote input all right

so that's just getting the text box then

it's adding an event listener to that

input for the input event we didn't talk

about this last week but literally when


you provide any kind of input by typing

by pasting by

any other user interface mechanism it

triggers an event called input so

similar to key press or key up I then

have a function no worries about this

async function for now then what do I do

inside of this all right so this is new

and this is the part that let's just

focus on the ideas and not the syntax

JavaScript nowadays comes with a

function called Fetch that allows you to

get or post information to a server

without reloading the whole page you can

sort of secretly do it inside of the

page what do I want to fetch slash

search question mark Q equals whatever

the value of that input is when I get

back a response I want to get the text

of that response and store it in a

variable called shows and I'm

deliberately bouncing around ignoring

special words like a weight and a weight

here but for now just focus on what came

back a response came back from the

server I'm getting the text from it

storing it in a variable called shows

what am I then doing I'm using query

selector to select my UL which is empty

by default and I'm changing its inner


HTML to be equal to the shows that came

back from the server so let's poke

around here's where again developer

tools are quite powerful let me go ahead

and reload this page to get rid of

everything

and let me now

open up inspect let me go to the network

Tab and let's just sniff the traffic

going between my browser and server I'm

going to search for C notice that

immediately triggered an HTTP request to

slash search question mark Q equals c so

I didn't even finish my cat thought but

notice what came back a bunch of

response headers but let's actually

click on the raw response

this is literally the response from the

server just a whole bunch of Li tags no

UL no HTML no title no body nothing just

Li tags and we can actually simulate

this let me manually go to that same URL

Q equals c enter we are just going to

get back whoops sorry

slash search Q equals c we are just

going to get back this stuff which I've

used Source it's not even a complete web

page the browser is trying to show it to

me as a complete web page with bullets


but it's really just partial HTML but

that's perfect because this is literally

what I essentially want my python code

to copy paste into the otherwise empty

UL tag and that's what this JavaScript

code then

here is doing once it gets back that

response from the server it's using

these lines of code to plug all of those

Lis into the UL after the fact again

changing the so-called Dom

but there's a slightly better way to do

this because honestly this is not the

best design because if you've got a

hundred shows or more you're sending all

of these tags unnecessarily like why do

I need to send all of these stupid HTML

tags why don't I just create those when

I'm ready to create them well here's the

final flourish whenever making a web

application nowadays where client and

server keep talking to one another

Google Maps does this Gmail does this

like literally every cool application

nowadays you load the page once and then

it keeps on interacting with you without

you reloading or having to change the

url

let's actually use a format called Json

JavaScript object notation which is to


say there's just a better more efficient

better designed way to send that same

data I'm going to go into shows 2 now

and do flask run

and I'm going to go back to my page here

the user interface is exactly the same

and it still works exactly the same

here's c c a c a t and so forth but

let's see what's coming back now if I go

to slash search question mark Q equals

cat enter

notice that I get this crazy looking

syntax but the fact that it's so compact

is actually kind of a good thing this is

actually going to let me format it a

little nicer well or a little worse this

is what's called JavaScript object

notation in JavaScript and uh angle a

square bracket means Here Comes an array

in JavaScript a curly bracket says here

comes an object AKA a dictionary and you

might recall from uh

did we do use kind of sort of recall

that you can now have keys and values in

JavaScript notation using colons like

this so long story short cryptic as this

is to you and me and not very human

friendly it's very machine friendly

because for every title in that database


I get back its ID and its title its ID

and its title its ID and its title and

this is a very generic format that an

API an application programming interface

might return to you and this is how apis

nowadays work you get back very raw

textual data in this format Json format

and then you can write code that

actually programmatically turns that

Json data into any language you want for

instance HTML so here's the third and

final version of this program I again

select my input I again listen for input

I then when I get input call this

function I fetch slash search Q equal

tools whatever that input was C or C A

or c a t i then wait for the response

but instead of getting text I'm calling

this other function that comes with

JavaScript these days called Json that

just parses that it turns it into a

dictionary forming or really a list of

dictionaries for me and stores it in a

variable called shows and this is where

you start to see the convergence of HTML

with JavaScript let me initialize a

variable called HTML to nothing quote

unquote using single quotes but I could

also use double quotes this is

Javascript Syntax for a loop let me


iterate over every ID in the shows list

that I just got back in the server that

big chunk of Json data let me create a

variable called title that's equal to

the shows the title of the show at that

ID but for reasons we'll come back to

let me replace a couple scary characters

then let me dynamically add to this

variable An Li tag the actual title and

a close Li tag and then very lastly

after this for Loop let me update the

ul's inner HTML to be the HTML I just

created on the fly so in short don't

worry too much about the syntax because

you won't need to use this unless you

start playing with more advanced

features uh quite soon but what we're

doing is with JavaScript we're creating

a bigger and bigger and bigger string of

HTML containing all of the Open brackets

the LI tags the close brackets but we're

just grabbing the raw data from the

server and so in fact in problem set 9

you're going to use a real world

third-party API application programming

interface for which you sign up the data

you're going to get back from that API

is not going to be show titles but

actually stock quotes and in stocks


ticker symbols and the prices of last uh

at which stocks were last bought or sold

and you're going to get that data back

in Json format and you're going to write

a bit of code that's then going to

convert that to the requisite HTML on

the page so the final result here is

literally the kind of autocomplete that

you and I see and take for granted every

day and that's ultimately how it works

HTML and CSS are used to present the

data your so-called view python might be

used to actually send or get the data on

the backend server and then lastly

JavaScript is going to be used to make

things Dynamic and interactive so I know

that's a whole bunch of building blocks

but the whole point of problem set nine

is to tie everything together set the

stage for hopefully a very successful

final project why don't we go ahead and

wrap up there and we'll see you one last

time next week for emoji

foreign

[Music]

foreign

[Music]

this is cs50 and this is week 10 our

very last together before we dive in

today
um just wanted to acknowledge how much

work we know this course is for for

everyone we know there's still a tad bit

of work remaining but we do hope

ultimately that you're really proud of

what you've pulled off over the past few

months only and indeed the final project

whatever it is you end up building

really is meant to be this Capstone

where you're finally standing on your

own there's no distribution code there's

not really a specification and really

just an opportunity to take all this

knowledge out now for a spin and we do

hope it serves you well longer term

before we dive into just wanted to offer

a number of thanks for so much of the

team that helps out behind the scenes in

particular the Memorial Hall team our

hosts here who make all of the space and

the activities behind the scenes

possible the education Support Services

team who helps with audio and video and

more and then especially cs50's own team

all here in the darkness helping out in

front of the camera behind the camera if

we could a huge round of applause for

for everyone that makes this possible

you might have noticed that these have


been unusual times and we've had some

unusual guests in the front of the room

here since we weren't sure what to

expect early on as to just what

protocols would be on campus and so we

have of course all of these plush

figures behind the scenes who have been

helping out uh behind the camera behind

the monitors and so forth and what many

of you'll see if you've been watching

right now or in the future of these

videos online you'll see a lot of backs

of heads so that there's a little bit of

characteristic to some of the shots that

we have here but this is actually born

of an inspiration that comes from who

will be ultimately today's special guest

Jennifer a Lee in fact when we'll meet

in just a little bit was ultimately the

good friend of the class that inspired

this tradition of using puppetry in some

form in the class here what I see down

below is is a shot like this here and

funny enough it seems that with machine

learning what it is nowadays artificial

intelligence so to speak on social media

media and the like like literally no

joke I pulled up Twitter earlier today

and among my suggestions for whom I

should follow now were literally the


suggestions here

um this is uh perhaps not surprising

though because some weeks back I

actually started following uh count Von

count whom you might remember from

Sesame Street if you're not following

him already this is an amazing account

to follow an actual count to follow I

mean it's actually an amazing use of

programming so this account joined in

April of 2012. it's got 198 000

followers out of after uh as of today

and what it's been doing for like nine

plus years is tweeting out a number one

per day this morning's was 3 300 uh

twenty three three thousand three

hundred twenty seven yesterday's was 3

326. and so presumably someone's just

written a program python or something

else that's just generating these tweets

once a day even more amusing though is

that like every tweet for the past nine

years has like 20 or 30 comments on it

from people who are following it so

perhaps consider following this same

account and the same application of Cs

as well wanted to also thank cs50's team

behind the cameras you might recall uh

the teaching fellows last year in


particular when everything was on Zoom

kindly put together this visualization

of tcpip and the passage passing of

messages among routers and in turn

computers for instance from Phyllis at

bottom right to Brian at top left just

wanted to thank the team but also reveal

to you all that uh these takes were not

perfect by any means and in fact here's

just 60 seconds or so of outtakes of us

trying to get data from point A to point

foreign

[Music]

foreign

[Music]

if we could to a round of applause for

all the teaching fellows teaching

assistants and course assistants

who make the course possible

as well before we now do a bit of review

of the semester thought we'd take first

a higher level view of where we've come

from recall of course from the syllabus

and literally week zero we claim this

that what ultimately matters in this

course is not so much where you Rend up

relative to classmates but where you end

up relative to yourself when you begin

and we really do mean that there are


certainly classmates of yours who have

been programming since they are 10 years

old but there are two-thirds of your

classmates who were not in fact that

case and so behind you in front of you

to the left and to the right today are

so many classmates who have had a very

shared experience with you but the only

person that really matters at the end of

the day in terms of how you've

progressed in this class truly is where

you in fact began and I realized that

with cs and especially this course and

with programming assignments especially

it can feel like week after week that

you're not really making progress

because it might feel like you're

struggling every darn week but that's

just really because we kind of keep

moving the bar higher and higher pushing

the Finish Line a little further and

further ahead because think back to like

week one when this for instance whoops

when this alone was hard and you were

just trying to get Mario to ascend a

pyramid that might look a little

something like this or the week after

when you started dabbling with

readability or two weeks after Mr and


Mrs dursley of number four Private Drive

and so forth trying to analyze just how

complex a sentence like that was and

manipulating strings and characters for

the first time and then of course we

progressed to deeper uh dives into

algorithms and actually implementing

something that's all too real world

these days and implementing electoral

algorithms in a few different forms

dabbling thereafter in a bit of

forensics a bit of imagery and taking

images like this here and filtering it

in a number of ways ultimately

understanding hopefully how these things

are implemented underneath the hood so

that henceforth when all you're doing is

tapping an icon on your phone or

clicking a command on your computer you

can infer even if you didn't write that

particular code how the thing is likely

working and even if you had started to

get your footing then around week four

then things escalated quickly further to

data structures but recall for your

spell checker you implemented a fairly

sophisticated data structure known as a

hash table and even if you struggle to

get that working again think back five

years five weeks prior you were just


trying to get four Loops to work and

variables to work and so if each week

realized there was significant progress

and then if you aggregate all these most

recent weeks with python and SQL HTML

JavaScript and CSS I mean you built your

very own web application and many of you

will go on and build something grander

for your own final project or Focus

again on C or on python alone or the

like but ultimately aggregating all of

these Technologies and kind of stitching

together something that you yourself

created we might have kind of put some

of the foundation there in place but the

the end result ultimately is yours so at

the end of the day as we promised in

week zero this course is really about

computational thinking cleaning up your

thought process getting you to think a

little more logically more methodically

and to express yourself just as

logically and methodically but it's also

about in some form critical thinking and

at the end of the day what computer

science is is really just taking input

producing output ideally correct output

and all the hard stuff is in the middle

there but we do hope you have in your


your toolkit so to speak is all the more

of a mental model all the more of an

understanding of like first principles

from which you can derive new outputs

new conclusions based on those inputs

and certainly today right there's so

much misinformation or miseducation in

the world and just being able to take

input and produce proper output in and

of itself is a compelling skill and

indeed when you all find yourselves

invariably in engineering positions

where you're asked to build something

because you now can or perhaps you're in

a managerial role where you decide you

should build something because you know

people who can I would also start to

consider even though the past 10 plus

weeks have all been about build this

because we asked you to to really start

to consider whether it's for fun for

profession for political purposes or the

like should you build something and

actually considering now that you have

this scale how you can use it most

responsibly and not just make a website

do something or make an app do something

because it can be done but really start

to ask and ask of others like should we

be doing this it's just a skill that you


can but don't necessarily have to use

now when it comes to writing some actual

code keep in mind that you might

continue to evaluate or your employer or

your colleagues might continue to

evaluate your code along these same axes

these are not cs50 specific correctness

does it do does it do what it's supposed

to do design like how well qualitatively

is it implemented and then style how

readable is it how pretty is it and

these three axes should really guide all

of your thinking whether it's for a test

or a project or an open source project

or the like like all three of these

things really matter and so if you're in

the mindset of wondering oh do I have to

worry about style for this do I have to

comment this the answer is always yes

this is what it means to be a good

programmer a good engineer to optimize

these kinds of axes now what about sort

of

two of those tools in the toolkit well

let's focus on just a couple here uh

full circle at the end of the semester

abstraction recall was one of the tools

in the tool kit that we proposed is all

about taking like complicated problems


complicated ideas and simplifying them

to really the essence so you can focus

on really just what matters or what

helps you get real work done and then

related to that was also this notion of

precision even as you abstract things

away you still have to be super precise

when you're writing code for a computer

or just giving instructions to another

human so that they are implementing your

ideas your your algorithms correctly and

sometimes these two goals abstraction

and precision can rather be at odds at

one another and what we thought we'd do

is give everyone a sheet of paper today

which you probably received on the way

in if not a pen as well if you didn't

receive hopefully you or a friend near

you has a sheet of paper and a pen or a

pencil do go ahead and grab that and we

thought we'd uh come full circle too and

see if we can't get a brain volunteer to

come up on the stage here and we just

need someone to give some stage to

directions all right I like it when

people start pointing and pointing how

about you being pointed at yes

yes you yes come on down

will there be one more opportunity after

this come on down what's your name


[Music]

Claire okay a round of applause for

Claire for being so enthusiastic

come on over here would you like to make

a quick introduction to the group yeah

hey

I'm Claire uh yeah that's all you need

to know about me all right so what I'm

about to hand Claire is a sheet of paper

that has a drawing on it and the goal at

hand is for you all to ultimately follow

Claire's hopefully very precise

instructions because she's going to give

you step-by-step instructions and

algorithm if you will for drawing

something on that sheet of paper all

right we're going to keep it in this

vanilla envelope so that folks can't see

through it but this is what we would

like you to give verbal instructions to

the audience to draw and you can say

anything you want but you may not make

physical hand gestures or the like and

or dip it down so everyone can see it

that's so true all right go ahead step

one wait I could say whatever I want

related to this problem yes idea

oh my God give them instructions for

recreating this picture on their paper


okay

start with uh like

like

um a square

but

but it's no hand gestures okay okay

sorry sorry start with a square but it's

like a diamond cut like there's a point

on Top

[Music]

wait I should not be the one doing this

okay so it's like a square but yeah

start with a square

okay step two step two

is that on one of the sides of the

square there's another Square

[Laughter]

doing really well on the abstraction I

don't feel like I'm doing too hot okay

this is does this affect my grade and it

anyway no no

okay two squares okay and then there's

like another Square

but they're like not squares they're

like kind of slanted

um there's another Square in between

like next

next to those squares connecting those

squares okay

any step four


step four is that it should look like a

cube

okay so let's go ahead pause here pause

here let's let's thank Claire for coming

on up bravely I'll take this

if uh let's go ahead and collect just a

few of these if maybe Carter and Valerie

you wouldn't mind helping me grab just a

few sheets of paper if you'd like to

volunteer what it is you drew in those

seconds just hand it over if you would

like no need for a name or anything like

that

okay all right it's very eager thank you

okay thank you

all right

thank you thank you okay

uh sorry okay that's that's plenty let's

come on up if you want to oh you want to

hand me yours too

okay sorry to reach all right so Carter

if you want to meet me up on stage for a

second

so we have a whole bunch of submissions

here

that represent what it was Claire was

describing

let me go ahead and uh just project here

in a moment use my camera so here we


have one let's see Carter piffer to just

bring those on up here

okay so here we have

one I'll hold up

all right so some squares overlapping

started to look more like a cube thank

you so much uh here maybe in more

primitive form

with another one

this one kind of started to have wheels

which was kind of

and then things started to take shape

perhaps at the very end both Big Cube

and small cube what it was that Claire

was showing us now if we project it was

in fact this and it's actually exactly

what Claire you just went through it's

actually a perfect example of like why

abstraction can be hard and where the

line is when you're just trying to

communicate instructions So In fairness

might have been nice to just start with

we're going to draw a cube and like

here's how because that was kind of a

spoiler at the end but that too a cube

is an abstraction but it's not very

precise right like how big is the cube

at what angle is it rotated what is how

are you looking at it and so when you

were struggling to describe these


squares but no they're kind of like

diamonds or whatnot I mean that's

because of this tension between what it

is you're trying to abstract but what it

is you're trying to communicate you

could have gone maybe the complete other

direction and maybe have been super

precise and not abstract this thing in a

way as a cube but say to everyone all

right everyone put your pen down on the

paper now draw a diagonal line to say

Southwest at 45 degrees now do another

one one that's South you could really

get into the weeds and tell people to go

up down left right of course it could

get a little tricky if they sort of

follow the direction incorrectly but it

would be hard for us all to know what it

is we're drawing If all we're hearing

are these very low level instructions

but that's what you're doing when you're

writing code you might Implement a

function called Cube how it works is via

those low level instructions but after

that you just don't care you'd much

rather think about it as a cube function

maybe with some arguments that speak to

the size or the rotation of it or the

like and that's where again abstraction


can come in so as we've discussed for so

many weeks now these trade-offs were

manifest even in week zero even if we

didn't necessarily put our finger on it

just then why don't we do things in a

slightly different direction if we could

get one other volunteer okay come on

down I saw your hand first one other

volunteer who this time we're going to

give the pen to

we're going to give the pen to

and what's your name Jonathan come on up

so I'm going to make this screen be

drawable in just a moment but what we

need you to do first on the honor System

is close your eyes

all right eyes are closed everyone else

in the audience is about to see the

picture that we want you to draw and you

all the audience are going to give

Jonathan the step-by-step instructions

this time around so I stay closed this

is what we're going to want Jonathan to

draw so kind of ingrain it in your mind

if you need a refresher we can have him

close his eyes again but that's what we

want him to draw I'm going to go back to

the blank screen all right Jonathan you

can open your eyes we have a blank

canvas and now step one what would you


like Jonathan to draw first

draw a circle I heard

okay it's a little smaller I'm hearing

now okay you can move it

look no don't do that all right let's

we'll do we'll give you one redo use

three fingers to delete everything

uh three fingers all together

yep there we go

uh farther apart there we go no it's

back

okay I'll do this part yeah okay all

right so I heard thank you I heard draw

a circle would anyone like to finish the

sentence more precisely

a smaller Circle

on top

a medium-sized Circle

at the top all right that's pretty good

medium-sized circle at the top and no

more deleting after this good all right

step two

align straight down

[Music]

[Applause]

yeah okay good all right that was step

two nicely done

what's that step three

draw a line down from the bottom to the


left

[Applause]

at okay

good all right next let's go over here

next one

[Music]

same thing but on the right

yes all right uh that's what one two

three four step five

stay yes step five

[Music]

do that again but higher closer to the

circle

on the right side

[Applause]

oh

are we gonna have to go with it step six

step six

[Music]

starting from the neck draw a line down

into the right

you don't like that he's

what do you want him to do step six

[Music]

can't no one do

[Music]

where the other line ends say again

we're the other line and

[Music]

near the vertical line


where the other line ends

draw a line that goes down

okay a couple more Steps step seven

seven

draw a horizontally slanting line from

the end of the line you just drew

diagonally

[Music]

okay we're resorting to hand gestures

now but I think that's what you mean yes

okay good or good good all right I have

one or two final steps let's get as

close as we can

[Music]

say hi

make him say hi

no

[Applause]

[Music]

okay hi

[Music]

okay and maybe one final step we'll give

them one more

say again

[Music]

put one of those lines from high to the

circle

[Music]

a line between High and the circle


all right let's let's show Jonathan

that's pretty darn close let's show him

what what we had in mind was this so a

round of applause for Jonathan too if we

could

a bigger round of applause for Jonathan

if we could

all right so I mean this is actually

there is this thing in in computer

science you know this is pair

programming we're actually programming

with someone else and it's actually not

all that dissimilar trying to

communicate your ideas to someone else

but notice just all of the ambiguities

and it certainly doesn't help that we're

in a big space but all the ambiguities

that arise when you're just trying to

convey something precisely so this is

not necessarily as constrained as a

program would but it's representative of

the end of the day even after all these

weeks this stuff is hard and in fact

it's not necessarily ever going to be

completely straightforward because the

problems you're going to try solving

down the road presumably if you continue

to apply these skills themselves are

just going to get more and more

sophisticated but hopefully the the


feeling you get from accomplishing

something as a result is just going to

rise with them as well before we now do

a bit of review just wanted to offer a

few suggestions and answer to an FAQ

which is like what do I do after a class

like cs50 typically about half of you

will go on and take one or more other

classes in CS which is great building on

this kind of foundation and about half

of you will not like this will be it but

very very likely certainly given how the

world is trending will you have

opportunities in the Arts Humanities

social sciences or Beyond to just apply

programming to data sets to problems in

those own domains and so toward that end

we would encourage you to start thinking

about how you can transition from what

has been your Cloud code space in the

cloud to something client-side like

using your own Mac and PC here on out so

that you're not reliant on a courses

infrastructure a particular website and

even though we used a fairly industry

standard tool you can actually get

almost all of that stuff running with

some effort perhaps on your own Mac and

PC so terminal Windows actually come


built into Mac OS if you go to your util

applications folder utilities there is a

program literally called terminal that

has always been there even if you've

never used it that will behave very

similar to what vs codes does as well in

the World of Windows can you similarly

install a version of the terminal

Windows software that we used in the

Cloud 2 to actually run similar commands

like CD and LS and and much more we

would encourage you ultimately to learn

git you've been indirectly using git

this semester when you run certain

commands we have been using git

underneath the hood of some of cs50s

tools that essentially push your code so

to speak to the cloud to a place like

github.com but git itself is an

incredibly powerful and just useful tool

for one backing up your code somewhere

else to the cloud which is effectively

what we've used it for but to

collaboration so that you can actually

share your code more readily with other

people and three building much bigger

pieces of software where each of you

work on different files different

folders or even just different parts of

the same file and then somehow merge all


of your handiwork together at the end of

the day to build something much bigger

than you as one person could alone vs

code itself now too we've been hosting

it in the cloud a real version of vs

code but it's much more commonly used on

people's own Macs and PCs and you can

download it onto your own Mac and PC you

might have to jump through a few more

Hoops to get things like C working

though python is much easier to get

working as well some of the

configuration won't be quite the same

like your prompt might look a little

different and the like but that's just

going to be the case anytime you sit

down in the future at a different system

it's going to look and feel a little

different to things you've used before

but hopefully there'll be enough

familiarities that you can get yourself

up and running pretty quickly

nonetheless hosting a website not

necessarily something you have to do or

will do for your final project depending

on your proposal but there's lots of

ways to just host your own portfolio

page home page website whatever on the

internet itself using tools like these


GitHub or netlify or other tools too

most of which have like free

student-friendly plans some of these are

indeed paid services but they very often

have entry level plans that are totally

fine if it's just you on the internet

and you don't expect having thousands

ten thousands of users it's a drop in

the bucket for these companies and so

they very often have free tiers of

service if you want to host something

more Dynamic something like cs50 finance

that takes user input and output uses

sessions uses database faces you might

like something like kuroku and for

instance we have some documentation on

one of cs50's websites for actually

moving your implementation of cs50

Finance over to this third-party

application called Heroku so you can

actually run it or something like it in

the cloud as well here too using a free

tier of service all of these providers

these are big cloud providers these days

Amazon Microsoft Google and others all

have student friendly accounts that you

can sign up for during or shortly after

you're in school that just gives you a

free compute time and storage GitHub

itself has this whole student pack that


by transitivity gives you access to a

whole bunch of discounts on other things

as well so if you're liking this stuff

and you just want to like learn more

perhaps over break by playing on your

own these then would be some some good

starting points and as for just keeping

abreast of Trends in programming and

technology or the like there's so many

different blogs and websites out there

but here are just some a couple of

different subreddits so to speak on

Reddit that are very programming

specific stack overflow with which

you've probably probably

interacted server fault which is similar

TechCrunch Y combinator and other sites

too and ultimately we would encourage

all of you to stay in touch certainly

Beyond Today by the time you finish your

final projects we'll have something

waiting for you and if you want to stay

engaged either on the teaching staff or

just as a lifelong learner of Cs and

programming by all means check out any

of these URLs here but in just a few

weeks time will you have one of these to

your name your very own I took cs50

t-shirt which we will distribute before


long

as well and now if we may uh we have an

opportunity here to synthesize the past

several weeks of material if you would

like to go ahead and open up the URL

that we put on the screen earlier I'll

toss it up here again you can use your

phone or your laptop you might recall

for a previous problem set we asked you

to propose a whole bunch of review

questions multiple choice or the like

that synthesize the past several weeks

of material we took some of our favorite

submissions of those ported it to this

poll everywhere platform so that we

could interactively see where everyone's

minds are at understanding is that and I

think you'll find all of these are

written by you and your classmates that

we slipped a few fun ones they are also

written by you along the way of Carter

you want to come on up here to get us

ready if you haven't yet opened the

website go to this URL here on your

phone or your laptop

and let me go ahead and switch this over

here before Carter takes control of this

machine here

's that same 2D barcode again feel free

to background that now and in just a


moment we've got a 20 question quiz show

it's all multiple choice so long as you

have internet access whether you're here

physically or online right now you

should be able to buzz in within 10 to

20 seconds of seeing a question and I'll

read each one aloud I think Carter we're

just about good to go so does everyone

have the software up and running on

their phone or their laptop if not no

big deal just look on with a friend but

otherwise Carter do you want to say

hello to and tee us up absolutely hey

everyone we're going to go ahead and get

started here with our first question

speed here matters so our first question

David go ahead all right what does CSS

stand for it's the first question

written by you four possible options are

cascading style sheet coding style sheet

cascading style system coded style sheet

15 seconds up to 300 responses already

both here in person and online

give folks a few more seconds what does

CSS stand for these are the four options

that were provided three two one Carter

cascading style sheets at 86 percent is

indeed the right answer so congrats to

those of you 86 percent who got that one


here's the leaderboard you all have

fairly random usernames but if your

username is on this board here or really

any of the 86 of you that just got that

right all of you are currently in the

lead but we'll see if this shifts before

long

question two which bests describe the

role of a compiler is our next question

debug one's code run the written program

distinguish between functions and

arguments turn source code into machine

code

300 responses in so far 10 seconds to go

which best describes the role of a

compiler

three seconds just crossed 400 and

Carter

turning source code into machine code at

92 percent some excellent progress there

is indeed the correct answer and indeed

more generally a compiler just converts

one language to another the use cases

we've seen for it have been only source

code to machine code but as you go out

into the real world you'll actually find

there to be compilers from One Source

Code language to another source code

language that itself might be runnable

or compilable thereafter good job to all


of you guests and Carter number three

what is the type of ARG C asks a

classmate int stir Char float

[Music]

what is the type of ARG C

all right about 350 responses seven

seconds to go

about to cross the 400 threshold and

three two one the type of Arc C is

indeed caressed but we're now starting

to distinguish folks only 55 percent

there uh Char is not correct you might

be thinking of ARG V in C but even that

is not a Char it's a Char star array or

a Char star star in fact so it's not

just a Char stir is in Python but even

that too if you were thinking of sys.org

V that would be a list of stirs not a

single stir all right Carter it's the

leaderboard

all right there are guests all and still

tied and number four what is the

searching efficiency of a balanced

binary search tree

Big O of n Big O of N squared Big O of

log n Big O of n log n

what is the searching efficiency of a

balanced binary search tree the balance

being key because as folks continue


buzzing in recall that a cert binary

search trees can de degrade devolve into

linked lists big old Vlog and is correct

for 54 percent

all right now people are getting annoyed

but let's keep going number five

leaderboard's not yet that interesting

more subtle what was the cs50 Ducks

Halloween costume he's here in winter

dress today thanks to Valerie a skeleton

a vampire Frankenstein or a ghost

what was its costume at Halloween a few

weeks back

answers are coming in a little slower

this time

people online or perhaps clicking on the

video

and vampire is correct at 69 nicely done

all right guests are still shuffled in

the top oh and we're starting to see

some leaders pull ahead the time in

which you buzz in is also take into

account now in C how can we unify

several variables of different types

into a single new type

trees arrays strucks tables

[Music]

quiet and see how can we unify several

variables of different types into a

single new type


eight seconds

400 responses in

[Music]

450 and the answer is structs are indeed

correct recall that we had a student

struck and we saw structs later on for

nodes that allowed us to Cluster

multiple variables or data types inside

of our own brand new structure that we

then type def to a name Carter said we

see the leaderboard now

all right whoever guessed 40 45 and 4383

have eeked ahead ever so slightly so

buzzing in fast can now benefit your

score too next question Carter in Python

which of the following statements is

false

tuples are an ordered immutable set of

data dictionaries associate keywords

with values arrays in Python are a fixed

size python is an object-oriented

language which of those statements is

false

[Music]

three seconds answers coming in more

slowly

but the most popular answer is correct

arrays and python are indeed not of a

fixed size which is why that's false


they're not even called arrays they're

called lists and recall that they

dynamically grow and Shrink effectively

implemented for you as a linked list all

right how do we all right we have a

leader whoever 4383 is nicely done

what does stir comp return in C

it's trcmp does it return a Boolean an

integer a string or a char

[Music]

what does stir comp return in C used to

compare two strings of course

recall that it returns

potentially not just true false but ooh

an integer is indeed correct does anyone

Recall why why is it an INT and not just

a simple true false

why is there are three values helpful

exactly it returns zero if they're equal

or returns negative value or a positive

value based on whether one string comes

before or after the other asciabatically

so to speak based on its ASCII code uh

the results Carter all right 4383 still

doing quite well but

being caught up with here what is David

malin's phone number

949-468-2750 play when you call it

the Harvard alma mater a parody of Yale

song a recording of David Malin singing


Never Gonna Give You Up

feel free to call or text I can't get it

now but we have nicely automated that

process

four seconds 400 responses in and the

answer of course is Never Gonna Give You

Up thanks to a little programming and a

script that our friend Rong Shin wrote

that essentially answers the phone

automatically and replies with a URL or

a song Carter

[Music]

oh

Dethrone Dethrone 2688 nicely done next

question

from which of the following places does

malloc get free memory for a program to

use

keep stack array or pointer

from which of the following places does

malloc get free memory for a program to

use

[Music]

answers are a little slower this time

five seconds

[Music]

and the answer is in

okay that's the answer we were given in

the problem set but I think uh we would


beg to differ pretty sure Carter would

you go with I would go with the Heap I

think it's indeed the Heap so this

answer not correct

I know I know we just transcribed what

you gave us though let's see how that

affects the scores

okay 2688 is still doing okay next

question about 10 or so to go suppose I

have an unsorted list of items store

receipts perhaps should I sort the items

before searching for an element

yes you should always sort before

searching no you should never sort

before searching if you will be

searching the list many times then yes

you should sort first if you will be

searching the list many times then no

you should not sort first

some nuanced replies five seconds

a few fewer answers than usual at this

point and if you will be searching the

list many times then yes you should

short first an example that we discussed

of trade-offs because if you're just

going to do a one-off search and never

again why bother incurring n log n or n

Square time to actually sort the thing

all right some shuffling happening but

2688 nicely done next question


when you run the create index command in

SQL what type of data structure do you

create array B trees linked lists hash

tables

when you run create index recall we did

this with like the movie titles the TV

show titles to speed things up so that

things wouldn't be super long and linear

we did a different data structure

all right about 400 responses in the

answer

is indeed bee trees bee trees not to be

confused with binary tree a Bee Tree

typically has other children besides two

that pulls the data even higher up from

the leaves of the tree could use a hash

table could use a linked list but indeed

the technology and databases is

generally these things called B tree

certainly in SQL light Carter

oh Dethrone but 4179 has now pulled

ahead nicely done next question

what HTTP status code means I'm a teapot

zero zero zero four one eight zero zero

seven one two eight

this recall was a April Fool's joke by

technical people uh some years ago that

has become part of computing lore

it's still there though in the document


in two seconds we'll know that it's 4 18

indeed let's see how that affected

things

four one seven nine is way down on the

list 7280 is number one now

nicely done what is an example of a SQL

injection attack when someone submits

malicious SQL commands via web form

physically destroying a computer

hardware that stores a SQL database

overwhelming a server with thousands of

requests to access a database injection

attacks are only in movies or TV

five seconds some fun answers

400 responses about in and indeed when

someone submits malicious SQL commands

via a web form because the you the

programmer is not escaping the code

using the question mark syntax that

we've seen using cs50s library or other

third-party libraries like it Carter

7280 is still the guest to beat nearing

the end few more questions how are the

elements of an array stored in memory

contiguously in random locations that

happen to be available as a linked list

as a binary tree

how are the elements of an array stored

in memory

[Music]
about five seconds to go almost have

everyone in

two

one and contiguously is indeed the right

answer back to back to back in random

locations that happen to be available

he's probably describing your use of

malloc in the Heap but you would then

need a linked list or some other

structure to stitch those locations

together in Array by definition is

contiguous Carter

7280 is hanging on to that lead by about

uh 499 points next up is which SQL query

would allow you to select the ID of a

specific movie star Zendaya in a table

of movie stars

select ID where name equals Zendaya

select star ID for movie stars where

name equals and Daya select ID from

movie stars where name equals Zendaya

select ID for movie stars where name

equals quote unquote Zendaya

and I I'm spoiling it uh I should have

read out some quotes earlier too one

second

the last one is correct and indeed this

one's almost correct but lacks the

single quotes Zendaya is not a single uh


it's not a SQL key word it's of course a

string so it does need to be escaped

there but 63 of you realize that 7280 is

still in the lead I think we have a few

more questions to go

why is a hash table faster to search

than a linked list even though the run

time for both is Big O of n

the hash table actually has a big O of N

squared run time the hash table

optimally has Omega of O's run time the

hash table creates shorter linked lists

to search rather than one long linked

list the hash table takes less memory

and this was an example of practical

versus theoretical differences

and indeed that was interesting with 83

percent of you buzzing in the hash table

creates shorter linked lists ideally if

you have a good hash function rather

than one long linked list even though

technically it's still in Big O of n

7280 seem to know that is pulling ahead

of the crowd still a few questions is

Game of Thrones is a dot dot dot comedy

drama historical fantasy documentary

romance sci-fi or all of the above

this is written by your classmates

recall based on

our SQL week


in five seconds

we'll be reminded that according to our

CSV file they were all of the okay all

of the above

all right 7280 did okay with that next

question which of the following is a

golden rule when allocating memory

every block of memory that you malloc

must be freed only memory that you

malloc should be freed Do Not Free a

block of memory more than once all of

the above

more into the nuances of C this Golden

Rule when allocating memory didn't have

to worry about this in Python we did in

C in two seconds we'll know that all of

the above are indeed things you must do

not doing those would be in fact bugs

Carter the leaderboard

still doing well 7280 whoever you are

last few questions last question in fact

last question what do the binary bulbs

on stage spell today the answers could

be

uh faced with medical mask faced with

tears of joy snowman without snow or Red

Heart

what do the binary bulbs on stage spell

six five four three


[Music]

the answer is the red heart taking a

look at the leaderboard here who's our

winner

[Music]

the winner is hello guess 34.87 a big

round of applause for our guest thank

you to Carter

so it's it's nice that there's some

opportunity here

um because recall that in week zero we

did start talking about emoji and really

about data and representation and we

talked not about just binary but ASCII

and then Unicode and then when we had

Unicode we had all of these additional

bits that we could play with and we

could start to represent not just

letters of the English alphabet as in

ASCII but really letters of any human

alphabet and even alphabets that are

continuing to develop and indeed this

was faced with medical masks which we

claimed at the time was just how a Mac

or PC or Android phone or iPhone

nowadays would interpret and display a

pattern of bits like this this happening

to be for the four bytes that represent

that particular emoji and over time

humans have been deciding to use


different patterns for new and uh new

emojis that might not have existed

yesterday and indeed most anytime you

update your Mac or your PC or your phone

these days at least on a semi-annual

basis are you getting some new and

improved emojis and they're not just

these faces now they're of course

representing different human emotions

different physical objects and

ultimately among the Unicode

consortium's goals is to be able to

represent all human languages but were

it not for certain groups of people and

certain individuals these things would

all rather look fairly similar and

indeed today we're so pleased to be

joined by an old classmate of mine

Jennifer a Lee who was class of 99 here

at the college who's gone off to do many

many different things in life

prolifically so not only has she been a

writer an author a journalist for the

New York Times a producer of films like

the Harvard computers the search for

General Tso and the Emoji story which

focuses on exactly today's topic Jenny

and her colleagues have been involved

particularly with um championing


representation of different types of

people and cultures and languages and

these are just a few of the emojis that

our friend Jenny has indeed brought into

creation on our phones and laptops Jenny

2 is the original inspiration for what

has become it seems my Twitter

recommendations and all of these puppets

I was visiting her in Manhattan one time

some years ago she had on her shelf a

couple of Puppets known as Muppet

whatnots at the time you could go to FAO

Schwartz or the website therefore an Old

Toad store and you could actually

configure your very own Muppets and I

thought this was the coolest thing and

literally on the cab ride home from her

place was I logging into the website

configuring a couple of Puppets a couple

weeks later they arrived and then rather

sat on my shelf for a couple of years as

I wondered why I had just bought two

Muppets in the back of a cab but brought

them into the office at one point a

colleague saw them Drew inspiration from

them and now have they been woven really

into the fabric of this course in

particular and a lot of the courses

pedagogy at least incarnated here just

for fun but also in video form as well


which is only to say so glad that our

friend Jenny ately is here for us us

today to talk about these Emoji Jenny

all right

well this is very exciting I took cs50

in 1994

um to give you a sense one of my block

mates was the first intern for Netscape

if you guys have ever heard of Netscape

and I graduated just as Google was come

like we did not have Google when we were

undergrads so

um it's a honor obviously to be in um at

cs50 it's also very impressive to see

how David has turned it from uh

entry-level computer science course into

a Lifestyle brand that is world renowned

so it's an honor and I'm going to talk

to you today about how an emoji becomes

an emoji

um so first I'm going to talk about my

journey down the rabbit hole of how I

got involved with Emoji so this is my

friend Ian Liu she is a designer famous

for Designing the Twitter fail well

which was like this kind of image that

puffed up

when Twitter went down which back in the

day was rather often so she's Chinese


Australian American so which is like a

weird interesting combination and so one

day we were texting about dumplings

because that is what chinese-ish women

do we talked about food and so I sent

her this picture of dumplings and then

she said yum yum yum yum yum

um you know knife and forth knife and

fork knife and fork and then she was

like oh I'm surprised that Apple doesn't

have a dumpling emoji and I'm like oh

yeah that's kind of weird and you know

it's one of those things where

you know the thought comes to your head

and then it leaves I I was you know

which is sort of an observation but then

half an hour later on to my phone pops

up this like dumpling Emoji with hearts

actually you can't see it here but it

actually had like blinking eyes so she

called it bling bling dumpling

she's a designer so she decided she was

gonna fix

um this like lack of dumpling Emoji

problem and I was actually like really

puzzled like how could there be no

dumpling Emoji right

because you know I knew that emoji

originally Japanese this by the way was

back in 2015. So Japanese Foods super


well represented on the Emoji Keyboard

yo Brahman you have tempura you have

Curry you have actually Bento Box Curry

then tempura you have you even have like

kind of slightly weird foods like

um

let's see you had these like

things on a stick which are fish cakes I

discovered then you have this white and

pink swirly thing which is also a fish

cake you even have this like triangle

thing that looks like it's had a bikini

wax but in essence there were all these

foods that were on the keyboard but

there was no dumpling right

and I was like dumplings are this kind

of universal food like every culture has

some version of a dumpling whether or

not it's empanadas or ravioli or

um God what else ravioli pierogi Momos

you know the whole idea is all cultures

have basically found the idea like this

concept of like yummy goodness within a

carbohydrate shell whether or not it's

baked or steamed or fried so dumplings

are Universal Emoji I didn't use them

that much but I was like they're also

kind of universal so the fact there was

no dumpling Emoji tell me like whatever


system

was in place failed and I actually had

no idea I was like who controls Emoji

I'm going to go fix this problem like

there's something wrong with the

universe if there's no dumpling emoji

and I took it upon myself to like go fix

that so I Googled

um and I basically discovered there was

this thing you know called the Unicode

Consortium which is a non-profit based

in count uh let's see Mountain View

California that when I looked had these

like 12 full voting members so this is

late 2015.

of those 12 nine were

multinational U.S tech companies so

there's Oracle IBM Microsoft Adobe

Google Apple Facebook and Yahoo so these

were

um eight I think and then

you had the German software company sap

the Chinese company called Huawei and

then the government of Oman so these

were like basically the people who were

in charge and had full voting power on

unicode

so they paid eighteen thousand dollars a

year

um to have this full voting power which


is a lot of money I was like kind of

very indignant on like how this cabal of

tech companies basically control This

Global

um curated image based language on your

keyboard so there was a little bit of a

kind of loophole which is you could you

could pay 18 000 a year to have full

voting power or

um you can pay 75 a year as an

individual you had no voting power but

you had the ability

to sign up for the email list and also

show up at the meetings so put in my

credit card got an email list

and

like

was like kind of checking my email one

day when there was an invite that said

they were going to have a quarterly

meeting and I think this was going to be

October 2015.

and I looked it was in Sunnyvale I

looked at my calendar I looked at you

know the point that I was actually going

to be able to be in Silicon Valley at

that time so I took a bus to Apple where

they were having that meeting and I

don't know completely what I thought I


was going to see like I think maybe it

was going to be like maybe like a

Sanders Theater or like a little mini

Congress like people making Emoji

decisions but that was not what it was

basically this is the room where it

happens these in 2015 where the people

who were deciding Emoji you know these

were Emoji decision makers which were

not like the most demographically kind

of um diverse group they had a sense of

humor about it one guy had a shirt that

said shadowy Emoji Overlord

and so I decided along with my friend

Ian Liu to create a group called

EmojiNation whose motto is Emoji by the

people for the people and it kind of

kind of brought the voice of like the

normal world into the decision-making

chain

so

um you know we launched a little

campaign about dumpling emojis we made a

Kickstarter video

um let's see dumplings are one of the

most universal cross-cultural Foods in

the world Georgia has kinkali Japan has

gyoza Korea has Mandu Italy has ravioli

Poland has pierogi Russia has pal many

Argentina has empanadas Jewish people


have krep left China has pot stickers

Nepal and Tibet have Momos yet somehow

despite their popularity there is no

dumpling Emoji in the standard set why

is that emoji exists for pizza tempura

Sushi spaghetti hot dog and now tacos

which Taco Bell takes credit for we need

to write this disparity dumplings are

Global Emoji are Global isn't it time we

brought them together

oh yeah oh well we're at it how about an

emoji for Chinese takeout

[Music]

um

this is

giving a 2015 I wrote a dumpling Emoji

proposal this is it

um you know kind of different styles

like whether or not it's a head-on view

or a slightly diagonal view

um and so we we that's Yi Ying with then

um one of the co-chairs of the Emoji

subcommittee

and so along with dumpling we also did

take out box we got Chopsticks and then

fortune cookie which actually I have to

be honest I don't think fortune cookie

would have gotten in on its own merits

were not on the quote tales of the other


three so we got these four through

um and you know that that is how they

look today and I have to say that that

dumpling looks really photo realistic in

the Apple World unlike

the fortune cookie which has like no it

looks like a dead Pac-Man I don't know

what is going on with that

um design but uh so very proud you know

I also did a lot of research on Chinese

food in America and wrote a book called

the fortune cookie Chronicles produced a

documentary called the search for

General so so like I had a lot of moral

Authority on the issues of uh you know

Asian food in America not all things but

this one I felt like I had like made a

mark on the 2500 year history of emoji

oh sorry of uh dumplings I'm moving them

into emoji so it kind of gets in this

very complicated thing like how does an

emoji become an emoji and it's actually

fairly complex

um so let's say you have an idea for an

emoji you write a proposal

and then you submit it to the Emoji

subcommittee

um that line like debates and thinks

about it sometimes they have feedback

and they kick it back to you and if so


then you have to revise it and it kind

of goes around around in a circle and

and then they once they're happy with it

they kick it to the full Unicode

technical committee which is sort of

like a sort of a governing body within

Unicode on things Technical and encoding

so what are the kinds of things that

impact

um whether an emoji can be an emoji so

one is there popular demand is it

frequently requested

um

and at this point one of the very crude

ways that we measure is if you search

for it on Google does it have more than

500 million uh kind of like results

which is what elephant gets in English

and that's sort of like a median Like

Elephant is like kind of right in the

middle of like popular emoji and not

popular emojis so we use that as a

benchmark

there's a plus if there's multiple

usages and meanings for example

um

like sloth

that was an emoji that we did it also

you know it's both in it literally kind


of an emoji of an animal but it also has

lots of connotations so something has

lots of multiple meanings that kind of

gives it a bum

um one thing is visually distinctive

like does it work at little tiny Emoji

sizes and that's actually really hard

because there's some things that I think

could have been Emoji but don't

completely work when you try to shrink

it down and I'll give some examples of

that later and then

kind of filling the gap or completeness

is another Factor so for a long time we

had Red Heart yellow heart green heart

blue heart Purple Heart there was no

orange heart and so there was

um a gay designer from Adobe who was

like actually very heartbroken by that

so he had been substituting the pumpkin

to get the orange to get the rainbow and

so he proposed an orange heart and that

was you know obviously at that point

you're like yes that will complete a set

and another thing is is it already

something that you know one of the

companies

um has and therefore everyone else

is going to like adopt it and so a good

example for that is


um the binary I think it was a

non-gender binary Emoji the pink blue

and white flag so I have to say WhatsApp

is by far one of the most Rogue

um platforms so they just like randomly

like added it one day and we just

noticed it and we're like oh God given

that they have to do it now we have to

build it into the Stop

um so factors of exclusion

were against inclusion to be more PC

sometimes if it's too specific or narrow

um that works against being included so

poutine which the Canadians love is kind

of really specific and I know it's

really important to the Canadians but it

just kind of didn't have enough sort of

global appeal if it's redundant so an

example for that is a couple years ago

Butterball proposed like a roasted

turkey Emoji but we already had like an

unroasted live emoji of a turkey so it

wasn't clear that we needed the cooked

version to go with like the live version

so that didn't pass

um not visually discernible so this

one's actually really tricky

um and knocks out a lot of things so it

knocked out kimchi for example really


hard to do kimchi had Emoji sizes like

how you is it in the jar is it like you

know just sort of in a little bowl so

kimchi kind of got kind of died on that

another one that was really hard was

cave Emoji actually really hard out of

exercises and then this is interesting

no logos Brands deities or celebrities

and this is a new policy we just

introduced which is no more flags flags

were killing us in terms of all kinds of

complicated reasons and there was much

regret that we ever added flags and

um and and lots of politics so at this

point we're no more Flags

so once it kind of gets passed into the

the full Unicode technical committee The

Proposal gets voted on like once a year

and then they pass all the emoji for the

next year we just actually did that a

couple weeks ago and it takes a while it

gets sent to all the companies like

apple Google Adobe Facebook and then

they add it to all your devices and

and then ta-da it takes about 18 to 24

months from when you have first have

your proposal

to when it lands onto your devices so

EmojiNation has worked on a bunch of

emoji and so we've kind of shepherded us


through so one of the interesting

questions is why isn't that Unicode

controls Emoji so a lot of it has to go

um kind of do what has to do with the

history of emoji they were originally

popularized in Japan there was a very

one of the initial sets is from 1999

from docomo these were actually recently

collected by the Museum of Modern Art in

New York City

um and so all the the Japanese

um vendors had these like little glyphs

that they added to their character set

and the main problem is like if you were

docomo you had like you know one stat if

you were in SoftBank you had another set

so no matter what you couldn't you could

only kind of text the people who are on

your platform not across platforms and

that was a real big problem when Apple

and Google started introducing

smartphones into

Japan and there was sort of this kind of

understanding and expectation that if

you if you did something in your

smartphone you also want it to show up

an email and be sent into you know Into

The Ether and someone else is supposed

to get the same uh image that you sent


so that was not the case so in 2007 they

went to Unicode and asked them to like

basically unify the Emoji set and

unicode is interesting because its

mission is to enable everyone

uh speaking every language on Earth to

be able to use their language on

computers and smartphones and they

actually see this as a human right

because at a certain point if your

language cannot be captured digitally

it's going to disappear so you know they

spend a lot of time doing Chinese Arabic

uh Cyrillic in the very early days

in 2001 they actually had a proposal for

Klingon which they did not actually

accept at that point so they have three

major projects they encode characters

including Emoji that's actually what

they're most famous for they also have a

bunch of localization resources so

um that's like you know in this country

they use this as a currency and they use

this kind of

um time format and like it's you know

whether or not it's month month date

date year year in some countries it's

you know date date month month year

you're in other countries so they kind

of tell you what country cares about


what and then they also then have the

libraries

um so that no one's basically

programming things from scratch

so what's really funny is you say cldr

really fast it sounds like seal deer and

this really confused one of the

girlfriends one of the engineers why he

was always talking about sealed ears and

so she uh basically surgically attached

a bunch of antlers to this little guy

um and made a sealed deer and so it took

three years between 2007 to 2010 to

introduce the first Unicode Emoji set so

this these were the ones that kind of

came out it took many many years to

figure out like how to reconcile all the

different images and like which one

should we include which ones we

shouldn't include

um and as you guys probably know from

cs50 a Unicode code point is a unique

number assigned to each Unicode

character

so you can represent that emoji tears

with face of tears of joy as this or

this or the binary code

so Emoji are just kind of hanging out on

your phone
after 2010 until 2011 when Apple

suddenly made them much easier to access

on your phone and one of the kind of

confusing things of course is like Emoji

are very ambiguous and it's not always

clear what they mean and that's one of

the great Joys right it's it can be more

um there's there's much more

interpretation on on on in terms between

the sender and the receiver so if you

actually look if you start doing that on

Google the the autocompletes are like

what does it mean when a guy sends it to

you what does it mean when the girl

sends it to you and

um clearly many many people have been

confused by that emoji when it's been

sent to them so who can propose emoji

and the short answer is basically anyone

uh there's a Google form that is open

between April and August

um City hijab Emoji actually was

originally proposed by a 15 year old

girl who is Saudi Arabian but lived in

Germany

who actually got into Harvard and then

chose Stanford so I've always giving her

a hard time about that I know

on that I was

I was um
she wrote the proposal it's and it

through and she's actually the subject

of the documentary that we put together

called the Emoji story we also have a

group of argentinians who fought really

hard for the mate Emoji their national

drink and then there was this non-profit

for girls advocacy that really wanted a

menstruation emoji and they sent in this

bloody Underpants proposal which is like

really terrible I'll be honest so we

kind of worked with them and got blood

drop which actually is one of like

actually has done pretty statistically

like like well we were kind of surprised

actually how popular it is

um the skin tone Emoji were actually

proposed not from within Unicode clearly

it was done by a mom from Houston who's

also an entrepreneur because her

daughter asked her came home one day and

said

um

I'd really like an emoji that looks like

me and her mom Katrina Parrott was like

that's great honey what's an emoji and

so but she actually had worked in

procurement with NASA and so she

understood foreign from proposals and


she actually was the one we should thank

for having like five skin tones today

um women's flat shoe and sort of the

one-piece bathing suit and as opposed to

just the you know uh yellow you know

teeny weeny yellow polka dot bikini is a

mother of three now four who

um just wrote that because she was very

offended that all of the shoe Emoji had

high heels for women

um I actually really like this guy some

random guy in Germany came up with this

uh Emoji as we like to say it's a kobear

Emoji he wrote a proposal and got

accepted because it was a really good

proposal then you even have governments

The Finnish government

like literally to finish government

their equivalent of the Department of

State uh proposed a sauna Emoji which

these are the images and I think they're

really ugly for I mean they're all

there's there's so many problems with

this Emoji but we helped them as Emoji

Nation first we like got rid of the club

feet and then you know gave them you

know sort of examples like you know do

you want them to hold the ladle do you

want like the the sort of steam around

it do you want like it um


you know with like

clothing or not clothing we actually did

um a little you know a little bit of a

towel for the more modest in us so it

got passed and then the way it ended up

is basically person in a steamy room so

this is how it kind of evolved so you

can see that is what Finland kind of

submitted that is what we submitted and

then that is how it's ended up on your

phone and that is basically supposed to

mean sauna Emoji

um so one of the questions is like why

do I care so much about emoji and

representation of emoji and a lot of it

has to do with the fact that I grew up

speaking Chinese and like going to

Sunday school or Saturday

um Chinese School

and

and as you can see there's sort of like

some really interesting parallels

between modern day emoji and like

Chinese radicals and characters from a

long time ago so this is fire this is

mouth this is tree this is moon this is

Sun uh you can mix and match them in

Chinese as well so one of the

interesting ones is like you know two


trees together

basically makes a forest you have like a

sun and a moon together and that means

bright in Chinese

it's kind of fun then

um this one's fun right so it's

basically a pig underneath

a roof

so you're like oh maybe that means Farm

or like I don't know like a barn or some

kind of like animal thing but actually

that in Chinese means home ja or family

so like whom is where your pigs are

which I think is says a lot about

society and what people cared about way

back in ancient China

um

this is one of my favorites so this is a

character for woman or female me and I

guess it kind of looks like this like

you know she's like curtsying or

something so

um super interesting character if you

like grow up

like you know writing your characters

you know so

um so this is a woman underneath a roof

and you're like oh that might mean like

wife or family or something but um it

actually doesn't it means peace on so


the idea is like things are at peace

when the woman is under a roof which I

always thought kind of like I felt like

kind of weird about that growing up

um another one is okay there's a woman

and then you have a child or Boy Child

specifically

so you're like oh that might mean family

or mother or something but actually it

means good so the standard for good in

ancient China was a woman with a Boy

Child which I thought was also you know

as a six-year-old was I found

problematic as well

um and all kinds of interesting things

in Chinese use the female radical so

three women together means evil this one

means greedy this one means slave

this one means jealous

this one means betrayal or adultery

which I think is interesting

so in case you want to bring this to

your favorite 10 year old we have a

Chinese like an emoji uh kids book

coming out from MIT press in the fall

called Han Moji from so it's from MIT

teen press it's super fun so it's a lot

of these Concepts so like a little bit

more rigorous and


um this idea of

like gender in emoji was really

important to a bunch of us as we were

kind of working through the issues so

for a long time you know on the Emoji

Keyboard there are all kinds of jobs you

could have as a man like you could be a

police officer you could be a detective

you could be a Buckingham Palace guard

you could even be Sienna you could be

Black Santa right but until as of 2015

if you're a woman there are only four

jobs you could have on the emoji

keyboard so you could be a princess you

could be a bride you'd be a dancer or

you could believe bunny so those are

your like four choices and

um so we worked really hard on like

trying to diversify what women could be

and one of the ways we did it was

through this idea of like combining

Emoji so in Emoji Land there's something

called zwidge a zero witch Joiner and a

lot of emoji you see are actually glued

together

so the rainbow plus flag is how you get

rainbow flag and this is actually how

we've worked on introducing a bunch of

the

um the occupations in Emoji Land so a


lot of these are like you know the chef

is a woman plus

um like the Fry frying pan or a teacher

is a woman Plus or a man actually plus a

school

um and

so one of the interesting things is you

can actually have

um

as a result of all the gender parity

stuff we actually had to make male and

female versions of all the Emoji because

some of them originally were passed as

like man and tuxedo and now because we

had gendered versions of everything we

now have women with tuxedo I don't know

if you noticed there's also a man in a

wedding dress to to kind of compliment

the woman in a wedding dress

um there's now actually also bearded

woman I don't know if you've noticed

that

so it gets interesting because

originally I was from we had passed

women breastfeeding and then there was

like all this like complaints coming

into Unicode about what about men as

caretakers you can't actually tell she's

breastfeeding it's more just like she's


holding it so people were like what

about the man As a caretaker like

fraternity leave and so

um there was now like like man kind of

nursing the child and

um

the other kind of ways you can combine

the Emoji are through skin tones so

unfortunately those are not through his

ridges this is sort of older kind of

Technology where you have all the skin

tones are basically the yellow character

plus like a little square box at the end

we call them skin tone modifiers and

um in terms of what are the things that

we worked on at EmojiNation which is one

of the hardest ones was to create

the interracial couples and we worked on

that with Tinder which really cared

about it because apparently which I

thought was interesting when you

introduce online dating into a community

the rates of interracial marriage go up

and there's a pretty interesting

academic favor that kind of

systematically looks at the rollout

through different countries and

different communities

so it was really nice to see it

introduced on the phone one of my


friends cried

um in terms of EmojiNation Emoji we've

worked on a lot so these are just a

sampling of the ones that we've

um done I really liked let's see

DNA I feel really good about Lobster on

behalf of people from Maine

yarn and thread for all the people who

like knitting there was Bagel emoji on

behalf of like all New Yorkers

this Emoji actually which we called

microbe was like very sleepy on the

keyboard until 2020 and it really had

its moment I really really kind of proud

proud of that one and

um

there is Yoga Emoji sponge so these are

just a sampling of the ones that we

worked on and this is a sampling of the

people who have contributed you too if

you feel really passionate about Emoji

could like impact billions of keyboards

worldwide

so it's interesting to see in terms of

frequency of use it's very power law

right so here sort of these are actually

um

like order magnitude like so one is half

of this two is half of one all the way


down and one of the most stunning things

I I was surprised to see is that face of

tears are Joy by itself is like almost

10 of all Emoji scent 9.9 of emoji is

just that one character number two is

red heart which I guess you guys can see

in its binary form and then it falls off

like pretty quickly so I know I'm

hearing that face of tears are Joy is

very Boomer or very Gen X and that it's

uh maybe among you guys it's a little

bit kind of um

blase or de Clase at this point

um so the future Emoji we really don't

Unico does not want to be encoding Emoji

um and along the way I became a vice

chair of the Unicode Emoji subcommittee

so I went from like kind of shaking my

my fists at the institution to becoming

part of the institution

um so there's one idea this coded hash

of arbitrary images can we create a

system where instead of

um just using a binary code to represent

a different Emoji we actually can

do specific images we create hashes and

then like you look and you can look up

like by the hash which image you're

looking you know at so that was the idea

this is from a standard Professor didn't


really get take off then there was this

idea using Wikipedia

um or Wikimedia the wikidata qid numbers

which I didn't know this until this

proposal came along but everything in

Wikipedia has a number

and that allows it to sort of match

things between different languages so in

Chinese the page for Obama match is

matched with the English page with this

you know Arabic page and that went

nowhere so

um what I'm going to finish with is

telling you what the new Emoji are you

guys are among the first people to hear

about this because no one's really

paying attention so this probably this

was published a couple weeks ago but

like like it made no news because you

have to be looking at the Unicode

register

um so first off more Hearts because you

guys all love heart so there's light

blue heart gray heart and pink heart

there was kind of debate do we need more

pink hearts the answer seems to be yes

um light blue is really interesting

because In some cultures light blue and

dark blue are different colors in our


culture we just call them like versions

of blue it's sort of like how in r in

English pink and red are different

colors but

um In some cultures there isn't a

difference between pink and red

then there are a bunch of bird things

the wing Emoji is coming Blackbird and

Goose I don't really don't really know

why

um High incense uh as a flower this has

like very popular and Iranian culture

jellyfish I don't know I I

I'm very suspicious of jellyfish because

um they used to they used Mana War as

one of their like phrases that they

searched for and that had a billion I

think like it had a lot of entries and I

feel like those were not about the

actual invertebrate like there was

something else going on there but kind

of wrote in on that

um moose on behalf of the Canadians

donkey on behalf of I guess the

Democrats

um so that was interesting because like

you had to have the donkey look

different from a horse and there was a

whole debate like do you want a donkey

head or do you want a donkey body do you


want donkey with fluffy ears do you want

like all kinds of donkey debate and it

was actually originally proposed in 2019

and just got in this year

um Ginger uh and Peapod these are these

are kind of weird like the food things

kind of got in in a weird way Ginger was

good because it also represented root

and then Wireless got in which is

interesting because we couldn't use the

phrase Wi-Fi because that's actually

trademarked by like the wi-fi people and

then on behalf of um seeks conda finally

got it it was the largest religion that

wasn't already represented on the emoji

keyboard and then on behalf of like the

faces shaking phase so

I don't I'm glad you guys are really

excited by that it is it is unclear to

me

um like I was not a big proponent of

this but your excitement about it makes

me change my mind

um then holding hand fan I actually find

that one interesting because I think it

was just like college students were

fresh out of college students who are

like we want to do a proposal that

passes and they were very opportunistic


and just sort of like chose fan and then

first they submitted electric fan and

then we

tell them like oh the longevity for

electric fan isn't great even though

it's been around for a couple hundred

years why don't we go with the folding

hand fan which is a much longer history

and then this one is actually a big deal

is um afro hair pick on behalf there's a

lot of controversy about debate about

curly hair and it's supposed to be

represent afros and then Apple did not

do that so everyone else was very

afro-looking hair Apple just makes it

look wavy and so there was like

upsetness that like black hair wasn't

represented in emoji set and so this was

a proposal that someone worked on

um and then animals uh sorry not animals

instruments Maracas and flute

um and that's a so in terms of if you

have any questions you can look at

emojination.org you can email me for all

things Emoji

um Jenny at emotionation.org and

remember you guys can actually impact

billions of keyboards around the world I

mean it's a little bit of impact for

humans but billions so it adds up to a


lot and if you have any more questions I

am here and can give you know lots of

answers and questions and I'm really

thrilled actually to kind of bring the

Emoji

um flag waving to such a large crowd and

especially a large you know diverse and

very motivated crowd and

um one of the interesting things is

we've kind of like

this is I'm not a profound of this but

they slowly decrease the number of emoji

per year it was like 70 then it was 50

then it was 30 and this year we only did

20 and I'm

um I'm a little bit sad about that but I

hope that you know if there's more you

know excited

um proposals that can be submitted to

Unicode we might be able to dial that

number back up so

that is me am I good yay so thank you

[Applause]

we'll think this is about 20 years late

but thank you so much Jenny we have in I

took cs50 t-shirt for you

on the way out too we have some cs50

stress balls for you cannot wait to see

your final projects coming up if you'd


like to chat with Jenny this was cs50

see you soon

[Applause]

[Music]

thank you

[Music]

foreign

[Music]

thank you

[Music]

all right this is cs50 and this is first

year family weekend here at Harvard so

welcome to all the moms and dads

brothers sisters cousins Aunts Uncles

grandparents and Beyond cs50 here is

Harvard University's introduction to the

intellectual Enterprises of computer

science and the Art of programming and

what that means is that what we've been

doing in here over the past several

weeks is introducing students to

computational thinking the process of

cleaning up one's thoughts and

expressing oneself all the more

correctly all the more precisely and

ultimately translating those thoughts of

course to a computer in the form of

programming which is where we've spent

quite a bit of time programming writing

code over the past several weeks but


toward that end we've also been

equipping students with some basic

building blocks you might already know

if a parent a that computers only

somehow speak zeros and ones even if

you're not necessarily a computer person

yourself or know what that means but

with those zeros and ones can we

represent numbers and letters and colors

and videos and more and in fact fact

your your child perhaps sitting next to

you could perhaps tell you what today's

message says here we have 64 light bulbs

on stage and if you look at eight of

them at a time there's a pattern of

bulbs that are either on or off that if

you know the code so to speak can you

actually convert these bits these zeros

and ones in light bulb form to today's

particular message now before we begin

we thought we'd make this as engaging as

interactive as possible rather than

focus on any assumptions of Prior

Computing knowledge you need to know

nothing today other than how to operate

for instance your own phone or a laptop

or desktop or the like and indeed we'll

assume a general audience and in this

Halloween week what we also see if we


can't scare you a little bit into

practicing better practices when it

comes specifically to the security or

cyber security of the device you carry

with you every day in your pocket use on

your desk on your laptop or Beyond so if

you haven't already whether you're here

in person or tuning in online go to this

URL here which will lead you to an

interactive polling tool any phone or

laptop or desktop suffices if it's a

little easier than typing in this URL

you can just scan this code with your

phone's camera

take a moment to just open your camera

and hopefully if you're at a good enough

angle and we've made this thing big

enough this is a two-dimensional barcode

or QR code embedded in which is that

exact same URL we're increasingly seeing

this throughout the world as a mechanism

for doing what many of you are doing

right now linking the physical world to

the virtual but that URL again is simply

this one here and in a moment you'll see

on your screen it's okay if you weren't

quite able to get that working feel free

to glance to the left or to the right of

you for someone else who did let me go

ahead and full screen a question just to


ask of everyone here as we focus today

on cyber security

is your phone secure

whether an Android phone an iPhone or

anything else if you're holding it in

your hand right now here in person or

online you should see three possible

answers yes or no or unsure we've got

over 300 responses come in already in a

moment I'll flip over and reveal the

results and see if we can't see how much

work we have to do together here today A

few more seconds almost up to 400

answers

almost up to 400 it's okay if those keep

coming in I'm going to toggle back and

show the results in just a moment here

and the results are now in according to

a response rate of over 400 it looks

like 36 percent of you don't need what

we're about to do here today which is

great we'll see if we can't poke some

holes though and maybe some assumptions

you all are making 31 32 percent maybe

of you are a little are saying no your

phone is not secure so so glad you came

and then understandably to another third

of you are unsure so in very good

company today and we'll see if we can't


open the eyes of everyone in each of

these disparate audiences well let's

consider first for a moment exactly how

we might think about the security of our

phones representative of just any

Computing device and in fact everything

we discussed today could be extrapolated

to laptops and desktops and servers but

all of us being so familiar with phones

let's start with phones themselves now

odds are you have on your phone like so

many other things in your life a

password or a passcode and in fact

without raising your hands and therefore

leaking information think to yourself

well what is my my password or passcode

it's probably four digits it's maybe

four letters maybe it's even longer

maybe it's even nothing and I think

maybe from the chart earlier we can

assume that we have a third of each of

those possible responses so a password

of course is the super common mechanism

that you and I are all using all the

time to keep our devices secure but do

passwords keep things secure like how

many of you thinking about your phone

right now and that specific password

might think it's secure and if so

why do you think it's secure


we have at least 33 percent of you are

ready to say that your password secure

don't want to know it but why might it

be in your mind secure

why might you think it's secure or more

generally what makes your password

secure

it's random okay so it's random so

random letters and numbers and the like

and that's great because it's not just a

word in the dictionary that someone

could guess and type in downside of

course I dare say is that it might take

you as well as anyone else quite a bit

of time to guess or figure out what or

just to remember what it is if it was

indeed random but Randomness is going to

be a primitive that really actually

helps us unfortunately you and I and

really the whole world are not very good

even at passwords as omnipresent as they

are as a defense against adversaries in

fact if we look at um if we look at the

most common passwords from the past year

in 2020 thought we'd share with you some

of those results this is the result of

security researchers having found uh big

exploited compromised databases

analyzing them for what passwords are in


them and then inferring from that what

the most common passwords you and I are

all using unfortunately in 2020 the most

common password according to one measure

was one two three four five six

how funny yes but if you're seeing your

password on the screen already not so

funny perhaps

the number two password was not much

better

number three picture one presumably for

a device a website that requires that it

not just be a word to have at least one

number which this person took these

hundreds of thousands of people took

literally password was number four

this past year one two three four five

six seven eight one one one one one one

one really not trying hard there one two

three one two three varying it a little

bit one two three four five was number

eight one two three four five six seven

eight nine zero was number nine and then

number ten in 2020 was senya which any

Portuguese speakers here means

password means password so made the list

twice in this case so one takeaway

already today should be if your

password's on this list like probably

you're in one of those other 33 whereby


we can do better than this why I mean

really the obvious if you're in this

list there's so many bad guys so to

speak out there that are going to try

guessing your password first why because

just statistically if they try one two

three four five six one two three four

five six seven eight nine they're just

gonna get into a lot of devices quickly

because they're just so commonly used

those passwords you don't want to be on

this list ideally you want to be random

but we want to somehow balance

Randomness with memorability so that you

don't actually keep forgetting your

password which of course defeats the

whole point of these things in the first

place but in a class like this cs50 and

computer science more generally let's be

a little more thoughtful as to what we

mean by a device being secure like what

does it mean to be secure and can we

even slap some numbers on it so that we

can make measurements so that we can

ideally compare and contrast one one

system versus another one password

versus another so it's not just our

instincts arguing that my password is

better than these but how can you


quantify that perhaps well let's start

simply a lot of Android phones and

iPhones these days require minimally

that you have like a four digit passcode

you're minimally encouraged to have at

least this bar set so that you're not

having no passcode altogether so if you

do have a four digit passcode well let

me go ahead and ask this question how

much time might it take to go about

cracking so to speak that is figuring

out what a four digit passcode is in

fact let me go ahead if you want to pull

up your devices again you should see on

the screen this question now how am I

how long might it take to crack that is

figure out guess a four digit passcode

for instance on someone's phone a few

seconds a few minutes a few hours a few

days thinking here from the adversarial

perspective if someone got a hold of

your phone somehow

how long do they need to get into your

phone if it has a four

digit passcode a few seconds few minutes

a few hours few days got about 300

responses so far

let's give folks another few seconds

here

another few seconds here all right up to


350 or so in a moment let me go ahead

and flip screens over to the results so

we'll see the preliminary results here

and if I now pull this screen up we see

that 50 percent of you claim that it's

going to take only a few seconds a few

of you say about a third a few fewer of

you are saying that it takes a few

minutes a few hours and even a few days

well let's answer that first because

honestly if it's already a few days or

even longer our work is here probably

already pretty done unfortunately the

problem with things like four digit

passcodes is that anyone who grabs your

phone you step out of the room you leave

it behind you lose it they could

certainly mimic your input device and

just use their finger pretending to be

you trying zero zero zero zero nope zero

zero zero one nope zero zero zero two

nope and it's a little slow to be fair

it would take me a while to count all

the way up to

9999 that's ten thousand total

possibilities there but let's go ahead

and consider exactly how else you could

do it for instance here uh it's an

example of in computer science what we


call a Brute Force attack and just an

adversary using their finger is a Brute

Force attack if they're trying all

possible passcodes the problem is even

if your passcode is way at the end of

the list of numbers eventually they're

going to get it by brute force sort of

like in yesteryear using you know a

battering ram or the like to brute force

your way into a building a castle or the

like in software sense it just means

trying all possibilities and you don't

even have to just use your finger right

anyone with some programming Savvy who's

good with Hardware could maybe do

something like this here's a quick video

I'll hit play on no sound but a little

bit of a robot that has an Android phone

underneath it and it's got

a little robotic finger that's doing the

work for you you can step out of the

room now as the adversary let the robot

do its work trying zero zero zero zero

through nine nine nine nine and

ultimately presumably get into that

phone so let's see if we can't quantify

then exactly how fast the human or the

robot could get in well how many total

possibilities are there that's the right

way to begin thinking about it if you


have 10 digits for the first one zero

through nine and then another 10

possibilities another 10 another 10 the

total number of possibilities of course

between zero zero zero zero and nine

nine nine nine is ten thousand ten times

ten times ten times ten which gives us

that much of a search space a universe

of possible passcodes to choose among

unfortunately you can do even better

than your own finger or even that robot

anyone in cs50 now who knows a bit of

programming in languages called C or

python or anything else could open up a

programming window and actually just

start writing some code and so let me do

that what you're seeing here if a family

member is a programming environment

called Visual Studio code that students

have been using for the past several

weeks up here we have a tabbed window

where we can type our code down here we

have what's called a terminal window

where I can type command to make the

computer run that code and then over

here is just a menu bar so crack.pi

means I'm going to write a program to

crack that is figure out passwords using

this language called Python and you know


even though most cs50 students wouldn't

know what code to start writing they'd

have to look up some of what I'm about

to do it's only going to be a few lines

so I'm going to go up here and say from

string import digits this is a fancy way

of saying hey python give me access to

all decimal digits it just avoids my

having to type out zero through nine

manually alright then I'm going to say

from either tools import product this is

another feature of python that cs50

students for the most part have not yet

seen that just says Hey python give me

the ability to do like the cross product

of a whole bunch of numbers so these 10

times these 10 times these 10 times

these 10 and then what am I going to do

with that well for each possible

passcode in the product of those digits

repeated four times I'm going to go

ahead and for now let's just print out

what the passcode is in other words

assume that I am now the adversary I

don't want to waste time using my finger

I don't have a robot that I made but I

am good at writing software and heck

I've got like a USB or a lightning cable

in my bag that I could connect your

phone to my Mac or PC and I could just


have my code that I'm writing now send

all the possible codes from laptop to

phone to automate this process just

using the little port at the bottom of

all of our phones well let me go ahead

and maximize this so-called terminal

window which is again where I'm going to

run this code and again the question a

moment ago was does it take seconds

minutes hours days well let me go ahead

and run python of crack.pi I'm

pretending for the moment that I did

grab that cable for my bag and plug it

into the phone hitting enter and it

doesn't uh didn't actually do anything

that was not supposed to happen

so in cs50 we spend a lot of time

introducing students to bugs uh which

aren't mistakes in programs sometimes

not so deliberate let me go ahead and

apologize let me open this file

this didn't technically happen Okay

python or character there we go okay

in cs50 we now will run the code here

and I'm going to go ahead and run a

command called python of crack.pi I had

the file in the wrong location a moment

ago and this is the equivalent on a Mac

or PC of double clicking an icon here we


go is it seconds uh minutes hours or

days barely one second to try all 10 000

possibilities you can't even see them

all on the screen but this printed out

zero zero zero zero all the way down of

course to nine nine nine nine plug in

that cable and boom the adversary

doesn't need to be in that room for very

long in order to get into that that

phone all right so what would be better

then like clearly four digit passcodes

bad if you have someone in your life who

has a finger or a robot or the ability

to write code and unfortunately because

of us you now all have someone in the

family with at least a third of those

how might we do better than this

what's better than a four digit passcode

anyone

yeah

okay so six digits heck or seven digits

or eight digits why because that's going

to make the of course the passcode

longer which means we're gonna have to

try more possibilities which doesn't

mean that the adversary is fundamentally

stopped but it is going to slow them

down it's going to take them more time

probabilistically to get to your

passcode and in a sense then increases


the cost to the adversary and indeed

that's the theme in cyber security

raising the cost to the adversary either

financially or time wise or the like

just like in the real physical world

most of you go home you lock your doors

at night you might have invested in a

better deadbolt than another why is that

you really just want to be more secure

than the house next door you want to

make sure that it takes too much time

too much effort too much risk to the

adversary to get into your home and

that's again what cyber security is all

about to say my phone is secure it's

sort of nonsensical to say that your

phone is more secure than someone else's

that's really a reasonable Fair

statement to make so I like this

Instinct let's see if we can't make

things a little harder and actually

let's go one step further rather than

just numbers you've probably noticed on

your phones you can use letters of the

alphabet too if you click the right

option on the phone you can start typing

in words and letters so how might we do

that instead well let's transition to

four letter passcodes for letter pass


codes and if we do four letter pass

codes uh where the letters of the

alphabet for instance are a through z in

English alone let's go ahead and ask

this question here if you have four

letters of the alphabet so let's not

increase length yet let's just change to

a bigger vocabulary

now we have a through z instead of zero

through nine how many four letter

passcodes are possible how big is that

universe that the adversary is going to

have to search

via Brute Force so I'm seeing a lot of

seven Millions a bunch of fifty two

thousands twenty six thousands ten

thousands nine nine nine nine a few

smaller numbers here hopefully it's not

this low right because we've already set

the bar at ten thousand possibilities

for numbers alone hopefully if we've got

English letters a through z we can at

least do better than ten thousand so I

think we'll start to see maybe some of

these bars change a little bit but we've

got sixty percent of you proposing seven

million well let's go let's go to the

math so here we might have a way of

thinking about this both uppercase and

lowercase even better if you consider it


that way lowercase A through Z uppercase

A through Z that's 52 possibilities for

the first digit times 52 times 52 times

52 or 52 to the fourth power that indeed

gives you seven million plus

possibilities all right well let's now

translate this to code that already

sounds way better ten thousand versus

seven million this is definitely going

to slow that hacker down well let's

consider exactly how fast or slow it

might now be let me go into my crack.pi

program and let me make a little tweak

so that instead of just using digits

this time I'm going to use letters

otherwise known as ASCII letters that

cs50 students will know that just means

familiar English letters of the alphabet

and I'm going to change my code to use

these ASCII letters four of them still

instead of digits alone and that's the

only change now I'm going to pretend to

plug my phone that I just stole from

someone into a USB or a lightning cable

let me maximize my window just so we can

see things a bit more let me run python

of crack.pi now and let's consider how

long it takes to do 7 million possible

codes
okay slower

slower

can't dramatically just say in one

breath that we're done but we're already

at the G's and then the H is

and it's kind of flying by you know this

is where the adversary is probably

getting nervous in the TV show or movie

right someone is tiptoeing around in the

other room you don't want them to come

in you only have this much time to crack

the code

and we're at the RS the s is

T's use V so you know this feels like

what a minute or so it's a good number

of seconds but it's still pretty brief

certainly if someone has the ability to

now we got to do the capital letters too

certainly if someone has the ability not

to just secretly do it like in Hollywood

in the Next Room but just take it with

them and do it over the course of a

minute or two at home this seems to be

faster sorry this seems to be slower

because we're trying so many more

possibilities but you know if the

adversary takes your phone has it long

enough this doesn't feel like terribly

long so what might be better than this

let's take it one step further


what might be better than four letters

what what do most websites ask you to

add to the mix

so special characters right and those

things are darn annoying right because

sometimes they even tell you what

letters punctuation symbols you have to

use and then you type one and you're

like oh it's not on the damn list I mean

it's frustrating why well it's going to

raise the bar though to the adversary

and that's indeed going to be the goal

here again just to increase the cost or

time required for the adversary so that

it doesn't finish like it did just now

after a couple of minutes but it's going

to keep going and going hopefully such

that they're going to lose interest in

your phone and go try to crack into

someone else's presumably so let's try

this let me now go over to how about one

other question here and this question

will now just be let's go from four

characters how about let's take it one

step further and mix the two ideas here

more digits and longer passcodes how

many eight character passcodes are

possible and by character as a cs50

student will know I mean number or


letter or punctuation symbol now and

there's like 32 or so standard

punctuation symbols so we're up to a

good set of numbers now how many eight

character passcodes do you think are

possible million billion trillion

quadrillion or quintillion all of which

of course are better than 10 000

possibilities so we're in a whole

different space now

looks like these answers are coming in a

little more slowly perhaps as folks

think about this

this is 10 digits plus 52 letters plus

32 punctuation symbols

much more secure it would seem

all right we're up to 230 responses give

folks another second or so

if you're trying to do the math 10 plus

52 plus 32 that's going to give you 94

possibilities for each of the digits

all right we're just about at our

just about out our

350 all right I'm going to toggle over

the screen here going to click over to

the results show them in just a second

on the screen now and this is an

interesting distribution I think some of

you perhaps have the Instinct now just

go for the biggest one


um it's not quintillion nice as that

would be maybe it's quadrillion trillion

billion or million we have more of a

split there so let's consider the math

so if we've got eight characters and I

claim uh that that's 94 possibilities

for each 10 digits

52 letters 32 punctuation symbols that's

94 to the eighth power essentially and

that indeed is six quadrillion

possibilities now that's crazy big at

this point I dare say we're pretty safe

from the human finger now we're probably

pretty safe from that robot which is

going to take a while too but Max and

PCs are pretty darn fast and you know

God forbid the adversary have a big

server or use the cloud so to speak and

really use a big expensive machine how

long does it take to get into six

quadrillion possible passcodes well how

might we think about this suppose just

for the sake of discussion it takes the

adversary one second per code just so we

have some unit of measure to start with

one second per code which means in the

worst case the adversary rarely gets

screwed and my passcode is like nine

nine nine nine nine nine nine or with a


lot of crazy punctuation symbols in it

if each passcode takes a second to guess

how long is it going to take the

adversary if in the worst case they

spend six quadrillion seconds

how many hours or minutes

or days or

years I'm hearing a lot a lot is in fact

correct I did do the math the adversary

if they're lucky and get all this way

they're going to be 193

000 years old by the time they get to

all of those possible passcodes so this

sounds alluring and in fact let's just

change our code one final time just to

get a sense of how this might look and

behave in this version here let me go

back into my code

and let me change this now to use not

just ASCII letters but digits and I'm

going to add in punctuation for cs50

students there is again this Library

called the string library that gets lets

you just import all of these symbols

automatically so we don't have to type

out every character on my keyboard

manually and then down here I'm going to

take the product of those ASCII letters

again plus those digits plus the

punctuation repeated eight times I claim


this time I'm going to now increase the

size of my window just so we can see

more on the screen rerun the code and

this is going to take us you know some

hundreds of thousands of years so we

won't run to the end of this demo now we

seem to be in a better place all right

so what's the takeaway here clearly you

should use a passcode a password that's

eight characters with letters and

numbers and punctuation yes

okay you know there's a mix here some of

you are saying yes someone don't know

how about someone who says no why

why no yeah

recaptcha okay so there's other

mechanisms more on that in a second

other instincts yeah

yes I'm kind of cheating with my verbal

simplification here even this computer

is way faster than one code per second

so it's not going to be hundreds of

thousands of years might be tens of

thousands of years or hundreds of years

but it's it's not going to be quite as

dramatic as this so that's a concern

yes so maybe there's other mechanisms so

maybe we don't have to be so Extreme as

to introduce all of this Randomness as


was proposed before because honestly

there's this theme in computer science

too and really information technology of

trade-offs right sure I can come up with

I can use a really big random password

but my God I'm going to end up writing

it on my monitor on a Post-It note which

I suspect statistically some of you are

guilty of right and you shouldn't

necessarily just blame yourself or you

know your colleague who's doing this

like this is a symptom perhaps a bad I.T

policy if we don't have necessarily very

usable systems maybe we shouldn't blame

the human for forgetting their very

random password maybe we shouldn't

require the human to have a very random

password so what could we do a couple of

technical mechanisms were just proposed

let's go down this road of how we might

try to defend against this and I'll keep

this running just for fun in the

background let me switch back over to a

visual here now that we've considered

that many codes what if we do something

that some of your own phones already

have

that slow the adversary down and some of

you might have seen on your iPhone a

screen like this let me zoom in iPhone


is disabled try again in one minute has

anyone locked themselves out of their

phone like this I have this is not I

mean it's embarrassing to admit but it's

not leaking any information all right so

many of you have done that already but

why is this actually a compelling

feature just to be clear annoying as

this might be because you probably don't

want your phone locked at the very

moment you're trying to get into it why

might it be a good thing

yeah uh oh let's let's go somewhere else

if we may yeah I'm back

sorry

it slows down the process it annoys you

to be fair like you pay a bit of this

price but it really slows down the

adversary now they're going to be able

to type in not one code per second but

one code per minute a 60 times

difference that's really going to force

them to pump the brakes and unless that

adversary is after you specifically odds

are they're gonna go take someone else's

phone or lose interest because you've

raised the bar high enough to their

getting in on Android if you do this it

depends on the operating system version


here might be something similar on

Android too many attempts try again

later I mean this is even more annoying

it doesn't even tell you when to try

again later but it does slow down the

adversary so if you don't have features

like this enabled you should and if

you're particularly security conscious

or or paranoid even you can even enable

a feature on these phones nowadays where

they self-destruct so to speak after 10

wrong guesses right why 10 you know the

presumption is among Apple and Google

and others that if you type your

passcode 10 times wrong you're probably

not who you say you are you're probably

probably someone else although you know

if you're a little groggy first thing in

the morning or if you've been out late

and having a good time you tend might

not be a high enough threshold to sort

of protect your phone from you and so

there too is this trade-off again and

that's an extreme one if your phone

deletes itself as which is what I meant

by self-destruct then that might

actually be to your detriment unless you

have backups and all of that but that's

another technology question altogether

so there too this theme of trade-offs


you raise the bar to the adversary but

you've got to pay the price you're not

going to get any such feature for free

all right what's another mechanism that

many of us increasingly thankfully are

doing might be when you log into a

website like Gmail to have two Factor

authentication sometimes called two-step

authentication I mean how many of you

use two-factor two-step authentication

with at least one account all right so

that's amazing how many of you use it

with all of your accounts

all right fewer of us and there too

that's not necessarily the wrong answer

right I have a lot of stupid websites

that I have accounts on like I bought

something once on them I don't really

care about it so there's a judgment call

there in terms of what you really care

about but maybe your financial websites

your Healthcare websites or anything

that's mildly sensitive to you probably

should be raising the bar to the

adversary by enabling this so what is

this particularly for those of you who

didn't raise your hand someone else what

is two-factor or two-step authentication

what's two Factor yeah


yeah so when you have to pull out your

phone and verify that it's really you

we're in the corporate world you might

have a little dongle a key fob on your

keychain that's got a little number on

it but generally speaking two-factor

authentication is all about indeed a

second factor it's kind of

oversimplified as two steps but it's

really key technologically that it'd be

a different Factor it is not two-factor

authentication if you just have two

passwords that you have to remember

because both of those could be forgotten

by you both of those could be stolen by

someone else if you write them down the

Post-it note or the like two-factor

authentication is about having a

fundamentally different Factor available

to you so that the odds that someone get

at something you know like your password

and something you have like your phone

is just much much smaller than the

threat of just figuring out something

you know like a password alone so the

factor is something that's fundamentally

different from the other thing and so

once you configure this the user

typically sees a screen like this for

instance in the context of Gmail the


screens vary here at Harvard and Yale

students are familiar with something

called Duo Mobile which is the exact

same idea and they typically use

one-time code six digits thereabouts and

you can only use that code once and the

idea is it's texted to you or pushed to

your device so that you and only you can

use it

does this fundamentally secure your

account

is this enough to just have a good

password and two-factor Authentication

does that keep the adversaries out

altogether

not if someone what

okay not if someone really wants to get

in then you have other problems that are

are certainly of concern but you do want

to ideally keep most adversaries at Bay

and there too all we're doing is like

raising the bar right there's nothing

stopping someone in physical proximity

to me stealing my phone and getting into

all of those accounts I just raised my

hand about but you at least protect

yourself against the billions of other

potential adversaries in the world that

are geographically not near us so at


least narrow the threat so that's a good

thing

but what else could we do because I feel

like it's not fair for us to say all

right everyone go home start using

better passwords longer more complicated

because again there's this trade-off we

don't want to send everyone home

essentially with a pad of Post-it notes

to then counterbalance what's an

unrealistic expectation so how many of

you perhaps with a show of physical

hands use a password manager already

this is something practical we can equip

you with okay so that was relatively few

hands and those of you who are in The

Habit still of memorizing your password

or Worse writing down the password there

are better Solutions today but here too

there's going to be a caveat there's no

clear win necessarily a password manager

is a piece of software that you install

on your Mac or PC or your phone that

manages your passwords for you and these

come either built into the operating

system Mac Windows has credential

manager Mac OS is something called

keychain there's third-party software

like one password or LastPass companies

and universities often have site


licenses so that students in particular

can use these kinds of things for free

but the ones that come with your

operating system or phone are themselves

already free and not using them is

really the missed opportunity here so

what is a password manager it's a

program that yes manages your passwords

but it does a few things more it

generates passwords for you typically I

mean honestly it's been years since I

have chosen my own password on a website

I instead click a button in my password

manager software or I use a keyboard

shortcut to generate something that's

eight characters heck maybe 16 24 32

characters long I don't care because the

software's job is to manage that

password for me that is the software

remembers this crazy long password for

me and better yet it comes with a button

or a keyboard shortcut that will

automatically fill out forms for me on

the web when I say log me in it will

grab my password from my computer plug

it in and voila I'm logged in the upside

of this is that even if that website is

compromised and my password leaks out

I'm not using that password presumably


anywhere else because this job

software's job is generally to create

unique passwords for each website and

it's not going to be guessed via Brute

Force by one of you writing code because

it's just too long probabilistically you

know we're all going to be gone by the

time your computer finishes trying to

crack it

so what's the downside I mean this

sounds great if the software generates

passcodes for you and plugs them in for

you where's the downside

anyone yeah

yeah if you use someone else's computer

or you're in like a you know a library

environment a lab environment you don't

have your passwords accessible now

there's a way to mitigate that so long

as you sync the same software to your

phone you might have to pay another

dollar ninety nine or twenty dollars to

have the same software on your phone you

can at least mitigate that by sharing

the passcodes across your devices not as

user friendly you're still gonna have to

now manually type out this really long

password and that too is annoying if you

get one character wrong but that's one

way to mitigate that other concerns


that's maybe the biggest threats I mean

you're kind of putting all of your

proverbial eggs in the same basket if

someone now gets into my password

manager which I should stipulate is

supposed to itself have a really big

long password that I do have to remember

but only one such long password I mean

then I'm really out of luck now every

single account I own is compromised

except for except for those that at

least have two Factor unless the

adversary also steals my phone or my key

fob other concerns

exactly if someone gets physical access

to your device honestly in general all

bets are off and this is why some of

today's lessons are really important

it's only going to matter when you first

lose your phone or someone walks off

with your laptop or they're like there

are certain things you can do to defend

against that inevitability dare say but

you want to make sure that if you are

using some of these Solutions like a

password manager that that long primary

password you use for is itself really

hard to guess and you know I would say

I'm okay with you writing that down even


but putting it in like a safe deposit

box or hiding it somewhere in the house

that's just very low probability of

someone finding because the other

problem with putting all of your eggs in

one basket if you forget your password

then you lose everything and that too

seems like a pretty serious price to pay

but this is a constant battle in

Computing nowadays usability and

security and finding that inflection

point but there to You Can Be You Can Be

selective right I called out financial

information health and information your

personal email your calendar anything

that's mildly more sensitive to you or

important raise the bar at least on

those accounts even if you're not quite

ready to go all in on all of these these

other factors well let's consider then

where we're using these passwords

consider just a couple of specific

examples email of course Gmail is the

example I used earlier Gmail and email

accounts more generally are increasingly

offering us features and in fact there's

one that I thought we could highlight as

an example of something that as a cs50

students a cs50 family member you should

really start viewing the eye a little


more the the world with a more skeptical

eye a little more paranoid eye and not

necessarily just believe things that

websites say I mean it's mostly

meaningless when a website says

sometimes with a pretty little logo or

emblem our website is secure like what

does that even mean and it's again all

about relativity and even Gmail I dare

say somewhat irresponsibly has this

feature in recent years confidential

mode like is anyone if you're using G

Suite or Google apps at work or

workspace now days in the habit of using

confidential mode

I mean it sounds okay no one's using

this so this is great and I worry now

that I'm introducing you to a feature

that you shouldn't necessarily use but

all this time if you're a Gmail user

there is along the little menu bar an

icon that lets you enable confidential

mode and later tonight play around for

it just look for it and you'll see

exactly the screenshot which I took

yesterday according to google recipients

won't have the option to forward copy

print or download this email right great

for lawyers it would seem great for


business great for private

correspondence

but why is this perhaps a bit misleading

what's the

where should the skepticism come from

here even a company like Google I dare

say you know they've probably buried the

caveats that I'm hinting at under the

learn more but unfortunately that might

be too late yeah and back

yeah I mean those of you know how to

take a screenshot that's the simplest

way if you don't know how to do that

well here's a phone I can just take a

picture of what it is I see on the

screen and so these are software

defenses that are in place that

essentially disable the forward button

disable the print button but honestly as

you probably already know once something

is already digital I mean it's out there

and there are other ways to get it it

might not be as high quality if you're

taking out your phone to do it but you

should view things like this with

skepticism and even I when I

occasionally receive something like this

I kind of roll my eyes but regret that

the user thinks what they're doing is

consistent with this language but it


isn't necessarily and so indeed in part

from an introduction to computer science

you begin to I mean get a little scared

from what's going on out there because

there's so many different threats and so

many things that you can't in fact do

and the onus is unfortunately often on

us users to read between the lines and

see what actually is possible here's

another one that you might be more in

the habit of using incognito mode or

private mode and Chrome or Safari or

Firefox or Edge or or the like what does

incognito mode do if familiar

what's incognito mode yeah

it doesn't log locally what you're doing

exactly most people here probably

generally know about things called

cookies even if you're not quite sure

how they work but they're like these

little remnants or breadcrumbs you leave

behind when visiting websites that allow

the websites to keep track of where who

you are in some sense according to

google here when you're using incognito

mode Chrome won't save your browsing

history so that's good cookies and site

data information entered into forms but

to their credit they do disclaim that


your activity might still be visible to

the website to visit your employer or

School your internet service provider so

they're getting better at at least

helping you evaluate by giving more of

the facts whether you do or don't want

to do this but this doesn't mean that

the websites you're visiting indeed know

um don't know who you are all of our

computers have unique addresses these

things called IP addresses that you

might have heard about and cs50 will

explore these in another week's time

your computer is constantly leaking

information that could be used to infer

who you were so this is really just best

left when you don't want to accidentally

unlike a friend's computer or a lab

computer remain logged in because

cookies are typically used to just

remember that you've logged in so if you

use a friend's computer you use

incognito mode and just close the window

boom you're effectively logged out but

even as Google disclaims there's other

caveats there there too

so what else might we keep in mind how

about let's consider one other big one

that's another thing to start looking

for increasingly in order to keep


yourself secure and this one's a little

more technical encryption and that cs50

students will know this is something you

can Implement in code and in fact let me

ask this question what does it mean to

encrypt

something

think back to pset 2 and Caesar and the

like let me look a little farther back

almost any student hand should

theoretically be up here yeah

exactly encryption is all about

substituting one letter for another and

generally scrambling the appearance of

some message up so that the recipient

knows how to reverse that process and

see what you actually sent but anyone

intervening in between you can't

actually see the information between you

so just to impress uh the parents in the

rooms any students what does this say

we're not ending here but

this was cs50 that's what it would save

and notice the scramble let me go back

and forth back and forth uh in this

message T becomes you

H becomes I

becomes j s becomes T this is what we

called a few weeks ago in cs50 a


rotational site for a Caesar Cipher that

literally does this you describe

substitutes one letter for the next but

it does so in a very predictable way a

becomes b b becomes C and so forth and

we also talked weeks ago that you don't

have to keep it that simplistic you can

use a bigger mathematical formula to

make it at least harder for some

adversary to figure out but you and I as

users these days are constantly

thankfully using encryption you probably

generally know that you should be hoping

for expecting this these days like https

is a good thing s means secure literally

and any website that has that in its URL

indicates to you that you and the

website are having an encrypted a

scrambled communication which means if

you type in your password your credit

card information anything else

personally no one between you

theoretically points A and B should be

able to know what it is you've typed

into that web page the web page

absolutely can because they have the

process the ability to decrypt that

information to reverse the process but

at least encryption is generally a good

thing but today let's take that one step


further and encourage you all to be

looking for expecting if you will as

consumers increasingly in the coming

years something better than encryption

alone but end to end encryption and

you're starting to hear about read about

this a little bit more but it's perhaps

a little less familiar someone in the

room who's familiar what is end to end

encryption

let me give folks a moment what is end

to end

encryption

good so it's when an app like WhatsApp

encrypts a message but it's encrypted

all the way to the other side to the

recipient even though Facebook in this

case owns WhatsApp even though your

message is going through Facebook or

meta servers they do not have

theoretically the ability to decrypt

your message whatever chat message

you've sent to a friend they are just

sending seemingly random zeros and ones

all the way to the end user who can then

decrypt it if you're an iPhone user

iMessage for instance does this

automatically so long as your text

messages are blue and not green that


means you're using iMessage and Apple's

platform that does this but let's let's

focus perhaps on something that's been

all too familiar to most of us over this

past year Zoom right Zoom actually took

some Flack some months ago because in

their marketing literature they were

advertising end-to-end encryption they

were not implementing end-to-end

encryption at least initially this was

probably marketing gone awry not quite

understanding what end to end option

means they were using encryption and

what that meant is that if I were having

a meeting with a colleague or you were

sitting in on a class with a teacher you

might have an encrypted connection all

of you choose Zoom centrally but they

had the ability early on and still now

if you leave this feature off to decrypt

that information and see and listen to

theoretically anything going on in that

meeting or that classroom now

technologically there's not really a

good defense against that if using that

older approach all it really is is

policy or hopefully there's rules in

place there's contracts in place that

say well yeah that's possible but don't

do that end-to-end encryption is a


stronger guarantee for you that

circumvents that risk altogether by

ensuring that if you're tuning into that

class or you're logging into that

meeting all of the zeros and ones are

going through Zoom servers just like

Facebooks but only the end users only

the students and teachers only the

colleague and colleague can actually

decrypt and see and hear what it is

that's being said and if you're one who

schedules Zoom meetings you can actually

see this for instance here's a

screenshot I took yesterday too

scheduling like a zoom meeting for today

and you'll see that you can choose the

Day and The Time the password haha and

also down here the encryption level and

by default it's typically enhanced

encryption which is stupid like enhanced

encryption it's just encryption and in

fact it's sort of worse encryption than

the other checkbox which is end-to-end

encryption but there's this little

caveat and here too consistent with this

reality in Computing there's always a

trade-off right it's not all upside and

all win several features will be

automatically disabled when using


end-to-end encryption including Cloud

recording and some phone stuff I mean

that's already kind of a big loss for a

class for instance a conference that

wants to keep the sessions but it kind

of makes sense right if the data is

encrypted between all of the end users

and therefore Zoom has no eyes into the

data or ears then it makes sense that

they can't record it for you in the

cloud because it's completely completely

scrambled to them too so a good

primitive to have in place but also

something that you need to sacrifice in

terms of usability well let me in our

final moments here let me flip back over

to where our hacking tool is it would

seem that eight characters is doing

really well because we still got three

A's at the beginning of this so that

might be in fact one takeaway and in

fact let me flip over and propose three

pieces of homework for everyone here one

use a password manager the one that's

built into your phone or your operating

system or pay a little something more

for something that you might like a

little better two use two-factor

authentication for more of your accounts

maybe not all but at least more of your


accounts and that's certainly a net

Improvement and then three use not just

encryption but end-to-end encryption and

unfortunately these features are not all

quite as simple as oh well let me just

check the box and turn on that

something's something that's always been

available to me because it's not always

been available and zoom only once they

sort of got in trouble for this do they

acquire some other company that

implements this feature and then add it

to their song software but as users as

consumers as parents as students

considering choosing one tool or another

because of these features is really

something you are empowered to do and do

not use those tools that you don't think

meet some threshold of comfort for you

for more on this and computer science

more generally any of you can take cs50

online at edx.org cs50 it's been so nice

to see you happy to chat one-on-one but

otherwise have a wonderful day here on

campus this was cs50

[Music]

thank you

[Music]

[Music]
foreign

[Music]

You might also like