Professional Documents
Culture Documents
BMS Prep For The Tech PM Interview
BMS Prep For The Tech PM Interview
BMS Prep For The Tech PM Interview
You have a ladder of N steps (rungs). You can go up the ladder by taking either 1 step or two
steps at a time, in any combination. How many different routes are there (combinations of 1
steps or 2 steps) to make it up the ladder?
Time complexity is a way to express the relationship between the number of operations an algorithm
will perform and the size of the input to the algorithm.
If an algorithm performs 1 operation (or any fixed number of operations) for an input size N, this is
constant time, because the number of operations does not change depending on the size of the
input.
If an algorithm performs a fixed number of operations for every input, this is linear. Increasing the
input size by 1 increases the number of operations by some fixed number of operations; in other
words the number of operations depends on the size of the input.
-
Technology
● How do computers work? Bits/Bytes? Computer Software?
● How does the internet work? What is HTTP and TCP/IP? How do you make it faster?
● How do servers work?
● What are the various search algorithms? Sorting? Graph search?
● What is Big O notations and various algorithm run times?
○ Big O Cheat Sheet
○ Time + Space Complexity / Complexity Analysis
● What is NoSQL?
● What is a Hash Table?
○ Its a Key Value look up
● What is a Hash Map?
○ Has
● What is a Key-Value Store?
○ Ke
● What is Mark down as it relates to Storage?
○ Markdown is a text-to-html conversion tool for web writers
○ Used for when there’s a very simple
○ HTML can be really hard to understand
○ Markdown is very simple and not a replacement for HTML
○ Video
● What is the difference between a normalized and denormalized
Database?
○ The objective of DB normalization is to isolate and eliminate redundancies.
○ Denormalized DB remove the connections between tables
○ The reason you would denormalize a db is because you want to flatten the data
(ie create redundant but paired data to increase efficiency by reducing the
necessity of leveraging joins, or queries across multiple DBs.)
○ Good analogy:
■ Say you wanted to go grocery shopping for lunch and dinner.
● In a normalized DB, you go to each section (meat, veggies, etc) 2x
because you first shop for breakfast then need to go back and get
everything separately for dinner
● In a denormalized DB you can shop for both.
○ The downside of a denormalized DB is that it has a lot of redundant data and can
become difficult to stay organized.
● What is a LAN?
○ Local Area Network (think LAN parties with Command and Conquer)
● What is Fuchsia?
○ Possible successor to Android.
● What is a protocol?
○ Video
○ The rules for how computers can communicate with each other. They need very
rigid rules.
○ Especially important for packet based communication.
○ Two things protocols need to do
■ 1. Define the structure for the packets being shared
■ 2. Set the rules of how the computers will able to communicate moving
forward (ie Unidirectionally, bidirectionally, open communication, etc)
○ Examples of Protocols:
■ TCP/IP- Used for transporting data often via Communication (WhatsApp)
■ Http/https - Used for web access (ie Google, Ebay)
● Note HTTP is fine for when you are just browsing the web (ie
Gif.com) but anyone can theoretically see what you are doing.
With HTTPS the data that is entered (ie SSN, CC info, etc) is
encrypted.
○ HTTPS uses SSL (Secure Socket Layer)
■ RTP - Used for Video Calls +Streaming (ie NetFlix, Zoom)
● What is an API?
○ An API is something that lets apps / software work with eachother. It prevents
redundancies (ie every website needing to build it’s own payment service) by
allowing different apps/services to use eachother’s software/hardware.
○ 3 types of API
■ Feature API - Example: Uber has an API to Braintree to outsource
payment collection.
■ Data API - How Google might connect with ESPN to get Sports Scores
■ Hardware API - How instagram tells your phone to connect it to the
phone’s camera.
○
● How does Chromecasting work?
○ Video
■ To cast there are two core pieces; a Sender application and a Receiver
application
● The Sender application just sends messages to and from the
device to the service (ie Netflix) It’s not actually streaming the
media. This is built into the app (ie Netflix) and controlled on the
phone / tablet/ computer.
● The Receiver application is what’s actually streaming the media
and is the software in the ChromeCast. The request is made via
the phone and it returns with the specific media to the Chromecast
as the display.
■ The ChromeCast is essentially a lightweight ChromeBrowser. It has its
own IP address, load HTML, CSS, and JS from the internet and can
communicate with the various sites.
■ The phone sends a URL request via the WiFi to a service that includes a
return address, which is the ChromeCast.
●
● Technical Article.
■
■ Queues are FIFO, imagine a line of people waiting to get into a movie
theatre
● Used things like accessing a website (ie if 5 people access you
want the first person who asked to get it first) and print jobs.
■ Both queues and stacks don’t have indexes so they can’t do insertions
into the middle and can only pull from the top or bottom (or back)
● When people talk about a tech Stack, what is that? (blonde dictionary)
○ It’s all of the technology (programming, languages, tools) needed to run your
software / service.
○ Examples would be:
■ Server (ie Apache, Nginx)
■ Database (ie MongoDB, MySQL, Postgres)
■ Back End (PHP, Java, Python, Ruby)
■ Front End (HTML, CSS, JavaScript) ← these 3 are often used together.
Bits/Bytes
● A bit is like an atom, it’s the smallest unit of storage and can be a 0 or a 1. It’s too small
to use. A byte is 8 bits and can store a letter like an “x”. There are 256 different patterns
of 1s and 0s in a byte. For example, 01101100 = 9 = “o”.
● KB is 1000 bytes. Email is about 2KB.
● MB is 1 Million Bytes. MPG audio is 1MB per minute.
● GB is one billion bytes. A flash drive may hold 16GB. A hard drive might hold 750GB.
● Terabyte is one trillion bytes.
● Email: 2KB, Image: 200KB to 1.5MB, Song: 5MB, Video: 20MB for 90 seconds
● Numbers: 13 is 1101
● Text: Each letter gets a number. For example HELLO is 72,69,76,76 and 79. Each
number then has a corresponding binary count. Unicode is the process of giving
characters a number
● Colors: Red, Green, Blue makeup all colors. To get binary version of color you say how
much red, green and blue is in it on a scale of 1 to 255. Magenta is
111111110000000011111111 (255 for red and blue, 0 for green)
● Image: A table of pixels. Each pixel is located at [X,Y] and has a RGB color value.
Data Sizes
TB 1t Bytes
● It’s important to know that when the video is upload there are two things to immediately
consider:
○ Different formats the video may need to be stored in:
■ ie the Codex / compression size based on viewers with Higher or lower
bandwidths.
○ Different resolutions
■ Where if someone is going to watch it on their TV they will likely want
1080p or 4k whereas someone watching it on their phone is likely fine
with 720
○ So the total number of videos that Netflix would want to save it (FxR) or the # of
formats x the number of Resolution sizes.
● Doing all of this at the time of upload would be pretty time consuming if it was done by
one computer / server so what Netflix does is it breaks the problem into much smaller
chunks and distributes it across a bunch of servers. For example on Server might
process the first scene of the video in 1080p in a Low compression format, whereas
another Server processes the scene of the video in 720b in a high compression format.
All of these chunks are processed concurrently.
○ Note: They don’t use raw time (ie 5 min chunks cause it’d be annoying if at the
climax at of a great scene buffering was required.
○
● What is Buffering?
○ Buffering is preloading a portion of the content so that the video can run
smoothly.
○ Think of it as a head start. If the computer knows that it needs X time to get 30
seconds of video, it was start your video at when it has y*30second clips so that
the computer can continue to fetch the movie data while you watch (and you
don’t need to wait for the whole movie to be transferred.
○ The little buffering icon appears when there’s a break or decrease in your
bandwidth and the video that you are viewing catches up to the amount of video
that has been retrieved.
○ The data retrieved from the buffering process stores the video/music you want to
watch in your RAM. having more RAM means your computer can store more
video in the buffer.
○ F
■
■ Typically scrums have a daily standup to discuss where everyone is /
highlight any blockers
■ Sprints typically end with a Sprint Review and a Sprint Retrospective.
■ Then the cycle repeats.
○ Kanban
■ Kanban is continuous process, so there is no Sprint backlog
■ A board is used but it’s based off of the team’s eng capacity.
■ When an items is completed another is just pulled from the build column
(ie below) and something from the product backlog is pulled into the build
column
What is a VPN?
● Good Analogy: Its like a PO box.
○ Say you wanted to get “XYZ monthly” but you didn’t want the people @ YXZ or
the Post man to know it’s for you. You’d get a PO box to keep your privacy cause
your address is associated with you.
○ The VPN obfuscates your IP address. When the data request gets to the server
it will not have your IP address but rather the IP address over the VPN server.
Once the server returns the requested data to the VPN, the VPN will then pass it
back to you.
● Why do people use a VPN?
○ Privacy
■ IP addresses are usually assigned in blocks so the server you are
reaching out to knows where you are, often down to the city or portion of
the city level.
○ Security
■ If you get on the “free internet” at Starbucks some might be able to spoof
a wifi connection and collect information you are accessing (ie bank
passwords, usernames, etc.)
○ Gaining access to blocked
■ For example, if you are traveling to India and want to access your netflix
but Netflix says we don’t service india and we can tell your IP address is
coming from India you are blocked unless you use a VPN.
- A person types in a website (ie Facebook.com) in a computer which is a “client” that can
connect to an ISP via a router.
- The request goes from the person’s IP address to the ISP (via router) which accesses
the DNS (Domain Name System) to determine the correct IP address of the domain
facebook.com. (This is only the case if you’ve never been to facebook.com. If you have it
will be stored in your cache memory.)
- It is then sent via a series of routers to another router that will bring it close to the server
where facebook.com is being stored, each time it travels over a router, that router
attaches its IP address so that when the information is sent back it knows where it needs
to go.
- The request is sent via a series of 1s and 0s that are split up into “packets” the
contain the sequence number (for reassembly) and the IP addresses that it
needs to flow through to bring the information back to the correct IP address.
- That request is routed from the user computer through optical cables (via pulses of light)
to a data center where the information for Facebook.com is being stored on a SSD (solid
state drive.)
- That request for information triggers a response from the Facebook Servers to serve the
correct data back to the requesting IP address.
- That information is sent back to your home via fiber optic cables till it gets to your router
where it converts the light signals into electrical signals that are then passed back to
your computer to display the data you requested.
Protocols set the rules for data packet conversion and destination addresses for each packet.
- Example of protocols are: TCP / IP, Http /Https, RTP.
- The Router
- Comes after the Modem if you want to have more than one device to access the
internet or have a wireless connection.
- Routes your internet connection to your various devices
- Increasingly folks are selling modem / router combined machines.
Cable or Fiber connections mostly come from cable companies (ie Comcast)
DSL connections mostly come from phone companies. (ie AT&T)
What is Recursion:
- Analogy: Russian doll
- It’s a function that calls itself and is useful when a problem is can be solved faster by
breaking it into smaller problems that use the same algorithm
- It must have an end condition to say when its achieved its goal (ie when you get to the
front of the movie theatres rows and there is no row in front of you come back)
- Uses:
- Sorting algorithms
- Working with files in a directory structure
- Parsing XML/html
What is Recursion
- Recursion is code that references itself to arrive at the answer
- Analogy: Say you are in a movie theatre and you want to know what row you are in but
it’s dark. So to find out you ask the person to what row am I in? They don’t know so they
ask the person in front of them, and it continues until the person gets to person whose in
the 1st (#1) row. They then say, I’m in the first row, so the person ahead knows they are
in the second row, and it comes back to where you are and you learn you are in the x
row.
What is an Array
- An array is a list of data (ie bubbles)
- Code is int [ ]
- Arrays always start at zero.
- This of an array as a series of X slots.
What is an Caching
- Caching is when a computer stores the results of an operation so that future requests
return faster.
- Helpful video
What is Sharding
- Analogy: A really big pizza that’s a lot to eat by itself so you cut it into slices and share it
across folks that can “eat” it
- Useful for when you have A LOT of data that you need to pull from, so you “partition” the
pizza across multiple Database servers with a key (ie every transaction from March with
a user whose name was A-M, is on server A156, every transaction from March with a
user whose name is N-z is one server A157)
- Examples of ways you can shard the data
- Location
- User ID
- Age of users
- etc
- Sharding is a component of horizontal partitioning.
- Partitioning can be either horizontal or vertical.
- Problems with Sharding
- Sometimes data can cut across to shards (ie joins)
- There are a fixed number of partitions set at the start and thus shards can get too
big (when that happens you need to break it into a “mini slice”)
- Big shards can get slow (in this case establish an index on the shard)
- Helpful video
What is SQL?
- Structured Query Language
- Example: SELECT id, name, price FROM products
- Uses a relational databases and tables
- Example.You have Users, Products and Orders tables. When a User picks their
products, they’ve made an order which connects the Users with the Product they
ordered.
- Fields are the columns. Records are the rows
- All data must conform to the Schema (ie the rules of the table)
- Joins are special commands that pull data from various tables (ie Users and Order Dates
for example)
- Downsides:
- There is a limit to how much you can scale SQL dbs because they require
Vertical scaling. (Not sure if this is true)
- SQL doesnt’ do large data set analysis well. (Largely cause fo the # of data
tables it needs to cut across.)
What is NoSQL?
- MongoDB is an example a NoSQL database
- The NoSQL Structure is”
- Database
- Collections
- Documents
- People like NoSQL DBs because
- Don’t have a schema requirement
- More Flexible to future Changes
- Faster
- Downsides
- Can be less reliable because the lack of schema means it can become less
reliable in terms of querying the data.
- If you are trying to make a retroactive change (ie BBall becomes Basketball)
there’s a bunch of places that that needs to be changed and its not as organized
as it would be in a SQL db
There is now one that’s better than the other. Most big companies use a combination of both.
What is MySQL?
- The most popular SQL database
What is MongoDB
- The most popular NoSQL
Estimating Capacity
Rate Limiting
System Design
Logarithmic scale
How does Rate Limiting work and how does it relate to the Thundering
Herd Problem (video)
- The Thundering Herd problem (ie too many folks suddenly hit the servers) can lead to
the “Cascading Failure” issue if other servers were already maxed out. It can become a
race against time.
- When this happens you have a few options:
- 1) [A bad solution] You can NOT load balance out the requests to the other
servers and just have the experience be down for just the % of users that were
being served by the server. (better one goes down that all via a cascade of
failures.)
- 2) [Usually the right solution]
- Once first server falls set up a que of requests from the fallen server
- Have the system take into account the capacity of all the other servers
and pass over capacity to server with the most free capacity.
- Have que fill that capacity to the limit but not above. Once it hits limits,
have the system respond with a failure notification.
- Send over requests to fall over server to the pick up where possible.
- Once exhausted then only show failures to those not-servable requests
(ie a much smaller portion than would have been seeing failures if you
just cut it off.
- Future-proofing
- If you have a “predictable surge” coming (ie black Friday) you can and should
“rent” additional server capacity.
- Setup Auto-scaling for the future (can get very expensive very quickly)
- Use Job Scheduling (ie if you are going to send a happy new year email to 1M
users, you don’t want to do it all at 12:01am. You want to schedule it out over a
period the system can handle.)
Antipatterns
- These are practices or structures that should be avoided.
- Example would be using your database to communicate with your various servers.
Reason is that a setup like that requires that the Servers pull from the database (ie
frequent polling) which impacts the servers and it takes requires the DB to do a good
amount of read functions which eatup the the DB load.
Sorting
- Fastest way to sort is nLogn
How would you think about scaling scaling data systems (video)
- Imagine you have a service (ie a website) on a server. Your site has become so popular
that the demands on that server are greater than it can handle. What do you do?
- You need SCALABILITY. To achieve Scalability you can go with either vertical or
horizontal scaling
- Vertical Scaling - Buy a bigger machine (think of it as adding a server on top)
- Horizontal Scaling - Buying more machines (think of it as setting up bunch of
distributed but equally sized servers.)
Horizontal Vertical
The CAP theorem suggests data structures must always balance tradeoffs of one the three
following items:
1) Consistency - ie Keeping Data consistent on 50 servers is a lot harder than doing it one
1 massive server.
2) Availability -
3) Partition Tolerance - If you have everything on
People often use the ATM Example. (video) This highlights the trade offs of Consistent vs
Available design. If the link (Partition is broken) between 2 ATMs and someone goes to
withdrawal money does it:
A) Say here your money: ie Available Design
B) Say we can’t give you money cause we can’t talk to other ATMs right now: Consistent
Design.
C) What ends up happening in the real world is people make trade offs between
Available Design and Consistent Design
For example, if you are coming to make a deposit and the network is down, it will accept it (ie
Available), but if you are trying to withdrawal it will reject it (ie Consistency)
Great analogy: Restaurant scales as a restaurant grows and you have a lot more traffic you
have a few choices.
1) Get the same chef to work more hours (ie vertical scaling.)
a) To do that you’d need the chef to do more prep / chopping in off peak
hours (optimizing)
b) DOWNSIDE is a lack of resiliency because what if the Chef gets sick. No
one else can make the food. (ie single point of failure)
i) Mitigation is have a back up. (ie trained chef that knows the
recipes and come in when primary chef gets sick). This is similar
to backing up the data on the server to another server.
2) Hire more chefs (ie horizontal scaling.)
a) DOWNSIDE is that some chefs might not use the right recipe if you
update it, so you risk a lack of consistency. Also, it’s possible that some
chef’s might be better at soups while others are better at desserts.
i) Mitigation is a microservice. When orders come in, you have some
chefs just get the soup orders and some chefs just get the dessert
orders. This is great because A) when you need to change the
soup recipe you don’t need to do it with all chefs and B) the chefs
that are great at making soup, just make soup instead of
struggling with making dessert.
3) If the demand for food gets REALLY big you might need to buy another
restaurant so that if the first one goes down the 2nd one can pick up the orders
(distributed system)
4) If say we had A LOT of pizza shops and we wanted to make sure that when a
customer called for a pizza delivery they got the pizza as fast as possible, instead
of them calling each shop to find out the delivery times, you’d probably just want
all customers to call a center call center that knows the delivery shops to that
location for each pizza shop (load balancer)
5) (link)
Describe Extensibility
- It’s essentially flexibility that comes from making the solution core it it’s role and not the
specifics of that role.
- Good analogy: If you were to build a delivery robot, you don't want to build it so it can
just deliver pizza. It’s possible that in the future you want to deliver hamburgers so create
the robot with extensibility (ie flexibility to adapt to future needs)
Weighted .25 2 2
Contribution
- Multiply Users * Queries per Day, then Divide down to per secondl
- 2.4B users * 4.25B queries per day = 10.2B queries per day
- Dominator for Days to Seconds = 1day = 60 secs *60 mins * 24 hrs = 86,400
- 10.2B/86,400 = 118,055 queries per second
- You will also need to have a Profile service to allow for account set up.
- Later on the profile service will need to pass tokens to authenticate that
the person signing in is the right person / they should have access.
- Making Recommendations:
- The biggest problem with recommendations will be understanding / predicting
which users will be interesting to a user.
- You can start with the constraints and comparing those against information on
other users you’ve stored and indexed (ie age, location, height, etc)
- This will need to pull information from the Profile service described above.
- Note you may think you can store all of this in DB but you are going to want it
sorted and indexed and something can’t be sorted/indexed by two things (ie you
can't have sorted by age and city at the same time.)
- Your choices would either be 3 sorted DBs or a NoSQL Db.
- You will need to create a recommendation service.
- Noting Recommendations/Connections:
- Matching should be handled by a matcher service which is just a place that will
know if person A is actually connected to B. This will matcher service will also
need to pass that information to the sessions service below as messaging
sessions can only happen between matched users.
- Direct Messaging:
- Since each mobile phone is a client and there is a server, you don’t want the
client to constantly be pinging the service saying “hey is there a new message for
me” cause that’s super inefficient. As a result you want to a protocol that allows
the server to push messages so you use a Peer 2 Peer protocol like a XMPP
protocol
- NOTE: this is different from the HTTP protocol where it’s a Client-Server
protocol. See below.
- You will also need to build a sessions service which will handle the messaging.