Professional Documents
Culture Documents
System Design
System Design
-> http://nginx.org/en/docs/
note :
-> user -> https/http => reverse proxy => server => services on different port =>
database
-> admin -> https/http => reverse proxy => server => services on different port =>
database
note : openssl
-> certificates and keys are generated with help of openssl
-> we can generate ssl certificates with help of openssl windows and linux
-> configuring ssl certificates with nginx :
http://nginx.org/en/docs/http/configuring_https_servers.html
-> https server optimization with nginx :
http://nginx.org/en/docs/http/configuring_https_servers.html
note : types of load balancing strategy (server picking strategy) : round robin ;
least connection ; ip_hash
***********************************************************************************
***********************************************************************************
*******************
note : benefits
-> place static files closer to user
-> reduce latency and cost
-> increase complexity of system
***********************************************************************************
***********************************************************************************
*******************
note : caching
-> code <-> cache <-> storage
-> improve read performance and reduce load
-> increase complexity and consume resource
-> caching dbs : redis, memecache, dynamodb
***********************************************************************************
***********************************************************************************
*******************
note : queues
-> if you have a pizza delivery system
-> 10 req comes to pizza service for customization
-> then they pay at payment service (1 paymemt per sec)
-> then payment send res to pizza service
=> it is a sync system and extremely slow : performance = speed slowest component
of system
=> https://developer.confluent.io/get-started/nodejs/?
utm_medium=sem&utm_source=google&utm_campaign=ch.sem_br.nonbrand_tp.prs_tgt.dsa_mt.
dsa_rgn.india_lng.eng_dv.all_con.confluent-
developer&utm_term=&creative=&device=c&placement=&gad=1&gclid=CjwKCAjw8ZKmBhArEiwAs
pcJ7kRo20XhlvlUJRptv0dFzr1fz-1gXZCpsAA_6YTttvDzAILcPafF0hoCGSsQAvD_BwE
=> https://docs.confluent.io/home/overview.html?
utm_medium=sem&utm_source=google&utm_campaign=ch.sem_br.nonbrand_tp.prs_tgt.dsa_mt.
dsa_rgn.india_lng.eng_dv.all_con.confluent-
developer&utm_term=&placement=&device=c&creative=&gclid=CjwKCAjw8ZKmBhArEiwAspcJ7kR
o20XhlvlUJRptv0dFzr1fz-
1gXZCpsAA_6YTttvDzAILcPafF0hoCGSsQAvD_BwE&_gl=1*11k55o5*_ga*MjExODc3MTY2MC4xNjkwNjE
1NTY4*_ga_D2D3EGKSGD*MTY5MDYxNTU2OC4xLjEuMTY5MDYxNTY3Ny4xMi4wLjA.&_ga=2.88850331.12
53229520.1690615568-
2118771660.1690615568&_gac=1.259665784.1690615677.CjwKCAjw8ZKmBhArEiwAspcJ7kRo20Xhl
vlUJRptv0dFzr1fz-1gXZCpsAA_6YTttvDzAILcPafF0hoCGSsQAvD_BwE
note :
-> pros: scalability ; reliability ; buffering ; durability ; req spikes smoothing
-> cons: incr system complexity ; increase latency
=> even if service crashes ; the req remains safe in queue ; once service restarts
it again starts processing the queue
1) message queue
-> one of the producer (one instance of pizza service) sends message to one of the
consumer (one instance of payment service)
-> if that instance of payment service is unavailable then request goes to another
instance of consumer (payment service instance)
=> action ; exactly once delivery ; message can arrive out of order
2) publisher subscriber
-> if payment service wants to initiate an event about a payment to billing and
reciept service
-> then it publishes that event on a channel which in turn is subscribed by billing
and reciept service
-> notification ; atleast once delivery ; message are always in order
note : rabbitMQ
-> AMQP protocol : it stores messages until consumer retrieves thems
-> offloads heavy tasks
-> distributes tasks
1) routing keys
=> when producer puts a message then it contains a routing key (it contains
information that in which queue this message needs to be droppes like payment
queue)
=> then the actualy body contains information like amount, orderid which again
helps to put the message in a queue related to particular instance of service
running
=> rabbit MQ acts as msg q in direct, topic and header exchange ; but it works as
pub/sub in fan out exchange
3) channels : concurrency
-> rabbitMQ consumer use tcp connection ; multiple tcp connection from different
threads in same service to consume msg faster
-> but tcp conn are expensive : instead rabbitMQ makes multiple channel with only
on tcp connection with from different threads within same service
note : kafka : most popular pub/sub system : event streaming platform : msg stored
for period of time
-> topic : it is basically queue name and >1 consumers are connected/subscribed to
that topic
-> each msg = event ; each event has key, value, timestamp
-> kafka topics are usually sharded / partitioned : it makes rules such as : (#1:
0,1,2 ; #2 : 3,4 ; #3 : 5,6,7 ; #4: 8,9)
=> if a producer produce and sends key ending with '9' will go into 4th paritioned
(kafka sharding logic)
=> consumers divide parition in themselves: if we have three instance of service
then one instance can be connected to #1,#2 and the other two can be connected to
#3, #4 resp
=> if more consumers then partitioned then wastage of resource
note :
-> instances of same services comes under same consumer groups
-> two instances of same group may not recieve all messages of a queue partition
they are connected with (ex : two instances of receipt)
-> but two instance of diff group if connected with same topic then any one
instance of both groups will recieve same message (ex : different instance of
receipts and billing)
=> appending at end ; deleting at start
=> it can process 100k events per second
***********************************************************************************
***********************************************************************************
*******************
note : protocols
-> tcp and udp
=> tcp > web sockets
=> tcp > http
=> http > rest, gRPC, GraphQL
note : tcp : reliable, ordered, error checked :: slower than other protocls
-> sender establish connection to reciever
-> sender breaks the message in peices (payload) and then sends package one by one
-> if reciever do not ack then sender will again send the packet
=> udp : non reliable, fast, good if there is constant stream of data
note : http: hyper text transfer protocol (tcp protocol) : text with links and
other docs
-> client sends request and wait for response
-> it contains status, header, body
=> status codes :
i) 100 - 199 : informations
i) 200 - 299 : success (200: ok ; 201: created)
i) 400 - 499 : client error (401: unauth ; 403: forbidden ; 404: not found)
i) 500 - 599 : server error (500: internal server error ; 503: service unavailable)
i) 300 - 399 : redirections
note : graphQL
-> it solves issues of overfetching and underfetching
=> overfetching : get request fetching multiple columns from diff tables but only
one gets used in frontend
-> instead of get, we can use post and send alias (list of fields) (required
fields) as body and only fetch those fields
=> QL : query language ; http based ; req & res => json format ; define which
fields/nested entities need to return
-> get : query
-> post, put, delete => mutation
note : socket.io
-> https://socket.io/
-> https://socket.io/get-started/chat
-> https://socket.io/docs/v4/
note : notification
-> https://www.npmjs.com/package/nodemailer
***********************************************************************************
***********************************************************************************
*********************
note : process
-> each process has seperate memory space
-> if you start java server then you see only one process
-> if you start nodejs process then you see multiple processes
=> if 10,000 users needs to be cached in memory then nodejs can consume a lot of
memory than java because they will replicate data multiple times (each process will
have its own data img)
=> what kind of process model your tech uses according to requirements
***********************************************************************************
***********************************************************************************
*********************
note : database
note : indexes
=> indexes are made on a column so that searches on that column becomes much faster
=> if we search something by name col and there is no index then we need to search
all rows and match string
=> if indexing then each name will be mapped to an id and fast to search
=> hashmap and dictionary : string type query
=> b-tree (it can have more than 2 child and leaf nodes are linked list) : if query
: where age between 25 and 50
=> if we need to add one more shard then we will add it in the middle : most of the
ending with ending hashes remains same
-> 1 : 100-127
-> 2 : 0-24
-> 5 : 25-49
-> 3 : 50-74
-> 4 : 75-99
note : partitioning
-> sharding : breaking one large db in smaller dbs
-> partitioning : breaking one large table in smaller table
=> benefits : smaller files = fast queries ; easy indexing
2) range of dates : paritioning orders : 2020 ; 2021 ; 2022 ; 2023 ; 2024 ; 2025
(more queries will come to 2025 ; hence decreased paritioned size will make queries
faster)
=> cons : uneven key distribution : if people purchase more in a particular month
3) hash of keys : take a primary key and hash by number of table and put data in
hash table ; it will contain even distribution ; works good if access data by key
=> cons: it will more worse then keeping data in single table as we need to query
all tables to find an order by date
=> cons
-> complexity ; scanning all parition is expensive ; hard to maintain uniqueness
note :
-> cap theoram : consistency ; availablility ; partition tolerance
-> acid preoperties : atomicity ; consistency ; isolation ; durability
***********************************************************************************
***********************************************************************************
*********************
2) serialization
-> data serialization != req serialization
-> serialize and deserialize data structure and object to store them on disk or
transfer them over network
-> use queue to serialize req ; avoid data race ; acheive serializability
3) CQRS
-> if there are buyer ans seller
-> seller add record to table and buyer fetch/update/delete record from table
-> buyer queries can be made faster by increasing number of indexes in table
-> but it will lead to request timeouts for seller as it will take time for db to
add record with indexiing
=> solution : there will two tables
-> seller will add record to a non indexed table
-> in the regular intervals or with help of websockets we will update the added
record in indexed table
-> buyer will read from indexed table
***********************************************************************************
***********************************************************************************
*********************
***********************************************************************************
***********************************************************************************
*********************
***********************************************************************************
***********************************************************************************
*********************
=> estimate number of insert operations and decide number of shards and sharding
technique
=> estimate db size
=> estimate throughput : number of request per second
-> number of user : x
-> number of times user consumes app : y
-> amount of time user consumes app each time : z
-> estimate user session for y and z tradeoff
-> estimate number of api user calls in z time : a
-> find the request per second : x*y*z*a / 24*60*60
=> estimate the req in terms of reads and writes
note : we place limit at read and sessions to optimize system with help of
pagination
=> services
=> caches
=> queues
=> load balancing
=> database
***********************************************************************************
***********************************************************************************
*********************