Scalability PDF

04/02/2023, 10:55 What is Scalability?
- Web Application and Software Architecture 101
What is Scalability?
This lesson introduces scalability.
We'll cover the following
• What is scalability?
• What is latency?
• Measuring latency
• Network latency
• Application latency
• Why is low latency so crucial for online services?
I am pretty sure, being in the software development universe, you’ve

come across the word scalability numerous times. What is it? Why is it so
important? Why is everyone talking about it? What are your plans or
contingencies to scale when your app or the platform experiences
significant traffic growth?
This chapter is a deep dive into scalability. It covers all the frequently
asked questions about it, including what does scalability mean in the
context of web applications and distributed systems?
So, without further ado. Let’s get started.
What is scalability?
Scalability means the application’s ability to handle and withstand
increased workload without sacrificing performance.
For example, if your app takes x seconds to respond to a user request. It

should take the same x seconds to respond to each of your app’s million
https://www.educative.io/module/lesson/web-application-architecture-101/N02LPnROXmv 1/5
04/02/2023, 10:55 What is Scalability? - Web Application and Software Architecture 101
concurrent user requests.
The app’s back-end infrastructure should not crumble under a load of a

million concurrent requests. It should scale well when subjected to a
heavy traffic load and maintain the system’s latency.
What is latency?
Latency is the time a system takes to respond to a user request. Let’s say
you send a request to an app to fetch an image and the system takes 2
seconds to respond to your request. The latency of the system is 2 seconds.
Minimum latency is what efficient software systems strive for. No matter

how much the traffic load on a system builds up, the latency should not go
up. This is what scalability is.
If the latency remains the same, we can say that the application scaled
well with the increased load and is highly scalable.
Let’s see scalability in terms of Big-O notation. Ideally, the complexity of a

system or an algorithm should be O(1) which is constant time like in a map
or a key-value database.
A program with the complexity of O(n^2) where n is the size of the data set
is not scalable. As the size of the data set increases, the system will need
more computational power and other resources to process the tasks.
So, how do we measure latency?
Measuring latency
Latency is measured as the time difference between the action that a user
takes on the website and the system’s response in reaction to that action.
The action can be an event like clicking a button, scrolling down a web
page, etc.
This latency is generally divided into two parts:
Network latency
Application latency
Network latency
Network latency is the time that the network takes to send a data packet
from point A to point B. The network should be efficient enough to handle
the increased traffic load on the website. To cut down the network latency,
businesses use a CDN (Content Delivery Network) to deploy their servers

across the globe as close to the end-user as possible. These close to the
user locations are also known as Edge locations.
If you wish to understand the Edge locations and how apps are
deployed in the cloud. Check out my cloud computing 101 course on
my platform.
After having spent a decade in the industry writing code, I firmly

believe that every software engineer should have knowledge of
cloud computing. It’s the present and the future of application
development and deployment.
Moving on.
Application latency
Application latency is the time the application takes to process a user

request. There are more than a few ways to cut down the application
latency. The first step is to run stress and load tests on the application and
scan for the bottlenecks that slow down the system as a whole. I’ll talk
more about it in the upcoming lessons.
Why is low latency so crucial for online services?

Latency plays a significant role in determining if an online business wins
or loses a customer. Nobody likes to wait for a response on a website.
There is a well-known saying, “If you want to test a person’s patience, give
them a slow internet connection.”
If the visitor gets the response within a stipulated time, great otherwise,
they’ll bounce off to another website. There is ample market research that
concludes high latency in applications is a big factor in customers
bouncing off a website. If there is money involved, zero latency is what

businesses want. If only if this was possible.
Think of massive multiplayer online (MMO) games. A slight lag in an in-

game event ruins the whole experience. A gamer with a high latency
internet connection will have a slow response time despite having the best
reaction time of all the players in an arena.
Algorithmic trading services need to process events within milliseconds.

Fintech companies have dedicated networks to run low latency trading.
The regular network just won’t cut it.
We can realize the importance of low latency by the fact that in 2011
Huawei and Hibernia Atlantic started laying a fiber-optic link cable across
the Atlantic Ocean between London and New York. This property was
estimated to cost approximately $300M just to save traders six
milliseconds of latency.
Back Next
Web Architecture Quiz - Part 2 Types of Scalability
Mark as Completed
04/02/2023, 10:56 Types of Scalability - Web Application and Software Architecture 101
Types of Scalability
In this lesson, we will explore the two types of scaling: Vertical and Horizontal.
• What is vertical scaling?

• What is horizontal scaling?
• Cloud elasticity
To scale well, an application needs solid computing power. The servers

should be powerful enough to handle increased traffic loads.
There are two ways to scale an application:
Vertically
Horizontally
What is vertical scaling?

Vertical scaling means adding more power to our server. Let’s say our app
is hosted by a server with 16 gigs of RAM. To handle the increased load,
we now augment the RAM to 32 gigs. Here, we have vertically scaled the
server.
https://www.educative.io/module/lesson/web-application-architecture-101/qArJ4LOJwl2 1/4
Ideally, when the traffic starts to build on the app, the first step should be
to scale vertically. Vertical scaling is also called scaling up.
In this type of scaling, we augment the power of the hardware running the
app. This is the simplest way to scale as it doesn’t require any code
refactoring or the need to make any complex configurations and so on. I’ll
discuss in the next lesson why code refactoring is needed when we
horizontally scale our app.
However, there is only so much we can do when scaling vertically. There

is a limit to the compute power we can augment for a single server.
A good analogy would be to think of a multi-story building. We can keep

adding floors to it but only up to a certain point. What if the number of
people in need of a flat keeps rising? We can’t scale the building up to the
moon for obvious reasons.
Now is the time to build more buildings. This is where horizontal

scalability comes in.
When the traffic is too large to be handled by a single server, we bring in

more servers to work together.
What is horizontal scaling?

Horizontal scaling, also known as scaling out, means adding more
hardware to the existing hardware resource pool. This increases the
computational power of the system as a whole.
With this, the increased traffic influx can be efficiently dealt with. And
there is no limit to how much we can scale horizontally, assuming we
have infinite resources. We can keep adding servers after servers, setting
up data centers after data centers.
Horizontal scaling also allows us to scale dynamically in real-time as the

traffic on our website climbs and drops over a period of time. Dynamic
scaling is not possible when scaling vertically.
Cloud elasticity
The most prominent reason cloud computing became mainstream in the
industry is the ability of the cloud to scale dynamically. In case of the
traffic climb, the cloud adds additional servers to the hardware resource
pool and when it drops, the servers added are removed.
The ability to use and pay only for the hardware resources used by the
website got popular with businesses for obvious economic reasons.
The process of adding and removing servers, stretching and returning to

the original infrastructural computational capacity, on the fly is popularly
known as cloud elasticity. It saves businesses truckloads of money every
single day.
If you wish to know in detail how cloud platforms scale our apps and
make them highly available. I’ve discussed the concept in my cloud
computing 101 course how clustering works and how cloud companies
deploy our apps across continents.
Having multiple server nodes on the backend also helps the website stay
online even if a few server nodes crash. This is known as high availability.
We’ll get to that in the upcoming lessons.
04/02/2023, 10:56 Which Scalability Approach is Right for our App? - Web Application and Software Architecture 101
Which Scalability Approach is Right for our App?

In this lesson, you will learn which type of scaling is best for a given scenario.
• Pros and cons of vertical and horizontal scaling

• What about the code? Why does the code need to change when it
has to run on multiple machines?
• Which scalability approach is right for our app?
Pros and cons of vertical and horizontal scaling

This is where I talk about the pluses and minuses of both the scaling
approaches.
Vertical scaling, as we learned before, is simpler in comparison to

horizontal scaling because we do not have to touch the code or make any
complex system configurations. It takes much less administrative,
monitoring, and management efforts than managing a distributed
environment when scaling horizontally.
A significant downside of vertical scaling is the availability risk. The

servers are powerful but few in number. There is always a risk of them
going down and the entire website going offline,Gotwhich doesn’t
any feedback? Get in happen
touch with us.
when the system is scaled horizontally. In this scenario, the system is

more highly available.
What about the code? Why does the code need to

change when it has to run on multiple machines?
https://www.educative.io/module/lesson/web-application-architecture-101/JPQ6ny6oODK 1/4
If you intend to run the code in a distributed environment, it needs to be

stateless. There should be no state in the code. What do I mean by this?
There should be no static instances in the class. Static instances hold

application data and when a particular server goes down, all the static
data/state is lost. The app is left in an inconsistent state.
In object-oriented programming, the instance variables hold object state in

them. Static variables moreover hold state that spans across multiple
objects. They generally hold state per classloader. Now, if the server
instance running that classloader goes down, all the data is lost.
Also, whatever data static variables hold, it’s not application-wide. For this
reason, distributed memory like Redis, Memcache, etc., are used to
maintain a consistent state application-wide. When writing applications
for distributed systems, it’s a good practice to avoid using static instances
in the class. The state is typically persisted in a distributed memory store;
this facilitates components to be stateless.
This is why functional programming got popular with distributed systems.

The functions don’t retain any state. However, the same behavior can also
be achieved with prominent OOP languages.
Which scalability approach is right for our app?

Always have a ballpark estimate in mind when designing your app. How
much traffic will it have to deal with?
Today, development teams are adopting a distributed microservices

architecture right from the start, and workloadsGot
any feedback? Get in touch with us.
(applications) are meant
to be deployed on the cloud. So, inherently the workloads are horizontally
scaled out on the fly.
The upsides of horizontal scaling include no limit to augmenting the

hardware capacity. Data is replicated across different geographical
regions as nodes and data centers are set up across the globe.
If your app is a utility or tool expected to receive minimal predictable

traffic. For instance, an internal tool of an organization or something
similar that is not mission-critical.
Why bother hosting it in a distributed environment? A single server is

enough to manage the traffic, so go ahead with vertical scaling when we
know that the traffic load will not spike in the future.
If your app is a public-facing social app like a social network, a fitness app,
an online game, or something similar, where the traffic is unpredictable.
Got any feedback? Get in touch with us.
Both high availability and horizontal scalability are important to you.
Build these apps to deploy them on the cloud, and always have horizontal
scalability in mind right from the start.
Back Next
04/02/2023, 10:57 Primary Bottlenecks That Hurt the Scalability of our Application - Web Application and Software Architecture 101
Primary Bottlenecks That Hurt the Scalability of

our Application
• Database
• Application design
• Not using caching in the application wisely
• Inefficient configuration and setup of load balancers
• Adding business logic to the database
• Not picking the right database
• At the code level
There are several points in a web application that can become a bottleneck
and hurt the scalability of our application. Let’s take a look at them.
Database
Imagine we have an application that appears to be well architected.
Everything looks good. The workload runs on multiple nodes, and it can
scale horizontally.
However, the database is a poor single monolith, where just one server
has the onus of handling the data requests from all the server nodes of the
workload.
This scenario is a bottleneck. The server nodes work well, handle millions
of requests at a point in time efficiently, yet, the response time of these
requests and the latency of the application are abysmal due to the
presence of a single database. There is only so much it can handle.
https://www.educative.io/module/lesson/web-application-architecture-101/YQxW727jNBA 1/4
Just like workload scalability, the database needs to be scaled well.
Make wise use of database partitioning, sharding with multiple database

servers to make your system efficient.
Application design
A poorly designed application’s architecture can become a major
bottleneck as a whole.
A typical architectural mistake is not using asynchronous processes and

modules wherever required; rather, all the processes are scheduled
sequentially.
For example, if a user uploads a document on the portal, tasks such as

sending a confirmation email to the user, sending a notification to all
subscribers/listeners to the upload event should be done asynchronously.
Tasks like these should be forwarded to a messaging server or a task queue

for asynchronous processing as opposed to being processed sequentially,
making the user wait.
Not using caching in the application wisely

Caching can be deployed at several layers of the application. It speeds up
the response time by notches. A cache cuts down the overall load on the
app, intercepting all the requests before they hit the origin servers.
We should use caching exhaustively throughout the application to speed

up things significantly.
If the system has a lot of static data, caching can bring down the
deployment costs significantly. I’ve written an article on my blog: How
PolyHaven manages 5 million page views and 80TB traffic a month for less
than 400 USD.
Polyhaven is a 3D asset library with a large amount of static data. The

article delineates how it leverages caching to bring down it’s deployment
costs.
Inefficient configuration and setup of load

balancers
Load balancers are the gateway to our application. Using too many or too
few of them impacts the latency of our application. More on load
balancers in the upcoming lessons.
Adding business logic to the database

No matter what justification anyone provides, I’ve never been a fan of
adding business logic to the database.
The database is just not the place to put business logic. Business logic in
the database makes the application components tightly coupled. Imagine
how much code refactoring this would require when migrating to a
different database. Also, the testing gets complex.
Not picking the right database

Picking the right database technology is vital for businesses. Need
transactions and strong consistency? Pick a relational database. If you can
do without strong consistency rather than need horizontal scalability, pick
a NoSQL database.
Trying to pull things off with a not-so-suitable tech always has a profound
impact on the latency of the entire application in negative ways. More on
this in the upcoming lessons.
At the code level

This shouldn’t come as a surprise, but inefficient and poorly written code
has the potential to bring down the entire service in production. This
typically includes:
Using unnecessary loops or nested loops

Writing tightly coupled code
Not paying attention to the Big-O complexity while writing the code.
(be ready to do a lot of firefighting in production)
Ideally, we should always do a DENTTAL (Documentation, Exception

Handling, Null pointers, Time complexity, Test coverage, Analysis of code
complexity, Logging) check of our code when doing a dry run.
In this lesson, don’t worry if a few things are not clear to you, such as
strong consistency, how the message queue facilitates asynchronous
behavior, or how to pick the right database. I’ll discuss all that in the
upcoming lessons. Stay tuned.
Moving on to the next lesson.
Back Next
Which Scalability Approach is Right for our … How to Improve and Test the Scalability of o…
Mark as Completed
04/02/2023, 10:57 How to Improve and Test the Scalability of our Application? - Web Application and Software Architecture 101
How to Improve and Test the Scalability of our

Application?
In this lesson, we will cover how to improve and test the scalability of our application.
• Tuning the performance of the application – Enabling it to scale

better
• Profiling
• Caching
• CDN
• Data compression
• Avoid unnecessary requests response cycles
• Testing the scalability of our application
Here are some of the standard strategies to fine-tune the performance of

our web application. If the application is performance-optimized, it can
withstand more traffic load with less resource consumption than an
application that is not optimized for performance.
Now you might be wondering, “Why are you talking about performance
when you should be talking about scalability? Isn’t it what the lesson title
says?”
Well, the application’s performance is directly proportional to scalability. If

an application is not performant, it will certainly not scale well.
These performance optimization strategies can be implemented even

before the pre-production testing stage of the application.
https://www.educative.io/module/lesson/web-application-architecture-101/39VNYBx7Z9r 1/5
Let’s see what they are.
Tuning the performance of the application –

Enabling it to scale better
Profiling
Profile the hell out of your app. Run application profiler and code profiler.
See what processes are taking too long and are eating up too many
resources. Find out the bottlenecks. Get rid of them.
Profiling is the dynamic analysis of our code. It helps us measure the

space and the time complexity of our code and enables us to figure out
issues like concurrency errors, memory errors and robustness and safety of
the program. This Wikipedia resource contains a good list of performance
analysis tools used in the industry.
Caching
Cache wisely, and cache everywhere. Cache all the static content. Hit the
database only when it is really required. Try to serve all the read requests
from the cache. Use a write-through cache.
CDN
Use a Content Delivery Network (CDN). Using a CDN further reduces the
application’s latency due to the proximity of the data from the requesting
user.
Data compression
Compress data. Use apt compression algorithms to compress data and

store data in compressed form. Since compressed data consumes less
bandwidth, the data download on the client will be faster.
Avoid unnecessary requests response cycles
Avoid unnecessary round trips between the client and server. Try to club
multiple requests into one.
These are a few of the things we should bear in mind in the context of
application performance.
Testing the scalability of our application

Once we are done with the essential performance testing of the
application, it is time for capacity planning, provisioning the right amount
of hardware—compute and storage power.
The right approach for testing the application for scalability largely
depends on the design of our system. There is no standard formula for
this.
Testing can be performed at both the hardware and the software level.
Different services and components need to be tested—individually and
collectively.
During the scalability testing, different system parameters are taken into
account, such as:
CPU usage
Network bandwidth consumption
Throughput
Number of requests processed within a stipulated time
Latency
Memory usage of the program
End-user experience when the system is under heavy load and so on.
In this testing phase, simulated traffic is routed to the system to study how
the system behaves and scales under the heavy load. Contingencies are
planned for unforeseen situations.
As per the anticipated traffic, the appropriate hardware and

computational power are provisioned to handle the traffic smoothly with
some buffer.
Several load and stress tests are run on the application. Tools like JMeter
are pretty popular for running concurrent user tests on the application; if
you are on the Java ecosystem. There are a lot of cloud-based testing tools
available that help us simulate test scenarios just with a few mouse clicks.
Businesses test for scalability all the time to get their systems ready to
handle a traffic surge. If it’s a sports website, it prepares itself for the
sports event day. If it’s an e-commerce website, it makes itself ready for
festival season sale.
Here are a couple of good reads on the topic:
How production engineers support global events on Facebook.
How Hotstar a video streaming service scaled with over 10 million

concurrent users.
In the industry, tech like Cadvisor, Prometheus and Grafana are pretty
popular for tracking the system profile via web-based dashboards.
I’ve written an article if you want to read more about pre-production

monitoring.
Back Next
Primary Bottlenecks That Hurt the Scalabilit… Scalability Quiz
Mark as Completed

Scalability PDF

Uploaded by

Copyright:

Available Formats

You might also like

Scalability PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Scalability PDF

Uploaded by

Copyright:

Available Formats

04/02/2023, 10:55 What is Scalability?

- Web Application and Software Architecture 101

We'll cover the following

I am pretty sure, being in the software development universe, you’ve

So, without further ado. Let’s get started.

For example, if your app takes x seconds to respond to a user request. It

concurrent user requests.

The app’s back-end infrastructure should not crumble under a load of a

Minimum latency is what efficient software systems strive for. No matter

Let’s see scalability in terms of Big-O notation. Ideally, the complexity of a

So, how do we measure latency?

This latency is generally divided into two parts:

businesses use a CDN (Content Delivery Network) to deploy their servers

After having spent a decade in the industry writing code, I firmly

Application latency is the time the application takes to process a user

Why is low latency so crucial for online services?

bouncing off a website. If there is money involved, zero latency is what

Think of massive multiplayer online (MMO) games. A slight lag in an in-

Algorithmic trading services need to process events within milliseconds.

Web Architecture Quiz - Part 2 Types of Scalability

We'll cover the following

• What is vertical scaling?

To scale well, an application needs solid computing power. The servers

There are two ways to scale an application:

What is vertical scaling?

However, there is only so much we can do when scaling vertically. There

A good analogy would be to think of a multi-story building. We can keep

Now is the time to build more buildings. This is where horizontal

When the traffic is too large to be handled by a single server, we bring in

What is horizontal scaling?

Horizontal scaling also allows us to scale dynamically in real-time as the

The process of adding and removing servers, stretching and returning to

Which Scalability Approach is Right for our App?

We'll cover the following

• Pros and cons of vertical and horizontal scaling

Pros and cons of vertical and horizontal scaling

Vertical scaling, as we learned before, is simpler in comparison to

A significant downside of vertical scaling is the availability risk. The

when the system is scaled horizontally. In this scenario, the system is

What about the code? Why does the code need to

If you intend to run the code in a distributed environment, it needs to be

There should be no static instances in the class. Static instances hold

In object-oriented programming, the instance variables hold object state in

This is why functional programming got popular with distributed systems.

Which scalability approach is right for our app?

Today, development teams are adopting a distributed microservices

The upsides of horizontal scaling include no limit to augmenting the

If your app is a utility or tool expected to receive minimal predictable

Why bother hosting it in a distributed environment? A single server is

Primary Bottlenecks That Hurt the Scalability of

We'll cover the following

Just like workload scalability, the database needs to be scaled well.

Make wise use of database partitioning, sharding with multiple database

A typical architectural mistake is not using asynchronous processes and

For example, if a user uploads a document on the portal, tasks such as

Tasks like these should be forwarded to a messaging server or a task queue

Not using caching in the application wisely