Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Troubleshooting is finding what changed

Performance Tuning is removing


bottlenecks. Tuning a business process, not a
machine.

UNITS

International system of units (SI)

Electrical engineers have a strong


professional background in physics and just
like physicists, they tend to use powers of ten.
(base 10)

Computer science professionals, on the other


hand, tend to count everything in powers of
two since it makes more sense in their
context. (base 2)
SI system:
kilo (K) – 10 ^3 = 1000
mega (m) – 10 ^6 = 1,000,000
giga (g) – 10 ^9 = 1,000,000,000

IEC system:

kibi (ki) – 2^10 = 1024


mebi (mi) – 2^20 = 1,048,576
gibi (gi) – 2^30 = 1,073,741,824

Blank HDD = base 10 ---> format it ---->


Filesystem = base 2

PAGING – PROCESS OF DECIDING


WHICH PAGE TO TAKE OUT / IN

VMSTAT – LOOKING INTO MEMORY


PERSPECTIVE

SAR – ARCHIVE – NOT LIVE


DMESG – IS A RING BUFFER IN
MEMORY – NOT A FILE. KERNEL RING
BUFFER IS A DATA STRUCTURE THAT
RECORDS MESSAGES RELATED TO
OPERATIONS OF KERNEL. ITS ALWAYS
CONSTANT IN SIZE. IT REMOVES THE
OLDEST MESSAGE WHEN NEW
MESSAGE COME IN.

vmstat – virtual memory statistic reporter.


Provided by “procps-ng” package

#vmstat

#vmstat -a

#vmstat -f

#vmstat -s

#vmstat -d

#vmstat -t 2 10
#vmstat -SM 2 10

sar – part of “sysstat” package. System


Activity Report

# sar -V

# sar -u 2 6

# sar -r 2 6

#sar -d 2 6

/proc – file system

some files are read only and some are


writable

/proc is dynamic – like a usb module loaded


kernel parameters can be changed

everything is a file

cat “/proc/sys/kernel/osrelease”

ll -i “/proc/sys/kernel/osrelease”

changing kernel parameters

“/proc/sys/net/ipv4/icmp_echo_ignore_all”

echo “0” > -----

“/proc/sys” is a human readable version of


kernel

sysctl – allow changing kernel


persistent changes

1. create a file -- *.conf

“/etc/sysctl.d/swappiness.conf”
vm.swappiness = 10

2. read that file using sysctl

sysctl -p “/etc/sysctl.d/swappiness.conf”

[sysctl -w vm.swappiness = 10 ]

remember :

“/usr/lib/sysctl.d/*.conf”
- vendor settings
- never change

“/run/sysctl.d/*.conf”
- what is loaded now

“/etc/sysctl.d/*.conf”
- my configurations

SYSFS - file system

- mounted on /sys
- access info & parameters for devices, file
system and other software laoded as kernel
modules
- vendor drivers – seggrated for stability
- can’t trust much with drivers given by 3rd
party vendor

- “/sys/modules” - modules that are loaded

- show usb storage example

-
“/sys/modules/usb_storage/parameters/delay_
use”

Try changing it

# modprobe usb_storage delay_use=5


- make it permanent by creating a conf file
under “/etc/modprobe.d/*.conf”

“/etc/modprobe.d/usb_storage.conf” file

options usb_storage delay_use=5

Attaching a cache to each CPU


increases performance in many
ways.

Bringing memory
closer to the CPU reduces the
average memory access time and at
the same time reducing the
bandwidth load on the memory bus.

The challenge with adding cache to


each CPU in a shared memory
architecture is that it allows multiple
copies of a memory block to exist.

This is called the cache-coherency


problem.

Caching snoop protocols were


invented

The most popular protocol, write


invalidate, erases all other copies
of data before writing the local
cache.

Any subsequent read of this data by


other processors will detect a cache
miss in their local cache and will be
serviced from the cache of another
CPU containing the most recently
modified data.

Uniform Memory Access


Architecture
CPUs are connected via a system bus (Front-
Side Bus) to the Northbridge.

The Northbridge contains the memory


controller and all communication to and from
memory must pass through the Northbridge.
The I/O controller, responsible for managing
I/O to all devices, is connected to the
Northbridge.

Therefore, every I/O has to go through the


Northbridge to reach the CPU.

NON-UNIFORM MEMORY ACCESS


ORGANIZATION

NUMA moves away from a centralized pool


of memory and introduces topological
properties.

By classifying memory location bases on


signal path length from the processor to the
memory, latency and bandwidth bottlenecks
can be avoided.

Introduced by AMD in their Opteron family


of processors
The memory connected to the memory
controller of the CPU1 is considered to be
local memory. Memory connected to another
CPU socket (CPU2)is considered to be
foreign or remote for CPU1.

Remote memory access has additional latency


overhead to local memory access, as it has to
traverse an interconnect (point-to-point link)
and connect to the remote memory controller.

As a result of the different memory locations,


this system experiences “non-uniform”
memory access time.
Point to Point Interconnect

CFS Scheduler

work by dividing the cpu time equally among


processes

The main idea behind the CFS is to maintain


balance (fairness) in providing processor time
to tasks. This means processes should be
given a fair amount of the processor.
When the time for tasks is out of balance
(meaning that one or more tasks are not given
a fair amount of time relative to others), then
those out-of-balance tasks should be given
time to execute.

To determine the balance, the CFS maintains


the amount of time provided to a given task in
what’s called the virtual runtime. The smaller
a task’s virtual runtime—meaning the smaller
amount of time a task has been permitted
access to the processor—the higher its need
for the processor.

The CFS also includes the concept of sleeper


fairness to ensure that tasks that are not
currently runnable (for example, waiting for I/
O) receive a comparable share of the
processor when they eventually need it.

But rather than maintain the tasks in a run


queue, as has been done in prior Linux
schedulers, the CFS maintains a time-ordered
red-black tree
Divide processor time equally among
processes.

Ideal Fairness if there are N processes in the


system, each process should have got (100/N)
% of the CPU time.

Ideal fairness not realizable


• A single processor can’t be shared
simultaneously and equally among several
processes
• Time slices that are infinitely small are not
feasible
• The overheads due to context switching and
scheduling will become significant
• CFS uses an approximation of ideal fairness

You might also like