Professional Documents
Culture Documents
Mphy0020 Notes
Mphy0020 Notes
Erwin Alles)
Introduction to Computing (Computer Programming)
Binary Numbering Systems
Binary numbers are the basis for:
Digital signal
A digital signal can only have a finite number of values, for a binary system there are only 2 values: 0 or 1. But in reality,
a digital signal has noise or may be distorted but the represented digits will not be affected.
Digital electronic circuits are made to operate on such digital signals. Such circuits are often printed on integrated circuits
and usually are made from large assemblies of logic gates – simple electronic representations of Boolean logic functions:
e.g.
Floating-point
o -1sign x 2exponent-bias x 1.significant
o We bias the exponent because it has to be signed to represent small and large values and the usual two’s
complement would make comparisons more difficult.
o Floating point accuracy is limited since an infinite set of real numbers is represented by finitely many
floating-point numbers, thus leading to rounding errors.
|r −f r|
o Floating point relative error is defined as , where r is the real number and fr is its floating point
|r|
representation
o The error of rounding a real number toward the nearest floating-point number in a floating point system
b −s
with base b and number of bits for the significand s, is bounded by b , which is called machine
2
epsilon.
Fixed-point (integer)
o Stored as binary data
o Unsigned integer can represent integers from 0 to 2n-1 where n is the number of bits
o e.g. 16 bits can represent values between 0 and 216-1
o Signed integers – need an extra bit to convey the sign
Two’s complement: method for storing signed integers
Most significant bit is 0 if number is positive; 1 if negative
An n-bit two’s complement system can represent every integer from -2n-1 to 2n-1-1
n-bit two’s complement of a number is the value obtained by subtracting that number
from 2n
Advantages:
o Adding and subtraction can occur naturally without having to examine sign
o Zero has a unique representation
To calculate two’s complement: invert all bits and add 1. For example for number 5,
we have:
o 5: 0b0101001
o Negation: 0b1010110
o Adding 1: 0b1010111
Boolean
o Simplest data type – one bit – 1 or 0, True or False
o Used for output of logic and conditions
e.g. if A > 10 then… A > 10 is either true or false
List
Array
String
Dictionary
What is a computer?
Digital computers: e.g. calculator, laptop, bank card, smartphone. They work on discrete
data (often binary).
Analogue computers: e.g. abacus, speedometer. They work on continuous data
Hybrid computers: involved the logic as digital components and analogue being the
arithmetic component.
-often in hospitals, they have hybrid systems as inputs and outputs are analogue
but all the computation happens in a digital system.
A computer is: a programmable device that accepts inputs, processes these and produces outputs.
Turing Machine
Mathematical model of a (digital) computer
Used to prove properties of computation e.g. fundamental limitations
If a programming language can express all tasks accomplishable by (digital) computers, it is said to be Turing
complete
Transistors
e.g. Bipolar Junction Transistor and Field Effect Transistor
A transistor is a device that regulates current or voltage flow and acts as a switch or gate for electronic signals, to regulate
the flow of electronic signals through amplification, controlling and generating electrical signals.
Clock rate: this is tied to CPU speed but not directly. A faster clock rate does
not necessarily lead to a faster computer; you should compare clock speeds
within brand and generation. Clock speeds tend to increase with a decrease in
transistor size i.e., smaller transistors mean shorter interconnects and smaller
capacitances.
Used by the CPU to perform start-up procedures when the computer is turned on
Initialises hardware
Finds bootable device (HDD)
Hardware test (power on self test: POST)
Load operating system (OS)
Transfer control to OS
Read only memory
Bus
Memory
Types of memory: volatile and non-volatile (storage)
Volatile loses date on power off e.g., random access memory (RAM)
whereas non-volatile retains data on power off e.g., hard disk drives
(HDD), flash memory, optical disks. Non-volatile memory consists of
magnetic storage devices (with a magnetised medium where polarisation
represents bits but has slow access) or flash based [solid state] memory
(which has no mechanical parts, and transistors represent bits. It is faster
than magnetic storage but has a limited lifetime).
Read only memory (ROM) [non-volatile] uses diodes instead of transistors to store data (this data is stored
permanently). Modern ROM can be reprogrammed: EPROM (erasable programmable read only memory), EEPROM
(electrically erasable programmable read only memory).
Without memory, we wouldn’t remember short term (volatile) and long term (on-volatile).
GPU is a very specialised processor. It was originally developed for graphics and graphics rendering but is now also used
in scientific computing due to is parallel computation powers. It has many more processor cores than a CPU, making it
much faster than it (processor core is a processing unit that reads instructions to perform specific actions. Instructions are
chained together so that, when run in real-time, they make up your computer experience).
Cooling
Electrical components generate heat and if too much heat is generated i.e. overheating, failures may be caused.
So, cooling is required to keep a computer working optimally
CPUs typically run at 40-50 degrees Celsius
o They typically shut down/fail above 80 degrees
o Raising clock speed (overclocking) increases CPU speed at the cost of additional heat generation
GPUs can withstand higher temperatures (up to 100 degrees Celsius)
There are two ways to transfer heat: conduction (through thermal plates or heat sinks) and convection (through air in fans
or liquid in pumps). Computer heat always needs to go somewhere: ether by heating the room or the cabinet the computer
is kept in.
Types of Computers
Centralised Computing
Servers
Provides a service to other computers by processing requests from a client e.g. in the form of a website (a Web server is a
computer that uses the HTTP protocol to send Web pages to a client's computer when the client requests them) or hosting
a database. It is often without a screen as it is set up to be maintained remotely. Hence, it requires good network access
and connectively, especially as a single server serves multiple computers at a time so, depending on what it is being used
for, will need quite a big CPU.
A specific type of server is a mainframe. They are specifically used in purposes where they are needed to run
continuously such as if it is being used for bank transactions etc. This is done by building it in such a way that broken
components can be hot swapped i.e. without having to turn off the service that the computer is giving to the client. This
also means there is built in redundance by using multiple hardrives acting as a single hardrive. This means that when one
of them fails, it will move the data over to another hardrive.
The largest centralised computer we will come across is a supercomputer, often used for very specific mathematical
computations. They are magnitudes better than mainframes as they are designed specifically for computational work. The
rate, measured in floating-point operations per second (FLOPS) i.e. the unit of measurement that calculates the
performance capability of a supercomputer is magnitudes better than a standard computer.
Distributed Computing
A distributed systems cluster is a group of similar machines that are virtually or
geographically separated and that work together to provide the same service or application to
clients i.e. set of computers that work together so that they can be viewed as a single system.
Cloud computing is a general term for anything that involves delivering hosted services over the internet i.e. uses the
internet for storing and managing data on remote servers and then access data via the internet. Cloud computing can also
be thought of as utility computing or on-demand computing e.g. google cloud. There are 3 types of cloud computing:
They are typically very small and use very low power. Often, you will find them in toys, medical systems (heart rate
monitors, blood pressure monitors) and household appliances. They are hard to reprogramme as they are designed for
only one purpose.
Compression is important to reduce file size. There are two types: lossless (allows the original data to be perfectly
reconstructed from the compressed data so can be reverted) and lossy (uses inexact approximations by discarding some
data from the original file, making it an irreversible compression method but, this means it also has the potential for much
greater compression).
FOSS is software for which the source code is distributed and you are free to modify and redistribute. This principle
allows other people to contribute to the development and improvement of a software like a community.
Backups
Version control is a form of backing up; it is the practice of tracking and managing changes to software code e.g.
subversion, git. It is beneficial as it allows traceability (providing evidence of all revisions and changes made over time),
it reduces duplication and allows management overview.
Medical imaging
Patient monitoring
Medical research
Hospital administration
Electronic health care records
Image files
Image files contain a header, containing extra information (e.g. fixed size, image dimension, data type, patient
information) and contain image data which contains the actual data.
Data in an image file is generally with the header data and image data stored separately, sometimes in different files for
some file formats, but usually within the same file. The header contains data on the patient name, DOB, ID, etc; scan
parameters; scanner and patient co-ordinate systems; display parameters; and the dimensions of the image – any data that
provides useful information on the image and that will enable it to be displayed correctly, measurements made, etc. The
image data part stores the pixel/voxel values (intensities) of the image.
Data that might be displayed on a radiological workstation at the same time as the image are patient name, patient ID
(hospital number), the date of the scan, the hospital/department name where the scan was performed, the size of the image
in pixels, the type of scan (e.g. MRI sequence), etc.
In a hospital setting, it is important that there are standardised data formats so that if your GP were to take an x-ray,
hospital staff can look at it or if an MRI were to be taken at one hospital, another hospital could also read it. So, DICOM
was released, as the international standard to communicate and manage medical images and data. (Neuroimaging
Informatics Technology Initiative i.e. NIfTI has also been released and although not a licensed standard, it is commonly
used within brain imaging).
DICOM usually has one file per 2D slice/frame with a variable size header. Each vendor (GE/ Siemens/ Phillips) often
has their own specific header too, as well as the header containing patient information.
NIfTI often has a fixed sized header (around 348 bytes) so, on converting DICOM to NIfTI, the tools may need to
remove some header information from DICOM.
Computational statistics focuses on computer intensive statistical methods, especially in cases where the sample sizes of
collected datasets are huge (in thousands) with non-homogenous datasets, e.g., pooled from different medical centres. In
such cases using traditional statistics (without computers) is almost impossible.
The term statistical computing usually means the application of computer science to statistics. Computational statistics,
however, goes further as it is aiming at the design of algorithms for implementing statistical methods on computers,
which were unthinkable before the computer age, such as:
the bootstrap
computer simulations
artificial neural networks, etc
Computational statistics also copes with analytically intractable problems.
Healthcare Informatics: If small amounts of data from many patients are linked up and pooled, researchers and doctors
can look for patterns in the data, helping them develop new ways of predicting or diagnosing illness, and identify ways to
improve clinical care.
Computer-aided diagnosis (CADx) or computer-aided detection (CADe) are systems that assist physicians in the
interpretation of medical images.
CADe systems are usually con ned to marking conspicuous structures and sections, while CADx systems evaluate or
classify the conspicuous structures.
Although CAD has been around for decades, it does not substitute the doctor or other professional, but rather plays a
supporting role. The doctor is generally responsible for the initial interpretation of a medical image. However, the goal of
some CAD systems is to detect earliest signs of abnormality in patients that human doctors cannot.
Probability Distribution
The probability distribution of a statistical data set or population is a
mathematical function that provides the probabilities of occurrence (y-axis)
of different possible outcomes in an experiment (x-axis).
That is, with the data being organised (e.g., ordered from low to high), one
can see the number or percentage of individuals in each group. This can then
be visualised in graphs and charts to examine the shape, centre, and amount
of variability in the data.
The sampling distribution of a statistic is the distribution of that statistic, considered as a random variable, when derived
from a random sample of size n
Accuracy describes how well a binary classification test correctly identifies or excludes a condition. Specifically,
accuracy is the proportion of true results (TP and TN) among the total number of cases examined.
In rare diseases:
High accuracy can be achieved simply by ignoring all evidence and calling all cases negative. If only 5% of patients have
the disease, a physician who always blindly states that the disease is absent will be right 95% of the time!
A good test has high sensitivity AND high specificity. Depending on the application, one may choose to reduce
specificity to maximise sensitivity or vice versa.
Confusion Matrix:
TPF + FNF = 1
TNF + FPF = 1
Each decision fraction represents an estimate of the probability of a particular decision, given that (condition) an
individual case has a particular health/disease state.
Let D represent the disease in question, and let T represent the result of a diagnostic test (decision). So, we have the
probability of a positive test given the absence of disease:
FPF=P ¿
TPF=P¿FNF=P ¿TNF =P ¿
Performance of CAD systems:
CAD systems cannot yet detect 100% of pathological changes (nor human doctors can). The hit rate (sensitivity) can be
up to 90% depending on system and application. The fewer FPs are indicated, the higher the specificity is. A low
specificity reduces the acceptance of the CAD system because the user has to identify all of these wrong hits.
???
Since the ROC curve is a graph of TPF versus FPF, both of which are independent of disease prevalence P(D+), it does
not depend on the prevalence of disease in the actual population to which the test may be applied. Thus, ROC analysis
provides a description of disease detectability that is independent from both disease prevalence and decision threshold
effects.
The Pearson correlation coefficient (denoted by r or ρ ) is a measure of the strength (or ‘tightness’) of a linear association
between two variables. The correlations coefficient indicates how far away all these data points are to the best linear fit—
i.e., how well the data points fit this line of best fit.
n
∑ (x i−x )( y i− y )
i=1
r=
√ ∑ ( ) √∑ (
n n
2 2
x i−x y i− y )
i=1 I=1
cov ( X , Y )
r=
σx σy
Where COV = coefficient of variation
The p-value is the probability of obtaining the current result of r, where in fact it was zero (null hypothesis). If this
probability is lower than the conventional 5% (p < 0.05) the correlation coefficient may be called statistically significant.
The p-value is obtained from the sampling distribution of r, which in this case follows the Student-t distribution with n - 2
degrees of freedom (the higher t, the lower p-value):
t=r
√ n−2
1−r
2
The sampling distribution of a statistic is the distribution of that statistic (in our case it is r), considered as a random
variable, when derived from a random sample of size n.
Voxel-based analyses assume that the data from a particular voxel all derive from the same part of the brain.
Violations of this assumption will introduce artifactual changes in the voxel values that may obscure changes, or
differences, of interest.
This assumption is often violated due to subject motion during a series of scans of the same subject. Image
realignment (rigid transformation) is used to correct for this.
In case of aligning functional and structural images, we use image co-registration (rigid transformation).
After realignment the data are then transformed using linear (affine) or nonlinear spatial normalisation into a standard
anatomical space in order to perform common analysis for all subjects.
Registration optimises the parameters that describe a spatial transformation between the source and reference (e.g.,
template) images.
The word registration often encompasses many types of alignment optimisations and the corresponding transformations,
i.e.:
2D affine transformations
TYPE OF TRANSFORMATION EQUATION
Translation by tx and ty x 1=1 ∙ x 0 +0 ∙ y 0+ t x
y 1=0 ∙ x 0+1 ∙ y 0+ty
Rotation around the origin by θ radians x 1=cos ( θ ) x 0 +sin ( θ ) y 0+ 0
y 1=−sin ( θ ) x 0 +cos ( θ ) y 0 +0
Zoom/scale by sx and sy x 1=s x ∙ x0 +0 ∙ y 0 +0
y 1=0 ∙ x 0+ s y ∙ y 0 +0
Shear x 1=1 ∙ x 0 +h ∙ y 0+ 0
y 1=0 ∙ x 0+ 1∙ y 0 +0
2D affine matrix representation
An affine transformation is a composition of two functions: a linear mapping and a translation:
⃗y= A ⃗x + b⃗
Matrix multiplication is used to represent linear maps, and vector addition to represent translations. It is possible to
represent both the translation and the linear map using a single matrix multiplication through augmented matrix and
vectors:
Spatial Normalisation
Why do we want to perform spatial normalisation?
1. Inter-subject averaging
a. Increase sensitivity with more subjects
b. Extrapolate findings to the population as a whole
2. Image data in standard coordinate system
a. e.g. Talairach & Tournoux space
Therefore, it minimises mean squared difference from template image(s):
Image Segmentation
= the process of partitioning a digital image into multiple segments—sets of pixels, also known as super-pixels
The goal of segmentation is to simplify and change the representation of an image into something that is more meaningful
and easier to analyse. Image segmentation is typically used to locate objects and boundaries in images. During
segmentation a label is assigned to every pixel in an image such that pixels with the same label share certain
characteristics.
Data Protection
The Data Protection Act 2018 is the UK’s implementation of the General Data Protection Regulation (GDPR). The Data
Protection Act controls how personal information is used by organisations, businesses or the government. The
aim of the GDPR is to protect all EU citizens framrom privacy and data breaches in today’s data-driven world.
Name
Identification number
Location data or online identifier
Everyone responsible for using personal data has to follow strict rules called ‘data protection principles’:
race
ethnic background
political opinions
religious beliefs
trade union membership
genetics
biometrics (when used for identification, e.g., from images)
health
sex life or orientation
In research activities it is mandatory to evaluate data in different ways. It would thus not be possible to specify all
processing in advance, due to the fact that e.g. new tools for image processing will be developed and should be evaluated
with former processing tools in imaging databanks. The development of platforms for long-term storage and image
organisation should allow sharing image data between researchers from all over Europe. The explicit consent should not
apply for the use of anonymised and key-coded image data for historical, statistical, educational and scientific research
purposes.
Data Anonymisation
Data anonymisation is the process of removing or encrypting sensitive information from a document, record or message,
whose intent is privacy protection. The sensitive information includes all personally identifiable information in the data
sets, so that the people whom the data describe remain anonymous. The EU new GDPR demands that stored data on
people in the EU undergo either an anonymisation or a pseudonymisation process.
Pseudonymisation vs anonymisation
Pseudonymisation is a procedure by which personally identifiable information fields within a data record are replaced by
one or more artificial identifiers, or pseudonyms. A single pseudonym for each replaced field makes the data record less
identifiable while remaining suitable for data analysis and data processing. Pseudonymised data can be restored to its
original state with the addition of information which then allows individuals to be reidentified
Where a data breach is likely to “result in a risk for the rights and freedoms of individuals”.
This must be done within 72 hours of first having become aware of the breach.
Data processors are also required to notify their customers, the controllers, “without undue delay” after first
becoming aware of a data breach.
Privacy by design:
Inclusion of data protection from the onset of the designing of systems, rather than an addition.
Right to access:
Data subjects have the right to obtain confirmation from the data controller as to whether or not personal data
concerning them is being processed, where and for what purpose.
The controller shall provide a copy of the personal data, free of charge, in an electronic format.
This comes with latest changes which is a dramatic shift to data transparency and empowerment of data subjects.
Right to be forgotten:
The right entitles the data subject to have the data controller erase their personal data, cease further dissemination of
the data, and potentially have third parties halt processing the data.
The conditions for erasure include the data no longer being relevant to original purposes for processing, or a data
subject withdrawing consent.
Difficulties for healthcare providers
Busy doctors already use popular smartphone apps for clinical communications, which store information online, e.g.:
Calendars,
• Dropbox,
• Google Drive,
• PDF creator apps.
With data in so many locations it will be difficult for trusts to identify where information is stored when faced with a
“subject access request” from a patient.
The most popular apps have the right to access all of the data on users’ devices, including:
• contact lists,
• calendars,
• email,
• SMS,
• instant messages,
• microphone,
• image gallery,
• camera,
• location
All such data on a clinicians’ device are potentially sensitive.
Advantages:
• Rapid access
• No duplicates (i.e. copies on every computer)
• No lost images
• Enhances analysis
• Better collaboration and ease of sharing
Disadvantages:
Research Ethics
- of Human and Animal participants (or tissue, data)
Studies require approval of a Research Ethics Committee in order to safeguard the participant’s dignity, rights, safety and
wellbeing. The committee will look at if the research is justified, does research comply with legislations and law, whether
risks outweigh benefits, will the research be completed (due to lack of funding etc.), and if the researchers are qualified to
carry out the research.
Sampling converts an analogue signal into a set of values that specify the signal amplitude at pre-set
intervals
Quantization converts the signal amplitude into one of a discrete set of values (codes)
Quantization error = original signal amplitude – quantized value
Information loss
Information loss may be caused if we don’t take enough samples as this therefore means we cannot be sure of what we
are measuring. This may also lead to aliasing: an effect that causes different signals to become indistinguishable (i.e.
aliases of one another) when sampled. Aliasing also refers to the distortion or artifact that results when the signal
reconstructed from samples is different from the original continuous signal.
When f s is limited (e.g. due to hardware limitations), we can use an anti-aliasing filter to remove frequencies
above f s=f /2 so that the Nyquist criterion is satisfied
When the Nyquist theorem/criterion is satisfied, the original (analogue/continuous) signal can be perfectly
reconstructed (i.e. recovered) from the sampled signal without distortion or error
Applying the Nyquist Criterion:
For a compact disc (CD), digital audio f s=44.1 kHz as humans cannot hear above 20kHz
fs
Sampling has the effect of a low-pass frequency filter with a frequency cut-off at Hz
2
Signal Filtering
Filtering changes the nature of a signal in some way e.g. by removing certain components
Filters can:
o Be implemented in hardware or software
o Be analogue or digital
o Be defined in the time (or spatial) domain or the frequency domain
Filtering in the time (or spatial) domain is done using convolution
Filtering in the frequency domain makes use of the Fourier Transform
It allows us to understand the Nyquist Criterion, filtering and a wide range of other signal processing operations.
i.e. the convolution formula can be described as the area under the function f(τ) weighted by the function g(−τ) shifted by
amount t. As t changes, the weighting function g(t − τ) emphasizes different parts of the input function f(τ).
Commutativity: f ⊗ g=g⊗ f
Associativity:
f ⊗ ( g ⊗ h ) =( f ⊗ g)⊗ h
( I ⊗ G 1¿ ⊗ G 2⊗G 3 ⊗ …=I ⊗(G 1⊗G 2 ⊗G 3 ⊗…) I is an imagine and Gn defines
filter masks
Linear filtering itself is not an associative operation, i.e. f ( f ( X ; G ) ; G ) ≠ f ( X ) ; f (G;G)¿
Implication: Serial linear image filtering is often easier and faster if we pre-compute a single filter kernel/mask compared
with applying a series of filtering operations and storing intermediate filtered images.
f ⊗ ( g ± h )=(f ⊗ g)±( f ⊗ h)
Scaling:
d dg
( f ⊗ g )=f ⊗
dx dx
Where s is frequency.
Complex numbers
Polar form:
Exponential form:
iθ
cos θ+ isin θ=e
iθ
→ z=r e
F ( f ( ax ) )= ( 1a ) F ( as )
Convolution:
F ( f∗g )=F ( f ) × F (g) i.e. convolution in time/space = Multiplication in Freq. domain
F ( f × g )=F ( f )∗F (g)
Understanding sampling in the frequency domain:
If the Nyquist criterion is not satisfied, adjacent copies overlap, and it is not possible in general to discern an
unambiguous X (f ). Any frequency component above f s /2 is indistinguishable from a lower-frequency component,
called an alias, associated with one of the copies.
Reconstructing a sampled signal by filtering:
We need to:
∞
Definition: y [ n ] =x [ n ] ⊗h [ n ] = ∑ x [k ] ∙h [n−k ]
k=−∞
Image
filtering in 2D
By using a kernel, we can smooth the image, reducing noise by
replacing each data point by some kind of local average of
surrounding data points (at the expense of reducing fine image
detail). Therefore, pixel averaging will cause a blurred image. If
we were to increase the value of the central pixel of the kernel, it
will cause a sharpening filter.
∫ δ ( t ) dt=1
−∞
δ can be defined as the limit of a symmetric ‘spikey’ function with an integral of 1 (e.g. a Gaussian), as
the width approaches zero.
Useful properties:
Sam
∞
Definition: III ( x )= ∑ δ (x −n T s)
n=−∞
Choose how pixel values are defined outside the image. Possible options:
Constant
Specified value (e.g. zero – sometimes called zero padding)
Use value at image edges/corner (so-called replication)
Varying
Mirror image, periodic, etc.
Gaussian Filter
( ) (( ))
2 2
1 x y
G ( x , y ; σ x , σ y )= exp − + 2
2 π σx σ y 2
2σx 2σy
Separable Filters
Some filter masks/kernels are separable. This means that:
In digital signal processing, the function is any quantity or signal that varies over time, such as the pressure of a sound
wave, a radio signal, or daily temperature readings, sampled over a finite time interval (often defined by a window
function).
In image processing, the samples can be the values of pixels along a row or column of a raster image. The DFT is also
used to efficiently solve partial differential equations, and to perform other operations such as convolutions or multiplying
large integers.
T
f [n ] [ f 1 , f 2 , f 3 , … f N ] Discrete function 1D (samples represented as a column vector)
T
F [ m ] =DTF ( f ) =[ F1 , F 2 , F 3 , … F N ] DFT of f - also a discrete function (column vector)
( )
N
1 −2 πi ( m−1 )( n−1 )
F m= ∑
N n=1
f n exp
N
, 1≤ m ≤ N
M N
( (
F u ,v = ∑ ∑ f m ,n exp −2 πi
m =1 n=1
( m−1 ) ( u−1 ) ( n−1 ) ( v −1 )
M
+
N ))
, 1≤ u ≤ M , 1≤ V ≤ n
( ( ))
M N
1 ( m−1 )( u−1 ) ( n−1 )( v−1 )
f m ,n =
MN
∑ ∑ F u ,v exp 2 πi M
+
N
⟵ inverse DFT
u=1 v=1
Properties of 2D DFT:
Filtering:
Low-