Mphy0020 Notes

MPHY0020: Computing in Medicine (Dr.
Erwin Alles)
Introduction to Computing (Computer Programming)
Binary Numbering Systems
Binary numbers are the basis for:
 Data storage (very important in medicine)

 Data transfer
 Data manipulation and processing
The key advantage of digital signals is much greater noise immunity in comparison to analogue signal
Digital signal
A digital signal can only have a finite number of values, for a binary system there are only 2 values: 0 or 1. But in reality,
a digital signal has noise or may be distorted but the represented digits will not be affected.
Digital electronic circuits are made to operate on such digital signals. Such circuits are often printed on integrated circuits
and usually are made from large assemblies of logic gates – simple electronic representations of Boolean logic functions:
Arithmetic Logic Unit (ALU)
Opcode: operation selection code – an enumerated value that specifies the

desired arithmetic or logic operation to be performed by the ALU. This is what
we program with the programming language of choice.
Representation of Numeric Values

Computer number format
Number formatting is the internal representation of numeric values in computer. Normally, numeric values are stored as
groupings of bits (usually expressed in bytes).
the encoding between numerical values and bit patterns is chosen for convenience of the operation of the computer. There
are convenient ways to represent binary numbers using different bases.
e.g.
decimal: 8373 = 3x100 + 7x101 + 3x102 + 8x103

binary (16-bit): 0010000010110101 = 1x20 + 0x21 + 1x22 + 0x23 +...
Data Types
Data type is a classification, which indicates to the computer how the programmer intends to use the data – it provides a
set of values, a variable or function may take.
It also defines the operations that can be done on the date, the meaning of the data and the way values of that type can be
stored.
Most programming languages support various types of date, such as:
 Floating-point
o -1sign x 2exponent-bias x 1.significant
o We bias the exponent because it has to be signed to represent small and large values and the usual two’s
complement would make comparisons more difficult.
o Floating point accuracy is limited since an infinite set of real numbers is represented by finitely many
floating-point numbers, thus leading to rounding errors.
|r −f r|
o Floating point relative error is defined as , where r is the real number and fr is its floating point
|r|
representation
o The error of rounding a real number toward the nearest floating-point number in a floating point system
b −s
with base b and number of bits for the significand s, is bounded by b , which is called machine
2
epsilon.
 Fixed-point (integer)
o Stored as binary data
o Unsigned integer can represent integers from 0 to 2n-1 where n is the number of bits
o e.g. 16 bits can represent values between 0 and 216-1
o Signed integers – need an extra bit to convey the sign
 Two’s complement: method for storing signed integers
 Most significant bit is 0 if number is positive; 1 if negative
 An n-bit two’s complement system can represent every integer from -2n-1 to 2n-1-1
 n-bit two’s complement of a number is the value obtained by subtracting that number
from 2n
 Advantages:
o Adding and subtraction can occur naturally without having to examine sign
o Zero has a unique representation
 To calculate two’s complement: invert all bits and add 1. For example for number 5,
we have:
o 5: 0b0101001
o Negation: 0b1010110
o Adding 1: 0b1010111
 Boolean
o Simplest data type – one bit – 1 or 0, True or False
o Used for output of logic and conditions
 e.g. if A > 10 then… A > 10 is either true or false
 List
 Array
 String
 Dictionary
What is a computer?
Digital computers: e.g. calculator, laptop, bank card, smartphone. They work on discrete
data (often binary).
Analogue computers: e.g. abacus, speedometer. They work on continuous data
Hybrid computers: involved the logic as digital components and analogue being the
arithmetic component.
-often in hospitals, they have hybrid systems as inputs and outputs are analogue
but all the computation happens in a digital system.
A computer is: a programmable device that accepts inputs, processes these and produces outputs.
Turing Machine
 Mathematical model of a (digital) computer
 Used to prove properties of computation e.g. fundamental limitations
 If a programming language can express all tasks accomplishable by (digital) computers, it is said to be Turing
complete
Transistors
e.g. Bipolar Junction Transistor and Field Effect Transistor
A transistor is a device that regulates current or voltage flow and acts as a switch or gate for electronic signals, to regulate
the flow of electronic signals through amplification, controlling and generating electrical signals.
Central Processing Unit (CPU)

Contains: arithmetic logic unit (ALU), register, cache, control unit, interface
bus and a clock.
The architecture and software design determine practical efficiency and
computation speed.
Clock rate: this is tied to CPU speed but not directly. A faster clock rate does
not necessarily lead to a faster computer; you should compare clock speeds
within brand and generation. Clock speeds tend to increase with a decrease in
transistor size i.e., smaller transistors mean shorter interconnects and smaller
capacitances.
Pipelining: Pipelining is the process of accumulating instruction from the

processor through a pipeline. It allows storing and executing instructions in an
orderly process. It is also known as pipeline processing. Pipelining is a
technique where multiple instructions are overlapped during execution.
Motherboards and buses

Power Supply Unit (PSU)
 Converts AC to DC
 ATX is the standard
 Provides connectors: motherboard, fans, GPU, hard drives, PCI(e) cards
Uninterruptible Power Supply (UPS)
 Keeps the computer powered even when the mains fail
o For long enough to power down safely or;
o To enable emergency generators to power on
 Protects equipment from unexpected power disruption
 Range from single computer to whole city
o Largest UPS is BESS in Fairbanks, powering the
entire city on outages
Chipset
 The chipset works with CPU and controls memory, buses

and peripherals. The northbridge is directly connected to
CPU and has a fast link between CPU/GPU/RAM. The
southbridge is not directly connected to the southbridge and
has a slower link between HDD/USB/etc. (on modern MBs,
the northbridge has moved to the CPU)
 The front-side bus is important for CPU because it
determines how fast CPU talks to motherboard
Basic Input/Output System (BIOS)
 Used by the CPU to perform start-up procedures when the computer is turned on
 Initialises hardware
 Finds bootable device (HDD)
 Hardware test (power on self test: POST)
 Load operating system (OS)
 Transfer control to OS
 Read only memory
Bus
 A bus is a communication system that transfers data

o Can be a wire, optical fibre or even wireless
 Bus width = number of bits
o Parallel connections
*serial buses are considered superior to parallel buses in modern computers due to a faster speed time
Memory
Types of memory: volatile and non-volatile (storage)
Volatile loses date on power off e.g., random access memory (RAM)
whereas non-volatile retains data on power off e.g., hard disk drives
(HDD), flash memory, optical disks. Non-volatile memory consists of
magnetic storage devices (with a magnetised medium where polarisation
represents bits but has slow access) or flash based [solid state] memory
(which has no mechanical parts, and transistors represent bits. It is faster
than magnetic storage but has a limited lifetime).
Read only memory (ROM) [non-volatile] uses diodes instead of transistors to store data (this data is stored
permanently). Modern ROM can be reprogrammed: EPROM (erasable programmable read only memory), EEPROM
(electrically erasable programmable read only memory).
*flash drives are a type of EEPROM.
Without memory, we wouldn’t remember short term (volatile) and long term (on-volatile).
Graphical Processing Unit (GPU), cooling and peripherals

GPU
GPU is a very specialised processor. It was originally developed for graphics and graphics rendering but is now also used
in scientific computing due to is parallel computation powers. It has many more processor cores than a CPU, making it
much faster than it (processor core is a processing unit that reads instructions to perform specific actions. Instructions are
chained together so that, when run in real-time, they make up your computer experience).
Cooling
 Electrical components generate heat and if too much heat is generated i.e. overheating, failures may be caused.
So, cooling is required to keep a computer working optimally
 CPUs typically run at 40-50 degrees Celsius
o They typically shut down/fail above 80 degrees
o Raising clock speed (overclocking) increases CPU speed at the cost of additional heat generation
 GPUs can withstand higher temperatures (up to 100 degrees Celsius)
There are two ways to transfer heat: conduction (through thermal plates or heat sinks) and convection (through air in fans
or liquid in pumps). Computer heat always needs to go somewhere: ether by heating the room or the cabinet the computer
is kept in.
Types of Computers
Centralised Computing
Servers
Provides a service to other computers by processing requests from a client e.g. in the form of a website (a Web server is a
computer that uses the HTTP protocol to send Web pages to a client's computer when the client requests them) or hosting
a database. It is often without a screen as it is set up to be maintained remotely. Hence, it requires good network access
and connectively, especially as a single server serves multiple computers at a time so, depending on what it is being used
for, will need quite a big CPU.
A specific type of server is a mainframe. They are specifically used in purposes where they are needed to run
continuously such as if it is being used for bank transactions etc. This is done by building it in such a way that broken
components can be hot swapped i.e. without having to turn off the service that the computer is giving to the client. This
also means there is built in redundance by using multiple hardrives acting as a single hardrive. This means that when one
of them fails, it will move the data over to another hardrive.
The largest centralised computer we will come across is a supercomputer, often used for very specific mathematical
computations. They are magnitudes better than mainframes as they are designed specifically for computational work. The
rate, measured in floating-point operations per second (FLOPS) i.e. the unit of measurement that calculates the
performance capability of a supercomputer is magnitudes better than a standard computer.
Distributed Computing
A distributed systems cluster is a group of similar machines that are virtually or
geographically separated and that work together to provide the same service or application to
clients i.e. set of computers that work together so that they can be viewed as a single system.
Grid computing is an extreme example whereby it is a heterogeneous cluster and tend to be

more geographically dispersed section of an “ad-hoc” network of computers. This grid of
computers may be used to compute prime numbers or fold proteins while your computer is idle.
Cloud computing is a general term for anything that involves delivering hosted services over the internet i.e. uses the
internet for storing and managing data on remote servers and then access data via the internet. Cloud computing can also
be thought of as utility computing or on-demand computing e.g. google cloud. There are 3 types of cloud computing:
 IaaS (Infrastructure as a Service): IaaS provides an on-demand infrastructure to organizations on a pay-as-you-

go basis such as storage, networking and virtualisation over the Internet instead of via a traditional datacenter.
o BUT: Because IaaS has a multi-tenant architecture, there are data security issues associated with it.
o If there are vendor outages in IaaS solutions, users might be unable to access their data for some time.
 PaaS (Platform as a Service): PaaS is a cloud computing model where a third-party provider delivers hardware
and software tools to users over the internet. Usually, these tools are needed for application development. A
PaaS provider hosts the hardware and software on its own infrastructure.
o BUT: Since not every element of existing infrastructure can be cloud-enabled, there might be
compatibility issues with adopting PaaS solutions.
 SaaS (Software as a Service): SaaS providers host software on their servers and lease it to organizations on a
subscription basis. Rather than IT administrators installing the software on individual workstations, the SaaS
model allows users to access the application via a web browser where they log in with their usernames and
passwords e.g. Cisco Webex, Dropbox
o BUT: Network connectivity is a must when it comes to using SaaS solutions.
Microcontroller
= it is like a small motherboard with a CPU, memory and built in input/output
They are typically very small and use very low power. Often, you will find them in toys, medical systems (heart rate
monitors, blood pressure monitors) and household appliances. They are hard to reprogramme as they are designed for
only one purpose.
Software: file formats, compression etc.

Compression
Compression is important to reduce file size. There are two types: lossless (allows the original data to be perfectly
reconstructed from the compressed data so can be reverted) and lossy (uses inexact approximations by discarding some
data from the original file, making it an irreversible compression method but, this means it also has the potential for much
greater compression).
FOSS (free and open source software)
FOSS is software for which the source code is distributed and you are free to modify and redistribute. This principle
allows other people to contribute to the development and improvement of a software like a community.
Backups
Full backup: backs up everything

Differential backup: only backs up changes from last full backup
Incremental backup: only backs up changes from last backup
Version control is a form of backing up; it is the practice of tracking and managing changes to software code e.g.
subversion, git. It is beneficial as it allows traceability (providing evidence of all revisions and changes made over time),
it reduces duplication and allows management overview.
Medical Image Formats

We may use computers in medicine for:
 Medical imaging
 Patient monitoring
 Medical research
 Hospital administration
 Electronic health care records
Image files
Image files contain a header, containing extra information (e.g. fixed size, image dimension, data type, patient
information) and contain image data which contains the actual data.
Data in an image file is generally with the header data and image data stored separately, sometimes in different files for
some file formats, but usually within the same file. The header contains data on the patient name, DOB, ID, etc; scan
parameters; scanner and patient co-ordinate systems; display parameters; and the dimensions of the image – any data that
provides useful information on the image and that will enable it to be displayed correctly, measurements made, etc. The
image data part stores the pixel/voxel values (intensities) of the image.
Data that might be displayed on a radiological workstation at the same time as the image are patient name, patient ID
(hospital number), the date of the scan, the hospital/department name where the scan was performed, the size of the image
in pixels, the type of scan (e.g. MRI sequence), etc.
In a hospital setting, it is important that there are standardised data formats so that if your GP were to take an x-ray,
hospital staff can look at it or if an MRI were to be taken at one hospital, another hospital could also read it. So, DICOM
was released, as the international standard to communicate and manage medical images and data. (Neuroimaging
Informatics Technology Initiative i.e. NIfTI has also been released and although not a licensed standard, it is commonly
used within brain imaging).
DICOM usually has one file per 2D slice/frame with a variable size header. Each vendor (GE/ Siemens/ Phillips) often
has their own specific header too, as well as the header containing patient information.
NIfTI often has a fixed sized header (around 348 bytes) so, on converting DICOM to NIfTI, the tools may need to
remove some header information from DICOM.
Computational Statistics in Medicine

Part of mathematics which deals with:
Data collection (including design of experiments)

Data organisation
Analysis
Interpretation
Presentation
The goal of statistics is to transform raw data (collected) into knowledge.
Computational statistics focuses on computer intensive statistical methods, especially in cases where the sample sizes of
collected datasets are huge (in thousands) with non-homogenous datasets, e.g., pooled from different medical centres. In
such cases using traditional statistics (without computers) is almost impossible.
The term statistical computing usually means the application of computer science to statistics. Computational statistics,
however, goes further as it is aiming at the design of algorithms for implementing statistical methods on computers,
which were unthinkable before the computer age, such as:
the bootstrap
computer simulations
artificial neural networks, etc
Computational statistics also copes with analytically intractable problems.
Healthcare Informatics: If small amounts of data from many patients are linked up and pooled, researchers and doctors
can look for patterns in the data, helping them develop new ways of predicting or diagnosing illness, and identify ways to
improve clinical care.
Evidence based medicine
Clinical decisions should be based on evidence

Evidence is obtained from experimental studies on populations
Individual decisions (binary: healthy or diseased) are made based on the results of such studies (the larger the studies
the better).
The challenge of statistics (part of healthcare informatics) is to derive useable and signi cant evidence from patient
data:
create patient data
‘transform’ raw patient data into clinical evidence
use evidence to inform management of future patients
Computer Aided Diagnosis
Computer-aided diagnosis (CADx) or computer-aided detection (CADe) are systems that assist physicians in the
interpretation of medical images.
CADe systems are usually con ned to marking conspicuous structures and sections, while CADx systems evaluate or
classify the conspicuous structures.
Although CAD has been around for decades, it does not substitute the doctor or other professional, but rather plays a
supporting role. The doctor is generally responsible for the initial interpretation of a medical image. However, the goal of
some CAD systems is to detect earliest signs of abnormality in patients that human doctors cannot.
Probability Distribution
The probability distribution of a statistical data set or population is a
mathematical function that provides the probabilities of occurrence (y-axis)
of different possible outcomes in an experiment (x-axis).
That is, with the data being organised (e.g., ordered from low to high), one
can see the number or percentage of individuals in each group. This can then
be visualised in graphs and charts to examine the shape, centre, and amount
of variability in the data.
For example, if the random variable (= a variable whose values depend on

outcomes of a random phenomenon X denotes the outcome of an experiment
of coin toss, then the probability distribution of X would take the value 0.5
Probability distribution for the sum S of counts
for X = heads, and 0.5 for X = tails (if the coin is fair).
from two dice throws
Sampling Distribution
The sampling distribution of a statistic is the distribution of that statistic, considered as a random variable, when derived
from a random sample of size n
The sampling distribution depends on:
underlying probability distribution of the population

statistic being considered
sampling procedure employed
sample size used
Performance of Diagnostic Modalities
True positive (TP): Sick people correctly diagnosed as sick

False positive (FP): Healthy people incorrectly identified as sick
True negative (TN): Healthy people correctly identified as healthy
False negative (FN): Sick people incorrectly identified as healthy
Accuracy
Accuracy describes how well a binary classification test correctly identifies or excludes a condition. Specifically,
accuracy is the proportion of true results (TP and TN) among the total number of cases examined.
Accuracy = all correct/ all

Accuracy = (TP+TN) / (P+N)
Accuracy = (TP+TN) / (TP+TN+FP+FN)
However, accuracy can be misleading.
In rare diseases:
High accuracy can be achieved simply by ignoring all evidence and calling all cases negative. If only 5% of patients have
the disease, a physician who always blindly states that the disease is absent will be right 95% of the time!
Sensitivity and Specificity
Sensitivity = TP/(TP + FN) = TP/Total Positives

100% sensitivity means all sick people are identified as sick. It needs the gold standard which may not exist.
Specificity = TN/(TN +FP) = TN/Total Negatives

100% specificity means that all healthy people are correctly identified
A good test has high sensitivity AND high specificity. Depending on the application, one may choose to reduce
specificity to maximise sensitivity or vice versa.
Confusion Matrix:
Confusion or error matrix is a table layout that allows

visualisation of the performance of an algorithm, typically a
supervised learning, where the truth is known (confirmed).
Additional terminology:
True positive fraction: TPF — the same as sensitivity

True negative fraction: TNF — the same as specificity
False positive fraction: FPF = FP / (TN+FP) = 1-TNF

False negative fraction: FNF = FN / (FN + TP) = 1-TPF
TPF + FNF = 1
TNF + FPF = 1
The fractions can be defined using conditional probabilities:
Each decision fraction represents an estimate of the probability of a particular decision, given that (condition) an
individual case has a particular health/disease state.
Let D represent the disease in question, and let T represent the result of a diagnostic test (decision). So, we have the
probability of a positive test given the absence of disease:
FPF=P ¿
TPF=P¿FNF=P ¿TNF =P ¿
Performance of CAD systems:
CAD systems cannot yet detect 100% of pathological changes (nor human doctors can). The hit rate (sensitivity) can be
up to 90% depending on system and application. The fewer FPs are indicated, the higher the specificity is. A low
specificity reduces the acceptance of the CAD system because the user has to identify all of these wrong hits.
Receiver Operating Characteristic (ROC)
???
Since the ROC curve is a graph of TPF versus FPF, both of which are independent of disease prevalence P(D+), it does
not depend on the prevalence of disease in the actual population to which the test may be applied. Thus, ROC analysis
provides a description of disease detectability that is independent from both disease prevalence and decision threshold
effects.
Pearson Correlation Coefficient
The Pearson correlation coefficient (denoted by r or ρ ) is a measure of the strength (or ‘tightness’) of a linear association
between two variables. The correlations coefficient indicates how far away all these data points are to the best linear fit—
i.e., how well the data points fit this line of best fit.
n
∑ (x i−x )( y i− y )
i=1
r=
√ ∑ ( ) √∑ (
n n
2 2
x i−x y i− y )
i=1 I=1
cov ( X , Y )
r=
σx σy
Where COV = coefficient of variation
The p-value is the probability of obtaining the current result of r, where in fact it was zero (null hypothesis). If this
probability is lower than the conventional 5% (p < 0.05) the correlation coefficient may be called statistically significant.
The p-value is obtained from the sampling distribution of r, which in this case follows the Student-t distribution with n - 2
degrees of freedom (the higher t, the lower p-value):
t=r
√ n−2
1−r
2
The sampling distribution of a statistic is the distribution of that statistic (in our case it is r), considered as a random
variable, when derived from a random sample of size n.
The correlation coefficient, r, can take a range of values from +1 to -1:
r = 0 indicates that there is no association between the two variables;

r > 0 indicates a positive association;
r < 0 indicates a negative association.
Medical Image Processing and Data Protection
Image realignment, registration and spatial normalisation
Voxel-based analyses assume that the data from a particular voxel all derive from the same part of the brain.
Violations of this assumption will introduce artifactual changes in the voxel values that may obscure changes, or
differences, of interest.
This assumption is often violated due to subject motion during a series of scans of the same subject. Image
realignment (rigid transformation) is used to correct for this.
In case of aligning functional and structural images, we use image co-registration (rigid transformation).
After realignment the data are then transformed using linear (affine) or nonlinear spatial normalisation into a standard
anatomical space in order to perform common analysis for all subjects.
Registration optimises the parameters that describe a spatial transformation between the source and reference (e.g.,
template) images.
The word registration often encompasses many types of alignment optimisations and the corresponding transformations,
i.e.:
Image realignment (within a series of scans of the same subject)

Image co-registration (multimodality registration of images from the same subject)
Spatial normalisation or nonrigid registration (warping into a common template to perform group analyses)
Transformation performs resampling according to the determined transformation parameters.
2D affine transformations
TYPE OF TRANSFORMATION EQUATION
Translation by tx and ty x 1=1 ∙ x 0 +0 ∙ y 0+ t x
y 1=0 ∙ x 0+1 ∙ y 0+ty
Rotation around the origin by θ radians x 1=cos ( θ ) x 0 +sin ( θ ) y 0+ 0
y 1=−sin ( θ ) x 0 +cos ( θ ) y 0 +0
Zoom/scale by sx and sy x 1=s x ∙ x0 +0 ∙ y 0 +0
y 1=0 ∙ x 0+ s y ∙ y 0 +0
Shear x 1=1 ∙ x 0 +h ∙ y 0+ 0
y 1=0 ∙ x 0+ 1∙ y 0 +0
2D affine matrix representation
An affine transformation is a composition of two functions: a linear mapping and a translation:
⃗y= A ⃗x + b⃗
Matrix multiplication is used to represent linear maps, and vector addition to represent translations. It is possible to
represent both the translation and the linear map using a single matrix multiplication through augmented matrix and
vectors:
All vectors are augmented with a "1" at the end,

The matrix is augmented with an extra row of zeros at the bottom, an extra column—the translation vector—to the
right, and a "1" in the lower right corner.
[ ⃗1y ]=[ 0 …A 0|b1] [ ⃗x1]
Spatial Normalisation
Why do we want to perform spatial normalisation?
1. Inter-subject averaging
a. Increase sensitivity with more subjects
b. Extrapolate findings to the population as a whole
2. Image data in standard coordinate system
a. e.g. Talairach & Tournoux space
Therefore, it minimises mean squared difference from template image(s):
Image Segmentation
= the process of partitioning a digital image into multiple segments—sets of pixels, also known as super-pixels
The goal of segmentation is to simplify and change the representation of an image into something that is more meaningful
and easier to analyse. Image segmentation is typically used to locate objects and boundaries in images. During
segmentation a label is assigned to every pixel in an image such that pixels with the same label share certain
characteristics.
Data Protection
The Data Protection Act 2018 is the UK’s implementation of the General Data Protection Regulation (GDPR). The Data
Protection Act controls how personal information is used by organisations, businesses or the government. The
aim of the GDPR is to protect all EU citizens framrom privacy and data breaches in today’s data-driven world.
What constitutes personal data?

The GDPR applies to ‘personal data’, meaning any information relating to an identifiable person who can be directly or
indirectly identified, in particular by reference to an identifier
Range of personal identifiers:
Name
Identification number
Location data or online identifier
Everyone responsible for using personal data has to follow strict rules called ‘data protection principles’:
used fairly, lawfully and transparently

used for specified, explicit purposes
used in a way that is adequate, relevant and limited to only what is necessary
accurate and, where necessary, kept up to date
kept for no longer than is necessary
handled in a way that ensures appropriate security, including protection against unlawful or unauthorised processing,
access, loss, destruction or damage
There is stronger legal protection for more sensitive information, such as:
race
ethnic background
political opinions
religious beliefs
trade union membership
genetics
biometrics (when used for identification, e.g., from images)
health
sex life or orientation
In research activities it is mandatory to evaluate data in different ways. It would thus not be possible to specify all
processing in advance, due to the fact that e.g. new tools for image processing will be developed and should be evaluated
with former processing tools in imaging databanks. The development of platforms for long-term storage and image
organisation should allow sharing image data between researchers from all over Europe. The explicit consent should not
apply for the use of anonymised and key-coded image data for historical, statistical, educational and scientific research
purposes.
Data Anonymisation
Data anonymisation is the process of removing or encrypting sensitive information from a document, record or message,
whose intent is privacy protection. The sensitive information includes all personally identifiable information in the data
sets, so that the people whom the data describe remain anonymous. The EU new GDPR demands that stored data on
people in the EU undergo either an anonymisation or a pseudonymisation process.
Pseudonymisation vs anonymisation
Pseudonymisation is a procedure by which personally identifiable information fields within a data record are replaced by
one or more artificial identifiers, or pseudonyms. A single pseudonym for each replaced field makes the data record less
identifiable while remaining suitable for data analysis and data processing. Pseudonymised data can be restored to its
original state with the addition of information which then allows individuals to be reidentified
Anonymised data can never be restored to its original state.
Data Protection — Territorial Scope

The Data Protection Act 2018 comes with the extended jurisdiction of the GDPR—it applies to all companies processing
the personal data of data subjects residing in the EU, regardless of the company’s location. It applies to the processing of
personal data by controllers (a controller is the entity that determines the purposes, conditions and means of the
processing of personal data) and processors (the processor is an entity which processes personal data on behalf of the
controller) in the EU, regardless of whether the processing takes place in the EU or not.
Data Protection — Penalties

Organisations in breach of GDPR can be fined up to 4% of annual global turnover or €20 Million (whichever is
greater).
This is the maximum fine that can be imposed for the most serious infringements e.g., not having sufficient customer
consent to process data or violating the core of Privacy by Design concepts.
It is important to note that these rules apply to both controllers and processors – meaning ‘clouds’ are not exempt
from GDPR enforcement.
Data Protection — Consent
Companies/Bodies are no longer able to use long illegible terms and conditions full of legalese.
The request for consent must be given in an intelligible and easily accessible form, with the purpose for data
processing attached to that consent.
Consent must be clear and distinguishable from other matters and provided in an intelligible and easily accessible
form, using clear and plain language.
It must be as easy to withdraw consent as it is to give it.
Data Protection — Subject Rights
Mandatory Breach Notification:
Where a data breach is likely to “result in a risk for the rights and freedoms of individuals”.
This must be done within 72 hours of first having become aware of the breach.
Data processors are also required to notify their customers, the controllers, “without undue delay” after first
becoming aware of a data breach.
Privacy by design:
Inclusion of data protection from the onset of the designing of systems, rather than an addition.
Right to access:
Data subjects have the right to obtain confirmation from the data controller as to whether or not personal data
concerning them is being processed, where and for what purpose.
The controller shall provide a copy of the personal data, free of charge, in an electronic format.
This comes with latest changes which is a dramatic shift to data transparency and empowerment of data subjects.
Right to be forgotten:
The right entitles the data subject to have the data controller erase their personal data, cease further dissemination of
the data, and potentially have third parties halt processing the data.
The conditions for erasure include the data no longer being relevant to original purposes for processing, or a data
subject withdrawing consent.
Difficulties for healthcare providers
Busy doctors already use popular smartphone apps for clinical communications, which store information online, e.g.:
Calendars,
• Dropbox,
• Google Drive,
• PDF creator apps.
With data in so many locations it will be difficult for trusts to identify where information is stored when faced with a
“subject access request” from a patient.
The most popular apps have the right to access all of the data on users’ devices, including:
• contact lists,
• calendars,
• email,
• SMS,
• instant messages,
• microphone,
• image gallery,
• camera,
• location
All such data on a clinicians’ device are potentially sensitive.
The use of WhatsApp by NHS trusts

When the Westminster terrorist attack happened in March 2017, one of the major problems was trying to get fast and
upto-date communication. Some doctors therefore started using WhatsApp messenger not only to organise shifts with
colleagues but also to deal with major incidents through group messaging. In the following two London incidents—the
London Bridge terrorist attack and the Grenfell Tower fire—all communication has been through the WhatsApp group. It
meant, that everyone had an idea of what was going on, who was needed where, and where the patients were moving
around the hospital.
Concerns about WhatsApp’s popularity in healthcare

Although messages are encrypted in transit, that doesn’t mean they’re private. Messages can easily be read on a lost or
stolen phone. A photograph sent through the app will immediately be downloaded into the recipient’s smartphone photo
library unless that setting is manually switched off. All messages are stored on a server in the US, which means they’re
not compliant with EU data protection legislation and the GDPR. To solve this, new messaging systems have been
proposed specifically for healthcare providers.
Data protection when sharing medical images

Genuine anonymisation of clinical digital images taken using a smartphone or mobile device is more complex than
simply cropping a photo and deleting or omitting the name or patient number. Digital photos have supplementary records
embedded, including the date, time, and geographical coordinates of the image, and the make and model of the device
used to take them.
Electronic Healthcare Records

• Storing patient records;
• Combine data from multiple sources;
• Share data fast and securely;
• Guide clinicians in entering data;
• Reduce costs and errors;
• Improve patient outcomes
Picture Archiving and Communication Systems - PACS
Archived in DICOM image format. The system allows you to access remote images and to combine modalities.
Advantages:
• Rapid access
• No duplicates (i.e. copies on every computer)
• No lost images
• Enhances analysis
• Better collaboration and ease of sharing
Disadvantages:
• High (initial) cost

• User training
Research Archive (eXtensible Neuroimaging Archive Toolking)
Research Ethics
- of Human and Animal participants (or tissue, data)
Studies require approval of a Research Ethics Committee in order to safeguard the participant’s dignity, rights, safety and
wellbeing. The committee will look at if the research is justified, does research comply with legislations and law, whether
risks outweigh benefits, will the research be completed (due to lack of funding etc.), and if the researchers are qualified to
carry out the research.
Helsinki Declaration (for human medical research)

Ethical principles developed by the World Medical Association (WMA)
• Respect for the individual

• Right to make informed decisions
• Special considerations for the vulnerable
• Subject welfare takes precedence over science and society
• Ethics takes precedence over law and regulation
Principles:
• Research based on solid scientific basis;

• Careful assessment of risks and benefits;
• Reasonable likelihood of benefit;
• Conducted by suitable investigators;
• Use approved protocols;
• Independent ethical review by an appropriate committee;
• Information regarding study should be publicly available.
Digital Image and Signal Processing
Data are represented by numbers and sequences of numbers. If we know how these data are organised, we can process
and interpret them in a meaningful way e.g. display a picture made up of pixels. In general, data are processed to extract
information e.g. does a patient have a heart abnormality?
Real world: analogue Signals

Analogue signals are:
 Continuous measurements of a physical quantity (e.g. temperature)

 Represented mathematically by continuous functions and are defined everywhere (e.g. for all points in time or
space)
 Do not rely on software or algorithms e.g.
thermometer only relies on the analogue
indicator or a watch that works by calibration
 e.g. Electrocardiogram (ECG) signal
Digital world: discrete signals
 Digital signals are stored in computers as arrays
of numbers and are represented mathematically
by discrete functions
 Digital signal processing
o Uses software and algorithms
o Offers more flexibility
 e.g. MRI scan
*discrete functions are not defined everywhere; we only have them at certain points and also we naturally convert them
into a binary value.
Converting signals from the real to the digital world: sampling

 Sampling is the process of measuring a signal at pre-set intervals (e.g. regular time points)
 In practice, the process of sampling is performed by electronic hardware called an analogue to digital converter
(ADC)
 The sampling period, Ts = time span between two neighbouring samples
o It should be long enough to allow digital conversion (quantization) by the electronic circuits
 𝑥[𝑛] is a discrete function that represents a sampled signal

o The values of 𝑥[𝑛] are samples of 𝑥(𝑡) at multiples of T, (e.g. 0, 0.25, -0.14, …)
 Variables 𝑥, 𝑡𝑛, and 𝑇s are real numbers (e.g. 𝑥 = 9.7V, 𝑡𝑛 = 4.23s, 𝑇s = 1.2s
 𝑛 is an integer (e.g. 𝑛 =0 or 𝑛 =5, etc)
 The sampling frequency (or sampling rate), 𝑓𝑠 = 1/ 𝑇s (e.g. 𝑓𝑠 = 100 Hz)
o Samples should be collected fast enough so that the original signal can be reconstructed
Converting signals from the real to the digital world: quantization
 Sampling converts an analogue signal into a set of values that specify the signal amplitude at pre-set
intervals
 Quantization converts the signal amplitude into one of a discrete set of values (codes)
Quantization error = original signal amplitude – quantized value
Information loss
Information loss may be caused if we don’t take enough samples as this therefore means we cannot be sure of what we
are measuring. This may also lead to aliasing: an effect that causes different signals to become indistinguishable (i.e.
aliases of one another) when sampled. Aliasing also refers to the distortion or artifact that results when the signal
reconstructed from samples is different from the original continuous signal.
The signal spectrum

Fourier’s Theorem: any periodic function can be expressed as a sum of sine and cosine waves with different frequency,
amplitude, and phase. In general, all signals contain different frequency components of different amplitudes and phase,
which can be represented by a signal spectrum.
Nyquist sampling theorem/criterion

 Tells us when aliasing will occur when sampling
 Aliasing is avoided when the sampling frequency is at least twice the highest frequency component of the signal
f s ≥2 f max∨f s ≥ 2× Bandwidth
 When f s is limited (e.g. due to hardware limitations), we can use an anti-aliasing filter to remove frequencies
above f s=f /2 so that the Nyquist criterion is satisfied
 When the Nyquist theorem/criterion is satisfied, the original (analogue/continuous) signal can be perfectly
reconstructed (i.e. recovered) from the sampled signal without distortion or error
Applying the Nyquist Criterion:
For a compact disc (CD), digital audio f s=44.1 kHz as humans cannot hear above 20kHz
fs
Sampling has the effect of a low-pass frequency filter with a frequency cut-off at Hz
2
Signal Filtering
 Filtering changes the nature of a signal in some way e.g. by removing certain components
 Filters can:
o Be implemented in hardware or software
o Be analogue or digital
o Be defined in the time (or spatial) domain or the frequency domain
 Filtering in the time (or spatial) domain is done using convolution
 Filtering in the frequency domain makes use of the Fourier Transform
Introduction to Sampling Theory

Sampling theory provides a mathematical framework to describe sampling to link continuous (analogue) and discrete
(digital) functions (signals).
It allows us to understand the Nyquist Criterion, filtering and a wide range of other signal processing operations.
Convolution as a mathematical operator

Definition (for one-dimensional, continuous functions, f and g
∞
h ( t )=f ⊗ g=f ∗g= ∫ f ( τ ) . g ( t−τ ) dτ
−∞
i.e. the convolution formula can be described as the area under the function f(τ) weighted by the function g(−τ) shifted by
amount t. As t changes, the weighting function g(t − τ) emphasizes different parts of the input function f(τ).
Commutativity: f ⊗ g=g⊗ f
Associativity:
 f ⊗ ( g ⊗ h ) =( f ⊗ g)⊗ h
 ( I ⊗ G 1¿ ⊗ G 2⊗G 3 ⊗ …=I ⊗(G 1⊗G 2 ⊗G 3 ⊗…) I is an imagine and Gn defines
filter masks
Linear filtering itself is not an associative operation, i.e. f ( f ( X ; G ) ; G ) ≠ f ( X ) ; f (G;G)¿
Implication: Serial linear image filtering is often easier and faster if we pre-compute a single filter kernel/mask compared
with applying a series of filtering operations and storing intermediate filtered images.
Distributive for addition/subtraction:
 f ⊗ ( g ± h )=(f ⊗ g)±( f ⊗ h)
Scaling:
 kf ⊗ g=f ⊗kg=k ( f ⊗ g ) k is a scalar

constant
Identity:
 f ⊗ δ=f , where δ =Dirac delta function

Derivatives:
d dg
 ( f ⊗ g )=f ⊗
dx dx
The Fourier Transform

The Fourier Transform is a different representation of
data that allows you to represent a signal in the
time/spatial domain, as a spectrum in the frequency
domain. This will tell you how much energy at a certain
frequency is contained in the signal.
∞ ∞
F ( s ) =∫ f ( x ) e dxf ( x )= ∫ F ( s ) e
−2 πisx 2 πisx
dx
−∞ −∞
Where s is frequency.
The Fourier transform uses complex numbers.
Complex numbers
Polar form:
x=r cos θ y=r sin θ
→ z=r cos θ+ir sin θ=r ¿ ¿
Exponential form:
iθ
cos θ+ isin θ=e
iθ
→ z=r e
The Fourier Transform is always symmetrical (from

the way it is defined)
Width in frequency domain (“bandwidth”)

is inversely proportional to width in spatial
(or time) domain
Basic Properties of Fourier Transform:

Notation and definition:
∞
F ( f )= ∫ f ( t ) exp (−i 2 πst ) dt=F( s)
−∞
Linearity: F ( k 1 f +k 2 g )=k 1 F ( f ) + k 2 F ( g )
Scaling:
F ( f ( ax ) )= ( 1a ) F ( as )
Convolution:
F ( f∗g )=F ( f ) × F (g) i.e. convolution in time/space = Multiplication in Freq. domain
F ( f × g )=F ( f )∗F (g)
Understanding sampling in the frequency domain:
Understanding Aliasing in the Frequency Domain
Avoiding aliasing: The Nyquist Criterion.
If the Nyquist criterion is not satisfied, adjacent copies overlap, and it is not possible in general to discern an
unambiguous X (f ). Any frequency component above f s /2 is indistinguishable from a lower-frequency component,
called an alias, associated with one of the copies.
Reconstructing a sampled signal by filtering:
Once we have taken a sample, we can always

reconstruct our original signal. We can do this in
spatial or frequency domain.
We can apply an anti-aliasing filter to avoid

overlapping of frequencing spectra to occur. We do
this by using a low pass filter which will only accept
the frequencies below a certain point.
As well as sinc function filters in the spatial domain,

you can also get other reconstruction filters as shown
below:
Reconstruction involves convolution and is

equivalent to interpolation:
sinc = perfect reconstruction

II = NN (nearest neighbour)
interpolation
Λ =Linear interpolation
How would we perform these operations on a

digital computer?
We need to:
Represent signals as discrete functions

Use discrete versions of convolution, the Fourier Transform, etc.
This therefore means we will need to:
Think of signals as a series of sample values

Use the square bracket notation introduced earlier, e.g. x [n]
Discrete convolution
∞
Definition: y [ n ] =x [ n ] ⊗h [ n ] = ∑ x [k ] ∙h [n−k ]
k=−∞
y [ 0 ] =∑ xk ∙ h [ 0−k ] =…+ x [−1 ] h [ 0−(−1 ) ]+ x [ 0 ] h [ 0−0 ] + x [ 1 ] h [ 0−1 ] + …

¿ …+ x [ −1 ] h [ 1 ] + x [ 0 ] h [ 0 ] + x [ 1 ] h [ −1 ] + …
¿ …+ ( 0 )( 1 ) + ( 3 ) ( 2 ) + ( 4 ) ( 0 ) +…=6
y [ 1 ] =∑ xk ∙ h [ 1−k ] =…+ x ( 0 ) ( 0 ) + ( 3 ) ( 1 )+ ( 4 )( 2 ) +…=11
y [ 2 ] =∑ xk ∙h [ 2−k ] =…+ x ( 3 ) ( 0 ) + ( 4 ) (1 ) + ( 5 )( 2 ) +…=14
Image
filtering in 2D
By using a kernel, we can smooth the image, reducing noise by
replacing each data point by some kind of local average of
surrounding data points (at the expense of reducing fine image
detail). Therefore, pixel averaging will cause a blurred image. If
we were to increase the value of the central pixel of the kernel, it
will cause a sharpening filter.
Other filters such as an edge detection (sobel filter) may be used.

Notes about linear filtering:
• Linear filtering can be expressed as a convolution

Remember to flip the filter kernel/mask
But many common linear filtering kernels/masks are symmetric and therefore Gflip = G
Sum the products of kernel (mask) and sample (image) values
The Dirac Delta function 𝛿

The Dirac Delta function is a very useful mathematical tool, represented graphically as a ‘spike.’ It takes
the value of infinity at t=0 and zero everywhere else, and has the property that:
∞
∫ δ ( t ) dt=1
−∞
δ can be defined as the limit of a symmetric ‘spikey’ function with an integral of 1 (e.g. a Gaussian), as
the width approaches zero.
Useful properties:
 Sifting: f ( x ) δ ( x−a )=f (a)δ ( x −a)
 Shifting: f ( x )∗δ ( x−a ) =f ( x−a)
Sam
pling using the Dirac Comb (or Shah) function
∞
Definition: III ( x )= ∑ δ (x −n T s)
n=−∞
*also known as a sampling function or impulse train
Handling Image Boundaries

To choose size of kernel, you must decide what kind of convolution you want and choose overlap region to be considered
at edges.
Filtering at image boundaries
Choose how pixel values are defined outside the image. Possible options:
Constant
Specified value (e.g. zero – sometimes called zero padding)
Use value at image edges/corner (so-called replication)
Varying
Mirror image, periodic, etc.
Gaussian Filter
( ) (( ))
2 2
1 x y
G ( x , y ; σ x , σ y )= exp − + 2
2 π σx σ y 2
2σx 2σy
σ x , σ y = standard deviation (or ‘width’) in each

pixel row/column direction
If σ x =σ y we have a symmetric (or “isotropic”)
Gaussian
Mask is only defined over a limited range (e.g. 20 ×
20 pixels) ➔ normalisation (sum of mask
values = 1)
Can use Gaussian smoothing to reduce noise but
comes at the expense of sharpness of the image.
The higher the value of sigma, the more
smoothing there is
Separable Filters
Some filter masks/kernels are separable. This means that:
An N-D mask/kernel can be expressed as the convolution N 1D filter masks/kernels (i.e. K +k 1 ⨂ k 2 … ⨂ k n )

The filtering operation can be implemented as a series of N 1D filtering operations using the previous output as input
to the next filter
Consider filtering an M × N image using a U ×V mask/kernel.
How many arithmetic operations per pixel?
UV multiplications + (UV −1) additions

→time = ( 2 UV −1 ) t
(assuming the time, t, to perform an addition or a multiplication
is the same)
If we perform the filtering operation using 2 x 1D filters, then: time
= ( 2 U−1 ) t+ ( 2 V −1 ) t=2 ( U +V −1 ) t which is significantly
less.
Gaussian Filtering Summary
• Linear, separable filter

• Simple to implement in both Fourier and spatial domains
• Acts as a low-pass filter (with cut-off proportional to 1/σ)
– Smoothing / noise reduction
– Degree of smoothing (blurring) is determined by σ
• Image smoothing (blurring) is useful as a pre-processing step in a wide
variety of image processing/analysis tasks:
– e.g. to remove noise before computing image gradients
– before resampling at a lower image resolution
• (Advanced: Fundamental to multiscale and directional image analysis techniques; parallels to the operation of
the human visual system)
• For a larger sigma, it is better to store values in frequency domain as you need to represent fewer values and vice
versa
Filtering – Fourier Transform
Discrete Fourier Transform (DFT) Definition
The discrete Fourier transform (DFT) converts a finite sequence of equally-spaced samples of a function into a same-
length sequence of equally-spaced samples of the discrete-time Fourier transform (DTFT), which is a complex-valued
function of frequency i.e. it is used to derive a frequency-domain (spectral) representation of the signal.
In digital signal processing, the function is any quantity or signal that varies over time, such as the pressure of a sound
wave, a radio signal, or daily temperature readings, sampled over a finite time interval (often defined by a window
function).
In image processing, the samples can be the values of pixels along a row or column of a raster image. The DFT is also
used to efficiently solve partial differential equations, and to perform other operations such as convolutions or multiplying
large integers.
T
f [n ] [ f 1 , f 2 , f 3 , … f N ] Discrete function 1D (samples represented as a column vector)
T
F [ m ] =DTF ( f ) =[ F1 , F 2 , F 3 , … F N ] DFT of f - also a discrete function (column vector)
( )
N
1 −2 πi ( m−1 )( n−1 )
F m= ∑
N n=1
f n exp
N
, 1≤ m ≤ N
( 2 πi ( n−1N ) ( m−1) ) ⟵inverse DFT

N
f n= ∑ F u exp
m=1
Filtering in the frequency domain (1D example)
Fast Fourier Transform

• Cooley-Tukey Fast Fourier Transform (FFT) algorithm:
– Popular, computationally efficient implementation of the DFT
– Uses a ‘divide and conquer’ strategy (i.e. a recursive implementation)
– This means that N=2n; Pad with zeros if N ≠2 n
• Fast computation of the DFT
– Computation time is proportional to N.log2(N) where N is the number of samples in
the signal (rather than N2 using a “naive” implementation of the DFT)
– If N = 2n (see above), the computation time is proportional to N.log2 (N)
=2n.log2(2n) = n.2n
– Example:
o if n = 10, N = 210 = 1024 samples
o The speed-up from using the FFT would then be 1,048,576 / 10,240 ≈ 102
(i.e. over 100 times as fast!)
Filtering (DFT) in the frequency domain (2D example)
Definition of 2D DFT and its inverse:
M N
( (
F u ,v = ∑ ∑ f m ,n exp −2 πi
m =1 n=1
( m−1 ) ( u−1 ) ( n−1 ) ( v −1 )
M
+
N ))
, 1≤ u ≤ M , 1≤ V ≤ n
¿ note indices start at 1 , not 0
( ( ))
M N
1 ( m−1 )( u−1 ) ( n−1 )( v−1 )
f m ,n =
MN
∑ ∑ F u ,v exp 2 πi M
+
N
⟵ inverse DFT
u=1 v=1
Properties of 2D DFT:
Can be expressed as a matrix transformation

DFT and inverse DFT have very similar definitions
use same algorithm for both (e.g. FFT algorithm)
Linearity
Separability: 2D DFT can be expressed as 2 x 1D DFTs
Filtering:
Low-
pass filtering removes rapidly changing pixel intensities = blurring!

High-pass filtering preserves rapidly changing image intensities (regions of sharp contrast) = Enhancement of edge
and ridge features
Gaussian filtering is effectively a low-pass filter
o Separable and (D)FT is also a Gaussian, so easy to compute

Mphy0020 Notes

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mphy0020 Notes

Uploaded by

Copyright:

Available Formats

MPHY0020: Computing in Medicine (Dr.

 Data storage (very important in medicine)

Arithmetic Logic Unit (ALU)

Opcode: operation selection code – an enumerated value that specifies the

Representation of Numeric Values

decimal: 8373 = 3x100 + 7x101 + 3x102 + 8x103

Most programming languages support various types of date, such as:

Central Processing Unit (CPU)

Pipelining: Pipelining is the process of accumulating instruction from the

Motherboards and buses

 The chipset works with CPU and controls memory, buses

 A bus is a communication system that transfers data

*flash drives are a type of EEPROM.

Graphical Processing Unit (GPU), cooling and peripherals

Grid computing is an extreme example whereby it is a heterogeneous cluster and tend to be

 IaaS (Infrastructure as a Service): IaaS provides an on-demand infrastructure to organizations on a pay-as-you-

Software: file formats, compression etc.

FOSS (free and open source software)

Full backup: backs up everything

Medical Image Formats

Computational Statistics in Medicine

Data collection (including design of experiments)

Evidence based medicine

Clinical decisions should be based on evidence

For example, if the random variable (= a variable whose values depend on

The sampling distribution depends on:

underlying probability distribution of the population

True positive (TP): Sick people correctly diagnosed as sick

Accuracy = all correct/ all

However, accuracy can be misleading.

Sensitivity and Specificity

Sensitivity = TP/(TP + FN) = TP/Total Positives

Specificity = TN/(TN +FP) = TN/Total Negatives

Confusion or error matrix is a table layout that allows

True positive fraction: TPF — the same as sensitivity

False positive fraction: FPF = FP / (TN+FP) = 1-TNF

The fractions can be defined using conditional probabilities:

Receiver Operating Characteristic (ROC)

Pearson Correlation Coefficient

The correlation coefficient, r, can take a range of values from +1 to -1:

r = 0 indicates that there is no association between the two variables;

Image realignment (within a series of scans of the same subject)

All vectors are augmented with a "1" at the end,

[ ⃗1y ]=[ 0 …A 0|b1] [ ⃗x1]

What constitutes personal data?

Range of personal identifiers:

used fairly, lawfully and transparently

Anonymised data can never be restored to its original state.

Data Protection — Territorial Scope

Data Protection — Penalties

The use of WhatsApp by NHS trusts

Concerns about WhatsApp’s popularity in healthcare

Data protection when sharing medical images

Electronic Healthcare Records

• High (initial) cost

Helsinki Declaration (for human medical research)

• Respect for the individual

• Research based on solid scientific basis;

Real world: analogue Signals

 Continuous measurements of a physical quantity (e.g. temperature)

Converting signals from the real to the digital world: sampling

 𝑥[𝑛] is a discrete function that represents a sampled signal

Converting signals from the real to the digital world: quantization