Final IEEE Penang

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 74

REAL TIME TASK SCHEDULING

ON FPGAs

Dr. Amlan Chakrabarti, SMIEE


Associate Professor and Coordinator
A.K. Choudhury School of Information Technology
University of Calcutta
(amlanc@ieee.org)
Outline

 Real time Tasks Scheduling


 Real time issues for Embedded Systems
 Real time scheduling techniques for Multicore systems
 New real time task scheduling methodologies for Reconfigurable
Platforms
 Advance design thoughts
 Conclusion
Real-time System

 A real-time system is a system whose specification includes


both logical and temporal correctness requirements.
 Logical Correctness: Produces correct outputs.
Can by checked, for example, by Hoare logic.
 Temporal Correctness: Produces outputs at the right time.
It is not enough to say that “brakes were applied”
You want to be able to say “brakes were applied at the
right time”
 Key property
 Predictability on timing constraints
Types of Real Time Systems

 Hard real time systems


 Must always meet all deadlines
 System fails if deadline window is missed
 Soft real time systems
 Must try to meet all deadlines
 System does not fail if a few deadlines are missed
 Firm real time systems
 Result has no use outside deadline window
 Tasks that fail are discarded
Periodic, Sporadic, Aperiodic Tasks

 Periodic task:
 We associate a period pi with each task Ti.
 pi is the interval between job releases.

 Sporadic and Aperiodic tasks: Released at arbitrary times.


 Sporadic: Has a hard deadline.
 Aperiodic: Has no deadline or a soft deadline.
APERIODIC TASK
SCHEDULING
Notation:

Quality criteria would be minimize the maximum


lateness, where lateness= finish time - deadline

With this objective task can be scheduled via EDD


(Earliest due date) and EDF (Earliest Deadline First)
Task constraints

 Deadline constraint

 Resource constraints
 Shared access
 Exclusive access

 Precedence constraints
 T1  T2: Task T2 can start executing only after T1 finishes its execution

 Fault-tolerant requirements
 To achieve higher reliability for task execution
 Redundancy in execution

7
Real-Time Workload

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

= job release Job (unit of work)


a computation, a file
= job deadline
read, a message
transmission, etc
Attributes
• Job is released at time 3. Resources required to
• Its (absolute) deadline is at time 10. make progress
• Its relative deadline is 7. Timing parameters
• Its response time is 4.
Real-Time Task

 Task : a sequence of similar jobs


 Periodic task (p,e)
Its jobs repeat regularly
Period p = inter-release time (0 < p)
Execution time e = maximum execution time (0 < e < p)
Utilization U = e/p

0 5 10 9
15
REAL TIME ISSUES IN EMBEDDED SYSTEM
What is an Embedded System

 Its not a PC!


 Most computer in the world do not have a keyboard and screen
 The vast majority of computers in the world are small chips that
are hidden inside all kinds of products
System Design

Microprocessor

Flexibility
?
Reconfigurable
SoC

ASIP and
Configurable p

ASIC

Performance
12/58
Embedded Systems
 An embedded system is nearly any computing system (other than a general-purpose
computer) with the following characteristics
 Single-functioned
 Typically, is designed to perform predefined function
 Tightly constrained
 Tuned for low cost
 Single-to-fewer components based
 Performs functions fast enough
 Consumes minimum power
 Reactive and real-time
 Must continually monitor the desired environment and react to changes
 Hardware and software co-existence

13
Modern day Car as ES

 Mission: Reaching the destination safely.

 Controlled System: Car.

 Operating environment: Road conditions.

 Controlling System
- Human driver: Sensors - Eyes and Ears of the driver.
- Computer: Sensors - Cameras, Infrared receiver, and Laser telemeter.

 Controls: Accelerator, Steering wheel, Break-pedal.

 Actuators: Wheels, Engines, and Brakes.


14
Car example (contd)

 Critical tasks: Steering and breaking.

 Non-critical tasks: Turning on radio.

 Cost of fulfilling the mission → Efficient solution.

 Reliability of the driver → Fault-tolerance needs to be


considered.
15
Target Platforms for ES

 Microcontroller-based systems
 DSP processor-based systems
 ASIC technology
 FPGA technology

16
Integration in System Design
Embedded Software Tools

Integration of Functions CPU Logic + Memory +


CPU IP + Processors +
Embedded Software Tools
RocketIO
Embedded Software Tools (Virtex)

FPGA + Logic Design Tools


FPGA Memory + IP +
High Speed IO
(4K & Virtex)
Programmable Systems
I/O usher in a new era of system
Logic Design Tools
design integration
possibilities
Memory

Logic Design Tools

Time 17
What are FPGAs
 FPGAs are programmable digital logic chip
 We can program them to do almost any digital
function
 Here's the general flow of working with FPGAs:
 We use a computer to describe a "logic function" that
we want. We might draw a schematic
 We compile the "logic function", using a software
provided by the FPGA vendor.
 That creates a binary file that can be downloaded into
the FPGA
 Binary file can be downloaded to the FPGA by
connecting cable
 That's it! our FPGA behaves according to our "logic
Xilinx Spartan 3e & Altera Cyclone Board
function"
Why FPGAs are Favorable for ES
 Customization
 Complete flexibility to select any combination of peripherals and controllers
 New, unique peripherals that can be connected directly to the processors bus
 Component and cost reduction
 Multiple component systems can be replaced with a single FPGA
 One can reduce board size and inventory management, both of which will save design time and cost
 Hardware acceleration
 Ability to make trade off between hardware and software to maximize efficiency and performance
 Algorithm with software bottleneck, a custom co-processing engine can be designed in the FPGA specially for that
algorithm
 Main processor and co-processor architecture makes the total Hardware system much accelerated
Embedded Design in an FPGA

 Embedded design in an FPGA consists of the following:


 FPGA hardware design
 C drivers for hardware
 Software design
 Software routines
 Interrupt service routines (optional)
 Real Time Operating System (RTOS) (optional)
Multitasking in modern systems

 Software tasks: implemented in microprocessors


 Hardware tasks: implemented in field-programmable gate arrays (FPGAs)
 Task management requirements in any multitasking system
 Task scheduling
 Mechanism to begin task execution on a system resource
 Task suspension
 Preempt task and saving execution state --- context save (CS)
 Task switching
 Replacing preempted task with another task
 Task restoration
 Resume preempted task from saved state --- context restore (CR)
Reconfigurable Devices and Real-Time
 Great deal of attention on reconfigurable FPGA for embedded and real-time systems

 Pro: HW logic is (often) more predictable than SW executing on complex


microarchitectures
 Pro: HW logic is more efficient (per unit of chip area/power consumption) compared to
GP CPU on parallel math crunching applications – somehow negated by GPU nowadays
 Cons: Programming the HW is more complex

22
Context Switching Analogy
Processor Context Switch FPGA Context Switch
uP FPGA

Software task 1
Region 1 Hardware task 1
OS Software task 2 uP or
Region 2 Hardware task 2
custom

Stack
+
MMU HW
Region 3 Hardware task N
Software task N

• Multiple tasks execute concurrently


• One task executes at a time
– Time multiplex tasks
• Multiple regions time multiplex tasks
– Use context save and context restore
• One stack for all tasks
for task switching
– Stack saves context for
• Task relocation
task restore
– Challenging!
• No task relocation
• Leveraging partial reconfiguration
• No reconfiguration of uP
can be advantageous…
Reconfiguration on FPGAs
 Benefits to system designers and functionality
 Run-time hardware adaptation via time multiplexing of resources
 Reduced area/power requirements
 Two types of reconfiguration: Full and partial reconfiguration
 Full reconfiguration: U
ca nle
Entire FPGA configured with full bitstream with fixed hardware task set n ss
ch be s sta

e a te
 Reconfiguration halts all tasks --- lengthy task switching time da ckpo ved
ta v
ca int o ia
 No context save/restore of hardware tasks --- tasks restart execution ptu f
HW task C2 re

HW task B1

HW task A1
Full bitstream 1
Configuration
Port
C1

Full bitstream 2
B2

A2

Execution and state of all tasks is lost during full reconfiguration!


Partial Reconfiguration (PR)
 PR divides the FPGA fabric into two regions ICAP Module A
 Static region: fixed functionality, never reconfigured after

Mem Controller
Module B

Embedded
processor
initial configuration at startup
Module C
 Reconfigurable region: multiple PR regions (PRRs)
 PRRs execute PR modules (PRMs) (hardware tasks) Module D

Static region Reconfig. region


• PR compared to full reconfiguration
– Dynamic, on-the-fly PR of individual PRRs
• No interruption of static region or other PRRs!
– Uses partial bitstreams --- smaller than full bitstreams
• Faster reconfiguration time
• *May* require a bitstream for each PRM-to-PRR mapping
– Increased flexibility
– Increased task throughput/performance
– Enhances hardware multitasking
• Requires context save and context restore
for effective task switching
Task context switches in FPGA

How to switch between tasks A & B?


(1) stop task A, save context A
(2) reconfigure task B, restore context
B, run B
What is the context?
content of all state-holding elements
i.e. flip-flops, BRAMs, LUT-RAMs
Which approaches exist?
(i) task-specific preemption (scan-
chains)
(ii) FPGA-specific bitstream readback
Scan Chain Method

Add scan-chains to hardware tasks


extract/insert context from/to scan-chain
Drawbacks

modifications at source code- or netlist-


level
considerable hardware overhead
Advantages

approach indepependent of FPGA


architecture
only captures/restores used resources
Bitstream Readback Method

Read back bitstream of a partial region


using internal configuration access port
(ICAP)
read contents of all flip-flops, BRAMs,
etc.
Drawbacks

bitstream format tailored to FPGA family


time-inefficient (capture/restore all
resources)
Advantages

no modifications at task-level


no hardware overhead
Basic Requirements for RTOS
 Requirements
 Multi-threading and preemptibility
 Thread priority
 Thread synchronization mechanisms
 Priority inheritance
 Predefined latencies
Example of RTOSes for FPGA
 RTOSes available in market are VxWorks by Wind River Systems,
 ThreadX by Express Logic, Nucleus Plus by Mentor Graphics,
 XILKERNEL by Xilinx and many more
XILKERNEL

 Small, robust and modular


kernel
 RTOS
 Interface with POSIX API
 Works with both MicroBlaze
&
PowerPc
 Highly integrated with EDK
 lightweight (16Kb-32 Kb)
FPGA Based system with Xilkernel
Threads Processing using Xilkernel on
FPGA
 Using Xilkernel Thread synchronization can be done by Semaphore and Mutex.
 Thread communication can be carried out using Shared memory and message
queue.
 Real time resource sharing protocol like (PIP) can be implemented.
 Though RMS ( real time scheduling) can’t be applied here.
 But using virtualization RMS can be implemented

 Sangeet Saha; Amlan Chakrabarti;Ranjan Ghosh., "Exploration of Multi-thread


Processing on XILKERNEL for FPGA Based Embedded Systems," Control Systems
and Computer Science (CSCS), 2013 19th International Conference (Bucharest,
Romania) on , vol., no., pp.58,65, 29-31 May 2013
Real time scheduling techniques for Multicore
systems
Multiprocessors

 A multi-core processor is one which combines, two or


more independent processors into a single package, often
a single integrated circuit.

 Most high-end computers today have multiple processors


 In a busy computational environment, multiple processes
compete for processor time
 More processors means more scheduling complexity
Real-Time Multiprocessor Scheduling
 Real-time tasks have workload deadlines
 Hard real-time = “Meet all deadlines!”

 Problem: Scheduling periodic, hard real-


time tasks on multiprocessor systems. This
is difficult.
s !!
y
ea
34
Scheduling Three Tasks
 Example: 2 processors; 3 tasks, each with 2 units of
work required every 3 time units
deadline
job release

Task 1

Task 2

Task 3

time = 0 1 2 3

35
Global Schedule
 Example: 2 processors; 3 tasks, each with 2 units of work
required every 3 time units

Task 1 migrates between processors

CPU 1

CPU 2

time = 0 1 2 3

36
The Big Goal
 Design an optimal scheduling algorithm for periodic task sets on
multiprocessors

 A task set is feasible if there exists a schedule that meets all


deadlines

 A scheduling algorithm is optimal if it can always schedule


any feasible task set

37
Necessary and Sufficient Conditions
 Any set of tasks needing at most
1) 1 processor for each task ( for all i, ui ≤ 1 ) , and
2) m processors for all tasks (  ui ≤ m)
is feasible

 Status: Solved
 pfair (1996) was the first optimal algorithm

38
New real time task scheduling methodologies for
Reconfigurable Platforms
Why Involving Reconfigurable Platforms

 Reconfigurable FPGAs can be used as a performance efficient back up platform


for real time tasks in a complex safety-critical system
 This backup platform is activated whenever one or more processors in the system
fail
 Assume the responsibility of executing the real-time tasks that were previously
running on the failed processors
 So, it is essential to have well defined scheduling methodologies, feasibility
criteria and admission control mechanisms for real-time task sets in these
reconfigurable platforms
 Such that all timing constraints may be met while taking care of the
reconfiguration overheads and also allowing efficient resource utilization
Reconfigurable FPGA

 How to use it: dynamic design


 Implement I/O and interconnects as fixed logic on FPGA.
 Use the rest of the FPGA area for reconfigurable HW tasks.

 HW Task
 Period, deadline, wcet as SW tasks.
 Additionally has an area requirement.
 Requirement depends on the area model.
41
Area Model
 2D model
 HW Tasks with variable width and height. 2 3
4
1

• 1D model
– HW Tasks have variable width, fixed height.
1 2 3 4 – Easier implementation, but possibly more
fragmentation.

5/ 18
Assumptions
 Processor Identity: All processors are equivalent

 Task Independence: Tasks are independent

 Task Unity: Tasks run on one processor at a time

 Task Migration: Tasks may run on different processors at different times


 Overhead: Context switch overhead (Full and partial reconfiguration overhead)

In practice: built into WCET estimates

43
Concept of Deadline Partitioning

Task 1

Task 2

Task 3

Task 4

44
Deadline Partitioning (Contd..)

CPU 1

CPU 2

45
Working Principle of DPSFR
Example:
 Let there be 6 tasks, having weights 24/60, 36/90, 24/60, 72/90, 72/90,
36/90
 m=4, tfrg=6 ms, now 1st time slice length=60 ms
 The shares to be executed within this time slice is shr1= shr2= shr3=
shr6=24 and shr4= shr5=48 and sum_shr=192
 Total available Context switch=└60*4-192/6*4┘=2
 Hence we may allow 2 frames in that time slice and at each of length
(60-6*2)/2=24
So, total number of required frames for all tasks
⌈24/24˥+ ⌈24/24˥+ ⌈24/24˥+ ⌈48/24˥+ ⌈48/24˥+ ⌈24/24˥=8
And total number of available frames 2*4=8. so the task set is schedulable
Example(contd..)
 Time : 0 6 (Full reconfiguration)
 6 30 (1st Frame )

T4 T5

T1 T2
 30 36 (Full reconfiguration)
 36 60 (2nd Frame)

T3 T5

T4 T6

If the share of any task increased by 1, then sum_shr=193 thus, Cfrg=1, scheduling
infeasible
DPSPR Working Principle
 DPSPR (Deadline Partitioning Scheduler for Partially Reconfigurable Systems)
 Context switching not a global event
 Localized to individual partitions
 Partial reconfiguration overhead is low than full reconfiguration over head

 Tasks are allocated in any order starting from 1st tile


 Such that sum of task shares along with the reconfiguration over head is less than ts r
Example:
 Consider same task set as that of DPSFR
 shr1= shr2= shr3= shr6=24, shr4= 49, shr5=48 and sum_shr=193, tprg=1ms

Scheduling Scenario using DPSPR


Promising Results
 Simulation based experimental results
reveal that the DPSFR can achieve fair
resource utilization up to 80% (Workload)
with TRR less than 10% and DPSPR can
utilize resource up-to 90% with 1%TRR
( Task Rejection Rate)

 Sangeet Saha, Arnab Sarkar, Amlan


Chakrabarti, “Scheduling Dynamic Hard
Real-Time Task Sets on Fully and Partially
Reconfigurable Platforms”, published at
IEEE Embedded System Letters (Vol 7, pp
23-26) 2015
Full Reconfiguration time vs TRR
What if task has dependencies?

Task Graph

Infeasible
52
Criticality of Task Placement

Task Graph

Feasible
53
Heterogeneous Implementations

 FPGA contain heterogeneous components:


 Memory Blocks
 Hardware Multipliers
 Embedded Processors
 Placement should consider multiple hardware implementations of
tasks
 Problem: Resources are limited and available in specific locations
on FPGA
54
Advance Design Thoughts
 S/W tasks multitasking by proper
scheduling scheme among hard & soft
core
 H/W task multitasking via Partial
Reconfiguration inside the FPGA logic
 Overall system utilization be
maximized and reliability should be
maintained
 Lower use of Si area with lower
consumption of power
S/W H/W Partitoning
 Given a set of tasks T1, T2,………,
Tn in Multiprocessor system-
on-chip (MPSoC)
 Tasks can be executed in either
PE1 (as S/W task in GPP) or
PE2 (as H/W task in FPGA) Allocator
according to their criteria
 So, require a Proper allocation
scheme!!!
Problem Formulation
Task can have both a software and a hardware configuration
Optimal allocation problem can be represented by:

Where ci =1,task executed in software


di = 1,task executed in hardware

Minimize the total software utilization on the GPP


Transform this optimization problem to another allocation optimization problem.
Problem Formulation (contd.)
 Objective function was to maximize hardware utilization

n
Where, A is FPGA
area
n
HSi is task size

The problem resembles with well-known 0-1 Knapsack problem.


We solve the aforementioned optimization problem (0 1 KP) via
i. Greedy Heuristic
ii.Dynamic Programming
Goals
 If tasks allocation is allowed to be done in offline (some
embedded systems)
 Does the use of optimal algorithm(Dynamic
programming) makes any significant increase in the
allocation scheme?
 Can the system resource be better utilized?
 How and in which conditions the greedy and dynamic
programming differs??
 Is there is a compromise in resource utilization for sake
of complexity??
Outcomes
 Greedy Vs Dynamic:
 In terms of achieved Utilization:
 When the task sizes are uniform and varies within a short range then greedy and
dynamic exhibits same performance
 When task size skewed in form of varies for large ranges then dynamic performs well
 When task size is smaller to FPGA then greedy performs similar to dynamic
programming
 When task size increases greedy’s performance decreases with respect to dynamic
programming
 It can be proved “If the size of task is less than 10% of FPGA area, then 2-step/ 3-
step Greedy Heuristic gives the utilization of 90% of Optimal Solution”
 In terms of Memory consumptions
 Dynamic programming takes much memory than greedy and as number of task
increases memory consumption increases both the cases
 In terms of running time:
 Dynamic programming takes much time to execute than greedy for same number of
tasks and as usual as the FPGA size or the number of tasks increases the running time
will also increases
Mixed Critical Task
Scheduling on
Reconfigurable
Hardware
Mix Criticalities

 Criticality is a designation of the level of assurance needed


against failure for a system component
 Combination of optimistic and pessimistic Worst Case
Execution Time(WCET) estimation in case of a 2-level
mixed-criticality system
LO mode :obtained by extensive experimentation
HI mode :cycle-counting under pessimistic assumptions under
rigorous conditions
Different sub-systems have different certification requirements
 Safety critical: certified by Certification Authorities
 Mission-critical: validated by design team
Example of a Mix-Critical
system

Anti-lock Brake System


(ABS)
 Drive/Brake :
considered as Safety
Critical or High Critical
Tasks
 Car Entertainment
System : considered as less
critical Tasks
Scheduling of Hardware task multiple variant
on FPGA

 Proposed by Marconi (2013)


 Each task has different hardware versions depending on the
desired performance (e.g. task T1 with its two hardware
versions T1’ and T1’’).
 Each version has its own speed with required reconfigurable
area for running it on FPGA. For example, with more
required area, version T1’’ runs faster than version T1’
 Choose the best hardware version on-the-fly among many
alternative versions in fulfilling its deadline constraint with
as small as possible of reconfigurable hardware resources
Example of Mix Critical
Scheduling with Multiple H/W
Ti L
i A C (Lo)variants
i Area(L C (Hi) Area(Hi) D
i i i
o)
T1 Hi 0 8 10 12 5 17
T2 Hi 0 14 6 21 3 18
T3 Hi 0 8 12 14 6 19

 Here Li denotes the level of criticality of the arriving tasks.


All task are assumed in “Safety Critical or High Critical ”
Optimistic Execution time, Ci (lo) where sufficient area is
provided to each task
Pessimistic Execution time, Ci (Hi) where least area is provided to
each task (worst case)
Di denotes Deadline
Ai arrival time
Scheduling Strategy
(using Optimistic execution time)
area 25
10 15 20
5
time
T1
5
T3
10

15 T2

20

25
Deadline Miss
Scheduling Strategy
Using pessimistic Execution time
10 area 15 20 25
5
time
5
T1
T3
10 T2

15

20

Deadline Miss
25
space
MIX Critical Scheduling
area
10 15 20
5
time
5
T1
T2 T3
10

15

20

25
Advance Design Thoughts
 Software components running on processors exhibit high flexibility but often poor
performance
 Hardware components placed on FPGA modules are of high performance but of low
flexibility and higher cost
 Making a framework to allow seamless mapping and scheduling of real time tasks on
these complex platform
 Low run time overhead, Power budgets and reliability are need to be satisfied
Scheduling a task on S/W or H/W is a crucial decision
 An improper execution would lead to deadline failure and more consumption of power
 So its serious concern of combined spatial-cum-temporal scheduling
If we conclude…………..

 It requires analysis of real time scheduling algorithms on heterogeneous


multiprocessor architectures.
 Develop a comprehensive simulation bed for real time task scheduling on
heterogeneous platforms so that various parameters like real-time constraints,
cache misses, power consumption etc. may be analyzed.
 We need a suitable dynamic and static strategy for a task to core assignment
in run time reconfigurable FPGA systems with processor core for real time
task scheduling.
 It requires execution of software task by processing elements (ARM or
MicroBlaze for Zynq platform) with load balancing and low minimum power
consumption.
Group Members

DR. Arnab Sarkar,Assistant Professor, Dept of CSE, IIT Guwahati


(mail : arnabsarkar@iitg.ernet.in)

Mr.Sangeet Saha, TCS RSP Fellow, AKCSIT, University of Calcutta


(mail:sasakc_s@caluniv.ac.in)
Acknowledgements

Tata Consultancy Services (TCS)


for all kinds of support and encouragement
to carry out this research work.
Embedded Systems and VLSI Design Research work
at School of IT, University of Calcutta (recent
Publications)

 Multi core SSL/TLS security processor architecture and its FPGA prototype design with automated
preferential algorithm, Elsevier Micro, 2015
 Error Resilient Secure Multi-gigabit Optical Link Design for High Energy Physics Experiment, VLSID 2016
 FPGA Based Novel High Speed DAQ System Design with Error Correction, ISVLSI 2015
 FPGA Implementation of High Speed Latency Optimized Optical Communication System Based on
Orthogonal Concatenated Code, ATS 2015
 Integrated chip and package co-analysis for early data-driven package bump & ball optimization on
Value-Tier Smartphone products, DAC 2016
 An Efficient Synthesis Method for Ternary Reversible Logic, ISCAS 2016
74

You might also like