Professional Documents
Culture Documents
MPHS EXAM 13/06/16 - Prof. Luca Benini Name: Nmat:: Load 4'b1000
MPHS EXAM 13/06/16 - Prof. Luca Benini Name: Nmat:: Load 4'b1000
EXERCISE I
LOAD
4’b1000
EN
4 4 GREATER
A
FFs >
UPDATE
CLK
RST_N 4 EN
4
+ 4
1 4
FFs
OUT
CLK
RST_N
Solution
The circuit is composed by an input stage made of 4Flip Flops, an elaboration block (comb logic ) and an output stage
(4FFs). On the rising edge of the clock, the input data A is sampled if LOAD==1. Then the Registered data A_reg on the
output of the first stage is elaborated in the comb logic blocks. The comparator checks if the A_reg > 8 and if this
check is true, it set the multiplexer to perform the Accumulation operation in the Outputs stage. If the comparator
check is false, the A_reg is moved in the FFs. Note that, if UPDATE is 1’b0, the last stage FFs are not updated.
1'b1:
begin // Simple sampling
OUT <= A_reg;
end
endcase
end
end
endmodule
EXERCISE II
A microcontroller has to acquire the electrical activity of the human heart over a period of time using an analog
sensor placed on the skin. The selected microcontroller has the following possible operative modes:
(2V - 3V)
Low Power Mode 2 75 µA@2V ( Only the Ultra low power Clock 32KHz)
The analog signal of the heart has a frequency of around 300Hz and the sensor gives an output in the voltage range
0V- 0.5V
- is supplied by a battery.
- The sensor consumes 1mA @3V in the period the time the ADC is acquiring the data, and 0mA in the time the
microcontroller is in sleep mode.
- The system is sampling continuously the sensor for 10seconds and sleeping for 990seconds.
2. The power consumption of the system (Microcontroller+Sensor) in the active period (using the normal mode) and
in the sleeping period according to the DC. Afterward, evaluate the energy consumed for a single period T =
Tactive+Tsleep
4. Minimize the energy consumption during the acquisition using one of the two low power modes possible and
comment the decision. Evaluate the lifetime extension and the minimal sampling frequency.
Solution
2)
3) Lifetime estimation.
4) The best and only option to reduce the energy consumption is to use the low power mode 1 as the ADC
can be clocked also with the secondary clock during the acquisition. The LPM 2 it is not possible to be used
as all the clock sources for the ADC would be NOT active. Moreover it is possible to use 2V as voltage to
reduce even more the power consumption.
In this condition
main()
{
int i, niter = 1000000;
double x, y, z, pi;
double count; // number of points in the 1st quadrant of unit circle
for (i=0; i<niter; i++) // repeat for a very large number of iterations
{
x = rand(); // Select random value for x
y = rand(); // Select random value for y
z = (x*x)+(y*y);
if (z<=1) count++; // if x^2+y^2 <= 1 this point belongs to the circle
}
pi = count / niter * 4;
}
1a) Describe how the main loop can be parallelized. Explain which data needs to be declared as private and
which as shared and why.
1b) Would you use dynamic scheduling or static scheduling for this loop? Explain why.
1c) Executing this loop in parallel is subject to a race condition. Explain how this can be protected. Could a
reduction clause be used? If so, how?
Solution
1a) count should be declared as shared, since every parallel thread might increase its value. x, y and z are
private to each thread (and their value across iterations should remain independent).
main()
{
int i, niter = 1000000;
double x, y, z, pi;
double count; // number of points in the 1st quadrant of unit circle
pi = count / niter * 4;
}
1b) Dynamic scheduling is useful when loop iterations contain different amounts of work. In this case, assigning the
same number of iterations to each thread (what happens with static scheduling) might lead to load imbalance. On
the other hand, dynamic scheduling implies higher runtime overhead than static scheduling.
The loop considered in this example has very little variance (the iterations for which the condition (z<=1) evaluate
to true execute one additional increment operation), thus dynamic scheduling might not bring very significant
improvements compared to static scheduling. On the other hand, the loop contains sufficient work to amortize
runtime overheads, thus its use is not harmful.
1c) When executing the loop in parallel several threads might try to update count at the same time. To avoid the
race condition, i) reading count, ii) computing its new value and iii) writing it into memory should be made
atomically (i.e., this sequence of operations must not be interrupted by other threads). The simplest way to do so in
OpenMP is to protect the update with a critical section:
The race condition can also be avoided by protecting the update with the reduction clause as follows:
-------------------------------------------------------------------------------------------------------------