IVA Question Bank

Image and Video Analytics CCS349
Question Bank
Unit 1:
1. What is the difference between image analysis (or computer vision) on one side
and computer graphics on the other side?
2. Define: (a) Spatial resolution, (b) Spectral resolution, (c) Radiometric resolution,
(d) Time resolution
3. Define: (a) Additive noise, (b) Multiplicative noise, (c) Gaussian noise, (d)
Impulsive noise, (e) Salt-and-pepper noise
4. Discuss the various factors that influence the brightness of a pixel in an image.
5. Define (a) The Euclidean metric (b) The city block metric (c) The chessboard
metric
6. For each uppercase printed letter of the alphabet, determine the number of lakes
and bays it has. Derive a look-up table that lists the candidate letters, given the
number of lakes and bays. Comment on this quality of this feature as an identifier
of letters.
7. Mention some factors which make computer vision difficult.
8. What is low-level image processing?
9. What is high-level image understanding?
10. Compare low-level image processing and high-level image understanding?
11. What are the 4 possible levels of image representation. Explain each of them?
12. Give examples of low-level operations
13. What information about the image does high level data represent?
14. What are the tasks involved in 3D vision from a user’s point of view?
15. What is meant by ‘continuous image’, ‘discrete image’ and ‘digital image’?
16. Why is ‘brightness’ considered a good physical quantity to represent the
continuous image function?
17. What is an intensity image?
18. Explain the process of image digitization.
19. Explain the process of sampling.
20. Explain the process of quantization.
21. If 4 bits are used represent each pixel brightness level, what is the total number
of brightness levels in the image?
22. What are the 3 conditions to be satisfied for distance metric?
23. Explain (a) Euclidean distance DE, (b) city block or Manhattan distance D4, (c)
Chessboard distance D8.
24. What is pixel adjacency. Explain 4-neighbors and 8-neighbors.
25. Draw the neighborhood representation for 4-neighborhood and 8-neighborhood.
26. Explain the distance transform.
27. What is an ‘edge’ in the context of an image?
28. What is a ‘crack edge’?
29. What is a ‘border’ in an image? What is an inner border and an outer border?
30. Distinguish between ‘edge’ and ‘border’.
31. What is a convex region in an image?
32. Explain ‘convex hull’, ‘deficit of convexity’, ‘lakes’ and ‘bays’
33. What are topological properties?
34. Explain brightness histogram of an image
35. What is entropy?
36. What are the different levels of image data representation?
37. Explain (a) iconic images (b) segmented images (c) geometric representation (d)
relational models, in the context of image data representation
38. Draw a representation of a 3x3 binary image
39. What is a multi-spectral image? How is it represented?
40. What are hierarchical image data structures?
41. What is a co-occurrence matrix?
42. For a given ‘image matrix’ with specified offset, construct its co-occurrence
matrix.
43. What is an integral image? How is it useful?
44. For a given ‘image matrix’, construct its integral image.
45. Demonstrate the usefulness of an integral image in summing up a region of
pixels.
46. What is a ‘chain’ or ‘chain code’?
47. Construct a chain code for given image using 4-neighborhood/8-neighborhood.
48. What is run-length coding?
49. Construct a run-length code for given image?
50. What are topological data structures? Explain region adjacency graph and
relational data structures.
51. What are hierarchical data structures?
52. Explain (a) M-pyramid, (b) T-pyramid, (c) Quad-trees
Unit 2:
1. What is the main aim of image pre-processing?

2. What are the 4 categories of image pre-processing?
3. Give examples of situations in which brightness transformations, geometric
transformations, smoothing, edge detection, and/or image restorations are
typically applied.
4. What is the main difference between brightness correction and gray-scale
transformation?
5. Explain position dependent brightness correction?
6. What is gray-scale transformation? Explain the different types of gray-scale
transformations?
7. Explain (a) Contrast stretching, (b) Gray-level slicing (c) logarithmic gray scale
transformation.
8. Explain the rationale of histogram equalization.
9. Explain why the histogram of a discrete image is not flat after histogram
equalization.
10. Consider the image given in Figure 5.3a. After histogram equalization (Figure
5.3b), much more detail is visible. Does histogram equalization increase the
amount of information contained in image data? Explain.
11. What are the two main steps of geometric transforms?
12. What is the minimum number of corresponding pixel pairs that must be
determined if the following transforms are used to perform a geometric
correction? (a) Bilinear transform (b) Affine transform
13. Give a geometric transformation equation for (a) Rotation (b) Change of scale (c)
Skewing by an angle
14. What information does the Jacobian determinant of a geometric transformation
convey?
15. If the Jacobian determinant of a geometric transformation equals 1, what does it
mean?
16. If the Jacobian determinant of a geometric transformation equals 0, what does it
mean?
17. Consider brightness interpolation—explain why it is better to perform brightness
interpolation using brightness values of neighboring points in the input image
than interpolating in the output image.
18. Explain the principles of nearest-neighbor interpolation, linear interpolation, and
bicubic interpolation.
19. What is local pre-processing? Explain the 2 categories of local preprocessing.
20. Explain why smoothing and edge detection have conflicting aims.
21. Explain why Gaussian filtering is often the preferred averaging method.
22. Explain why smoothing typically blurs image edges.
23. Name several smoothing methods that try to avoid image blurring. Explain their
main principles.
24. Describe the ‘averaging’ method of image smoothing.
25. Write the averaging masks for a 3x3/4x4/5x5 etc. neighborhoods.
26. Explain ‘averaging with limited data validity’.
27. Explain averaging according to ‘inverse gradient’
28. Explain ‘averaging’ using ‘rotating mask’.
29. Explain why median filtering performs well in images corrupted by impulse noise.
30. Give convolution masks for the following edge detectors: (a) Roberts (b) Laplace
(c) Prewitt (d) Sobel (e) Kirsch Which ones can serve as compass operators?
List several applications in which determining edge direction is important.
31. Explain why subtraction of a second derivative of the image function from the
original image results in the visual effect of image sharpening.
32. What are LoG? How do you compute LoG? How is it used?
33. Propose a robust way of detecting significant image edges using zero-crossings.
34. Explain why LoG is a better edge detector than Laplace edge detector.
35. Explain edge detection based on zero-crossings of the second derivative (i.e.
Marr-Hildreth edge detection).
36. Explain the notion of scale in image processing.
37. Explain ‘Canny edge detection’ in detail.
38. Explain the importance of hysteresis thresholding and non-maximal suppression
in the Canny edge detection process. How do these two concepts influence the
resulting edge image?
39. What are the 3 criteria used in ‘Canny edge detection’?
40. Explain the principles of noise suppression, histogram modification, and contrast
enhancement performed in adaptive neighborhoods.
41. What are parametric edge models?
42. What is meant by ‘facets’ in parametric edge models?
43. Explain ‘edge detection’ in multi-spectral images.
44. What is the advantage of local preprocessing in frequency domain.
45. Describe low-pass, high-pass and band-pass filtering and the effect of each of
these on an image.
46. Explain the procedure for homomorphic filtering. What kind of noise does it
remove from an image?
47. Write the 3x3 convolution masks (kernels) for line detection. Explain the process
of line detection using these kernels.
48. What are the common causes of image degradation?
49. What is image restoration? Describe the 2 broad categories of image restoration
techniques.
50. Explain the principles of image restoration based on (a) Inverse convolution (b)
Inverse filtration (c) Wiener filtration. List the main differences among the above
methods.
51. Give image distortion functions for
1. (a) Relative camera motion
2. (b) Out-of-focus lens
3. (c) Atmospheric turbulence
Unit 3:
1. Explain the concept of anchor boxes.

2. What is non-max suppression?
3. How are bounding boxes important for object detection?
4. How are R-CNN, Fast R-CNN, and Faster R-CNN different and what are the
improvements?
5. What is object detection?
6. Compare and contrast object detection vs object localization vs object
classification.
7. Mention some applications (use cases) for object detection.
8. Mention some deep learning architectures used for object detection.
9. Explain the sliding window approach for object detection.
10. Explain the bounding box approach for object detection.
11. What is meant by binary classification and multilabel (multiclass) classification?
12. What is intersection over union (IoU)? Explain how it is used to distinguish
between useful bounding boxes and not useful bounding boxes.
13. What is non-max suppression? Explain the steps involved in non-max
suppression.
14. What is a good choice for the value of IoU threshold when implementing non-
max suppression?
15. What are anchor boxes? What information do anchor boxes capture?
16. Explain the use of anchor boxes in object detection.
17. Explain R-CNN in detail.
18. What is meant by region proposals in R-CNN?
19. What is the main reason for R-CNN being slow in terms of its computation speed.
20. What are the main drawbacks (challenges) with R-CNN?
21. Explain Fast R-CNN in detail.
22. What improvements are implemented in Fast R-CNN as compared to R-CNN?
23. What are the advantages of Fast R-CNN over R-CNN?
24. Explain Faster R-CNN in detail.
25. What is a region proposal network (RPN) in Faster R-CNN?
26. What are the algorithmic improvements in Faster R-CNN as compared to Fast R-
CNN?
27. Compare the computational complexities (speed) of R-CNN, Fast R-CNN and
Faster R-CNN.
28. Explain YOLO algorithm for object detection in detail.
29. What are the salient features of YOLO?
30. What is loss function in YOLO? What are the various losses that contribute to the
loss function?
31. Explain the architecture of YOLO algorithm in detail.
32. Mention some challenges (drawbacks) associated with YOLO. How are these
addressed in YOLO v2 and YOLO v3.
33. What are the key algorithmic features that make YOLO algorithm significantly
faster than R-CNN, Fast R-CNN and Faster R-CNN
Unit 4:
1. What is face recognition? How is it different from object detection?

2. What is the difference between face detection and face recognition?
3. Mention some challenges associated with face recognition.
4. Write some applications of face recognition.
5. Compare and contrast face authentication vs face recognition.
6. What are the 4 basic steps involved in face recognition?
7. List some of the deep learning based methods available for face recognition.
8. What is the concept of facial alignment?
9. Explain the DeepFace solution by Facebook for face recognition.
10. What is 3D frontalization? What is its purpose in Deepface face recognition
algorithm?
11. Explain Delaunay triangulation. What is its purpose in Deepface solution.
12. Deepface architecture uses locally-connected layers in addition to fully-
connected layers. What is the purpose/advantage of locally connected layers?
13. What features in Deepface implementation make the Deepface network sparse?
14. What verification metrics are used for face recognition? Explain the verification
metrics.
15. What are local binary patterns? How are they computed? What is the use of local
binary patterns in Deepface face recognition?
16. Explain the FaceNet architecture for face recognition in detail.
17. What is the key principle of FaceNet algorithm for face recognition?
18. What is meant by ‘embeddings’ in FaceNet?
19. Explain the concept of ‘triplet loss’ in FaceNet. What is the intuition behind triplet
loss.
20. FaceNet algorithm represents images in a Euclidean space. What is the
advantage of such a representation?
21. What principle is adopted for triplet selection, i.e. to select anchor, positive and
negative images in FaceNet.
22. Explain the ZFNet CNN architecture used in FaceNet implementation.
23. What are deconvolution layers in ZFNet? What purpose do they serve?
24. What is inception model? What is the core principle of inception model?
25. Describe the implementation of inception module – naïve version.
26. Describe the implemention of inception module with dimensionality reduction.
27. What metrics are used to evaluate the performance of face recognition models.
28. Explain the terms (i) True Accept (ii) False Accept (iii) Validation rate and (iv)
False Accept rate.
29. What is gesture recognition?
30. What are the key components (steps) involved in gesture recognition.
31. Mention some applications of gesture recognition.
Unit 5
1. What is meant by video analytics?

2. What are some use cases of video analytics?
3. What information can be extracted from video analytics?
4. Explain vanishing gradient problem.
5. Explain exploding gradient problem.
6. Intuitively, a deep neural network should perform better and generate a better
model. In practice, it deeper networks perform worse. Explain why?
7. Explain the terms ‘global minima’ and ‘local minima’ in loss function.
8. What effect does vanishing gradient have on the initial layers (layers towards the
left) of a deep neural network?
9. Why do initial layers undergo very less training (or training stops) in a deep
neural network?
10. What are the indicators (signs) that a deep convolutonal neural network is
suffering from vanishing gradient problem?
11. What are the possible solutions to avoid vanishing gradient problem?
12. What are the indicators (signs) that a deep convolutonal neural network is
suffering from exploding gradient problem?
13. What are the possible solutions to avoid exploding gradient problem?
14. Explaint the ResNet architecture in detail.
15. What is the concept of ‘skip connections’ in ResNet?
16. Explain how ‘skip connections’ address vanishing gradient problem.
17. What are the two types of ‘residual blocks’ that are used in ResNet?
18. What is an ‘identity’ residual block and ‘convolutional’ residual block. When is
each one used?
19. What are the 2 methods to combine inputs and outputs of a residual block.
20. When combining the input and output of a residual block using ‘addition’ what
potential problem can arise? How is the problem resolved?
21. When combining the input and output of a residual block using ‘concatenation’
what potential problem can arise?
22. Why is ‘addition’ preferred over ‘concatenation’ in a residual block, when using
large number of residual blocks?
23. What is an inception network?
24. What is the main principle of inception network?
25. Describe the implementation of inception module – naïve version.
26. Describe the implemention of inception module with dimensionality reduction.
27. What is the function of 1x1 convolution block in inception module with
dimensionality reduction.
28. Explain the working of 1x1 convolution?
29. Explain how 1x1 convolution can be used to control the depth (number of
channels) in the output of a convolution layer.
30. Explain the GoogleNet architecture in detail.
31. What is the role (purpose) of auxilliary classification modules in GoogleNet
architecture.
32. What improvements are made in Inception v2 and v3 as compared to the first
version.
33. Explain the term ‘top-k accuracy’ or ‘top-k error’.
34. For a multi-class classifier with ‘k’ different classes of objects, what is the value of
top-k accuracy?
35. For the following convolutional layer implementation, compute the total number of
operations.
36. Compute the total number of operations for the following implementation which
uses 1x1 convolution.
NOTE: This question bank is prepared to help you inquire and explore the core
concepts of this course and develop a clear understanding of the topics. You can
search for answers in the text book, lecture notes and supporting material
provided or on the internet. Happy exploration!
In addition to the above listed questions, make sure to prepare topics from your
studio sessions, involving Python implementations.
Do not use ChatGPT to find your answers. It will neither serve the purpose
of your learning/understanding nor benefit you in the final exam.

IVA Question Bank

Uploaded by

Copyright:

Available Formats

You might also like

IVA Question Bank

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IVA Question Bank

Uploaded by

Copyright:

Available Formats

Image and Video Analytics CCS349

1. What is the main aim of image pre-processing?

1. Explain the concept of anchor boxes.

1. What is face recognition? How is it different from object detection?

1. What is meant by video analytics?

You might also like