Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Module 1

Training the network


The neural network finds characteristics that link grayscale images with their
colored versions. Suppose you had to color black and white images - but with
restriction that you can only see nine pixels at a time. You could scan each
image from the top left to bottom right and try to predict which color each pixel
should be. First, you look for simple patterns: a diagonal line, all black pixels,
and so on. You look for the same exact pattern in each square and remove the
pixels that don’t match. You generate 64 new images from your 64 mini filters.
If you scan the images again, you’d see the same small patterns you’ve already
detected. To gain a higher-level understanding of the image, you decrease the
image size in half. You still only have a three-by-three filter to scan each image.
But by combining your new nine pixels with your lower-level filters you can
detect more complex patterns. One-pixel combination might form a half circle,
a small dot, or a line. Again, you repeatedly extract the same pattern from the
image. This time, you generate 128 new filtered images. In this training the
network process, the input is given in the format of grayscale image where by
pre-processing we are able to get the prediction of the output.

Module 2
Feature extraction using Convolutional Neural Network
Therefore, a method of extracting images must be able to detect duplicate
images and filter them out. Secondly, there are some parts of the cartoon that we
might not want to include in the data set, namely the opening and closing
credits. The reason for removing the images depicting the credits is their
uniformness and their general prevalence in the set. If we were to leave the
credits frames in the data set, it may have resulted in increased tendency of the
network to, for example, use shades of blue (since the credits are mostly a
uniformly colored blue screen) even in images that were not credit frames. To
ensure that only reasonably differing images were extracted, we used simple
squared mean thresholding, acting on consecutive frames extracted from the
image. In this module we are concentrating on the feature extraction using CNN
that is Convolutional Neural Network, where, the pre-processed grayscale
images are been taken as input and then they are sent to the convolutional layer
and after that process it’s sent to the max pooling layer. Once this process is
being done it then goes towards to the fully connected layer and at last the
feature maps which are termed to be the output are obtained.
This effectively means that any noticeable difference in the two consecutive
frames would result in the newer frame being accepted as a unique frame.
Comparing against the last accepted frame rather than the last frame allows
small changes to accumulate between accepted images. This approach can be
disadvantageous due to random noise present in the images, but working under
the assumption that the amount of noise present in frames is roughly constant
and similar in most frames, we can disregard it by accounting for it in the
threshold value. This simple method accomplishes the desired effect of filtering
out duplicate frames and only preserving unique images. There are several
instances where images are repeated due to the nature of the image rather than
frame difference (scenes that repeat the same sequences several times for
example) but those are not common enough to warrant handling such cases
specifically and those duplicates are allowed into the final data set. Following
the duplicate filtration, every frame was resized to 256 × 256 image size and
saved into a lossless PNG file format, as using a compressed file format such as
JPEG would result in loss of precision, which is particularly observable around
object edges on this dataset and negatively impairs performance of the models.

Module 3
Converting the LAB Image into RGB format
The neural model operates in a trial-and-error manner. It first makes a random
prediction for every pixel. Based on the mistake for every pixel, it really works
backward through the model to enhance the future extraction. It starts adjusting
for the conditions that generate the most important errors. In this case, it’s
whether or not to color the object or now no longer and to locate different
objects. Then it colors all of the object’s brown. It’s the color that this is
maximum much like all different colors, as a result generating the smallest
error. Because maximum of the training records is pretty similar, the model
struggles to distinguish among special objects. It will alter special tones of
brown, however fail to generate extra nuanced colors. In this module the
conversion of the LAB image into the RGB format takes place in, where the
Lightness of the pixels are taken as the input and then they are sent to the pre-
trained network where the neural networks predict both the channels then by
combining both the prediction and the given inputs the results are been out.
Here you can get the RGB colorized image format from the LAB image.

You might also like