Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Some Projects Proposal

Project 1: - Zero-shot Learning with Deep Neural Networks for


Object Recognition

The core problem of supervised learning lies in the ability to generalize the prediction of a model
learned on some samples seen in the training set to other unseen samples in the test set. A key
hypothesis is that the samples of the training set allow a fair estimation of the distribution of the test
set, since both result from the same independent and identically distributed random variables. Beyond
the practical issues linked to the exhaustiveness of the training samples, such a paradigm is not
adequate for all needs, nor reflects the way humans seem to learn and generalize. Despite the fact that,
to our knowledge, nobody has seen a real dragon, unicorn or any beast of the classical fantasy, one
could easily recognize some of them if met. Actually, from the single textual description of these
creatures, and inferring from the knowledge of the real wildlife, there exist many drawings and other
visual representations of them in the entertainment industry. Zero-shot learning (ZSL) addresses the
problem of recognizing categories of the test set that are not present in the training set. The categories
used at training time are called seen and those at testing time are unseen, and contrary to classical
supervised learning, not any sample of unseen categories is available during training. To compensate
this lack of information, each category is nevertheless described semantically either with a list of
attributes, a set of words or sentences in natural language. The general idea of ZSL is thus to learn
some intermediate features from training data, that can be used during the test to map the sample to
the unseen classes. These intermediate features can reflect the colours or textures (fur, feathers, snow,
sand...) or even some part of objects (paws, claws, eyes, ears, trunk, leaf...). Since such features are
likely to be present in both seen and unseen categories, and one can expect to infer a discriminative
description of more complex concepts from them (e.g., some types of animals, trees, flowers...), the
problem becomes tractable.
Some Approaches to tackle ZSL are: -
1. Ridge Regression Approaches: - One approach to ZSL is to view this task as a regression
problem, where we aim to predict continuous attributes from a visual instance. Linear regression is a
straightforward baseline.
2. Triplet-loss Approaches: -Triplet loss methods make a more direct use of the compatibility
function. The main idea behind these methods is that the compatibility of matching pairs should be
much higher than the compatibility of non-matching pairs.
3. Generative Approaches: - Generative methods applied to ZSL aim to produce visual samples
belonging to unseen classes based on their semantic description; these samples can then be used to
train standard classifiers. Partly for this reason, most generative methods directly produce high-level
visual features, as opposed to raw pixels – another reason being that generating raw images is usually
not as effective.

Project 2: -Variational Single Image Dehazing forEnhanced


Visualization
Due to the existence of turbid medium in the atmosphere,images of outdoor scenes usually suffer
from degradationsuch as reduction of intensity contrast. This kind of degradationis particularly serious
in bad weather conditions, i.e., abnormalatmospheric optic phenomena such as haze, fog or rain.
Forinstance, as a common weather phenomenon, haze may producean albedo effect, leading to
scattering of the reflected light, andas a result, the objects in long-shot regions are less visible.
Effective haze removal (or dehazing) methods, which are ur-gently needed in practical applications,
have attracted increasingattention in imaging science recently. Compared with hazy images, haze-free
images have some obvious merits:
a) they are visually clearer and more vivid;
b) they are more suitable for practical applications such as feature extraction, change detection
and pattern recognition. However, since the thickness of haze usually depends on the depth of
the scene which is unknown, the dehazing task is extremely challenging.

Project 3:- Real-World Super-Resolution of Face-Images from


Surveillance Cameras
Face Super-Resolution (SR) is a special case of SR which aims to restore High-Resolution (HR) face
images from their Low-Resolution (LR) counterparts. This is useful in many different applications
such as video surveillance and face enhancement. Current State-of-the-Art (SoTA) face SR methods
based on Convolutional Neural Networks (CNNs) are able to reconstruct images with photo-realistic
appearance from artificially generated LR images. However, these methods often assume that the LR
images were downsampled with bicubic interpolation, and therefore fail to produce good results when
applied to real-world LR images. This is mostly due to the fact that the downsampling operation with
bicubic downscaling changes the natural image characteristics and reduces the amount of artifacts.
Hence, when using algorithms trained with supervised learning on such artificial LR/HR image pairs,
the reconstructed images usually contains strong artifacts due to the domain gap.

Project 4:- Unsupervised Novel View Synthesis from a Single


Image
Novel View Synthesis (NVS) aims at generating novel viewpoints of an object or a scene given only
one or few images of it. Given its general formulation it is at the same time one of the most
challenging and well studied problems in computer vision, with several applications ranging from
robotics, image editing and animation to 3D immersive experiences. Traditional solutions are based
on multi-view reconstruction using a geometric formulation. Given several views of an object, these
methods build an explicit 3D representation (e.g., mesh, pointcloud, voxels.) and render novel views
of it. A more challenging problem is novel view synthesis given just a single view of the object as
input. In this case multi-view geometry cannot be leveraged, making this an intrinsically ill-posed
problem. Deep learning, however, has made it approachable by relying on inductive biases, similarly
to what humans do. For instance, when provided with the image of a car, we have the ability to
picture how it would look like from a different viewpoint. We can do it because we have,
unconsciously, learnt an implicit model for the 3D structure of a car, therefore we can easily fit the
current observation to our mental model and use it to hallucinate a novel view of the car from a
different angle.
The underlying research question that we address in this work is: how can we learn to hallucinate
novel views of a certain object from natural images only?

Project 5:- Multimodal Machine learning approaches to detect


fake news in social Media.
Due to development in social media networking, fake news is appearing in a very huge amount day by
day for different political and commercial purposes and these fake news are getting spread in the
online world very frequently. This fake news has had a disturbing and negative effect on our lives
than before it was impacting our lives. Therefore, it is a big challenge and very important to detect the
real or fake news. For the identification of fake news, there are some drawbacks or weakness in the
current ways so, their inefficiency to get a shared representation of textual + visual i.e., multimodal
knowledge. Automatic classification of fake news is a tough task, since true model-based on the study
for the news i.e., fake or real for both textual and visual is still an open problem, and to solve this
problem few existing models can be applied. With a complete analysis and examination of a fake
data, numerous appropriate explicit features are determined from text and images, and both of these
are used in the fake news.

Project 6:- Face Recognition using 3D CNNs


The area of face recognition is one of the most widely researched areas in the domain of computer
vision and biometric. This is because, the non-intrusive nature of face biometric makes it
comparatively more suitable for application in area of surveillance at public places such as airports.
The application of primitive methods in face recognition could not give very satisfactory
performance. However, with the advent of machine and deep learning methods and their application
in face recognition, several major breakthroughs were obtained. The use of 2D Convolution Neural
networks(2D CNN) in face recognition crossed the human face recognition accuracy and reached to
99%. Still, robust face recognition in the presence of real world conditions such as variation in
resolution, illumination and pose is a major challenge for researchers in face recognition.so an input to
the 3D CNN architectures for capturing both spatial and time domain information from the video for
face recognition in real world environment can be considered.

Project 7:-Fine-tuning Handwriting Recognition systems with


Temporal Dropout
Dropout is a regularization technique that is often used to prevent neuron co-adaptation which reduces
overfitting and improves network generalization. The technique consists of randomly disabling
individual neurons by masking out their activations (output weights). A generalization of dropout,
DropConnect, works by randomly disabling individual neuron weights. Dropout was primarily
intended to be applied to intermediate layers of the network.
Most handwriting recognition systems (working at the line level) use a sliding window approach for
feature extraction. The window is swiped along the horizontal axis, usually in the direction of writing
(e.g., left to right for Latin handwriting), to extract specific characteristics of the script. At each
position of the window, features are extracted as a vector, and then concatenated to a sequence. This
spatio-temporal mapping from image pixels to feature sequence is achieved through an encoder
network (usually a CNN). The features sequence is then modeled by a decoder network (usually an
RNN).

Project 8:-Semantics-Guided Neural Networks for Efficient


Skeleton-Based Human Action Recognition
Human action recognition has a wide range of application scenarios, such as human-computer
interaction and video retrieval. In recent years, skeleton-based action recognition is attracting
increasing interests. Skeleton is a type of well structured data with each joint of the human body
identified by a joint type, a frame index, and a 3D position. There are several advantages of using the
skeleton for action recognition. First, skeleton is a high level representation of the human body with
the human pose and motion abstracted. Biologically, human is able to recognize the action category
by observing only the motion of joints even without appearance information. Second, the advance of
cost effective depth cameras and pose estimation technology make the access of skeleton much easier.
Third, compared with RGB video, the skeleton representation is robust to variation of viewpoint and
appearance. Fourth, it is also computationally efficient because of low dimensional representation.
Besides, skeleton-based action recognition is also complementary to the RGB-based action
recognition.

Project 9:-Detect-and-Track: Efficient Pose Estimation in Videos


In recent years, visual understanding, such as object and scene recognition has witnessed a significant
bloom thanks to deep visual representations. Modeling and understanding human behaviour in images
has been in the epicenter of a variety of visual tasks due to its importance for numerous downstream
practical applications. In particular, person detection and pose estimation from a single image have
emerged as prominent and challenging visual recognition problems. While single-image
understanding has advanced steadily through the introduction of tasks of increasing complexity, video
understanding has made slower progress compared to the image domain. Here, the preeminent task
involves labeling whole videos with a single activity type. While still relevant and challenging, this
task shifts the focus away fromone of the more interesting aspects of video understanding, namely
modeling the changes in appearance and semantics of scenes, objects and humans over time.

Project 10:-Fast Online Object Tracking and Segmentation


Tracking is a fundamental task in any video application requiring some degree of reasoning about
objects of interest, as it allows to establish object correspondences between frames [38]. It finds use in
a wide range of scenarios such as automatic surveillance, vehicle navigation, video labelling, human-
computer interaction and activity recognition. Given the location of an arbitrary target of interest in
the first frame of a video, the aim of visual object tracking is to estimate its position in all the
subsequent frames with the best possible accuracy. For many applications, it is important that tracking
can be performed online, while the video is streaming. In other words, the tracker should not make
use of future frames toreason about the current position of the object. This is the scenario portrayed by
visual object tracking benchmarks, which represent the target object with a simple axisaligned or
rotated bounding box. Such a simple annotation helps to keep the cost of data labelling low; what is
more, it allows a user to perform a quick and simple initialisation of the target. Similar to object
tracking, the task of semi-supervised video object segmentation (VOS) requires estimating the
position of an arbitrary target specified in the first frame of a video. However, in this case the object
representation consists of a binary segmentation mask which expresses whether or not a pixel belongs
to the target. Such a detailed representation is more desirable for applications that require pixel-level
information, like video editing and rotoscoping. Understandably, producing pixel-level estimates
requires more computational resources than a simple bounding box. As a consequence, VOS methods
have been traditionally slow, often requiring several seconds per frame. Very recently, there has been
a surge of interest in faster approaches. However, even the fastest still cannot operate in real-time.

Project 11:-Sentiment Analysis Techniques of Twitter Data


Sentiment analysis is also known as “opinion mining” or “emotion Artificial Intelligence” and alludes
to the utilization of natural language processing (NLP), text mining, computational linguistics, and bio
measurements to methodically recognize, extricate, evaluate, and examine emotional states and
subjective information. Sentiment analysis is generally concerned with the voice in client materials;
for example, surveys and reviews on the Web and web-based social networks. As a rule, sentiment
analysis attempts to determine the disposition of a speaker, essayist, or other subjects in terms of
theme via extreme emotional or passionate responses to an archive, communication, or occasion. The
disposition might be a judgment or assessment, full of emotion (in other words, the passionate
condition of the creator or speaker) or an expectation of enthusiastic responses (in other words, the
impact intended by the creator or buyer). Vast numbers of client surveys or recommendations on all
topics are available on the Web these days and audits may contain surveys on items such as on clients
or fault-findings of films, and so on. Surveys are expanding rapidly, on the basis that individuals like
to provide their views on the Web. Large quantities of surveys are accessible for solitary items which
make it problematic forclients as they must peruse each one in order to make a choice. Subsequently,
mining this information, distinguishing client assessments and organizing them is a vital undertaking.
Sentiment mining is a task that takes advantage of NLP and information extraction (IE) approaches to
analyse an extensive number of archives in order to gather the sentiments of comments posed by
different authors. This process incorporates various strategies, including computational etymology
and information retrieval (IR). The basic idea of sentiment investigation is to detect the polarity of
text documents or short sentences and classify them on this premise. Sentiment polarity is categorized
as “positive”, “negative” or “impartial” (neutral). It is important to highlight the fact that sentiment
mining can be performed on three levels as follows:

• Document-level sentiment classification: At this level, a document can be classified entirely as


“positive”, “negative”, or “neutral”.

• Sentence-level sentiment classification: At this level, each sentence is classified as “positive”,


“negative” or unbiased.

• Aspect and feature level sentiment classification: At this level, sentences/documents can be
categorized as “positive”, “negative” or “non-partisan” in light of certain aspects of
sentences/archives and commonly known as “perspective-level assessment grouping”.

You might also like