EE20BTECH11035 AutoRegression

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Image Generation Using Autoregression Models

NAROPANTH SRIKAR RAO

IIT HYDERABAD

Introduction

Auto-Regression :

Autoregression, or AR, is a way to predict what will happen in the future based on what

happened in the past. It works by looking at how something has changed over time and using that

information to make predictions. This is useful in many areas like predicting stock prices, time

series, and even forecasting the weather. In Autoregression, the variable’s current value is

expressed as the linear combination of previous values with coefficients determined through

statistical analysis. Autoregression figures out the connections between past and future events,

helping us make informed guesses about what might happen next. Currently, we train

autoregression models for image generation.


The word Autoregression states the regression i.e. we predict a number or the target value based

on its previously generated target values and also from the input features.

Here’s a time-series equation that shows how variable at time t depends on its previous values.

𝑦𝑡 = 𝑐 + φ1 𝑦𝑡−1 + φ2 𝑦𝑡−2 + . . . + φ𝑝 𝑦𝑡−𝑝 + ϵ

Image Generation Application :

Unlike GAN’s, AutoRegression models provide sample diversity which means autoregressive

models capture the entire data distribution guaranteeing a diverse set of generated samples;

however, AutoRegression(AR) models tend to be limited to low-resolution images since memory

and computation requirements grow with the size of the image.

Let us consider an image as a sequence of pixel values. Now, the probability of each pixel value

is conditioned on previously generated pixels and the total probability would be the product of

all the conditional probabilities of pixels i.e


2
𝑛
𝑝(𝑥) = ∏ 𝑝(𝑥𝑖 | 𝑥1 , . . 𝑥𝑖−1)
𝑖 =1

Now, if we consider color images then each pixel contains (R,G and B) values. Then this process

would be similar to the previous one where we just order the R,G and B values. It means that
probability of R is calculated first and is followed by G and B. Note that in both the cases the

pixel matrix(n x n) or (n x n x 3) is flattened as a 1-D array and is given as input to the network.

𝑝(𝑥𝑖) = 𝑝(𝑥𝑖,𝑅| 𝑥<𝑖) 𝑝(𝑥𝑖,𝐺| 𝑥<𝑖 , 𝑥𝑖,𝑅) 𝑝(𝑥𝑖,𝐵| 𝑥<𝑖 , 𝑥𝑖,𝑅 , 𝑥𝑖,𝐺)

PixelRNN was the first autoregression model developed by Google DeepMind which is an

application for image generation. PixelCNN was developed further which is a “work around” to

avoid excess computation compared to PixelRNN.

While performing convolution operations PixelCNN can parallely learn distributions of all the

pixels in an image. But, this action violates the working principle of autoregression where each

pixel is trained sequentially. A major problem arises while processing a central pixel, the

convolutional filter takes all the pixels around it (which includes past pixels and upcoming

pixels) to calculate the output feature map. To avoid this issue, we apply a technique called

Masking to block the information flow from pixels which are yet to be predicted.

Masking can be done by zeroing out all the unpredicted pixels and only considering the previous

pixels. We can implement this by taking a matrix which is the same size as a filter and filling it

with 1’s for previous pixel values and 0’s for unpredicted pixels. Finally, we multiply this matrix

with the weighted tensor before performing the convolutional operation. There are 2 different

ways of masking in which Mask-A is applied only at the 1st convolutional layer where it zeroes

out the pixel of interest whereas Mask-B contains the current pixel value as 1.

You might also like