13. Unsupervised learning

대학원 수업 'Deep Learning', 교재 'Understanding Deep Learning', 그리고 직접 찾아서 공부한 내용들을 토대로 작성하였습니다.

지금까지 우리는 supervised learning에 대해서만 알아봤습니다. 관찰된 data x를 output value y로 mapping하는 model을 정의하고 training dataset {xi,yi}에 대한 mapping의 quality를 측정하는 loss function에 대해 알아봤습니다.

unsupervised learning에선, model이 'label이 존재하지 않는' 관찰된 data의 set을 통해 학습을 하게 됩니다.

(models learn from a set of observed data in the absence of labels)

모든 unsupervised model은 각각 다양한 목표가 있지만 이 특성은 모두 공유합니다.

unsupervised learning goal

- understand data

- essence of nature

- underlying principle

Taxonomy of unsupervised learning models

latent variable models

: define a mapping between an underlying explanatory (latent) variable z and the data x

for example, the famous k-mean algorithms maps the data x to a cluster assignment z ∈ {1,2,...,K}

이 때 보통 z는 해석하기 어렵다고 합니다. 여기서 z가 할 수 있는 일은 data distribution을 summarize하거나 data generation하는 데에 사용할 수 있습니다. (no fixed definition)

Generative models

: synthesize (generate) new examples with similar statistics to the training data

a subset of these are probabilistic, which define a distribution over the data from which we draw sample to generate new examples

In the following chapters

앞으로 4개의 chapter에선 모두 latent variable을 사용하는 generative model에 대해 설명합니다. (i.e., latent variable z를 x에 mapping)

Generative adversairal networks

GAN은 latent variable x를 통해 data example x*를 생성하고, real example로부터 구분할 수 없는 generated sample을 만들도록 loss를 사용합니다.

( learn to generate data example x ∗ from latent variables z, using a loss that encourages the generated samples to be indistinguishable from real examples.)

Normalizing flows, variation autoencoders, diffusion models

3개의 model은 'probabilistic generative model'이고, model parameter Φ 가 주어졌을 때 각 data point x에 probability Pr(x| Φ)를 할당합니다. 우리는 관찰된 data의 probability를 maximize하고, 그래서 loss는 negative log-likelihood가 됩니다.

( Probabilistic models (including variational autoencoders, normalizing flows, and diffusion models) learn a probability distribution over the training data. As training proceeds (left to right), the likelihood of the real examples increases under this distribution, which can be used to draw new samples and assess the probability of new data points.)

What makes a good generative model? (6가지)

Efficient sampling

"computationally inexpensibe and take advantage of the parallelism of modern hardware"

High-quality sampling

"indistinguishable from the real data with which the model was trained"

Coverage

"represent entire training distribution. insufficient to generate samples that all look like a subset of the training examples"

Well-behaved latent space

"every latent variable z corresponds to a plausible data example x. smooth changes in z correspond to smooth changes ins x"

Disentangled latent space

"manipulating each dimension of z should correspond to changing an interpretable property of the data. for example, in a model of language, it might change the topic, tense or verbosity"

Efficient likelihood computation

"if the model is probabilistic, we would like to be able to calculate the probability of new examples efficiently and accurately"

Quantifying performance

four quantitative measures of success for generative models

1) test likelihood

2) inception score

3) frenchet inception distance

4) manifold precision/recall

Test likelihood

: how well the model generalizes from the training data and also the coverage

- not relevant for GANs(not assign a probability) and expensive to estimate VAEs and diffusion models

- normalizing flow에만 적합함 (likelihood can be computed exactly and efficiently)

Inception score

(calculated using a pretrained classification model - 이름이 "inception" 인 이유)

inception score (IS) uses the "inception" model to calculate the average distance between individual class probabilities and average class probabilities of the genreated images using KL-divergence

highest inception score is achieved when the model generates diverse classes of images with definite class predictions (가장 높은 inception score는 model이 한정된 class prediction에 대해 다양한 image의 class를 생성할 때 얻을 수 있습니다)

a) pretrained network가 generated image 분류합니다. image가 realistic하면, 결과로 나오는 class probability Pr(yi|x*)는 correct class에서 절정에 이릅니다.

b) model이 모든 class를 비슷한 빈도로 생성하면, marginal(average) class probabiltiy는 flat할 것입니다. inception score는 (a)에서의 dist와 (b)에서의 dist사이의 거리의 average를 측정합니다.

(However, this metric is only sensible for generative models of the ImageNet database and is sensitive to the particualr classification model. Moreover, it does not reward diversity within an object class; it returns a high value if the model only generates one realistic example of each class) ( 또한 개체 클래스 내의 다양성을 보상하지 않으며, 모델이 각 클래스의 현실적인 예제를 하나만 생성하는 경우 높은 값을 반환합니다)

Frechet inception distance

: computes a 'symmetric distance' between the distributions of generated images and real images by approximating both distributions by multivariate Gaussians and estimates the distance between them using the 'Frechet distance'

(multivariate Gaussian에 의해 두 distribution을 approximation함으로써 generated image와 real image의 distribution간의 symmetric distance를 계산하고, frechet distance 이용해서 그들간의 거리 계산)

but, doesn't model the distance with respect to the original data but rather the activations in the deepest layer of inception classification network, ignoring the more fine-grained details of the images (measure not about "quality" property)

also, this metric does take account of diversity within clases but relies heavily on the information retained by the features in the inception network; any information discarded by the network does not contribute to the result.

Manifold precision/recall

frechet inception distance is sensitive both to the realism of samples and their diversity but does not distinguish between these factors

to disentangle these qualities, consider the overlap between the data manifold (i.e., subset of the data space where the real examples lie) and the model manifold (i.e., where the generated samples lie)

- precision

: fraction of model samples that fall into the data manifold (measures the proportion of generated samples that are realistic)

- recall

: fraction of data examples that fall within the model manifold (measures the propotion of the real data the model can generate)

To estimate manifold, place a hypersphere around each data example, whose radius is the distance to the kth nearest neighbor.

Union of these spheres is an approximation of the manifold, and it's easy to determine if a new point lies within it..

This manifold is also typically computed in the feature space of a classifier with the advantages and disadvantages that entails.

Summary

- unsupervised models learn about 'the structure of a dataset in the absence of labels'

- subset of these models is 'generative' and can synthesize new data examples. further subset is probabilistic in that they can both generate new examples and assign a probability to observed data

- models start with a latent variable 'z' which has a known distribution

- deep neural network then maps from the latent variable to the observed data space. we considered desirable properties of generative models and introduced metrics that attempt to quantify their performance

'Artificial Intelligence > Deep Learning' 카테고리의 다른 글

14. Generative Adversarial Networks (0)	2024.05.13
12. Graph neural networks (1)	2024.05.03
11. Transformers (0)	2024.04.23
10. Residual networks (1)	2024.04.18
9. Convolutional networks (0)	2024.04.16

수딘 연구 블로그