Deep Convolutional Generative Adversarial Networks (DCGAN)

This operation is sometimes called "deconvolution" after Deconvolutional Networks, but is actually the transpose (gradient) of conv2d rather than an actual deconvolution.

毎回重み等を定義するのがめんどくさいので以下のような関数を作成した.
conv_2d_transpose output_shape does not match · Issue #112 · tflearn/tflearn · GitHubにあるように、逆操作を考えるとpaddingやstrideに悩まずに済む.

def conv_transposed(input, kernel_shape, bias_shape, output_shape):
    weights = tf.get_variable("weights", kernel_shape,
                              tf.float32,
                              initializer=tf.random_normal_initializer(stddev=0.02))
    biases = tf.get_variable("biases", bias_shape,
                             tf.float32,
                             initializer=tf.constant_initializer(0.0))
    convt = tf.nn.conv2d_transpose(input, weights,
                                   strides=[1, 2, 2, 1],
                                   output_shape=output_shape, padding="SAME")

    return convt + biases

Flatten

flattenの操作がないようなので自前のものを定義(TF-Slimにはある)

def flatten(input):
    shape = input.get_shape()[1:].as_list()
    dim = np.prod(shape)
    return tf.reshape(input, [-1,dim]), dim

Leaky ReLU

この活性化関数は明示的に実装されてはないが、以下の処理に対応する.
${ \begin{eqnarray} f(x) = \left\{\begin{array}{} x & (x > 0) \\ 0.01x & (x <= 0) \end{array} \right. \end{eqnarray} }$

def leaky_relu(x):
    tf.maximum(0.01*x, x)

Discriminator

Discriminatorの構成は
conv+leakyReLU層x2
dropout層
fc層(logits)
sigmoid(probs)

Discriminatorクラスを定義して、クラスメソッドとしてモデルを定義する関数を定義した.

@classmethod
def build_D_model(self, X, p, batch_size, reuse):
    with tf.variable_scope("conv1", reuse=reuse):
        conv1 = conv_leaky_relu(X, [5, 5, 1, 64], [1, 2, 2, 1], [64])
    with tf.variable_scope("conv2", reuse=reuse):
        conv2 = conv_leaky_relu(conv1, [5, 5, 64, 128], [1, 2, 2, 1], [128])

    flt, dim = flatten(conv2)
        
    with tf.variable_scope("fc1", reuse=reuse):
        fc1 = fully_connected(flt, dim , 256)
        fc1 = tf.maximum(0.01*fc1, fc1)
            
    dropout = tf.nn.dropout(fc1, p)

    with tf.variable_scope("fc2", reuse=reuse):
        fc2 = fully_connected(dropout, 256, 1) # logits
    return tf.nn.sigmoid(fc2)

Update

Discriminatorは以下の確率的勾配をascendingすることで更新する.
よって実装する際には、符号を反転して最小化問題とする.
${ \displaystyle \nabla_{\theta^d} \frac{1}{m} \sum_{i=1}^m { \lbrack \log {D(x^{(i)})} + \log {(1-D(G(z^{(i)})))} \rbrack } }$

# バッチサイズ128で最初の64個がデータセットからの画像、後ろ64個がGeneratorからの画像
# probsはDiscriminatorの出力
tf.reduce_mean(-tf.reduce_sum(
               tf.log(self.probs[0:64]) + \
               tf.log(tf.ones(64, tf.float32)-self.probs[64:batch_size]),
               axis=1))

Generator

Generatorの構成は

fc + batch normalization(bn) + relu層x2
transposed conv(convt) + bn + relu層
convt + tanh(-1~1にスケール)層

Generatorクラスを定義して、モデルを作成する関数を定義した.

input = tf.random_uniform([batch_size, 100], minval=-1.0, maxval=1.0)
with tf.variable_scope("fc1", reuse=reuse):
    fc1 = fully_connected(input, 100, 1024)
    batch_mean1, batch_variance1 = tf.nn.moments(fc1, axes=[0])
    bn1 = tf.nn.batch_normalization(fc1, batch_mean1, batch_variance1,
                                    None, None, 1e-5)
    relu1 = tf.nn.relu(bn1)
with tf.variable_scope("fc2", reuse=reuse):
    fc2 = fully_connected(relu1, 1024, 128*7*7)
    batch_mean2, batch_variance2 = tf.nn.moments(fc2, axes=[0])
    bn2 = tf.nn.batch_normalization(fc2, batch_mean2, batch_variance2,
                                    None, None, 1e-5)
    relu2 = tf.nn.relu(bn2)

    img = tf.reshape(relu2, [-1, 7, 7, 128])

with tf.variable_scope("convt1", reuse=reuse):
    convt1 = conv_transposed(img, kernel_shape=[5, 5, 64, 128],
                             bias_shape=[64],
                             output_shape=[batch_size, 14, 14, 64])
    batch_mean3, batch_variance3 = tf.nn.moments(convt1, axes=[0, 1, 2])
    bn3 = tf.nn.batch_normalization(convt1, batch_mean3, batch_variance3,
                                    None, None, 1e-5)
    relu3 = tf.nn.relu(bn3)

with tf.variable_scope("convt2", reuse=reuse):
    convt2 = conv_transposed(relu3, kernel_shape=[5, 5, 1, 64],
                             bias_shape=[1],
                             output_shape=[batch_size, 28, 28, 1])
return tf.nn.tanh(convt2)

DiscriminatorのSharing

Generatorのlossを計算する目的で、内部でDiscriminatorのモデルを持つ.
パラメータを共有するためにSharing Variablesの機能を利用する.
Sharing Variables | TensorFlow

with tf.variable_scope("D", reuse=True):
    probs = Discriminator.build_D_model(self.out_img, self.p, batch_size, True)

よくよく考えたら、Generator内部でDiscriminatorを持っている感じになっていて、GANの構成としては適切ではないし、回りくどかったと猛省している...

Update

Generatorは以下の確率的勾配を"dicsending"することで更新をする.

${ \displaystyle \nabla_{\theta^g} \frac{1}{m} \sum_{i=1}^m \log(1-D(G(z^{(i)}))) }$

が、実装する際には、Discriminatorを通ってる関係上更新する対象からDiscriminatorに関するものを除外する必要がある.
今回はネームスコープを利用してこれを行った.

self.grads_and_vars = self.optimizer.compute_gradients(self.loss)
self.grads_and_vars = [[grad, var] for grad, var in self.grads_and_vars \
                       if grad is not None and var.name.startswith("G")]
self.train_op = self.optimizer.apply_gradients(self.grads_and_vars)

tf.reduce_mean(tf.reduce_sum(tf.ones(batch_size, tf.float32) - tf.log(probs),
               axis=1))

上の損失関数を利用する代わりに、下式のものを使うとよいというトリックが知られている.

${ \displaystyle -\frac{1}{m} \sum_{i=1}^m \log(D(G(z^{(i)})) }$

tf.reduce_mean(tf.reduce_sum(tf.log(probs), axis=1))

結果

それなりに学習してくれている事が確認できたが、
ゼロを塗りつぶしたような画像ができているのが気になった.
生成画像に若干の偏りが見られる.

DeepLearningを勉強する人

興味のあることを書く

Deep Convolutional Generative Adversarial Networks (DCGAN)