defforward(image, label): ''' Completes a forward pass of the CNN and calculates the accuracy and cross-entropy loss. - image is a 2d numpy array - label is a digit ''' # We transform the image from [0, 255] to [-0.5, 0.5] to make it easier # to work with. This is standard practice. out = conv.forward((image / 255) - 0.5) out = pool.forward(out) out = softmax.forward(out)
# Calculate cross-entropy loss and accuracy. np.log() is the natural log. loss = -np.log(out[label]) acc = 1if np.argmax(out) == label else0
MNIST CNN initialized! [Step 100] Past 100 steps: Average Loss 2.302 | Accuracy: 11% [Step 200] Past 100 steps: Average Loss 2.302 | Accuracy: 8% [Step 300] Past 100 steps: Average Loss 2.302 | Accuracy: 3% [Step 400] Past 100 steps: Average Loss 2.302 | Accuracy: 12%
classSoftmax: # ... defforward(self, input): ''' Performs a forward pass of the softmax layer using the given input. Returns a 1d numpy array containing the respective probability values. - input can be any array with any dimensions. ''' self.last_input_shape = input.shape
defbackprop(self, d_L_d_out): ''' Performs a backward pass of the softmax layer. Returns the loss gradient for this layer's inputs. - d_L_d_out is the loss gradient for this layer's outputs. ''' # We know only 1 element of d_L_d_out will be nonzero for i, gradient inenumerate(d_L_d_out): if (gradient == 0): continue
# e^totals t_exp = np.exp(self.last_totals)
# Sum of all e^totals S = np.sum(t_exp)
# Gradients of out[i] against totals d_out_d_t = - t_exp[i] * t_exp / (S ** 2) d_out_d_t[i] = t_exp[i] * (S - t_exp[i]) / (S ** 2)
\(\partial \mathrm{L} / \partial i n p u t\)被送往上一层;
为了计算这三个梯度,我们从下面的式子出发:
\[
t=w \cdot input +b
\]
这些梯度计算都比较简单:
\[\begin{aligned}
&\frac{\partial t}{\partial w}=i n p u t\\
&\frac{\partial t}{\partial b}=1\\
&\frac{\partial t}{\partial i n p u t}=w
\end{aligned}\]
利用链式法则汇总
\[\begin{aligned}
\frac{\partial L}{\partial w} &=\frac{\partial L}{\partial o u t} * \frac{\partial o u t}{\partial t} * \frac{\partial t}{\partial w} \\
\frac{\partial L}{\partial b} &=\frac{\partial L}{\partial o u t} * \frac{\partial o u t}{\partial t} * \frac{\partial t}{\partial b} \\
\frac{\partial L}{\partial i n p u t} &=\frac{\partial L}{\partial o u t} * \frac{\partial o u t}{\partial t} * \frac{\partial t}{\partial i n p u t}
\end{aligned}\]
defbackprop(self, d_L_d_out): ''' Performs a backward pass of the softmax layer. Returns the loss gradient for this layer's inputs. - d_L_d_out is the loss gradient for this layer's outputs. ''' # We know only 1 element of d_L_d_out will be nonzero for i, gradient inenumerate(d_L_d_out): if gradient == 0: continue
# e^totals t_exp = np.exp(self.last_totals)
# Sum of all e^totals S = np.sum(t_exp)
# Gradients of out[i] against totals d_out_d_t = -t_exp[i] * t_exp / (S ** 2) d_out_d_t[i] = t_exp[i] * (S - t_exp[i]) / (S ** 2)
# Gradients of totals against weights/biases/input d_t_d_w = self.last_input d_t_d_b = 1 d_t_d_inputs = self.weights
# Gradients of loss against totals d_L_d_t = gradient * d_out_d_t
# Gradients of loss against weights/biases/input d_L_d_w = d_t_d_w[np.newaxis].T @ d_L_d_t[np.newaxis] d_L_d_b = d_L_d_t * d_t_d_b d_L_d_inputs = d_t_d_inputs @ d_L_d_t
defbackprop(self, d_L_d_out, learn_rate): ''' Performs a backward pass of the softmax layer. Returns the loss gradient for this layer's inputs. - d_L_d_out is the loss gradient for this layer's outputs. - learn_rate is a float ''' # We know only 1 element of d_L_d_out will be nonzero for i, gradient inenumerate(d_L_d_out): if gradient == 0: continue
# e^totals t_exp = np.exp(self.last_totals)
# Sum of all e^totals S = np.sum(t_exp)
# Gradients of out[i] against totals d_out_d_t = -t_exp[i] * t_exp / (S ** 2) d_out_d_t[i] = t_exp[i] * (S - t_exp[i]) / (S ** 2)
# Gradients of totals against weights/biases/input d_t_d_w = self.last_input d_t_d_b = 1 d_t_d_inputs = self.weights
# Gradients of loss against totals d_L_d_t = gradient * d_out_d_t
# Gradients of loss against weights/biases/input d_L_d_w = d_t_d_w[np.newaxis].T @ d_L_d_t[np.newaxis] d_L_d_b = d_L_d_t * d_t_d_b d_L_d_inputs = d_t_d_inputs @ d_L_d_t
deftrain(im, label, lr=.005): ''' Completes a full training step on the given image and label. Returns the cross-entropy loss and accuracy. - image is a 2d numpy array - label is a digit - lr is the learning rate ''' # forward out, loss, acc = forward(im, label)
print('MNIST CNN initialized!') # Train! loss = 0 num_correct = 0 for i, (im, label) inenumerate(zip(train_images, train_labels)): if i > 0and i % 99 == 0: print( '[Step %d] Past 100 steps: Average Loss %.3f | Accuracy: %d%%' % (i + 1, loss / 100, num_correct) ) loss = 0 num_correct = 0
l, acc = train(im, label) loss += l num_correct += acc
这里的训练过程不太规范,是一个个地喂数据,同时计算在训练集上的准确率。
运行后的输出是:
1 2 3 4 5 6 7 8 9 10 11
MNIST CNN initialized! [Step 100] Past 100 steps: Average Loss 2.239 | Accuracy: 18% [Step 200] Past 100 steps: Average Loss 2.140 | Accuracy: 32% [Step 300] Past 100 steps: Average Loss 1.998 | Accuracy: 48% [Step 400] Past 100 steps: Average Loss 1.861 | Accuracy: 59% [Step 500] Past 100 steps: Average Loss 1.789 | Accuracy: 56% [Step 600] Past 100 steps: Average Loss 1.809 | Accuracy: 48% [Step 700] Past 100 steps: Average Loss 1.718 | Accuracy: 63% [Step 800] Past 100 steps: Average Loss 1.588 | Accuracy: 69% [Step 900] Past 100 steps: Average Loss 1.509 | Accuracy: 71% [Step 1000] Past 100 steps: Average Loss 1.481 | Accuracy: 70%
可以看到准确度有了很大的提高,我们的CNN已经开始学习了。
4 Backprop: Max Pooling
Max Pooling层其实没有任何参数,但是我们还是要实现backprop操作以将梯度操作继续向前传导。我们还是从forward开始
1 2 3 4 5 6 7 8 9 10 11 12 13
classMaxPool2: defforward(self, input): ''' Performs a forward pass of the maxpool layer using the given input. Returns a 3d numpy array with dimensions (h / 2, w / 2, num_filters). - input is a 3d numpy array with dimensions (h, w, num_filters) '''
defiterate_regions(self, image): ''' Generates non-overlapping 2x2 image regions to pool over. - image is a 2d numpy array ''' h, w, _ = image.shape new_h = h // 2 new_w = w // 2
for i inrange(new_h): for j inrange(new_w): m_region = image[(i * 2):(i * 2 + 2), (j * 2):(j * 2 + 2)] yield im_region, i, j defbackprop(self, d_L_d_out): ''' Performs a backward pass of the maxpool layer. Returns the loss gradient for this layer's inputs. - d_L_d_out is the loss gradient for this layer's outputs. ''' d_L_d_input = np.zeros(self.last_input.shape) for im region, i, j in self.iterate_regions(self.last_input): h, w, f = im_region.shape amax = np.amax(im_region, axis=(0, 1)) for i2 inrange(h): for j2 inrange(w): for f2 inrange(f): # If this pixel was the max value, copy the gradient to it if im_region[i2, j2, f2] == amax[f2]: d_L_d_input[i * 2 + i2, j * 2 + j2, f2] = d_L_d_out[i, j, f2] return d_L_d_input
这个实现是循环的,效率太低了。
Max Pooling的backprop操作到这里就可以了。
5 Backprop: Conv
卷积层的处理是CNN的网络的核心。还是遵循前例,我们从forward阶段的缓存开始:
1 2 3 4 5 6 7 8 9 10 11
classConv3x3 # ... defforward(self, input): ''' Performs a forward pass of the conv layer using the given input. Returns a 3d numpy array with dimensions (h, w, num_filters). - input is a 2d numpy array ''' self.last_input = input
我们这里主要关心损失关于Filter的梯度。MaxPool层已经提供了Conv层的\(\partial L / \partial o u t\),所以这里我们只需要计算\(\partial o u t / \partial f i l t e r s\)。要理清这个梯度的计算方法,我们首先来考虑这个问题:如果我们修改Filter的参数,这会如何影响到Conv层的输出呢?
classConv3x3 # ... defiterate_regions(self, image): ''' Generates all possible 3x3 image regions using valid padding. - image is a 2d numpy array. ''' h, w = image.shape
for i inrange(h - 2): for j inrange(w - 2): im_region = image[i:(i + 3), j:(j + 3)] yield im_region, i, j
defbackprop(self, d_L_d_out, learn_rate): ''' Performs a backward pass of the conv layer. - d_L_d_out is the loss gradient for this layer's outputs. - learn_rate is a float. ''' d_L_d_filters = np.zeros(self.filters.shape)
for im_region, i, j in self.iterate_regions(self.last_input): for f inrange(self.num_filters): d_L_d_filters[f] += d_L_d_out[i, j, f] * im_region
# We aren't returning anything here since we use Conv3x3 as # the first layer in our CNN. Otherwise, we'd need to return # the loss gradient for this layer's inputs, just like every # other layer in our CNN. returnNone
import mnist import numpy as np from conv import Conv3x3 from maxpool import MaxPool2 from softmax import Softmax
# We only use the first 1k examples of each set in the interest of time. # Feel free to change this if you want. train_images = mnist.train_images()[:1000] train_labels = mnist.train_labels()[:1000] test_images = mnist.test_images()[:1000] test_labels = mnist.test_labels()[:1000]
defforward(image, label): ''' Completes a forward pass of the CNN and calculates the accuracy and cross-entropy loss. - image is a 2d numpy array - label is a digit ''' # We transform the image from [0, 255] to [-0.5, 0.5] to make it easier # to work with. This is standard practice. out = conv.forward((image / 255) - 0.5) out = pool.forward(out) out = softmax.forward(out)
# Calculate cross-entropy loss and accuracy. np.log() is the natural log. loss = -np.log(out[label]) acc = 1if np.argmax(out) == label else0
return out, loss, acc
deftrain(im, label, lr=.005): ''' Completes a full training step on the given image and label. Returns the cross-entropy loss and accuracy. - image is a 2d numpy array - label is a digit - lr is the learning rate ''' # Forward out, loss, acc = forward(im, label)
# Train the CNN for 3 epochs for epoch inrange(3): print('--- Epoch %d ---' % (epoch + 1))
# Shuffle the training data permutation = np.random.permutation(len(train_images)) train_images = train_images[permutation] train_labels = train_labels[permutation]
# Train! loss = 0 num_correct = 0 for i, (im, label) inenumerate(zip(train_images, train_labels)): if i > 0and i % 100 == 99: print( '[Step %d] Past 100 steps: Average Loss %.3f | Accuracy: %d%%' % (i + 1, loss / 100, num_correct) ) loss = 0 num_correct = 0
l, acc = train(im, label) loss += l num_correct += acc
# Test the CNN print('\n--- Testing the CNN ---') loss = 0 num_correct = 0 for im, label inzip(test_images, test_labels): _, l, acc = forward(im, label) loss += l num_correct += acc
MNIST CNN initialized! --- Epoch 1 --- [Step 100] Past 100 steps: Average Loss 2.254 | Accuracy: 18% [Step 200] Past 100 steps: Average Loss 2.167 | Accuracy: 30% [Step 300] Past 100 steps: Average Loss 1.676 | Accuracy: 52% [Step 400] Past 100 steps: Average Loss 1.212 | Accuracy: 63% [Step 500] Past 100 steps: Average Loss 0.949 | Accuracy: 72% [Step 600] Past 100 steps: Average Loss 0.848 | Accuracy: 74% [Step 700] Past 100 steps: Average Loss 0.954 | Accuracy: 68% [Step 800] Past 100 steps: Average Loss 0.671 | Accuracy: 81% [Step 900] Past 100 steps: Average Loss 0.923 | Accuracy: 67% [Step 1000] Past 100 steps: Average Loss 0.571 | Accuracy: 83% --- Epoch 2 --- [Step 100] Past 100 steps: Average Loss 0.447 | Accuracy: 89% [Step 200] Past 100 steps: Average Loss 0.401 | Accuracy: 86% [Step 300] Past 100 steps: Average Loss 0.608 | Accuracy: 81% [Step 400] Past 100 steps: Average Loss 0.511 | Accuracy: 83% [Step 500] Past 100 steps: Average Loss 0.584 | Accuracy: 89% [Step 600] Past 100 steps: Average Loss 0.782 | Accuracy: 72% [Step 700] Past 100 steps: Average Loss 0.397 | Accuracy: 84% [Step 800] Past 100 steps: Average Loss 0.560 | Accuracy: 80% [Step 900] Past 100 steps: Average Loss 0.356 | Accuracy: 92% [Step 1000] Past 100 steps: Average Loss 0.576 | Accuracy: 85% --- Epoch 3 --- [Step 100] Past 100 steps: Average Loss 0.367 | Accuracy: 89% [Step 200] Past 100 steps: Average Loss 0.370 | Accuracy: 89% [Step 300] Past 100 steps: Average Loss 0.464 | Accuracy: 84% [Step 400] Past 100 steps: Average Loss 0.254 | Accuracy: 95% [Step 500] Past 100 steps: Average Loss 0.366 | Accuracy: 89% [Step 600] Past 100 steps: Average Loss 0.493 | Accuracy: 89% [Step 700] Past 100 steps: Average Loss 0.390 | Accuracy: 91% [Step 800] Past 100 steps: Average Loss 0.459 | Accuracy: 87% [Step 900] Past 100 steps: Average Loss 0.316 | Accuracy: 92% [Step 1000] Past 100 steps: Average Loss 0.460 | Accuracy: 87%
--- Testing the CNN --- Test Loss: 0.5979384893783474 Test Accuracy: 0.78