Deep learning on the MNIST dataset using the Keras API

Deep learning on the MNIST dataset using the Keras API

- 9 mins

MNIST with deep learning using Keras

import pandas as pd
import numpy as np
import os

The data was required from Kaggle and the kernel is actually public and can be reached at this link. The code have been slightly modifed due to the fact that I had to ran it on my computer in order to export it to markdown.

data = pd.read_csv('train.csv')
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import normalize
# splitting into train, test and validation sets
X_train, X_test, y_train, y_test = train_test_split(normalize(data.values[:, 1:]), data.values[:, 0],
                                                   test_size=0.33, shuffle=True, random_state=42)

# splitting the test set to a small validation set
X_test, X_valid, y_test, y_valid = train_test_split(X_test, y_test, test_size=0.33, shuffle=True, random_state=137)
X_train.shape, y_train.shape, X_test.shape, y_test.shape, X_valid.shape, y_valid.shape
((28140, 784), (28140,), (9286, 784), (9286,), (4574, 784), (4574,))
# the test data that needs to be submitted to Kaggle
test = pd.read_csv('test.csv')
test.shape
(28000, 784)
from keras.models import Sequential
from keras.layers import Dense, Activation, Conv2D, MaxPool2D, Dropout, Flatten, BatchNormalization, AvgPool2D
from keras.preprocessing.image import ImageDataGenerator

The datagenerator from Keras can shift, rotate, zoom and apply different filters to the dataset which is extremely useful when one wants to make the training set artificially bigger and more diverse.

datagen = ImageDataGenerator(height_shift_range=5, rotation_range=10, width_shift_range=5, zoom_range=0.1)

It is usually the case to build convolutional layers after each other with filter sizes being raised after the previous one. I apply average pooling and batch normalization in each convolutional layer to keep the number of parameters relatively low and after that a deeply connected dense layer finishes the job with a softmax activation function at the end. Dropout is used to not overtrain the model.

model = Sequential()

model.add(Conv2D(32, kernel_size=(2, 2), activation='relu', input_shape=(28, 28, 1)))
model.add(BatchNormalization())
model.add(Conv2D(64, kernel_size=(2, 2), padding='same', activation='relu'))
model.add(AvgPool2D())
model.add(Dropout(0.5))


model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(AvgPool2D())
model.add(Dropout(0.5))

model.add(Flatten())

model.add(Dense(256, activation='relu'))
model.add(Dropout(0.33))
model.add(Dense(10, activation='softmax'))

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 27, 27, 32)        160       
_________________________________________________________________
batch_normalization_1 (Batch (None, 27, 27, 32)        128       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 27, 27, 64)        8256      
_________________________________________________________________
average_pooling2d_1 (Average (None, 13, 13, 64)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 13, 13, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 13, 13, 64)        36928     
_________________________________________________________________
batch_normalization_2 (Batch (None, 13, 13, 64)        256       
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 11, 11, 64)        36928     
_________________________________________________________________
average_pooling2d_2 (Average (None, 5, 5, 64)          0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 5, 5, 64)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 1600)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 256)               409856    
_________________________________________________________________
dropout_3 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                2570      
=================================================================
Total params: 495,082
Trainable params: 494,890
Non-trainable params: 192
_________________________________________________________________
from keras.utils import to_categorical
from keras.callbacks import EarlyStopping, LearningRateScheduler
from sklearn.preprocessing import normalize

X_train_ = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test_ = X_test.reshape(X_test.shape[0], 28, 28, 1)
X_valid_ = X_valid.reshape(X_valid.shape[0], 28, 28, 1)

y_train_cat = to_categorical(y_train)
y_test_cat = to_categorical(y_test)
y_valid_cat = to_categorical(y_valid)

hist = model.fit_generator(datagen.flow(X_train_, y_train_cat, batch_size=64),
                epochs=15, verbose=1, validation_data=(X_valid_, y_valid_cat),
                callbacks=[EarlyStopping(patience=3, restore_best_weights=True, monitor='val_acc', baseline=0.95)])
Epoch 1/15
440/440 [==============================] - 190s 431ms/step - loss: 0.7767 - acc: 0.7378 - val_loss: 0.3413 - val_acc: 0.8964
Epoch 2/15
440/440 [==============================] - 186s 423ms/step - loss: 0.2336 - acc: 0.9272 - val_loss: 0.1307 - val_acc: 0.9600
Epoch 3/15
440/440 [==============================] - 169s 384ms/step - loss: 0.1704 - acc: 0.9467 - val_loss: 0.1260 - val_acc: 0.9606
Epoch 4/15
440/440 [==============================] - 167s 380ms/step - loss: 0.1390 - acc: 0.9585 - val_loss: 0.0544 - val_acc: 0.9823
Epoch 5/15
440/440 [==============================] - 186s 424ms/step - loss: 0.1171 - acc: 0.9644 - val_loss: 0.0700 - val_acc: 0.9777
Epoch 6/15
440/440 [==============================] - 242s 549ms/step - loss: 0.1095 - acc: 0.9666 - val_loss: 0.0523 - val_acc: 0.9849
Epoch 7/15
440/440 [==============================] - 229s 519ms/step - loss: 0.1030 - acc: 0.9684 - val_loss: 0.0801 - val_acc: 0.9762
Epoch 8/15
440/440 [==============================] - 244s 554ms/step - loss: 0.0998 - acc: 0.9704 - val_loss: 0.1402 - val_acc: 0.9552
Epoch 9/15
440/440 [==============================] - 184s 417ms/step - loss: 0.0920 - acc: 0.9741 - val_loss: 0.0269 - val_acc: 0.9919
Epoch 10/15
440/440 [==============================] - 247s 562ms/step - loss: 0.0907 - acc: 0.9736 - val_loss: 0.6707 - val_acc: 0.8242
Epoch 11/15
440/440 [==============================] - 177s 403ms/step - loss: 0.0877 - acc: 0.9745 - val_loss: 0.0237 - val_acc: 0.9913
Epoch 12/15
440/440 [==============================] - 168s 381ms/step - loss: 0.0830 - acc: 0.9756 - val_loss: 0.0261 - val_acc: 0.9919

Early stopping is used with a patience of three epochs due to the fluctiation in validation accuracy. The baseline is set to 95% on the validation set, until that or the 15 epochs is reached the training will go on. The model is then evaluated on the test set.

score = model.evaluate(X_test_, y_test_cat, batch_size=32)
score
9286/9286 [==============================] - 13s 1ms/step





[0.028520030430235262, 0.991600258453586]

One should not forget (I did for the first 12 tries) that the test data must be normalized as well as the training data was. This is pretty straighforward in the case of the MNIST dataset. When dealing with other datasets one must take into account that the same scaling must be applied on the test and training sets.

test_normed = normalize(test)
test_normed = test_normed.reshape(test_normed.shape[0], 28, 28, 1)

y_pred = model.predict(test_normed)
submission = pd.DataFrame(np.argmax(y_pred, axis=1), columns=['Label'])
submission.index += 1
submission.tail()
Label
27996 9
27997 7
27998 3
27999 9
28000 2
submission.to_csv('./submission.csv', index_label='ImageId', columns=['Label'])

This submission resulted in ~98% accuracy on the provided test set by Kaggle. That is currently among the top 52% of submissions. The leaderboard can be found under the link.

@Regards, Alex

Alex Olar

Alex Olar

Christian, foodie, physicist, tech enthusiast

comments powered by Disqus
rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora