From the time I started this blog, during my undergrad, one of my goals was to create a system for automatically detecting galaxies. A system that ran on minimal computational power and without the help of the internet. At the time I had no idea how to do that, the only thing I knew was that Neural Networks were the way to go.

At that time, I created a Convolutional Neural Network based on LeNet-5, which is starting to show it's age, as there have been breakthroughs. Some of the breakthroughs were presented and used in the models, MobileNet v1 and v2. MobileNet's focus is not only model performance but also reducing runtime, model size, and train time. The improvements are detailed on the MobileNet v1 & v2 papers, (v1 link) (v2 link). This won't focus on evaluating it's performance as there is plenty of information on that in the linked papers, but I will focus on it's implementation. Also, I didn't implement some features of the MobileNets as to have a faster/simpler implementation.

MobileNet v1 Architecture

MobileNet v1 revolves around what is called depthwise convolution layers. These layers, having a smaller computational cost than standard convolution layers, which lets us create a model that is smaller in the number of trainable parameters and in turn the computational cost of training and using the said model.

In mobile net, a convolution block is a convolution layer followed by a batch normalization layer and a relu layer. A depthwise convolution block starts with a depthwise convolution layer followed by batch normalization layer, a relu layer, and a convolution block.

The network starts with a convolution block where the convolution layer has a filter size of $3\times 3\times 32$. The network then has a series of depthwise convolution blocks until reaching an average pool layer than reduces the dimension of the tensor to $1\times 1\times 1024$ and a fully connected layer with a softmax activation doing the classifications.  You can see the exact details of the layers in the first mobilenet paper link above.

Implementation

Keras allows us to create intricate model designs using it's functional API. Seeing as it is built into Tensorflow, I decided to use it.

Firstly, here are the layer imports we are going to need

from tensorflow.keras.layers import Conv2D, BatchNormalization, Activation, DepthwiseConv2D, AveragePooling2D, Dense, ZeroPadding2D, Flatten, Add

We build the convolution block:

def convolution_block(input_tensor, filters, kernel_size=(1, 1), strides=(1, 1), padding='same'):
    x = Conv2D(filters=filters, kernel_size=kernel_size, strides=strides,
               padding=padding, use_bias=False)(input_tensor)
    x = BatchNormalization()(x)
    return Activation('relu')(x)

In the function above, we take a temporary tensor which remembers the operations performed on it. This is useful when performing backwardpropogation, or more generaly automatic differentiation. We also, in the function, take in the number of filters in the convolution layer and the kernel size to use.
The format of the operations performed on these tensors starts by first creating and initiating the layer with the desired parameters, then passing the tensor to that object, resulting in another tensor with the memory of having passed through the layer. Anyway, as I said before, in the post, we pass the tensor through the convolution layer, the batch normalization layer, then the relu activation layer.
the depthwise convolution block:

def depth_wise_convolution_block(input_tensor, filters, depth_wise_strides, depthwise_padding):
    x = DepthwiseConv2D((3, 3), depth_wise_strides, padding=depthwise_padding, use_bias=False)(input_tensor)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    return convolution_block(x, filters)

In the function above, we take in that input tensor, the number of filters used in the covolution block, the stride used in the depthwise convolution layer, and it's padding. We pass this tensor, first through the depthwise convolution, the batch normalization, the relu activation, and a convolution block.
As I've said, we start with a convolution block, but due to the way that the convolution block works, we have to add some padding before every convolution layer with valid padding. So, we start with

def mobile_net(input_tensor, n_classes):
    x = ZeroPadding2D(((0, 1), (0, 1)))(input_tensor)
    x = convolution_block(x, 32, (3, 3), (2, 2), padding='valid')

Then, the depthwise convolution blocks start:

    x = depth_wise_convolution_block(x, 64, (1, 1), 'same')
    x = ZeroPadding2D(((0, 1), (0, 1)))(x)
    x = depth_wise_convolution_block(x, 128, (2, 2), 'valid')
    x = depth_wise_convolution_block(x, 128, (1, 1), 'same')
    x = ZeroPadding2D(((0, 1), (0, 1)))(x)
    x = depth_wise_convolution_block(x, 256, (2, 2), 'valid')
    x = depth_wise_convolution_block(x, 256, (1, 1), 'same')
    x = ZeroPadding2D(((0, 1), (0, 1)))(x)
    x = depth_wise_convolution_block(x, 512, (2, 2), 'valid')
    for i in range(5):
        x = depth_wise_convolution_block(x, 512, (1, 1), 'same')
    x = ZeroPadding2D(((0, 1), (0, 1)))(x)
    x = depth_wise_convolution_block(x, 1024, (2, 2), 'valid')
    x = ZeroPadding2D(((3, 3), (3, 3)))(x)
    x = depth_wise_convolution_block(x, 1024, (2, 2), 'same')

And, the Average Pooling layer. I added an extra 1000 neuron layer for added accuracy

    x = AveragePooling2D((7, 7), strides=(1, 1))(x)
    x = Flatten()(x)
    x = Dense(1000)(x)
    x = Activation('relu')(x)
    x = Dense(n_classes, activation='softmax')(x)
    return x

MobileNet v2 Architecture

MobileNet v2 revolves around the bottleneck block. This block consists of, starting out, a convolution block, a depthwise convolution block, another convolution layer, a batch normalization layer, and if the depthwise convolution layer stride is 1, with the number of output filters the size of the 3rd dimension of the input tensor, we  add the input tensor with the tensor from the batch normalization layer. In the bottleneck block, there is an expansion of the size filters to a multiple of the 3rd dimension of the input tensor.

To build the block we have to first get the shape of the third dimension of the input tensor and calculate the number of filters to expand with.

Implementation

def bottleneck(input_tensor, filters, depth_wise_strides, expansion_factor=6):
    expansion = input_tensor.shape[3].value * expansion_factor
    x = convolution_block(input_tensor, expansion)
    x = depth_wise_convolution_block(x, expansion, depth_wise_strides, 'same')
    x = Conv2D(filters=filters, kernel_size=(1, 1), strides=(1, 1), padding='same')(x)
    x = BatchNormalization()(x)
    if depth_wise_strides == (1, 1) and filters == input_tensor.shape[3].value:
        x = Add()([x, input_tensor])
    return x

The mobilenet v2 network starts with a convolution block and a series of bottleneck blocks. Again, the exact numbers are from the papers linked above. Also, I changed the ending of the network with an added fully connected layer with a relu activation layer.


def mobile_net_v2(input_tensor, n_classes):
    x = ZeroPadding2D(((0, 1), (0, 1)))(input_tensor)
    x = convolution_block(x, 32, (3, 3), (2, 2), 'valid')
    x = bottleneck(x, 16, (1, 1), 1)
    x = bottleneck(x, 24, (2, 2))
    x = bottleneck(x, 24, (1, 1))
    x = bottleneck(x, 32, (2, 2))
    x = bottleneck(x, 32, (1, 1))
    x = bottleneck(x, 32, (1, 1))
    x = bottleneck(x, 64, (2, 2))
    for i in range(3):
        x = bottleneck(x, 64, (1, 1))
    for i in range(3):
        x = bottleneck(x, 96, (1, 1))
    x = bottleneck(x, 160, (2, 2))
    x = bottleneck(x, 160, (1, 1))
    x = bottleneck(x, 160, (1, 1))
    x = bottleneck(x, 320, (1, 1))
    x = convolution_block(x, 1280, (1, 1), (1, 1), 'same')
    x = AveragePooling2D((7, 7))(x)
    x = Flatten()(x)
    x = Dense(1000)(x)
    x = Activation('relu')(x)
    x = Dense(n_classes, activation='softmax')(x)
    return x

Performance

Both model were trained on the Galaxy Zoo's Galaxy image dataset hosted at kaggle, (link). An input tensor size of $224\times 224\times 3$ is used. The optimizing algorithm used is the ADAM algorithm with a starting learning rate of $0.0005$. The learning rate was set to half every time the training plateaus with a patience of $2$ epochs. An image generator was also used that randomly rotates, changes the width, height, and zoom. It also randomly flips the image, both, horizontaly and verticaly and rescales the image to between $0$ and $1$. $10\%$ of the images were used as validation. The model was trained for 30 epochs, mostly reaching a convergance. This is shown in the training plot for both.

Training plot for MobileNet V1
Training plot for MobileNet V2

The top graph is the accuracy and the buttom is the loss. MobileNet v1 reached an accuracy of $80\%$ and MobileNet v2 $81\%$.

You can see and use the saved keras model as well as the source code for generating the model in the github page at the link below.

SOURCE CODE

I know that this does not exactly detect galaxies in images but classify them. The plan is to convert this model over to a faster R-CNN in a future post which can do exactly that.