On going deep sky surveys like The Sloan Digital Survey (SDSS) have amassed an enormous amount of information to sift through. In the case of SDSS's 14th data release, it amounts to over 125 terabytes of information. There arises the problem of visually classifying such a large amount of data. For this, Artificial Neural Networks (ANNs) are great tools.
Artificial Neural Networks
ANN's basic anatomy consists of neurons, the connections and weights between the neurons, and a propagation function. The neurons, which consists of an activation function, are organized into layers. There is an input layer, an output layer, and hidden layers of neurons in between. Each neuron of a layer connects, with a weight, to all the neurons of the layers right before and after the layer where this neuron resides. The propagation function transports the values through the network.
Figure 1: A conceptual image of an ANN
Convolutional Neural Network
A CNN like a ANN has an input layer and an output layer but the hidden layer consists of convolutional, pooling, relu, fully connected, normalization layers, etc. But, before explaining the function of these it is pertinent to explain how images are stored. Color images are stored in 3 dimensional matricies in this case they are stored in 220x220x3 matricies.
The Convolutional Layer (Conv Layer) is the most computationally costly part of the CNN. It consists of a set of learnable filters smaller than the image in terms of height and width but the same in depth, in this case 3. Each filter is passed along the entire image. As the filter is being moved across the image the dot product is calculated between the filterentries and the vales of the image at the position of the filter, in the end, an activation filter is produced. The CNN, in training, learns to recognize when certain filters are active in the image and clssifices accordingly. The connections of the neurons of the Conv layer are, for efficiency, connected only to the size of the filter. For example, in this paper, small 3x3x3 filters are used resulting in 27 weights and connections per filter. Other parameters of the Conv Layer include the stride size with which the filter moves along the image, the size of the padding around the image, and the number of filters to be used.
A pooling layer is used to reduce the size of the output of the layer before it and to reduce the number of computations required in the following layers. The most commonly used kind of pooling layer uses the max function, referred as the max pooling layer. There are other kinds of pooling layers that use L2-norm or the average function but for the purpose of this post max pooling will be talked about.
The max pooling layer reduces the dimension of the input images by, like the convolutional layer, sliding across the input image and finding the max value of that section of the image then placing that value in a newly constructed matrix there by downsampling the image.
A relu layer applies the function f(x)=max(0,x) to all the values of the input image. Therefore, it sets all the elements of the input to 0 if that element is negative. This reduces training time. The output dimensions of this layer is the same as the input.
The normalization layer first calcualted the batch mean and the batch variance then uses that ove each of the 3 color channels and normalizes them. When xi is an element of the image, that batch mean is calculated using
, the batch variance using , the normalization where epsilon is to improve numerical stability, and finally where the offset β and the scale factor γ are learnable.
Figure 2: Mosaic of images formed using MATLAB's Deeplearning with the image of the Andromenda Galaxy
Gradient descent (GD) is used to find local minimums of functions. GD uses an observation that a function descends faster if you go in the direction of the negative gradient, and this can be used to find the minimum. The problem is that the gradient needs to be calculated which is where Stochastic Gradient Descent (SGD) comes in, which uses stochastic approximation. The gradient of a function can be approximated by taking the gradient of a single sample of the form w=w-l∇Qi(w), where l is the learning rate, and iterating over all the samples until there is convergence.
Adaptive Moment Estimation (Adam), like SGD, minimizes a function, but unlike SGD, the learning rate is adaptive. In Adam, the exponential moving average of the gradient and the gradient squared are calculated and used to adapt the learning rate. You can find the specifications in the paper "ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION" by Kingma and Lei Ba (link).
Galaxy Morphology Classification
There are three main types of galaxies, Elliptical, Spiral, and Irregular Galaxies. Elliptical galaxies are smooth, textureless, and featureless in character with a bright nucleus. On the other hand, spiral galaxies are characterized by spiral arms formed by stars. The arms of the spiral emanate from the nucleus of the galaxy. The irregular galaxy has no specific shape and do not fall in any class.
The classification system used by Galaxy Zoo 2 shows the short hand for the class of a galaxy that is closer to elliptical than spiral starts with 'E' and the galaxies that are closer to spiral than elliptical start with 'S'. The classes that start with 'A' mean that the object in the SDSS are stars or have artifacts in the image.
Image and Classification Data
Galaxy Zoo 2 is a data release from volunteers who go through The Sloan Digital Sky Survey and classify galaxies. The Sloan Digital Sky Survey (SDSS) is a full sky survey of photometric data. The name and classification of galaxies was obtained from the Galaxy Zoo 2 survey. And, a program was created in C++ which uses the libcurl library to iterate through Galaxy Zoo 2's database, querying the SDSS using the RA and DEC of the galaxy to obtain a JPG image for each entry of the database. It then places the image in a folder named after the classification of the galaxy it contains. This makes importing them into MATLAB easier. 175,906 total classifications and images of galaxies were collected.
The Architecture of the CNN was modeled after LeNet-5. LeNet-5 is a pretrained CNN whose purpose is to classify images of hand written digits. LeNet-5 layers are as following
where FC layers are Fully Connected LAyers. The architecture of the CNN used is as follows
The filter size of all the conv layers is 3x3 with the number of filters, starting with 64, doubling every other time a conv layer occurs util the number of filters per conv layer reaches 256. The size of the max pooling layers used are 2x2 with a stride of 2. The input layer takes in RGB images with a dimension 220x220. And, the output layer gives the most probahle classification for the image as 'E', 'S', or 'A', using the classification system Galaxy Zoo 2 uses.
To train the Cnn, the Adam algorithm was used. The network was trained over 10 epochs. This means that the network was trained over the entire dataset 10 times. The CNN was built and trained using MATLAB on a Graphics Processing Unit (GPU) whose slower but many cores allowed for faster training. More specifically, the trining was done on a NVIDIA GTX Titan which has 2688 cuda cores. The training's run time was about 100 hours. The CNN was trained over 80% of the 175,906 images leaving 20% of the images to test and validate that the CNN works.
Figure 3: A confusion matrix is a common tool in evaluating classification systems. It is used to visualize classifications and misclassifications of a given dataset. For example, the square where the second row and third column intersects displays how many Elliptical galaxy images were misclassificed as Spiral galaxies according to the given classifications from Galaxy Zoo 2.
After training, the remaining 20% of the images were used to check the CNN. Form Figure 3, you can see that over the remaining images 82.3% were classified correctly, according the classifications of Zoo Galaxy 2. 84.4% of the spiral galaxy images were classified correctly with 15.6% classified incorrectly. For elliptical galaxy images, 80.1% were classified correctly and 19.9% incorrectly. 98.3% of the 'A' class of images were incorrectly classified. This means that the CNN's performance with regard to Spiral and Elliptical galaxies are above 80%. But, the CNN's performance in classifying 'A' class images was poor, 'A' class images are images that could be galaxies or stars but if it is a galaxy there, it is being obstruction in the image disallowing proper classification.
While the training of a CNN is extremely computationally expensive, now with cheap widely available computing power with GPUs and Tensor Processing Units (TPUs) and the small computational cost of using a pretrained CNN, the benefits outweigh the cost. This system once further optimized will able to classify large amount of visual information in a very short amount of time compared to human visual classification of galaxies.
Possible improvements to better the performance of the CNN include changing the architecture to a non-linear layout similar to Google's Inception-v4 pretrained neural network, use more computational power then a single GPU to be able to train the network to more of a convergence and to increase the compelxity of the CNN, and include more 'A' class images to be able to properly train the CNN to recognize obstructions and that which is not a galaxy. Expanding the CNN to include galaxy subclasses and/or even other objects is a natrual step.
Diederik P. Kingma and Jimmy Lei Ba Adam: a Method for Stochastic
Ross E. Hart, Steven P. Bamford, Kyle W. Willett, Karen L. Masters, Caroline
Cardamone, Chris J. Lintott, Robert J. Mackey, Robert C. Nichol,
Christopher K. Rosslowe, Brooke D. Simmons, Rebecca J. Smethurst
Galaxy Zoo: comparing the demographics of spiral arm number and a new
method for correcting redshift bias Monthly Notices of the Royal Astronomical
Society, Volume 461, Issue 4, 1 October 2016, Pages 3663-3682,
CS231n Convolutional Neural Network for Visual Recognition Stanford U.,
May 2017, cs231n.github.io/convolutional-networks/
Sergey Loe and Christian Szegedy Batch Normalization: Accelerating Deep
Network Training by Reducing Internal Covariate Shift arXiv:1502.03167v3,
De Vacouleurs, G. Classication and Morphology of External Galaxies
Astrophysik IV: Sternsysteme / Astrophysics IV: Stellar Systems, 1959,
Springer Berlin Heidelberg, 275-310 pgs., doi 10.1007/978-3-642-45932-0 7,
Kyle W. Willett, Chris J. Lintott, Steven P. Bamford, Karen L. Masters,
Brooke D. Simmons, Kevin R.V. Casteels, Edward M. Edmondson, Lucy
F. Fortson, Sugata Kaviraj, William C. Keel, Thomas Melvin, Robert C.
Nichol, M. Jordan Raddick, Kevin Schawinski, Robert J. Simpson, Ramin
A. Skibba, Arfon M. Smith, Daniel Thomas Galaxy Zoo 2: detailed morpho-
logical classications for 304,122 galaxies from the Sloan Digital Sky Survey
Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haner Gradient-
Based Learning Applied to Document Recognition proc. of the IEEE, November
Christian Szegedy, Sergey Ioe, and Vincent Vanhoucke Inception-v4,
Inception-ResNet and the Impact of Residual Connections on Learning