Recently, I started work on a C++ library with the purpose of creating easily creating and training Neural Networks.

Introduction

At the Large Hadron Collider (LHC), according to the CERN website, 600 million particles collide per second. It produces 30 petabytes of data to go through per year, and the feasibility of sifting though that manually is nil. So, various statistical methods are used in place, including decision trees and shallow neural networks.

For this post, I will create a shallow neural network, a network with only 1 hidden layer, train it, and test it's performance using data from Monte Carlo Simulations from the following site. As stated before, I used a C++ neural network utility library that I'm currently working on called CppNNet.

Higgs Boson

What is the Higgs Boson?. It is an elementary particle part of the Standard Model where according to quantum field theory, is a resultant of a perturbation of the Higgs field. In the 1950s, scientistic found similarities between the neutrons and protons where a sort of local symmetry, now we know of quarks. But, the problem arises of the symmetry breaking problem of the up and down quarks containing different masses and electric charges. This can be traced back to the existence of the higgs boson and even the fact that fermions, like an electron, have mass can be traced back to the higgs.

If a fermion were masses it would always be traveling at the speed of light, because as Sean Carroll put in his book, "The Particle At The End of The Universe", "That is what mass-less particles do." There is a quantity called Helicity in these fermions, which is the projection of the spin on the momentum. Meaning, that as our relative velocity to fermions change so does the helicity, but, in a mass-less fermion, the helicity would be constant, just as the velocity of light in a vacuum. The importance of this is that, in nature, massless gauge bosons couple to a particle with helicity in one direction and not the other, unlike strong force, gravity, and electromagnetism that couple on both directions. This doesn't really make sense when helicity and direction depend on relative velocity unless the fermion is mass-less. When a higgs field we see that the symmetry is broken, bosons and leptons have mass, and both both directions of helicity are allowed. For more information refer to appendix 1 of the book mentioned above.

Measured data

The data obtained contains various physical, kinematic, aspects of particle collisions that would be detected at the LHC but simulated using Monte Carlo Methods. Each instance collected various aspects of simulated particle collision including the amount of missing energy and it's direction. All in all, there are 21 collected features and 7 high level features which where calculated from the 21 collected features. The data file provided in the archives has at the start a 0 when the sample is of background or a 1 when it contains the higgs then the 28 total features mentioned above. (link)

In order to make easier to calculate, I made a few modification to the format of the data file. Instead of having the classification 0 or 1 at the start, I moved that to the end and changed to format so that there is an additional output classification. For instance, when it is a background sample, the original would have a classification column at the front with a 0, but the modified would have two columns at the end with a 0 in the first and a 1 in the second. Similarly, for a sample with the higgs, the modified last columns would have a 1 in the front and a 0 at the end. Here is a link to the modified data. I recommend decompressing the file before using it as I haven't tested CppNNet with large compressed files. It might work thou.

The Code

Now, as I have said, I used a C++ neural network library that I'm working on called CppNNet. I just have to say, if you would like to help with the development of the library, I welcome it!!!

First of all, I haven't really tested the library in a windows environment so to build it there your on your own for now. To build it in Linux, you need to have at least version 7.3.0 of the gcc and g++ compiler together with at least the version 2.8.9 of CMake and git. The package manager called hunter, when compiling, should automatically download and compile any dependencies.

To build in ubuntu

sudo apt install git cmake build-essential
git clone https://github.com/anhydrous99/CppNNet
mkdir CppNNet/build && cd CppNNet/build
cmake ..
make

All you have to do then is link the compiler to the static library that is created with the $-l$ g++ option and $-I$ for the include to the include folder.

Let's create a main.cpp and in the file include the following headers

#include "Activation_Functions.h"
#include "Neural_Layer.h"
#include "Neural_Trainer.h"
#include "CSV_Importer.h"
#include "Normalizer.h"
#include <iostream>

The first include is a small repository of activation functions in the form of std::function as to have better modularity. The second contains the class Neural_Layer which contains the weights and biases for the network as well as the activation function.The third has the class which will train the network. The fourth is contains the class tasked with importing CSV files and has the ability to import compressed csv files with gzip. The fourth is the class which can swish or expand data to fit a certain interval which is useful in accelerating and stabilizing the training of neural networks. Now, lets create our main function.

int main(int argc, char *argv[]) {
  if (argc != 2) {
    std::cerr << "Error: wrong number of arguments. Exiting...\n";
  }
  
  // TODO
}

The argc != 2 part checks that an argument has been passed containing the path to the data file. Then, lets declare the number of inputs, the number of outputs, and the number of neurons which we will use in integers in the main function.

// Number of inputs
int inp = 28;
// Number of outputs
int out = 2;
// Number of neurons
int nn = 400;

I setup the layers so that they only depend on each other by using a C++11 feature called shared pointers. These pointers keep track of the number copies made of them and delete the object when these copies and the main pointer is deleted. Here is the reference to the shared pointers. And, here is include for the Neural Layer class. In order to create a shared pointer properly it is required to use the std::make_shared template function along with the arguments for the constructor as the arguments for the function. In this case we will create a two layer network, currently I'm working on creating a batch normalization layer which will combat the vanishing gradient problem with larger network.

// Create Layers
std::shared_ptr<Neural_Layer> layer1 = std::make_shared<Neural_Layer>(nn, inp, Logistic_Function);
std::shared_ptr<Neural_Layer> layer2 = std::make_shared<Neural_Layer>(out, nn, layer1);

In the first constructor, the argument that it requires are the number of neurons to use, the number of inputs that we will give each neuron, the third argument is optional as the default is the linear function but we will use the logistic function as it's activation function. The declaration for Logistic_Function is in the Activation_Functions.h header. Let's now import the data.

// Import Data
std::string path = argv[1];
CSV_Importer imp(path, inp, out);
std::vector<Evector> samples = imp.GetSamples();
std::vector<Evector> targets = imp.GetTargets();

CppNNet, as a backend, uses the Eigen3 library and I have typedef'ed the Eigen::VectorXf to Evector and Eigen::MatrixXf to Ematrix, reference. Each sample will have it's own eigen vector in the standard library's vector, resulting in std::vector<Evector>. The optimize and remove the possibility for floating point overload due to gradient vanishing because of the large amount of samples, we will normalize it to within the output of the logistic function, between 0 and 1.

// Normalize data
Normalizer samplen(samples, 0, 1);
std::vector<Evector> normed_samples = samplen.get_batch_norm(samples);

Unfortunately, to train the function the derivative of the activation function is needed and I have not yet programmed an intuitive way of detecting of a function from the Activation_Function.h header is used and automatically use it's derivative, but for now we have to let the training class which derivatives functions to use in a standard library vector. I have typedef'ed the std::function as function.

// Create derivative function vector
std::vector<function> derv_funs;
derv_funs.push_back(Logistic_Function_D);
derv_funs.push_back(Identity_Function_D);

Now, we declare the training class

// Create Trainer
Neural_Trainer trainer(layer2, derv_funs);

The trainer needs to know where the last layer is located and the derivative functions. The first is accomplished by us passing it the pointer to the last layer and the second by the vector of derivative functions. Let's Train! The best function to train the network is to use the function that trains it via mini-batches. as in to split up the samples and have the split up samples train the network individually; this allows for efficient implementation of multi-threading by using OpenMP. I trained in 100,000 sample batches out of 11,000,000 total samples.

std::cout << "Starting to Train\n";
auto start = std::chrono::steady_clock::now();
for (int i = 0, sizei = 100; i < sizei; i++) {
  trainer.train_minibatch(normed_samples, targets, 100000);
  std::cout << "Epoch: " << i << std::endl;
}
auto end = std::chrono::steady_clock::now();

As shown above, I trained for 100 Epoch, or over the entire dataset 100 times. the lines that use the chrono library are to measure how much time the training took. Finally, we need some kind of indicator as to the performance. I have included in the Layer_Class multiple performance metrics, but I used the Mean Squared Error for simplicity.

float mse = layer2->mse();
std::cout << "MSE: " << mse << std::endl;
return 0;

The Mean Squared Error when trained was $0.249682$
output:

MSE:  0.249682

SOURCE CODE
LIBRARY

We now have a pretty good classifier for data from the LHC as to whether it contains the higgs boson.

Citation

Baldi, P., P. Sadowski, and D. Whiteson. “Searching for Exotic Particles in High-energy Physics with Deep Learning.” Nature Communications 5 (July 2, 2014). (link, link)

Further Reading