A lot of times, Neural Networks are talked about in a purely conceptual way, leaving lea way for someone, who is trying to understand it's mechanics, room for misunderstandings.

# Intro

Here, I'm gonna try to give a mathematical description of, specifically, Feedforward Neural Networks. The basic anatomy of neural networks consists of neurons and the connections between them. I can, at this point, explain an analogy to the human brain with neurons, but I am not going to. I'll start with an explanation of the simplest possible Feedforward Neural Network (FNN). I used the book Neural Network Design: 2nd Edition by Martin T. Hagen, et al. as a reference.

What Neural Networks (NNs) are a set of Non-linear equations whose inputs are data, and output is that which is desired. These sets of Non-linear equations can be denoted in terms of vector notation $$\vec{a}=\vec{h}(\vec{x})$$ Where $\vec{a}$ is the output vector, $\vec{h}$ is a nonlinear vector function, and $\vec{x}$ is the input vector which contains the input data.

## Notation

Here I am denoting a column vector of arbitrary size with an arrow over head, for example, $$

\vec{a} = \begin{bmatrix}

a_1 \\

a_2 \\

\vdots \\

a_n

\end{bmatrix}

$$ Where n is the size of the vector. It's transpose would be a row vector $$

\vec{a}^T = \begin{bmatrix} a_1 & a_2 & \dots & a_n \end{bmatrix}

$$

$R^n$ denotes real n-space which can used to describe a vector. For instance, $\vec{a}\in R^n$, meaning that the vector $\vec{a}$ is a vector comprising of $n$ real numbers.

For matrices, the notation is $R^{n\times m}$, denoting a matrix of size $n\times m$. A matrix I will denote with a line over the letter, for instance, $\bar{a}\in R^{n\times m}$.

### Inner product

The inner product, in terms of vectors, results in a scalar and is defined by $$<\vec{a},\vec{b}>:=\vec{a}^T\vec{b}=\sum_{i=1}^nx_iy_i\big|\vec{a},\vec{b}\in R^n$$ To simplify notation, I will leave it to be assumed that when two vectors are shown to be multiplied the inner product is used. $\vec{a}\vec{b}:=<\vec{a}, \vec{b}>$

### Matrix-Vector Multiplication

The Matrix-Vector Multiplication can be derived by simplifying the matrix multiplication defintion. So, for a matrix $\bar{a}\in R^{n\times m}$ and vector $\vec{b}\in R^m$

\begin{align}

\bar{a}\vec{b}&=\begin{bmatrix}

a_{1,1} & a_{1,2} & \dots & a_{1,m} \\

a_{2,1} & a_{2,2} & \dots & a_{2,m} \\

\vdots & \vdots & \ddots & \vdots \\

a_{n,1} & a_{n,2} & \dots & a_{n,m}

\end{bmatrix}\begin{bmatrix}

b_1 \\

b_2 \\

\vdots \\

b_m

\end{bmatrix} \\&

=\begin{bmatrix}

\sum_{i=1}^m a_{1,i}b_i \\

\sum_{i=1}^m a_{2,i}b_i \\

\vdots \\

\sum_{i=1}^m a_{n,i}b_i

\end{bmatrix}

\end{align}

It can also display it in it's component notation. $$(ab)_ {i} = \sum_{j=1}^m a_{i,j}b_j\bigg|i\in \mathbb{Z}\cap[1,n]$$

# Single Neuron Single Input

To simplify the introduction, I think it's best to start with the simplest possible FNN, an FNN with a single scalar input and single scalar output. In this case, the $h$ function is $$a(x)=f(wx+b)|_{w,b}$$ Where $w$ and $b$ are weight constants and $f$ is a transfer function can be a linear or nonlinear function whose purpose is to turn input information into output info. For example, turning input data into 0s and 1s using the piecewise functions. As you can see we are basically passing the line function throught an activation function. For more information on Activation Functions click this link.

# Single Neuron Multiple Input

The next simplest FNN is that which $n$ number of inputs and a single output. In this case, the function is $$a(\vec{x})=f(<\vec{w},\vec{x}>+b)$$ Where $<\vec{w},\vec{x}>$ is the inner product and is equal to $$<\vec{w},\vec{x}>=\Bigg<\begin{bmatrix}

w_1 \\

w_2 \\

\vdots \\

w_n

\end{bmatrix},\begin{bmatrix}

x_1 \\

x_2 \\

\vdots \\

x_n

\end{bmatrix}\Bigg>:=\vec{w}^T\vec{x}=\sum_{i=1}^n w_ix_i$$ So $$a(\vec{x})=f\bigg(\sum_{i=1}^n w_ix_i + b\bigg)$$ In the image above $p$ is used instead of $x$.

# The Neuron Layer

In a layer of neurons, we can have multiple inputs with multiple neurons resulting in multiple outputs. The weight vector, $\vec{w}$, turns into a weights matrix, $\bar{w}$, whose size is determined by both the number of inputs and the number of neurons. The $b$ scalar turns into the $\vec{b}$ vector. We can select different activation functions for different neurons, and The output of the layer will be the number of neurons in the layer. So, $$ \vec{a}(\vec{x})=\vec{f}(\bar{w}\vec{x}+\vec{b}) $$ Let's say that there are $N$ number of neurons and $M$ number of inputs. Well, $\vec{x}\in R^M$, $\vec{a},\vec{b}\in R^N$, and $\bar{w}\in R^{N\times M}$. And, if we use component notation $$a_i=f_i\bigg(\sum_{j=1}^M w_{i,j} x_j + b_i\bigg)\bigg|i\in \mathbb{Z}\cap[1,N]$$

# Neural Network

In a neural network, we have the input layer, output layer, and the hidden layers inbetween. Here is a link to a post showing more details.

Using the equation for the neural layer, we can build the FNNs. We build them by having the output of the first layer go as the input of the next layer for all layers. So, lets consider a FNN with 3 layers, the first contains $N_1$ neurons and has $M$ number of inputs, the second, the hidden layer, contains $N_2$ number of neurons, and, lastly, the 3rd layer, which is tied to the number of outputs, has $N_3$ neurons. Since we can have different activation layers between neurons in the same layer and between layers, I will denote the activation function $f$ and the other variables with a superscript denoting the layer in which it resides in and a subscript to show which neuron it belongs to, in component notation.

\begin{align}

&\text{first layer} \\

a _i^1&=f_i^1\bigg( \sum _{j=1}^{M} w _{i,j}^1 x _j + b _i^1\bigg)&\bigg|i\in \mathbb{Z}\cap [1,N _1] \\

&\text{second layer} \\

a _i^2&=f_i^2\bigg( \sum _{j=1}^{N_1} w _{i,j}^2 a _j^1 + b _i^2 \bigg) & \bigg | i \in \mathbb{Z} \cap [1,N _2] \\

&\text{thrid layer} \\

a _i^3&=f_i^3\bigg( \sum _{j=1}^{N_2} w _{i,j}^3 a _j^2 + b _i^3 \bigg) & \bigg | i \in \mathbb{Z} \cap [1,N _3]

\end{align} And, after setting $\vec{x}=\vec{a}^0$, $N_0=M$, and $P$ as the number layers, we put them together into a single recurrence relation. $$a_j^i=f_j^i\Bigg( \sum _{k=1}^{N _{i-1}} w _{j,k}^i a_k^{i-1} + b_j^i \Bigg) \Bigg|j\in \mathbb{Z}\cap [1,N_i] :i\in \mathbb{Z}\cap [1,P]$$