Consider a simple neural network architecture It has three layers – layer 0: input layer, layer 1: hidden layer, and layer 2: output layer. Last layer is denoted as \(L\). Layer \(0\) has three input neurons. The second layer has 4 neurons. and the last layer has two neurons corresponding to the two output levels, in our case probability of two events, giving the hierarchical function specification of the form:
\[
f(x;w)=\sigma ^{2}\left( z^{2}\left( \sigma ^{1}\left( z^{1}(x,w^{1})\right)
,w^{2} \right) \right) \equiv f_{w^{2}}^{2}\circ f_{w^{1}}^{1}(x).
\]
Function \(z^{i}(a^{i},w^{i})=w^{i}\cdot a^{i-1}\) at each layer \(i\) is a linear aggregator. The function \(\sigma ^i\) is a squashing function of the same dimension as \(z^i\), known as activation function.
A neural network is called deep neural network model or deep learning model because it can have more than one hidden layer to get better approximation compared to the original neural network models with only one hidden layer.