GPT prompt engineering is the process of crafting prompts to guide the behaviour of GPT language models, such as Chat-GPT, GPT-3,… Well, if I have to conclude Backpropagation, the best option is to write pseudo code for the same. Now is the correct time to understand what is Backpropagation. But, some of you might be wondering why we need to train a Neural Network or what exactly is the meaning of training. In the drawing above, the circles represent neurons while the lines represent synapses.

- Backpropagation is a very important part of the field of neural networks because it makes it possible to train deep neural networks with many layers.
- The data points p are updated by taking the dot product between the current activations p and the weight matrix for the current layer, followed by passing the output through our sigmoid activation function (Line 146).
- To finish off the computation of the delta, we multiply it by passing the activation for the layer through our derivative of the sigmoid (Line 110).

For now, let’s see the error of our network based on an error function. The output layer is the last layer which returns the network’s predicted output. Like the input layer, there can only be a single output layer. If the objective of the network is to predict student scores in the next semester, then the output layer should return a score. The architecture in the next figure has a single neuron that returns the next semester’s predicted score. I really enjoyed the book and will have a full review up soon.

## Backpropagation

In this function, o is our predicted output, and y is our actual output. Now that we have the loss function, our goal is to get it as close as we can to 0. As we are training our network, all we are doing is minimizing the loss.

Once we’ve reached an equation that has the parameters (weights and biases), we’ve reached the end of the derivative chain. The next figure presents the chain of derivatives to follow to calculate the derivative of the error W.R.T the parameters. The output of the activation function from the output neuron reflects the predicted output of the sample. It’s obvious that there’s a difference between the desired and expected output.

Because this equation seems complex to calculate the derivative of the error W.R.T the parameters directly, it’s preferred to use the multivariate chain rule for simplicity. For simplicity, the values for all inputs, weights, and bias will be added to the network diagram. Also, we’ll discuss how to implement a backpropagation neural network in Python from scratch using NumPy, based on this GitHub project.

## Loss function

The Back propagation algorithm in neural network computes the gradient of the loss function for a single weight by the chain rule. It efficiently computes one layer at a time, unlike a native direct computation. It computes the gradient, but it does not define how the gradient is used. It is used to calculate the gradients of the error with respect to the weights and biases in the neural network, and gradient descent is used to update the weights and biases based on the gradients. You can have many hidden layers, which is where the term deep learning comes into play.

## Step – 3: Putting all the values together and calculating the updated weight value

Basically, we need to figure out whether we need to increase or decrease the weight value. Once we know that, we keep on updating the weight value in that direction until error backpropagation tutorial becomes minimum. You might reach a point, where if you further update the weight, the error will increase. At that time you need to stop, and that is your final weight value.

Once we have the net output, we add it to our list of activations (Line 84). On the far left of Figure 2, we present the feature vector (0, 1, 1) (and target output value 1 to the network). Here we can see that 0, 1, and 1 have been assigned to the three input nodes in the network.

## How Forward Propagation Works

There is no shortage of papers online that attempt to explain how backpropagation works, but few that include an example with actual numbers. This post is my attempt to explain how it works with a concrete example that folks can compare their own calculations to in order to ensure they understand backpropagation correctly. How does it work, how is it calculated, and where is it used?

Just like in the human nervous system, we have biological neurons in the same way in neural networks we have artificial neurons, artificial neurons are mathematical functions derived from biological neurons. The human brain is estimated to have about 10 billion neurons, each connected to an average of 10,000 other neurons. Each neuron receives a signal through a synapse, which controls the effect of the signconcerning on the neuron.

Given an input from the design matrix, our goal is to correctly predict the target output value. Based on C’s value, the model “knows” how much to adjust its parameters in order to get closer to the expected output y. The output y is part of the training dataset (x, y) where x is the input (as we saw in the previous section).

The pace of the training process depends on the method you choose. Going with a stochastic gradient descent speeds up the training, but the actual fine-tuning of the backpropagation algorithm can be tedious. On the other hand, batch gradient descent is easier to perform, but the overall learning process takes longer. For these reasons, the stochastic approach is preferred, but it’s important to pick a training method that best fits your circumstances. The calculate_loss function requires that we pass in the data points X along with their ground-truth labels, targets.

## Backpropagation:

Backpropagation can be considered the cornerstone of modern neural networks and deep learning. In other words, backpropagation aims to minimize the cost function by adjusting network’s weights and biases. The level of adjustment is determined by the gradients of the cost function with respect to those parameters. The Backpropagation algorithm looks for the minimum value of the error function in weight space using a technique called the delta rule or gradient descent. The weights that minimize the error function is then considered to be a solution to the learning problem. Our neural network will model a single hidden layer with three inputs and one output.

In this case, we will be using a partial derivative to allow us to take into account another variable. As you may have noticed, we need to train our network to calculate more accurate results. To find the class label with the largest probability for each data point, we use the argmax function on Line 35 — this function will return the index of the label with the highest predicted probability. We then display a nicely formatted classification report to our screen on Line 36. We’ll also encode our class label integers as vectors, a process called one-hot encoding.

One important operation used in the backward pass is to calculate derivatives. Before getting into the calculations of derivatives in the backward pass, we can start with a simple example to make things easier. For example, the backpropagation algorithm could tell us useful information, like that increasing the current value of W1 by 1.0 increases the network error by 0.07. This shows us that a smaller value for W1 is better to minimize the error. To practically feel the importance of the backpropagation algorithm, let’s try to update the parameters directly without using this algorithm.