## Introduction

Backpropagation (or backward propagation) is a computed aided mathematical method for optimizing predictions in the sphere of data mining and machine learning. It's a speedy algorithm for calculating mathematical derivatives with great accuracy. The gradient descent in relation to the weights within artificial neural networks is computed using machine learning backpropagation technique during the learning phase. The method involves comparing the predicted outputs with the real system outputs, then cautiously tuning and adjusting the connection weights to minimize any discrepancies. The labelling of the backpropagation learning algorithm comes from its unique approach of altering the weights backwardly, from output towards input.

Neural network applications, in the grand scale of things, didn't pick up much speed until the early 2000s due to the sheer complexity of understanding how changes in weights and biases influenced the total behaviour of an artificial neural network. But with the advent of advanced computers, this changed and artificial intelligence (AI) applications like image processing, optical character recognition, and natural language processing started leveraging neural networks.

When working out the gradient of loss function, backpropagation necessitates each input value to be connected with an identified, desired result. Thus, this methodology is an instance of unsupervised machine learning practice. Over time, backpropagation has transformed into a fundamental element of machine learning applications employing predictive analytics, complementing classifiers like decision trees and Naive Bayesian filters.

## Backpropagation Through Time

Backpropagation operates by encapsulating the entire neural network as a function within a function. If the neural network can't be minimized to a single expression of composite functions or can't be interpreted as a directed acyclic graph, this strategy is unimplementable.

The output of a node at a given temporal point is fed back into the network at the following temporal point in a recurrent neural network tasked with processing incoming time-series data. As a recurrent neural network has cycles, it can't be categorised as a directed acyclic graph.

Nonetheless, a feedforward neural network can be used to depict a recurrent neural network when unrolled. In this model, a single version of the original neural network symbolizes each timestep.

Backpropagation through time is accomplished by unrolling a recurrent neural network and portraying it as a feedforward network. Due to the replication involved in backpropagation through time, a very large feedforward neural network may be formed, with the algorithm risking entrapment in local optima. In addition, detecting interactions between inputs that are time-separated can be a challenge for the network as the gradient contributions from such interactions are quite small compared to the local effects. Solutions to the vanishing gradient issue might include selecting ReLU activation functions and integrating regularization into the network.

## Implementation of Backpropagation

Backpropagation is heavily applied in the training of practically any type of neural network, thereby fuelling the recent surge of deep learning. Here are a few applications of backpropagation:

- Face Recognition: As a standard deep learning methodology for image recognition and processing, convolutional neural networks employ the backpropagation algorithm for neural networks training. In 2015, Parkhi, Vidaldi, and Zisserman conducted an in-depth analysis of face recognition using an 18-layered convolutional neural network and a celebrity face database.
- Speech Recognition: Backpropagation finds several uses in the sphere of speech recognition. Sony Corporation in Japan has developed a speech recognition system in both English and Japanese that could be run on embedded devices, offering users a limited range of voice commands. A Fast Fourier Transform aids in dividing the incoming audio stream into time frames. Using the backpropagation algorithm, the researchers managed to train these layers on Japanese commands using a softmax cross-entropy loss function. Later, the same network was trained on English voice commands, adjusting it to recognize orders in English.