Look Up Another Term


Redirected from: AI gradient descent

Definition: AI backpropagation


The method used to train a large language model (LLM). An AI model's neural network learns by recognizing patterns in the data and constantly adjusting its neurons to predict what comes next. With regard to text models, words are turned into tokens that are mathematical representations and it is the next token that is predicted. For images, audio and video, the predictions are the next group of pixels, time slices or frames respectively. See AI token.

Step 1 - Forward and Compare
Using text and their token representations as the example, training starts with a forward pass through the network making predictions. The difference between the predicted next token (next word) and the actual next token is computed, and this "error" is known as the "loss computation."

Step 2 - Backward and Adjust Weights
Using this error information, backpropagation goes back into the network and with the use of a "gradient descent" algorithm increases or decreases the neuron's weights and biases to generate a better result. A backpropagation pass generally follows a forward pass.

Prior to backpropagation, training was extremely slow, and backpropagation algorithms are considered one of the major breakthroughs in training AI models. AI programmers designing gradient descent algorithms require a strong calculus background. See AI weights and biases.

Quadrillions and Quintillions of Changes
The forward and backward passes are repeated over and over until the model achieves the desired outcome. Very large language models that have billions of tokens can perform the backpropagation step millions of times, which means the total number of gradient changes to the neural network can be in the quadrillions (see space/time). See AI weights and biases.




Adjust the Weights and Biases
The backpropagation algorithms constantly adjust the mathematical values between all the neurons. In a large language model, there can be millions and billions of neurons. See neural network and AI weights and biases.