Parameters in an LLM - let's shed some light on the dark matter

Apr 20, 2024

Dear All,

a little more technical today. On Wednesday, I was a guest at Top Young 100 - a very valuable initiative that offers students from the fields of logistics and IT the opportunity to get to know companies by actively contributing to selected projects.

The leading topic of the event was "AI" and how it will change different aspects of professional life. During the discussions, I was asked an excellent question: what are "parameters" within the LLMs?

And I realised that my knowledge in this area is pretty general. So I immersed myself in many articles and YT films. Step by step, the picture became clearer.

In the following I would like to share with you my understanding of "LLM parameters", with the hope that this is coherent.

There are 2 types of parameters: Hyperparameters and "normal" Parameters called also “weights”
Hyperparameters are defined by the engineers and specify the framework for how the model should learn, e.g. how many layers of neural networks are provided, how large the training data should be, what is the temperature, top-k and top-p
Parameters / Weights are simply connections between the neurons. They always have a certain value, which can change during the learning process and influence the value of the neurons.
Another important parameter is Bias. It is about further optimization of the calculated output of the neuron during the learning process in order to increase the accuracy of the results.

Simple structural overview of a neural network

The learning process can look something like this:

Forward Pass: The model attempts to make a prediction or generate text based on the current values of its parameters. For example, given the beginning of a sentence, it tries to predict the next word.
Loss Calculation: After making a prediction, the model calculates the error (loss) between its prediction and the actual correct output (from the training data). This error is a measure of how wrong the model’s predictions are.
Backward Pass (Backpropagation): This is where learning happens. The error is propagated back through the network, and the model calculates how each weight (parameter) contributed to the error. Using calculations, it computes gradients for each weight. A gradient is a value that tells the model in which direction and how much to change a weight to reduce the error.
Update Parameters: The weights are then updated using the gradients. This is typically done using an optimization algorithm like Gradient Descent. The basic idea is to slightly adjust each weight (parameter) in the opposite direction of its gradient to minimize the loss.

Conclusion: The LLM itself adjusts the parameters during the learning process. It's like a self-learning system. The model compares its predictions with the actual text and uses complex algorithms to automatically adjust the parameters in a way that reduces the error in its predictions.

At the very beginning, before any data is processed, the parameters are typically initialized with random values. This is like starting with a blank slate. The initial values don't have any specific meaning; they are simply a starting point for the learning process.

CTA: Please let me know if I have twisted anything or not presented it correctly.

Of course I do not want to leave you alone with my explanations. That's why I'm including some links to very informative YT films on the subject.

Lukasz Ostrowski | NextProcure

Discussion about this post