Fine-tuning (deep learning)

In deep learning, fine-tuning is an approach to transfer learning in which the parameters of a pre-trained neural network model are trained on new data.^[1] Fine-tuning can be done on the entire neural network, or on only a subset of its layers, in which case the layers that are not being fine-tuned are "frozen" (i.e., not changed during backpropagation).^[2] A model may also be augmented with "adapters" that consist of far fewer parameters than the original model, and fine-tuned in a parameter-efficient way by tuning the weights of the adapters and leaving the rest of the model's weights frozen.^[3]

For some architectures, such as convolutional neural networks, it is common to keep the earlier layers (those closest to the input layer) frozen, as they capture lower-level features, while later layers often discern high-level features that can be more related to the task that the model is trained on.^[2]^[4]

Models that are pre-trained on large, general corpora are usually fine-tuned by reusing their parameters as a starting point and adding a task-specific layer trained from scratch.^[5] Fine-tuning the full model is also common and often yields better results, but is more computationally expensive.^[6]

Fine-tuning is typically accomplished via supervised learning, but there are also techniques to fine-tune a model using weak supervision.^[7] Fine-tuning can be combined with a reinforcement learning from human feedback-based objective to produce language models such as ChatGPT (a fine-tuned version of GPT models) and Sparrow.^[8]^[9]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Fine-tuning (deep learning)

Robustness

Variants

Low-rank adaptation

Representation fine-tuning

Applications

Natural language processing

Commercial models

See also

References

Wikiwand - on