GPT-2
2019 text-generating language model / From Wikipedia, the free encyclopedia
Dear Wikiwand AI, let's keep it short by simply answering these key questions:
Can you list the top facts and stats about GPT-2?
Summarize this article for a 10 year old
Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained on a dataset of 8 million web pages.[2] It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.[3][4][5][6][7]
Original author(s) | OpenAI |
---|---|
Initial release | 14 February 2019; 5 years ago (14 February 2019) |
Repository | https://github.com/openai/gpt-2 |
Predecessor | GPT-1 |
Successor | GPT-3 |
Type | |
License | MIT[1] |
Website | openai |
GPT-2 was created as a "direct scale-up" of GPT-1[8] with a ten-fold increase in both its parameter count and the size of its training dataset.[7] It is a general-purpose learner and its ability to perform the various tasks was a consequence of its general ability to accurately predict the next item in a sequence,[2][9] which enabled it to translate texts, answer questions about a topic from a text, summarize passages from a larger text,[9] and generate text output on a level sometimes indistinguishable from that of humans,[10] however it could become repetitive or nonsensical when generating long passages.[11] It was superseded by GPT-3 and GPT-4 models, which are not open source anymore.
GPT-2 has, like its predecessor GPT-1 and its successors GPT-3 and GPT-4, a generative pre-trained transformer architecture, implementing a deep neural network, specifically a transformer model,[8] which uses attention instead of older recurrence- and convolution-based architectures.[12][13] Attention mechanisms allow the model to selectively focus on segments of input text it predicts to be the most relevant.[14][15] This model allows for greatly increased parallelization, and outperforms previous benchmarks for RNN/CNN/LSTM-based models.[8]