Natural Language Processing
2019 was especially significant for NLP. Various research breakthroughs happened, in particular the introduction of GPT-2 model for text generation by OpenAI. This model achieved never seen below accuracy in text generation, causing serious thoughts about security. The code of GPT-2 wasn’t released all at once, OpenAI has opted for publishing a couple of models from weakest to strongest in 6 months span to ensure that it isn’t used for malicious purposes.
Other research groups followed suit with Megatron (NVIDIA), BERT, Hugging Face, Allen Institute, culminating in Turing-NLG from Microsoft, the largest model in mid-2020. They all demonstrated that pre-trained language models could well solve various NLP tasks. All those models used massive datasets and considerable computing power. They were trained on large amounts of unlabeled text from the web (e.g. scraping articles from Reddit), and their underlying architecture was based on Transformers...