Unleash the power of AI with this ultimate guide, and impress everyone from your crush to your future employer!
Let’s cut to the chase: by now, all of us have heard about ChatGPT, and if you are remotely curious or like technology and you’re on social media, you must have seen lots of AI buzzwords floating around. In this article, we will try to cut through all the AI clutter and bring you up to speed on all the latest in AI. Maybe your eventual aim is to impress the interviewer in the job interview, impress your crush, or sound informed at a party. Whatever your end goal may be, this quick article will help you. So let’s begin.
What does GPT in ChatGPT means?
GPT stands for “Generative Pre-trained Transformer”. It is a type of machine learning model that uses deep neural networks to generate natural language text. ChatGPT is a specific implementation of the GPT architecture that has been trained on a large corpus of text data to allow it to generate human-like responses to user inputs in a conversational setting.
In other words, GPT is like a computer that can learn how to talk like a person. It does this by reading a lot of books and stories, and then it uses what it learned to write new stories that sound like they were written by a person. It’s like having a robot that can be your friend and tell you stories that it made up all on its own.
Since GPT is a transformer (we will talk about Transformers in a bit), what are the other Transformer models?
BERT
Family: BERT
Application: General Language Understanding and Question Answering. Many other language applications followed
Date (of first known publication): 10/2018
Num. Params:Base = 110M, Large = 340MT
Lab:Google
BART
Family: BERT for encoder, GPT for Decoder
Application: Mostly text generation but also some text understanding tasks*
Date (of first known publication): 10/2019*
Num. Params:10 % more than BERT
Lab:Facebook
ChatGPT
Family: GPT
Application: Dialog agents
Date (of first known publication): 10/2022
Num. Params:Same as GPT3
Lab: OpenAI
GPT
Family: GPT
Application: Text generation, but adaptable to many other NLP tasks when fine-tuned.
Date (of first known publication): 06/2018
Num. Params:117M
Lab: OpenAI
GPT-2
Family: GPT
Application: Text generation, but adaptable to many other NLP tasks when fine-tuned.
Date (of first known publication): 02/2019
Num. Params:1.5B
Lab: OpenAI
GPT-3
Family: GPT
Application: Initially text generation, but has over time been used for a large range of applications in areas such as code generation, but also image and audio generation
Date (of first known publication): 05/2020
Num. Params:175 B
Lab: OpenAI
GPT-3.5
Family: GPT
Application: Dialog and general language, but there is a code-specific model too
Date (of first known publication): 10/2022
Num. Params:175B
Lab: OpenAI
LAMDA
Family: Transformer
Application: General language modeling
Date (of first known publication): 01/2022
Num. Params:137B
Lab: Google
Wu Dao 2.0
Family: GLM (General Language Model)
Application: Language and multimodal (particularly image)
Date (of first known publication): 06/2021
Num. Params: 1.75T
Lab: Beijing Academy of Artificial Intelligence
Turing-NLG
Family: GPT
Application: Same as GPT-2/3
Date (of first known publication): 02/2020
Num. Params: 17B originally, up to 530B more recently
Lab: Microsoft
StableDiffusion
Family: Diffusion
Application: Text to image
Date (of first known publication): 12/2021
Num. Params: 890M (although there are different, smaller, variants)
Lab: LMU Munich + Stability.ai + Eleuther.ai
T5
Family:
Application: General language tasks including machine translation, question answering, abstractive summarization, and text classification
Date (of first known publication): 10/2019
Num. Params: 11 B (up to)
Lab: Google
Trajectory Transformers
Family: GPT, Control Transformers” (not per se a family, but grouping here those transformers that try to model more general control, RL-like, tasks)
Num. Params: Smaller architecture than GPT
Application: General RL (reinforcement learning tasks)
Date (of first known publication): 06/2021
Lab: UC Berkeley
Sparrow
Family: GPT
Application: Dialog agents and general language generation applications like Q&A
Date (of first known publication): 09/2022
Num. Params: 70B
Lab: Deepmind
MT-NLG (Megatron TouringNLG)
Family: GPT
Application: Language generation and others (similar to GPT-3)
Date (of first known publication): 10/2021
Num. Params:530B
Lab: NVidia
Flamingo
Family: Chinchilla
Application: Text to image
Date (of first known publication): 04/2022
Num. Params:80B (largest)
Lab: Deepmind
DALL-E
Family: GPT
Application: Text to image
Date (of first known publication): 01/2021
Num. Params:12B
Lab: OpenAI
You can refer to the full list here.
Now that we have a fair idea about the universe beyond ChatGPT, let’s talk about transformer models and why they have been disruptive in bringing AI to the masses. Believe it or not, transformer models have their origins in both Google and OpenAI, but OpenAI is credited with taking the gamble to bring AI to the masses. It’s interesting to consider why Google hasn’t made similar efforts, which could be the topic of another blog post.
What are Transformers?
Transformers are a class of deep learning models that are defined by some architectural traits. They were first introduced in the now famous “Attention is All you Need” paper by Google researchers in 2017 (the paper has accumulated a whooping 38k citations in only 5 years). The Transformer architecture is a specific instance of the encoder-decoder models that had become popular just over the 2–3 years prior. Up until that point however, attention was just one of the mechanisms used by these models, which were mostly based on LSTM (Long Short Term Memory) and other RNN (Recurrent Neural Networks) variations. The key insight of the Transformers paper was that, as the title implies, attention could be used as the only mechanism to derive dependencies between input and output.
In other words, a transformer is a specific type of deep learning model that’s really good at understanding sequences of data, like sentences or paragraphs of text. In a transformer model, there are two main components: an encoder and a decoder. The encoder takes in a sequence of input data, like a sentence, and turns it into a set of numbers that represent the meaning of the sentence. The decoder then takes those numbers and uses them to generate a new sequence of data, like a translation of the original sentence into a different language.
Think of it like a person who is translating a book from one language to another. The encoder is like the person who reads the original book and understands the meaning of the words and sentences. The decoder is like the person who takes that understanding and uses it to write a new book in the other language that conveys the same meaning. The transformer model is a really powerful tool for understanding and generating language, and it’s being used in many applications today, like language translation and chatbots.
What are Transformers used for and why are they so popular?
Although the original transformer model was created to translate English to German, the architecture demonstrated remarkable versatility and proved to be effective for various other language-related tasks, as highlighted in the original research paper. This trend was soon noticed by the research community, and within a few months, transformer models dominated most of the leaderboards for language-related machine-learning tasks. An example of this is the SQUAD leaderboard for question answering, where all the top-performing models are ensembles of transformer models.
Transformers have become so successful in natural language processing because of their ability to easily adapt to new tasks through a process called transfer learning. Pretrained transformer models can quickly adjust to tasks they haven’t been specifically trained for, which is a big advantage. As a machine learning practitioner, you no longer need to train a large model on a vast amount of data. Instead, you can reuse a pre-trained model and fine-tune it for your task, often with just a few tweaks.
Transformers are extremely versatile in adapting to various tasks, and although they were originally created for language-related tasks, they can now be used for a wide range of applications. These applications range from vision or audio and music applications to playing chess or performing mathematical calculations. Transformers have enabled a wide range of applications thanks to various tools that have made them accessible to anyone who can write a few lines of code. The transformer models were quickly integrated into the main artificial intelligence frameworks, including PyTorch and TensorFlow. These frameworks made the transformers more accessible and even paved the way for an entire company, Hugging Face, to be built around them.
If you ever come to the stage where you have some sample data that you would like to build an AI model around, That is, you want to create an AI model, train it, validate it, and deploy it. Then, https://huggingface.co/ is your friend. The how-to is of course beyond the scope of this post.
How to access GPT models for any work?
- OpenAI’s GPT models repository: OpenAI has made the GPT models publicly available on their website along with pre-trained weights and code to finetune the models for specific tasks. Users can download the models and use them for research or commercial purposes, following the licensing terms set by OpenAI.Hugging Face Transformers library
- Hugging Face is a company that provides pre-trained models for natural language processing tasks, including GPT models. Their Transformers library provides a simple interface to load and use GPT models for various text generation tasks.
- OpenAI’s GPT-3 API: OpenAI provides an API (application programming interface) to access GPT-3, a more powerful version of GPT with over 175 billion parameters. The API requires a paid subscription and allows users to access the GPT-3 model for a variety of natural language processing tasks.
- Cloud-based AI platforms: Many cloud-based AI platforms such as Google Cloud Platform, Amazon Web Services, and Microsoft Azure provide pre-trained GPT models that can be accessed through their services.
So, few points before we wrap things up:
- GPT is a language model developed by OpenAI, an artificial intelligence research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc.
- GPT is a widely used language model in the field of natural language processing and is employed by many AI companies and organizations for various applications.
- OpenAI has made the GPT models publicly available, along with pre-trained weights and code to finetune the models for specific tasks. This means that anyone can use the GPT models for research or commercial purposes, provided that they follow the licensing terms set by OpenAI.
- In addition, OpenAI also offers access to a more powerful version of GPT, called GPT-3, through an API (application programming interface) that requires a paid subscription. The API allows users to access the GPT-3 model for a variety of natural language processing tasks, such as text generation, translation, and sentiment analysis.