Podcast Transcript

Welcome to our third episode of EVO AI, titled ‘Large Language Models’. Join us today as we look at all major Large Language Models their capabilities so you can choose what works best for you or your company.

If you like our podcast and want to know more, please remember to subscribe on our website: evoai.ai and follow and subscribe to our Spotify, Youtube and Apple channels.

So, let’s start.

What are Large Language Models (LLMs):

LLMs, like OpenAI’s GPT series, are a type of deep learning model specifically designed to process and generate human-like text. They fall under the category of transformers, a model architecture introduced in a paper titled “Attention is All You Need” by Vaswani et al. in 2017.

The transformer models we’ve covered at length in some of the articles published on our website: evoai.ai

What are the Key characteristics of LLMs:

  1. Size: LLMs have an enormous number of parameters, often tens or hundreds of billions. The large size enables them to store a vast amount of information.
  2. Training Data: They are trained on vast amounts of text data sourced from the internet, libraries or elsewhere, enabling them to generate relevant and coherent responses in a wide variety of contexts.
  3. Capabilities: LLMs can answer questions, write essays, generate creative content, and more.

However, they generate responses based on patterns in the training data and don’t have a true understanding or consciousness.

  1. Generality: Unlike some models that are trained for a specific task, LLMs can perform a wide range of language tasks without task-specific training data.

Foundation Models:

The term “foundation models” was introduced by OpenAI to describe models that serve as foundational building blocks for a wide range of applications. These models, due to their vast size and training data, have generalized abilities that can be fine-tuned or adapted for specific tasks.

Characteristics of foundation models:

  1. Broad Utility: These models can be used as starting points for a wide variety of applications, ranging from natural language processing to image recognition and beyond.
  2. Fine-Tuning: Although foundation models are trained on vast datasets, they can be adapted to specific tasks or domains through fine-tuning on smaller, task-specific datasets.
  3. Societal Impact: Because of their broad utility, foundation models have the potential to impact numerous sectors and domains, leading to both positive applications and new challenges.
  4. Ethical Considerations: The use and deployment of foundation models bring forth numerous ethical, societal, and technological considerations, including fairness, transparency, and safety.

In essence, all LLMs could be considered as a type of foundation model when they are used as a basis for various downstream tasks. However, the term “foundation models” isn’t limited to just language models; it can also refer to large-scale models in other domains like computer vision or any other field where a foundational, pre-trained model can serve multiple purposes.

Introduction to Leading Language Models

Segment 1: Falcon LLM by Technology Innovation Institute (TII)

Falcon LLM outperforms GPT-3 at a fraction of the cost and matches the performance of similarly sized LLMs from DeepMind (Chinchilla), Google (PaLM-62B), and Anthropic. It is also more efficient in terms of training compute power, using only 75 percent of the training compute of OpenAI’s GPT-3, 40 percent of DeepMind’s Chinchilla AI, and 80 percent of Google’s PaLM-62B.

Falcon LLM is open-source and available for anyone to use. It is being used by researchers and developers around the world for a variety of applications, including natural language processing, machine translation, and text generation.

The development of Falcon LLM is a significant achievement for the UAE and the global AI community. It demonstrates the UAE’s commitment to technological progress and its ability to compete with the world’s leading AI research centres. Falcon LLM is a powerful tool that has the potential to revolutionize a wide range of industries.

It’s notable for its efficiency, requiring only 75% of the energy needed for GPT-3 and outperforming it in terms of speed. Its unique system filters high-quality information from the internet, ensuring that it learns from the best available data. Falcon’s design emphasises performance and energy conservation, making it a groundbreaking development in the field of AI.

Segment 2: LLaMA by Meta AI

LlaMA (Large Language Model Meta AI) is a family of large language models (LLMs) released by Meta AI starting in February 2023. For the first version of LLaMa, four model sizes were trained: 7, 13, 33 and 65 billion parameters. LLaMA’s developers reported that the 13B parameter model’s performance on most NLP benchmarks exceeded that of the much larger GPT-3 (with 175B parameters) and that the largest model was competitive with state-of-the-art models such as PaLM and Chinchilla.

As a parenthesis here: Natural Language Processing is a very vast field of research. It consists of many tasks like Machine translation, Question Answering, Text Summarization, Image captioning, Sentiment Analysis, etc.

Llama 2 was released in July 2023. It has a larger model size of up to 70 billion parameters and has been trained on a dataset of 2 trillion tokens. Llama 2 outperforms other open-source language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests.

Llama is available for free for research and commercial use. With both LLaMA and Llama 2, Meta AI allows everyone to explore and utilize these advanced models.

Segment 3: Claude by Anthropic PBC

Claude is a large language model (LLM) developed by Anthropic PBC, an AI safety and research company. Claude is a chatbot equipped with 52 billion parameters. Its second version, Claude 2, was built with a focus on ethical considerations and safety. Claude is still under development, but Anthropic is working to make it more reliable, interpretable, and steerable. The company is also working to ensure that Claude is used for good and not for harm. The team behind Claude prioritizes responsible AI, integrating principles from international guidelines to ensure that Claude operates within defined boundaries. This responsible approach reflects a thoughtful shift in AI development.

Claude is currently in beta and is only available to a limited number of users. However, Anthropic plans to make it more widely available in the future.

Segment 4: ERNIE 3.0 TITAN by Baidu

The only Chinese LLM: ERNIE 3.0 TITAN by Baidu is an impressive language model with 260 billion parameters. It’s designed to translate and analyze text in English and Chinese and uses an online distillation framework to train smaller models simultaneously. ERNIE’s unique methodology and extensive data library enable it to excel in various language tasks, marking a significant advancement in multilingual AI.

ERNIE 3.0 Titan is still under development, but it has the potential to be a powerful tool for a variety of applications

Segment 5: BLOOM’s Multilingual Capabilities

BLOOM stands out as a multilingual large language model with 176 billion parameters. It’s trained to generate text in 46 languages, a feat made possible by NVIDIA’s AI platform. While its primary strength lies in text generation, BLOOM’s ability to tackle various language tasks demonstrates its versatility and opens up numerous applications across languages.

Section 2: Exploration of Additional Cutting-Edge Language Models

Segment 6: Chinchilla by DeepMind

Introduced in March 2022 by DeepMind’s research team, Chinchilla represents a new generation of expansive language models. The name “Chinchilla” was chosen as this model is an evolution from their prior “Gopher” model family. The primary objective behind the development of both these models was to delve deeper into the scaling dynamics of large language models.

The Chinchilla model boasts performance metrics surpassing those of GPT-3. Insights from the training patterns of preceding models revealed that for every doubling of the model’s size, there’s a need to double the volume of training tokens. DeepMind capitalized on this principle while training Chinchilla. Even though it shares cost parallels with Gopher, Chinchilla has more than 70B parameters and leverages data volumes that are quadruple in comparison.

On the MMLU benchmark, which stands for Measuring Massive Multitask Language Understanding, Chinchilla recorded an impressive 67.5% accuracy, outshining Gopher by 7%.

However, as of January 12, 2023, Chinchilla remains in its experimental stages.

Definition:

CPUs (Central Processing Units), GPUs (Graphics Processing Units)

A TPU, or Tensor Processing Unit, is a type of application-specific integrated circuit (ASIC) developed by Google specifically for accelerating machine learning tasks. It’s optimized for TensorFlow, Google’s open-source machine learning framework, but can be used with other frameworks as well.

Here’s a breakdown of the TPU:

  • Purpose-built for Deep Learning: Traditional CPUs (Central Processing Units) and GPUs (Graphics Processing Units) can handle machine learning tasks, but TPUs are specifically designed for the high computational demands of large-scale machine learning applications.
  • Matrix Multiplication: One of the most computationally-intensive parts of training deep learning models is matrix multiplication. TPUs are designed to handle these operations at high speeds.
  • Reduced Precision Arithmetic: TPUs use reduced precision arithmetic (as compared to many CPUs and GPUs) to speed up computations. This means they use fewer bits to represent numbers. For many machine learning tasks, this reduced precision doesn’t significantly affect the model’s accuracy but offers computational advantages.
  • Memory: TPUs have a large amount of high-bandwidth memory, which reduces the need for data transfer which can slow down computations.
  • Integration with Google Cloud: TPUs are available on Google Cloud, enabling developers and researchers to use them for training and inference without needing to invest in physical hardware.

Google has released multiple versions of the TPU, each improving upon the previous in terms of computational capabilities and other features.

It’s worth noting that while TPUs are powerful for many machine learning tasks, whether they’re the best choice depends on the specific requirements of the project. For some applications, traditional GPUs or even CPUs might be more appropriate or cost-effective. However, for large-scale machine learning tasks, especially deep learning, TPUs offer significant acceleration and efficiency benefits.

Segment 7: Google’s PaLM

PaLM, or Pathways Language Model, is a brainchild of Google AI, boasting a massive 540 billion parameters. This model was developed using a novel approach called Pathways, which allows for efficient model training across multiple TPU v4 Pods. In the case of PaLM, it was trained on a staggering 6144 TPU v4 chips. PaLM’s size is not its only impressive feature; it also excels in tasks from common-sense reasoning to joke explanation, code generation, and translation. Specialized versions like Med-PaLM have outperformed previous models in medical question-answering. Google’s continued innovations with PaLM, including PaLM 2 and AudioPaLM, represent a technical marvel pushing AI’s boundaries.

Segment 8: OpenAI’s GPT-3

GPT-3, by OpenAI, is an autoregressive model launched in June 2020. With a staggering 175 billion parameters, it outshines its predecessor, GPT-2, which had 1.5 billion. GPT-3 introduced breakthroughs in zero-shot and few-shot learning capabilities, enabling it to undertake tasks like translation, question-answering, and more. It opened doors for various applications across industries. In September 2020, Microsoft obtained an exclusive license for GPT-3, but OpenAI still offers access through a public API. GPT-3 is renowned for its robust performance across a wide range of general natural language processing tasks, but it has its limitations in terms of fine control, creativity, and efficiency in certain specific domains.

Segment 9: OpenAI’s GPT-4

OpenAI’s GPT-4, the successor to GPT-3, builds upon its predecessor’s foundations with 140 billion parameters. While smaller in terms of parameter count, GPT-4 is architected to optimize performance. It takes generative text creation to new heights, offering unprecedented precision, creativity, and more nuanced control over text generation. GPT-4’s ability to generate content in multiple languages and subjects sets a new standard in large language models. Its improved training techniques and architecture have allowed it to outperform GPT-3 in several benchmarks, providing more coherent and context-aware responses. Its enhancements in fine-tuning and adaptability make it a significant step forward from GPT-3.

The key differences between GPT-4 and GPT-3 lie in their approach to generative capabilities, control over text generation, efficiency, and specific improvements in architecture and training techniques. GPT-4 represents a refined evolution, focusing on enhancing the quality of output and expanding its application range, despite having fewer parameters than GPT-3.

Segment 10: Google’s BERT (BERTBASE and BERTLARGE)

Introduced by Google’s AI team in 2018, BERT ((Bidirectional Encoder Representations from Transformers) has become vital in natural language processing experiments. BERTBASE, with its 12 encoders and 110 million parameters, is efficient and nimble. BERTLARGE, the larger variant, boasts 24 encoders and 340 million parameters. Trained on text data from the Toronto Book Corpus and English Wikipedia, BERT’s ability to understand and generate human-like text has revolutionized NLP.

In October 2019, Google announced that they were using BERT to better understand the context of words in search queries. This was one of the biggest leaps forward in the history of Search in terms of improving the understanding of the context around search queries.

BERT helps Google Search better understand the nuance and context of words in searches and better match those queries with relevant results. It’s particularly useful for understanding the intent of search queries that might have been previously difficult for Google’s algorithms to interpret correctly.

As of the announcement, BERT was used for English language queries in the U.S., but Google had plans to roll this out to more languages and locales over time.

Segment 11: Microsoft’s Turing-NLG

Microsoft’s Turing-NLG, with its 17 billion parameters, is a significant player despite its relatively smaller size. Its ability to generate cohesive text and perform in multiple applications proves that effective design can achieve excellence, even without an enormous parameter count

Let’s wrap up our episode for today

These remarkable language models—Falcon’s efficiency, LLaMA’s openness, Claude’s ethical considerations, ERNIE’s teaching capabilities, and BLOOM’s multilingual talents—represent the forefront of AI technology.

From the highly innovative Chinchilla to the ground-breaking BERT, the creativity driving AI’s advancement is astonishing. PaLM’s adaptability and technological marvel, GPT-3’s zero-shot learning, GPT-4’s generative capabilities, and Turing-NLG’s efficient design showcase the tremendous diversity in today’s AI landscape. All of these models are transforming our understanding of language processing and communication, bridging gaps, and fostering innovation. The future of AI is here, and it’s as exciting, promising and diverse as the models we’ve explored.

About The Author

Bogdan Iancu

Bogdan Iancu is a seasoned entrepreneur and strategic leader with over 25 years of experience in diverse industrial and commercial fields. His passion for AI, Machine Learning, and Generative AI is underpinned by a deep understanding of advanced calculus, enabling him to leverage these technologies to drive innovation and growth. As a Non-Executive Director, Bogdan brings a wealth of experience and a unique perspective to the boardroom, contributing to robust strategic decisions. With a proven track record of assisting clients worldwide, Bogdan is committed to harnessing the power of AI to transform businesses and create sustainable growth in the digital age.