Open Source LLMs You Should Try Right Now
14 June, 2023
5
5
0
Contributors
Since the release of revolutionary technologies like GPT-4 and ChatGPT, Large Language Models or LLMs have been the talk of the town. LLMs are everywhere these days. Every large corporation strives to become a giant in the newly emerged LLM space. OpenAI pioneered the race with ChatGPT and OpenAI, then Microsoft entered with a wild card entry, and now Google released Bard.
However, big tech corporations develop and release some open-source alternatives to LLMs. In this blog, let's explore some of the best open-source LLMs you can use now.
LLaMA
Meta announced LLaMA at the end of February 2023. Unlike its counterparts, OpenAI's ChatGPT and Microsoft's Bing, LLaMA is not accessible to the public. Still, Meta made it available as an open-source package that the AI community could request access to.
Unlike OpenAI and Googe, rather than giving a chat window for users to talk, Meta open-sourced its codebase for everyone to understand how LLMs are built.
Blog
Paper
BLOOM
The BLOOM model has been proposed with its various versions through the BigScience Workshop. BigScience is inspired by other open science initiatives where researchers have pooled their time and resources to achieve a higher impact collectively. The architecture of BLOOM is similar to GPT3 (auto-regressive model for next token prediction) but has been trained in 46 different languages and 13 programming languages. Several smaller versions of the models have been trained on the same dataset.
Hugging Face
BERT
BERT stands for Bidirectional Encoder Representations from Transformers and is a language representation model by Google. It uses two steps, pre-training and fine-tuning, to create state-of-the-art models for various tasks.
There are many popular variants based on BERT's architecture. Here are some of them:
- ALBERT by Google
- Roberta by Facebook
- ERNIE by Baidu
- DistilBERT by Huggingface
Here are the important links about BERT:
Research paper
GitHub repository
T5
T5 (Text-to-Text Transfer Transformer) is Google's family of general-purpose Large Language Models (LLMse. It's helpful in many tasks like summarization, classification, and translation and comes in several sizes, from "small" (~60M parameters) to quite large (~11B parameters). These sizes are increasingly powerful but also increasingly expensive to wield.
GitHub repository
Research paper
T5X
T5X is a modular, composable, research-friendly framework for high-performance, configurable, self-service training, evaluation, and inference of sequence models (starting with language) at many scales.
It is a new and improved implementation of the T5 codebase (based on Mesh TensorFlow) in JAX and Flax. To learn more, see the T5X Paper.
GitHub repository
Documentation
PEGASUS
PEGASUS is a state-of-the-art model for abstractive text summarization developed by Google. It is a Transformer encoder-decoder model pre-trained with a self-supervised objective called gap-sentence generation to improve finetuning performance on abstractive summarization.
Announcement blog
GitHub blog
FinGPT: Open-Source Financial LLMs
FinNLP provides a playground for all people interested in LLMs and NLP in Finance. It democratizes financial data, offering an accessible alternative to proprietary models.
It provides full pipelines for LLM training and finetuning in finance. The full architecture is shown in the following picture. It utilizes RLHF for personalized financial language modeling. Also, it employs a lightweight low-rank adaptation technique, allowing easy and cost-effective finetuning.
Research paper
GitHub repository
StableLM
StableLM is the first LLM launched by StabilityAI in April 2023. The Alpha model is available in 3 billion and 7 billion parameters, with 15 billion to 65 billion parameter models to follow. Developers can freely inspect, use, and adapt our StableLM base models for commercial or research purposes, subject to the terms of the CC BY-SA-4.0 license.
StabilityAI is the same company that released Stable Diffusion in 2022. The company commits to being more transparent, accessible, and supportive in AI fields.
Blog
GitHub Repository
Falcon-40B
Falcon-40B is a 40B parameters causal decoder-only model built by TII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license.
Falcon-40B claims to be the best open-source model available today. It even outperforms LLaMa, StableLM, RedPajama, MPT, and other open-source LLMs.
Hugging Face
Note: This is a raw, pre-trained model, which should be further finetuned for most use cases. Hence Falcon-7B itself recommends using Falcon-40B-Instruct if you need an LLM that takes generic instructions in chat format.
Falcon-7B
This could be said as the younger brother of Falcon-40B. It is made available under the Apache 2.0 license. It is a 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora.
Falcon-7B claims to outperform comparable open-source LLMs like MPT-7B, StableLM, RedPajama, etc.
Note: This version is a raw, pre-trained model which must be finetuned for most use-cases. Hence Falcon-7B itself recommends using Falcon-7B-instruct if you need an LLM that takes generic instructions in chat format.
Orca
Orca is a 13-billion parameter model that learns to imitate the reasoning process of models like GPT-4. It outperforms Vicuna, a previously leading open-source model, on many benchmarks and sometimes matches or exceeds the capabilities of GPT-3.5.
It achieves this by imitating the logic and explanations of larger models and training on diverse tasks with an order of magnitude more examples.
Research paper
Alpaca.cpp
Alpaca.cpp combines the LLaMa foundation model with an open reproduction of Stanford Alpaca, a finetuning of the base model to obey instructions, akin to the RLHF used to train ChatGPT and a set of modifications to llama.cpp to add a chat interface.
The most important thing is you can run this anywhere. You can run it on a Linux, Windows, or Macandt even on an Android phone!
GitHub Link
Frequently Asked Questions
What are LLMs?
LLMs stand for Large Language Models. They are natural language processing computer programs that use artificial neural networks to generate text. They are trained on massive amounts of text data to learn patterns and entity relationships in the language.
What are the most notable LLMs?
There are a lot of proprietary and open-source LLMs available to use today. Some notable ones are GPT-4 (OpenAI), LaMDA (Bard), BLOOM (Huggingface), and LLaMA (Meta). LLMs power many applications, such as AI chatbots and AI search engines.
Why consider open-source LLMs?
Open-source LLMs are those whose codebase is made available to the public to use, modify, and redistribute. There are several reasons for choosing an open-source LLM over proprietary ones.
- Code transparency and increased trust
- Cost efficient
- Community support
opensource
llm
t5
bert