cover-img

Open Source LLMs You Should Try Right Now

14 June, 2023

5

5

0

Since the release of revolutionary technologies like GPT-4 and ChatGPT, Large Language Models or LLMs have been the talk of the town. LLMs are everywhere these days. Every large corporation strives to become a giant in the newly emerged LLM space. OpenAI pioneered the race with ChatGPT and OpenAI, then Microsoft entered with a wild card entry, and now Google released Bard.
However, big tech corporations develop and release some open-source alternatives to LLMs. In this blog, let's explore some of the best open-source LLMs you can use now.

LLaMA

Meta announced LLaMA at the end of February 2023. Unlike its counterparts, OpenAI's ChatGPT and Microsoft's Bing, LLaMA is not accessible to the public. Still, Meta made it available as an open-source package that the AI community could request access to.

a black square with a blue logo on it

Unlike OpenAI and Googe, rather than giving a chat window for users to talk, Meta open-sourced its codebase for everyone to understand how LLMs are built.

Blog

https://ai.facebook.com/blog/large-language-model-llama-meta-ai/

Paper

https://arxiv.org/abs/2302.13971

BLOOM

The BLOOM model has been proposed with its various versions through the BigScience Workshop. BigScience is inspired by other open science initiatives where researchers have pooled their time and resources to achieve a higher impact collectively. The architecture of BLOOM is similar to GPT3 (auto-regressive model for next token prediction) but has been trained in 46 different languages and 13 programming languages. Several smaller versions of the models have been trained on the same dataset.

Hugging Face

https://huggingface.co/docs/transformers/model_doc/bloom

BERT

BERT stands for Bidirectional Encoder Representations from Transformers and is a language representation model by Google. It uses two steps, pre-training and fine-tuning, to create state-of-the-art models for various tasks.

There are many popular variants based on BERT's architecture. Here are some of them:

  • ALBERT by Google
  • Roberta by Facebook
  • ERNIE by Baidu
  • DistilBERT by Huggingface

Here are the important links about BERT:

Research paper

https://arxiv.org/abs/1810.04805

GitHub repository

https://github.com/google-research/bert

T5

T5 (Text-to-Text Transfer Transformer) is Google's family of general-purpose Large Language Models (LLMse. It's helpful in many tasks like summarization, classification, and translation and comes in several sizes, from "small" (~60M parameters) to quite large (~11B parameters). These sizes are increasingly powerful but also increasingly expensive to wield.

GitHub repository

https://github.com/google-research/text-to-text-transfer-transformer

Research paper

https://arxiv.org/abs/1910.10683

T5X

T5X is a modular, composable, research-friendly framework for high-performance, configurable, self-service training, evaluation, and inference of sequence models (starting with language) at many scales.

It is a new and improved implementation of the T5 codebase (based on Mesh TensorFlow) in JAX and Flax. To learn more, see the T5X Paper.

GitHub repository

https://github.com/google-research/t5x

Documentation

https://t5x.readthedocs.io/

PEGASUS

PEGASUS is a state-of-the-art model for abstractive text summarization developed by Google. It is a Transformer encoder-decoder model pre-trained with a self-supervised objective called gap-sentence generation to improve finetuning performance on abstractive summarization.

Announcement blog

https://ai.googleblog.com/2020/06/pegasus-state-of-art-model-for.html

GitHub blog

https://github.com/google-research/pegasus

FinGPT: Open-Source Financial LLMs

FinNLP provides a playground for all people interested in LLMs and NLP in Finance. It democratizes financial data, offering an accessible alternative to proprietary models.

It provides full pipelines for LLM training and finetuning in finance. The full architecture is shown in the following picture. It utilizes RLHF for personalized financial language modeling. Also, it employs a lightweight low-rank adaptation technique, allowing easy and cost-effective finetuning.

FinGPT.jpg

Research paper

https://arxiv.org/abs/2306.06031

GitHub repository

https://github.com/AI4Finance-Foundation/FinGPT

StableLM

StableLM is the first LLM launched by StabilityAI in April 2023. The Alpha model is available in 3 billion and 7 billion parameters, with 15 billion to 65 billion parameter models to follow. Developers can freely inspect, use, and adapt our StableLM base models for commercial or research purposes, subject to the terms of the CC BY-SA-4.0 license.

StabilityAI is the same company that released Stable Diffusion in 2022. The company commits to being more transparent, accessible, and supportive in AI fields.
example 2.png

Blog

https://stability.ai/blog/stability-ai-launches-the-first-of-its-stablelm-suite-of-language-models

GitHub Repository

https://github.com/stability-AI/stableLM/

Falcon-40B

Falcon-40B is a 40B parameters causal decoder-only model built by TII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license.

Falcon-40B claims to be the best open-source model available today. It even outperforms LLaMa, StableLM, RedPajama, MPT, and other open-source LLMs.

Hugging Face

https://huggingface.co/tiiuae/falcon-40b
Note: This is a raw, pre-trained model, which should be further finetuned for most use cases. Hence Falcon-7B itself recommends using Falcon-40B-Instruct if you need an LLM that takes generic instructions in chat format.

Falcon-7B

This could be said as the younger brother of Falcon-40B. It is made available under the Apache 2.0 license. It is a 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora.
Falcon-7B claims to outperform comparable open-source LLMs like MPT-7B, StableLM, RedPajama, etc.

Note: This version is a raw, pre-trained model which must be finetuned for most use-cases. Hence Falcon-7B itself recommends using Falcon-7B-instruct if you need an LLM that takes generic instructions in chat format.

Orca

Orca is a 13-billion parameter model that learns to imitate the reasoning process of models like GPT-4. It outperforms Vicuna, a previously leading open-source model, on many benchmarks and sometimes matches or exceeds the capabilities of GPT-3.5.

It achieves this by imitating the logic and explanations of larger models and training on diverse tasks with an order of magnitude more examples.


Research paper

https://arxiv.org/abs/2306.02707

Alpaca.cpp

Alpaca.cpp combines the LLaMa foundation model with an open reproduction of Stanford Alpaca, a finetuning of the base model to obey instructions, akin to the RLHF used to train ChatGPT and a set of modifications to llama.cpp to add a chat interface.

The most important thing is you can run this anywhere. You can run it on a Linux, Windows, or Macandt even on an Android phone!

GitHub Link

https://github.com/rupeshs/alpaca.cpp

Frequently Asked Questions

What are LLMs?

LLMs stand for Large Language Models. They are natural language processing computer programs that use artificial neural networks to generate text. They are trained on massive amounts of text data to learn patterns and entity relationships in the language.

What are the most notable LLMs?

There are a lot of proprietary and open-source LLMs available to use today. Some notable ones are GPT-4 (OpenAI), LaMDA (Bard), BLOOM (Huggingface), and LLaMA (Meta). LLMs power many applications, such as AI chatbots and AI search engines.

Why consider open-source LLMs?

Open-source LLMs are those whose codebase is made available to the public to use, modify, and redistribute. There are several reasons for choosing an open-source LLM over proprietary ones.

  • Code transparency and increased trust
  • Cost efficient
  • Community support

opensource

llm

t5

bert

5

5

0

opensource

llm

t5

bert

Kaushal Joshi
Learning👨‍💻 Documenting📝 Sharing🔗

More Articles

Showwcase is a professional tech network with over 0 users from over 150 countries. We assist tech professionals in showcasing their unique skills through dedicated profiles and connect them with top global companies for career opportunities.

© Copyright 2024. Showcase Creators Inc. All rights reserved.