Choosing the right LLM

Choosing the right model for your specific use case may prove to be a very complex endeavor. The list of LLMs has grown significantly over the last few months and it just keeps getting bigger. Choosing the right model is not only about the quality of the results, other criteria need to be taken into account if the final solution would prove to make business sense: inference cost, latency/performance of the model, hosting infrastructure, training data, data privacy, and licensing, are just examples of criteria that need to be considered when choosing what the right model.

Hyperscalers may be a good place to start, you’ll only be dealing with a curated set of models, and broadly speaking the heavy lifting of matching them to specific tasks has already been taken care of, and you’ll have something up and running quicker. Major cloud vendors such as Azure, AWS, and GCP already abstract the complexity of provisioning and running models and generally already provide capabilities to deal with sensitive data.

Transformer Architectures

By now it is widely known that the proposed Transformer Architecture (from the paper Attention is All You Need published in 2017) motivated the rise of Generative AI. Since 2017, transformers have evolved, and understanding the different flavors is the cornerstone to better match models to the proper tasks they perform.

Essentially, there are 3 types of transformer architectures:

  1. Encoder-only (Auto-Encoding): Used for tasks where the input needs to be understood, such as sentiment analysis, named entity recognition, and word classification. BERT is an example of an encoder-only transformer.
  2. Decoder-only (Auto-Regressive): Used for text generation use cases, such as story-writing. The GPT family of models (GPT-1, GPT-2, GPT-3), BLOOM, and LLaMa are examples of a decoder-only transformer.
  3. Encoder-decoder (Sequence-to-Sequence): Used for tasks where text has to be generated from the input, such as summarization, question answering, and translation. FLAN-T5 and BART are examples of the encoder-decoder architecture.

As a rule of thumb, the transformer architecture provides an important hint on which use cases the resulting model is better suited for. Auto-regressive models perform well on conversational AI tasks, question answering, and text summarization, while auto-encoders excel at “understanding” and structuring language as in sentiment analysis tasks. 

Model Types

There are three main model types to choose from:

  1. Base Models: They represent the base output of the pre-training stage of LLMs, which happens when the model is trained using unsupervised learning on massive amounts of data and their function is tied to what transformers do best: predicting the next word in the sequence for a given context.
  2. Instruction-Tuned Models: A second stage during the training process includes fine-tuning the model to perform better at specific tasks. Pre-trained (or based models) are generally fine-tuned with input-output pairs of data that include instructions and attempts to follow those instructions.
  3. RL-Tuned Models: A third stage often employed to refine the model further is Reinforcement Learning, with the aim of making the model better at being helpful, honest, and harmless.

Instruction-tuned, and specially RLHF (Reinforcement Learning from Human Feedback) models are less likely to generate problematic text and are more suitable for practical applications.

As an example, when a base model is prompted with the text “What is the capital of France?” the result might be “What is the capital of Germany?”; but an RLHF model’s output may be “Paris” or even “The capital of France is Paris” or as ChatGPT would reply “As of my last update in September 2021, the capital of France is Paris.”.

Where to find models?

Models are offered by different companies using different channels. Two main places include managed services through hyperscalers’ Generative AI solutions (Google Vertex AI / Model Garden, AWS Jumpstart / SageMager, and Azure OpenAI Service / Model Catalog), or places like HuggingFace and its Open LLM Leaderboard.

In Google’s Model Garden, models are classified as Foundation models (pre-trained), Fine-tunable models, and Task-specific models. Each model provides detailed descriptions of the use cases the model is better suited for.

Screenshot 2023-08-03 at 3.24.52 PM

Figure 1. Google's Model Garden - Foundation Models examples.

HuggingFace Leaderboard provides a bit more details on the models. There is a type column where you can check whether the model is pre-trained, fine-tuned, or fined-tuned with RL.

Screenshot 2023-08-03 at 3.27.39 PM

Figure 2. HuggingFace Open LLM Leaderboard.

HuggingFace also provides additional details on the data models were trained on, ethical considerations and limitations, and licensing

Key Takeaways

Selecting a model must not be a daunting task; knowing the fundamental origin and key characteristics of models can greatly benefit the process of leveraging LLMs for specific tasks without the frustration of not getting the desired results.

Of course, there are plenty of other factors to consider, such as latency, speed of inference, cost of inference, and hallucinations, just to name a few and excluding legal aspects. But the quality of results generally comes first when deciding whether an LLM-based solution will yield the expected benefits.

About Encora

Fast-growing tech companies partner with Encora to outsource product development and drive growth. Contact us to learn more about our software engineering capabilities.

Author Bio

Rodrigo Vargas is a leader of Encora’s Data Science & Engineering Technology Practice. In addition, he oversees project delivery for the Central & South America division, guaranteeing customer satisfaction through best-in-class services.

Rodrigo has 20 years of experience in the IT industry, going from a passionate Software Engineer to an even more passionate, insightful, and disruptive leader in areas that span different technology specializations that includes Enterprise Software Architecture, Digital Transformation, Systems Modernization, Product Development, IoT, AI/ML, and Data Science, among others.

Passionate about data, human behavior and how technology can help make our lives better, Rodrigo’s focus is on finding innovative, disruptive ways to better understand how data democratization can deliver best-in-class services that drive better decision making. Rodrigo has a BS degree in Computer Science from Tecnológico de Costa Rica and a MSc in Control Engineering from Tecnológico de Monterrey.

Share this post

Table of Contents