What are Large Language Models (LLMs)?

Key takeaways:

Large language models (LLMs) combine attention mechanisms, neural network architecture, pattern recognition, and vast amounts of data to generate human-like responses. But they face limitations, such as data-upkeep challenges, potential for hallucinations, and data-leakage risks.
LLMs have multiple use cases, including marketing, customer support, education and training, ecommerce, and healthcare.
Retrieval-augmented generation (RAG) can enhance LLM functionality by grounding responses in current, factual information from external sources. RAG reduces hallucinations and improves accuracy.

In late 2022, a new generation of consumer-facing artificial intelligence models captivated users. Thanks to LLMs, they could query a vast library of digitized information — books, online articles, websites, and more — and receive immediate, hopefully accurate answers.

And while LLMs have seen rapid growth across various industries, they face some challenges. For example, LLMs don’t automatically absorb new data, which means their responses might soon become inaccurate or outdated. Data scientists must retrain them on new public or proprietary data, often at great expense.

Challenges aside, pre-trained models like LLMs aren’t going anywhere. In this article, we’ll cover how LLMs work, their limitations, key use cases, and how retrieval-augmented generation (RAG) can enhance their functionality.

How do Large Language Models Work?

Four key components power state-of-the-art LLMs: attention mechanisms, a neural network architecture, pattern recognition and learning, and massive amounts of data.

Vast Amounts of Data

LLMs need to scrape massive datasets in order to respond accurately to queries and gain insights into human speech patterns.

These datasets most often include public information like online books, websites, and social media.

Attention Mechanisms

Attention mechanisms give LLMs the ability to contextualize data for more relevant answers to queries. They function like a book index, identifying where certain keywords appear in a mass of text.

The most common attention mechanism is “self-attention,” which allows the LLM to understand the context of each word in a sequence.

This capability is the difference between texting and speaking directly with a person. When you speak with someone face-to-face, their body language, tone, and inflection add another layer of meaning to their spoken English. This is true of any human language.

Neural Network Architecture

Neural network architectures are forms of machine learning that process information much like the human brain. LLMs are a type of large-scale neural network known as a transformer.

To find the answer to a question, the brain scans all the information it knows on the subject. Likewise, transformer architectures enable LLMs to scan information and turn it into text, code, images, and more.

Pattern Recognition and Learning

With extensive training, LLMs learn to connect patterns of human speech with the patterns in their datasets. The result is somewhat contextually relevant outputs.

For example, imagine the sentence: “The rabbit runs away from the big [blank].” An LLM trained on extensive text data recognizes the pattern and predicts the next word of the sentence: “fox.” It knows not to respond with “air traffic controller” because of the patterns and sequences it’s learned.

Limitations of LLMs

LLMs have several limitations, and some of them (most notably hallucinations) could have costly consequences.

Data Upkeep

LLMs require vast amounts of data to understand a subject and provide relevant answers to queries relating to that subject.

However, that data can become outdated rather quickly. Data scientists have to continually train the LLM on new data to avoid output errors and remove biases. The training process can be costly and time-consuming.

Tokenization

Tokenization breaks down text into smaller units called tokens. These units can be words, subwords, or characters.

Each token receives a numerical representation, or unique ID, that allows the LLM to process the user’s text input. But tokenization has a few challenges:

Limits What the LLM Can Understand

Tokenization operates solely on the general-purpose information in the LLM’s dataset.

If the LLM can’t fit terminology specific to your industry into its learned patterns, it can struggle to make sense of these terms.

Loss of Context

Tokenization can also cause a loss of context because it breaks the words in the prompt into subwords and code to process the request.

This reduction of words into code can sometimes erase the semantic nuances we take for granted in face-to-face communication.

Token-Processing Limits

Perhaps the biggest challenge is that LLMs can only process a certain number of tokens, limiting the length of the input. This makes it harder for the LLM to process long documents or large chunks of text.

This is a huge issue in the customer service world, for example, since it’s essential to maintain long conversation histories to understand the context of customer questions and complaints.

Potential for Hallucinations

Hallucinations are inaccuracies in LLM responses caused by out-of-date or missing training data on a specific scenario or question. Compounding this challenge is that LLMs provide answers without sources, which makes it difficult to cite or verify the answers.

Hallucinations aren’t just inconveniences. They can impact a company’s reputation and bottom line. Here are three examples:

Google's Bard AI incorrectly stated that the James Webb Space Telescope had taken the first pictures of exoplanets outside our solar system. This error led to a sharp drop in Alphabet's stock price, wiping out approximately $100 billion in market value.

In 2023 two lawyers submitted a court filing in an aviation injury claim that contained citations to nonexistent cases generated by ChatGPT. The lawyers were fined $5,000 each, along with their law firm, tarnishing their reputations.

The AI-powered MyCity chatbot for New York City was caught providing incorrect information about city statutes and policies. In one instance, it falsely told a user they could not be evicted for non-payment of rent.

These situations illustrate the challenge of training LLMs to recognize when they don’t have enough data to answer a question. If you can’t account for every scenario or question, then the LLM can’t either.

Data Leakage

LLMs can inadvertently disclose sensitive or personal information in responses to queries.

In April 2023, employees from Samsung's semiconductor division inadvertently leaked confidential company information by using ChatGPT. This incident highlights the risk of exposing sensitive data when using LLMs for internal communications or data processing without sufficient output filtering.

Output filtering involves training the LLM to recognize and withhold certain outputs from the end user. Once implemented, organizations must conduct careful monitoring and regular security audits to learn if further training is necessary.

Other data-leakage prevention strategies include data redaction, masking, anonymization, and encryption.

Lack of Domain-Specific Knowledge

Since LLMs are trained on broad datasets, they often lack the specialized knowledge required for certain tasks or industries. Without customization, they struggle to adapt to individual needs and preferences.

Customized LLMs still face the challenge of frequent maintenance. Highly customized LLMs face another challenge: They will perform well in their specific domain but struggle with generalization of other tasks.

LLM Applications

Most industries are using LLMs already, and new use cases keep popping up. Here are just a few examples of LLM applications.

Customer Support

Companies are using LLMs to build chatbots that offer a more personalized and human response to inquiries. Because LLMs can handle thousands of interactions simultaneously, both day and night, they’re a cost-effective solution for scaling customer service operations.

Suppose a customer wants to return a piece of clothing. If you’ve trained your LLM on your return policy, a chatbot can generate an answer to a return-related query in seconds. You’ve just freed up your human agents to handle more complex issues.

Education and Training

LLMs can provide training to new staff, creating scenarios that mimic face-to-face conversations. This feature is particularly helpful for sales and customer service teams.

If you train your LLM on your company’s sales training manual and product specifications, new salespeople can query the LLM for advice on dealing with common objections from leads.

Content Creation

Companies are using AI applications like GPT-4 to automate and scale content creation.

Marketers can brainstorm potential article topics by submitting detailed queries and receiving rough outlines or the articles themselves. Humans then step in to add brand voice, quotes, and statistics from recent, authoritative sources.

Individuals are also producing LLM-generated blogs, creative writing pieces, ebooks, and more.

Text Categorization

Users can ask LLMs trained on large amounts of text data to categorize, filter, or analyze the documents contained in those datasets.

Say you send out a customer survey heavily focused on qualitative answers. With an LLM, your marketing team can segment responses to find the most common problems respondents reported. Feedback related to products can be sent to the product team, while customer support teams can quickly access feedback related to their agents and bots.

How to Improve LLM Functionality

Here are a few ways to overcome the limitations of LLMs and enhance their ability to return more customized and accurate responses to queries.

Prompt Engineering

Prompt engineers work behind the scenes to teach LLMs how to understand user queries. They create “prompts,” which are templates that serve as interfaces between the user’s query and the LLM’s datasets.

The better the templates — or prompts — the better the responses from your LLMs. Here’s how they work.

Consider a user who enters this general query — “Where to purchase dog food” — into an AI chatbot. A good prompt engineer anticipates these queries and helps the LLM contextualize this input.

The behind-the-scenes engineered prompt might look like this: “You work at a grocery store. A user, based in Indiana, United States, is asking you where to purchase dog food. Respond with the five nearest store locations that have dog food in stock.”

Without the prompt, the LLM might have responded, “At the grocery store.” But with the engineered prompt, the user gets five local stores that have the food in stock.

Fine-Tuning

When you “fine-tune” an LLM, you train it on domain-specific data that adapts it to particular tasks or industries.

Fine-tuning an entire LLM could mean adjusting trillions of parameters. This approach is expensive and time-consuming. But here’s the good news for AI teams: Fine-tuning only needs to touch portions of the LLM to be effective, and there are four primary ways to do it.

Parameter-Efficient Fine-Tuning

This type of fine-tuning only updates a subset of an LLM’s parameters. For example, say your LLM has been trained on a vast library of customer chatbot interactions. Data scientists don’t need to fine-tune the entire LLM if they want to apply sentiment analysis to the database of interactions.

Data scientists only need to fine tune the parameters that help the LLM understand whether a conversation skews positive or negative.

For example, an LLM might not understand initially that a customer’s use of the word “okay” indicates neutral sentiment. Or that “useless” indicates negative sentiment. Parameter-efficient fine-tuning teaches the LLM to recognize these distinctions and sort conversations accordingly.

Prompt Tuning

Prompt tuning modifies the LLM’s behavior with small updates that help it better understand particular inputs or queries. These updates can be snippets of text or samples of human speech. It’s like adding one more token to the billions of tokens an LLM already uses.

Prompt engineering doesn’t involve training the LLM. This discipline simply looks to create prompts that “get the most” out of a pre-trained LLM.

Suppose the LLM wasn’t trained with the understanding that you can bake cakes without flour. Prompt tuning adds this enhancement with a few text snippets. Now, the LLM can process queries about baking with more authority.

Instruction Fine-Tuning

Instruction fine-tuning trains the LLM to handle specific tasks or commands. AI teams simply train the LLM on datasets of examples that show it how to respond to these new types of queries.

Legal teams needing to summarize all information related to certain court cases can use instruction fine-tuning to help their LLM summarize legal information more effectively. Suppose their LLM’s past attempts at summarization left out key information. Instruction fine-tuning helps the LLM recognize what key information teams need.

Domain Adaptation

Domain adaptation is a type of fine-tuning that trains the LLM on domain- or industry-specific information.

Consider biotechnology, an industry that uses several terms and phrases unrecognizable to individuals not working in the field. Domain adaptation helps the LLM understand these terms, making responses to queries from research and development teams more accurate.

Retrieval Augmented Generation (RAG)

Retrieval augmented generation improves the functionality of generative AI models like LLMs without fully retraining them. RAG improves LLMs by fetching facts from external sources not in the LLM’s original training datasets. These sources can include social media, product documentation, market data, customer data, and sales manuals.

With access to this “new knowledge,” RAG augments the general knowledge contained in LLMs. RAG analyzes the text in external sources and uses mathematical representations called embeddings to find similar text in the LLM.

Thanks to these embeddings, RAG is able to build on the knowledge in the LLM and generate more accurate responses.

With access to external sources, RAG reduces the likelihood of hallucinations and the presence of outdated information in responses.

Make Your LLMs More Functional With Scout

LLMs are becoming table stakes both for employee efficiency and customer happiness. Scout provides you with the tools you need to leverage the benefits of LLMs while bypassing some of the limitations. One of those tools is RAG, and Scout makes it easy to harness its power.

Create your first RAG chatbot today in minutes using Scout’s “no code” tool. Click here to get started.