Understanding Large Language Models (LLMs)
Artificial intelligence (AI) is rapidly moving out of the realms of science fiction and becoming more mainstream. Online retailers and media companies like Amazon and Netflix have been using AI-supported recommendations engines for some time now. However, AI has really become part of the general conversation with the rise of so-called large language models (LLMs) such as ChatGPT and Google’s Bard.
Wherever you look, it seems like someone is talking about or experimenting with ChatGPT or Bard. However, what are these models, what do they do—and more importantly, what do they not do? This page answers these questions and helps you to understand when you might use LLMs, when you should avoid them, and what precautions to take.
What are Large Language Models?
Large language models (LLMs) are computer models that use a system called neural networks.
Unlike conventional computer programmes, these systems do not have to be ‘programmed’ to do something. Instead, they use self-learning techniques, which means that they are given data, and they learn from those data.
In an LLM, the data provided for training is an enormous amount of text. ChatGPT, for example, was given access to the internet pre-2021.
These models then use their training data to teach themselves how to respond appropriately to questions and queries, using natural language processing. This means breaking down the training data (in the form of words) into numbers and analysing those numbers. This process allows them to ‘understand’ the intent behind users’ questions, and respond to those questions in a way that is most likely to be helpful.
LLMs are limited by their training data
An AI-based model learns from the data that it is given—but only from those data.
For example, if you give an AI romantic novels for training data, it will be able to write a ‘romantic novel’. However, it will not be able to write a thriller, because it doesn’t ‘know’ anything about thrillers.
An AI system that is shown pictures of cats will learn to recognise cats. It will not recognise a dog or a horse. It may also not recognise a picture of a Sphynx cat (the hairless kind) or a Manx cat (tailless) if it has ‘learned’ that all cats have hair and tails.
LLMs seem to be closer to a generalised AI than many other models (like recommendations engines) because they have been trained on a very broad data set. However, they are still limited by their training data.
The ‘large’ comes from the number of parameters. AI models are considered ‘large’ if they have at least 100 billion parameters (or parts of the equation that they use to calculate the answer). A large language model is therefore a computer programme that uses 100-billion-number equations to predict what words should come next in a sentence. For example, GPT-3.5, the latest iteration of the model behind ChatGPT, has 175 billion parameters.
Effectively, LLMs are chatbots. They do not ‘know’ anything. Instead, they have looked at a lot of text, and learned how to talk to people, and respond appropriately to questions.
Some LLMs also have search functionality—but the LLM is not the search engine. The LLM only helps to present the results of the search in a way that is more like human speech.
What LLMs Do Well and Badly
It is worth repeating that LLMs are chatbots. They are therefore designed to communicate with users in a way that sounds human.
This is what they do really well.
These models are really good at creating sentences by predicting what word should follow each other word. They can also be used as translators, because of their ability to assess the patterns between words. If you want a bit of fun, ChatGPT can also write you a poem or short story in the style of a named author or genre. It is also good at taking a lot of information and summarising it in a simple form.
LLMs have been described as ‘stochastic parrots’.
They are parrots in that they ‘parrot’ what they have heard or seen before. Like a parrot, they have no real understanding of the words that they are using.
Stochastic means that they use probability to predict what should come next.
Fundamentally, most LLMs are designed to act as intermediaries between organisations and clients or customers. They are simply advanced and powerful chatbots. As part of this process, they should provide useful information.
However, there are one or two drawbacks to the information provided. In particular:
LLMs do not ‘know’ anything
They merely respond to queries in a way that seems appropriate based on the query and the previous words. This might be right—but it might also not be what you wanted. The model has no way of knowing that.
LLMs do not always distinguish between information that they have seen and are paraphrasing, and information that sounds like this.
This means that LLMs can provide inaccurate information. In fact, they can simply make things up. This is known technically as ‘hallucinating’.
For example, in one real-world case, ChatGPT was asked about an expert on a particular topic, who had written several articles. It responded with the information that this person was an author who writes about this particular topic, and provided some example articles. The topic was accurate, but the articles did not exist.
There are ways to prevent this from happening. For example, some of the AI-based systems now available have built-in ‘guardrails’ that require the AI to quote from sources, rather than paraphrase. However, you need to check your LLM to make sure that is the case—and you should always check the accuracy of the information provided. You should, of course, be doing this with any source of information, but we are used to thinking of computers as ‘always right’. It may therefore be counterintuitive to consider this need.
There is more about how to assess information in our pages on Critical Reading and Critical Thinking.
Some analogies for LLMs
One commentator pointed out that when you are asking an internet-trained LLM a question, you are effectively asking it ‘how would the internet respond?’ (as in, all the people who use the internet and have provided content). If you wouldn’t trust ‘a bunch of internet strangers’ to answer your question, then don’t trust an LLM. Another analogy is to think of it as a bit like Koko, the gorilla who was taught to use sign language. Researchers claimed that he had learned language. However, later research showed that he had no real understanding of ‘language’. He was just copying his handlers, and therefore it looked like language, because he used the same sequences of words.
LLMs are not search engines
Some LLMs are linked to search engines. For example, Google’s Bard is linked the Google search platform. However, many are not—and that includes ChatGPT. You therefore cannot rely on the information that it provides to be up-to-date. It will not, for example, be able to tell you about today’s news. You may therefore get more accurate information from a simple internet search using a standard search engine. You might then ask ChatGPT to summarise it for you to make it easier to understand.
The takeaway from all this is that it is important to be wary about how much you rely on the accuracy of the information provided by any LLM.
LLMs and Misinformation
We have already mentioned that LLMs can make up information. Sometimes this is fun. At other times, it could be a real problem.
For example, let’s just consider the impact of asking an LLM to come up with some ‘stories’ about particular politicians or other well-known figures. Inaccurate information about them doing illegal activities could affect their careers, and the careers and lives of those around them.
In the light of concerns about recent elections around the world, and the spread of misinformation, it is reasonable to expect that this is likely to happen with increasing regularity.
We all therefore have a responsibility to check information before sharing it.
However, there is another more insidious problem with LLMs: inadvertent spreading of misinformation.
Let’s consider what could happen if you asked an LLM for details about the career of a particular person in the public eye, and it told you about a scandal that you hadn’t previously heard about (in fact, it had never happened). However, you shared that information with others, asking them if they had heard about this scandal—and it got around the internet.
That scandal was a hallucination—but by sharing that information, you risked damaging that person’s reputation.
As with any information that you find on the internet, it is worth considering the context and content very carefully before spreading it further.
Some LLMs and other AI-based Systems
The best known LLMs are probably ChatGPT and Bard. However, there are many other AI-powered systems available for different purposes.
The key when using any LLM is to understand its purpose, and what it has been designed to do. If you use it for that purpose, you are most likely to get useful results.
If you try to stretch it beyond those purposes, the results may be significantly less accurate or reliable.
ChatGPT is a chatbot that has been trained on a huge amount of text.
If you ask it a question, it will reply drawing on text on the internet in a way that is designed to mimic human speech patterns as much as possible. It will be like having a conversation. However, it doesn’t seem to distinguish between information that was published on the internet, and information that it has made up that is like something published on the internet.
It will therefore not provide accurate sources or references. If you ask ChatGPT about any topic, it will (usually) give you a brief summary, but entirely without sources. If you ask it to include references, it will add references—but some of those references may not exist, even though they look convincing, and may even be in genuine journals.
Bard is Google’s answer to ChatGPT. Like ChatGPT, it is a chatbot, but pulls information from the internet.
Its responses should therefore be more accurate and reliable than those of ChatGPT. However, there is some doubt about this, because one of the answers given during its launch demo was factually inaccurate.
Consensus is a search engine designed to use research questions as search terms.
If you ask it a question, it will search through peer-reviewed journals, and provide you with a summary of the research on the topic. It will summarise each article in turn, and also provide you with a broad consensus—hence its name—of the findings on the topic, including how many articles are positive, negative and neutral on your question.
Consensus uses GPT4 (the engine behind ChatGPT) to prepare its responses, especially the summary box. However, it has ‘guardrails’ around the responses. For example, all the text used is direct quotes from the articles referenced in the response. This prevents the model from ‘hallucinating’ (making up text).
Scite.ai has been designed to provide sources for claims made by language models like ChatGPT.
When you ask it a question, it will provide reliable answers quoting from the full text of research papers. It therefore speeds up the process of doing a literature review, by providing some sources of supporting and contrasting findings.
Inciteful.xyz provides tools to speed up the process of academic research.
It currently provides two tools. The Paper Discovery tool is designed to build a network of papers from citations. It uses network analysis to provide information about the most prolific authors, and most popular publishing sources (journals and institutions). It will also tell you which authors and papers are most influential from the number of citations.
The Literature Connector is designed to help with interdisciplinary research. If you give it details of two scientific papers in different fields, it will provide you with an analysis of how the papers are connected via citations.
Perplexity.ai is a chatbot, but with search functionality.
It is therefore similar to Bard, and will also show you where it found the information. Like Bard, it therefore works a bit like Google, but is better at summarising the results of its search.
This may or may not be helpful in practice. For example, if you ask it about a particular person, it will bring together all the results about people with that name, without distinguishing between them. Your summary paragraph is therefore likely to include results about different people, and you will have to go back to the sources to check the accuracy of the summary.
The Bottom Line
The bottom line with any LLM is that it is only a computer programme.
It may be very powerful, and it may even be able to sound relatively human—but it is not human.
You therefore need to use any LLM for the purposes for which it has been trained. You also need to be careful to check the accuracy of the information that it provides. Within those parameters, AI-based systems can be enormously helpful in speeding up work and making you more productive. However, you still need to be involved. They can’t do it for you.