Common misunderstandings about LLMs within Data and Analytics

I firmly believe that the recent hype surrounding Generative AI (GenAI) and Large Language Models (LLMs) is justified, but it has naturally led to misunderstandings about what these tools are capable of, especially in the data and analytics space. Let’s call it a classic case of over-selling.

In this blog, I explain where the common misconceptions of GenAI and LLMs are, and why people are needed to make sure they produce results.

“Copilot will do it”

Many software vendors are building copilots into their data and analytics platforms to accelerate development, migrate code and provide automated insights. While this can be beneficial, you can’t simply roll out these tools to all users. It’s important to understand the limitations of copilots and accept that you’ll still need developers to build applications.

Copilots can do many exciting things like automatically build reports with built-in trending and anomaly detection. Again, this can be helpful, but the reality is you’ll not be able to trust that the reports have been built correctly with all the necessary filters and definitions to ensure that the data is 100% accurate. Reports generated using copilots will all need to be validated by an expert before being rolled out.

“We’ll get the LLM to do all the maths”

These platforms are based on neural networks, which means they have some powerful capabilities similar to the human brain. But no matter how powerful they are, a human brain is not a calculator and neither is a LLM.

LLMs can make mathematical errors just like humans do, so you can’t rely on them being 100% accurate. That being said, some of these tools can be good at doing basic maths on small data sets, as long as the calculations involved are relatively simple such as sums and averages.

Whilst simple calculations are sometimes possible, they have no ability by themselves to do highly complex mathematical tasks like forecasting and what-if analysis.

If you want LLMs to work reliably with your data, you should make sure all the information you provide is already pre-summarised and pre-calculated, so they have less room to make a mistake.

“We no longer need a data analyst”

Using LLMs to provide commentary on data is an increasingly common use case. These tools are good at summarising information and even picking out the most interesting points by identifying outliers within data.

However, they can only comment on information you share, and the amount of data you can provide is relatively small compared to most databases.

If you use LLMs to generate commentary that involves its base model (i.e. its internet knowledge base) you could also be at risk of hallucination, where the platform makes up its own story.

Furthermore, they won’t always have domain-specific knowledge or be aware of all the internal and external factors that may be relevant when explaining your data, unless a data analyst provides it first. In summary, you will always need a data analyst to manage and oversee your data.

“We don’t need a database any more”

You could theoretically put an LLM on top of all your data and ask it questions, which would mean you no longer need to write SQL queries and so on.

This approach can work quite well when dealing with specific records. For example, you could ask “What is the status of order 12345?” and you’ll likely get the correct result.

The primary issue though is these solutions can only process small amounts of data at a time. So whilst you could ask a question about an individual record, you wouldn’t be able to ask questions such as “what is my total order value?” as this would need to scan many records and the amount of data would exceed the LLM’s input limits. You would get a result, but it wouldn’t be correct.

They would also not know how to calculate and apply precise business definitions that often exist within data models or be able to process complex relationships.

“We'll use Generative AI to generate our charts”

GenAI platforms can create sample code, but that doesn't mean you can use them to accurately generate the code needed to produce a chart. They would only be able to give you some sample code which may or may not work.

They can also create images, but can't be used for generating something so precise as a bar chart with 10 different bars and labels. They can only produce something imaginative and general like "show me a dog riding a bike".

For example, if you asked GenAI to produce an image containing a company’s logo, it would produce something that looks similar but it wouldn't be a perfect copy. So, when it comes to data and analytics, these tools can only be used for generating raw text and people will be needed to use this information to create what is required.

“We’ll use an LLM to fix our data quality issues”

LLMs have some potential to help with data quality issues. If your data contains “Eurpe”, for example, they could identify you have a spelling mistake and correct it, but this is a very general example.

Because of scalability limitations, LLMs are not suitable when dealing with large volumes of data or complex relationships or definitions. They’re also not subject matter experts that can natively understand all the nuances within your data.

Finally, these platforms are trained using things that exist (i.e. documents on the internet), they’ve not been trained on what doesn’t exist. This means they may struggle to identify incomplete data.

People-first technology is key

GenAI and LLMs have many powerful capabilities when used in the right way. But they cannot be seen as a substitute for the key roles that developers, analysts, databases and BI tools currently play in the data and analytics space. These technologies hold vast potential, but people are vital to ensuring they succeed and help organisations thrive.

Antony Heljula

Technology Director

Contact Antony

How AI’s reshaping software engineering

AI tools boost dev speed, but not skill. The future is in leading AI with experience, strategy, and thoughtful engineering.

Our recent insights

Transformation is for everyone. We love sharing our thoughts, approaches, learning and research all gained from the work we do.

Shaping the future of data in government

We share some key takeaways on data sharing, AI, innovation, and transformation across government from our Power of Data roundtables.

The data-driven future of funding services

Labour’s digital policy shift signals a new era of streamlined, user centred funding services led by data and trust, not bureaucracy.

Using data to transform public services

Tom Smith, Chief Data Officer and Data Unit Director at MHCLG, discusses overcoming legacy systems, leveraging AI, and why collaboration is key to shaping a data-driven future.

Read more insights

Share this post