In today’s market, it feels like every investor is chasing buzzwords like AI. So naturally, product companies are jumping on the bandwagon, eager to sprinkle in some flashy features that weren’t doable before.
But here’s the big question: how do you go from playing around with ChatGPT (we’ve all been there) to running something in production? And what key things should you keep in mind when making that leap?
The arrival of LLMS: From months to minutes
The arrival of large language models (LLMS), the enormous models driving generative AI, such as ChatGPT, has shaken up the AI world big time. In the past, you’d tackle AI through machine learning, which meant you had to hire a data scientist, dive into research, gather loads of data, and train complex models from scratch. This would often take months before you see some kind of output. Now? You can ask an LLM, and boom—it gives you an answer. This has massively lowered the barrier to getting started with artificial intelligence.
But does that mean it's a good idea to skip over traditional machine learning altogether and ride the generative AI wave? Often, yes. If an LLM can handle your use case, it’s a fantastic way to quickly test and validate your idea in a fast-paced environment.
This proposition obviously comes with a disclaimer. The success rate of employing LLMs for predictive use cases will always depend on the use case itself. Use cases that require more deterministic output, such as in legal, medical, or financial domains, e.g. fraud detection, will not always fit this proposition due to the unpredictable nature of an LLM. This does not mean that these domains cannot benefit from generative AI. The technology can still be a game changer for data exploration and recommendations to decrease human time investment and optimise workflows.
Which model is the best?
When it comes to choosing which model to use, most people gravitate towards what they know—ChatGPT is a popular go-to. But there's also Ollama, Gemini, Claude and many more. They all have differences such as the size and focus of training data, reasoning skills and context length limitations.
So, which one should you choose? Honestly, there’s no one-size-fits-all answer here. Sure, we have industry benchmarks that compare the biggest names based on reasoning skills and training data (which is important). But in the end, each use case might perform better on a different model. Why? Well, it's a bit of a mystery. These models are like black boxes—you never really know exactly what’s going on inside. That’s why it’s crucial to test your use case across different models, a step many companies seem to skip these days.
Enhancing LLM performance
Once a satisfactory output has been achieved, it’s time to crank up the quality of the output. As is custom in the AI world, you will need some benchmarks to test against. You can then make adaptations to the prompt, context or even model selection and measure the changes to the benchmarks.
It’s good to know beforehand that this approach suffices for generative use cases but will reach a limit for other types of AI, such as predictive AI. LLMs can answer in an unpredictable way due to some of their limitations, such as tokenization. These limitations can manifest themselves as inconsistencies, hallucinations and even straight-up wrong answers. Most of the limitations can be overcome by either prompt engineering, Retrieval-Augmented Generation (RAG) or fine-tuning. But that is an art in itself.
Will traditional machine learning offer better results for predictive tasks? Probably, yes. But you can always switch to that later, once you’ve validated your idea and are aiming for higher-quality performance of your use case.
How do I host an LLM?
There’s always the question of where the LLM is hosted and what it’s going to cost.
The most significant cost will always be the time necessary to set up and maintain the necessary infrastructure. It’s a good idea to limit this as much as possible by using off-the-shelf solutions such as OpenAI or AWS foundation models.
On the other hand, some companies are concerned about data security, and let’s be honest: once you send data to an external API, you never fully know what happens to it.
So, what are the go-to options for your average startup?
Model | Hosting Type | Service | Summary |
---|---|---|---|
GPT | External | OpenAI | The most popular, easy, cheapest solution. However, it is not the best compliance for data security. |
GPT | Serverless | Azure OpenAI | GPT, but with data compliance and high availability by Microsoft Azure. |
Llama, Claude, Mistral, Titan | Serverless | AWS Bedrock | Other industry-leading models, fully managed and with data compliance by AWS. |
Llama, Claude, Mistral, Titan | Provisioned | AWS SageMaker (JumpStart) | The AWS foundation models can be self-provisioned. In this case, you have full control of the hosting. |
Gemini | Serverless | GCP PaLM | The Gemini model on Google Cloud. |
Pick the best model for your use case first, then figure out the hosting solution—not the other way around. There’s nothing wrong with going multi-cloud if that’s what works. Stick with serverless options for as long as possible to avoid the complexity and cost of managing infrastructure.
Some other important considerations
Build with flexibility in mind. The market’s evolving fast, and a model’s quality or pricing could change overnight. Set up your code to easily switch providers or add features like prompt chaining later. A great framework for this is LangChain (available in Python and JavaScript), which helps you keep things adaptable.
Productize your prompts. Don’t let your end-users handle the prompting. Instead, craft your prompts carefully on the back-end, shaping them for one specific task based on user input to keep the process smooth and efficient.
Don’t overload the context. LLMs have limited context length (and each model has different context limits), and they’re notorious for ignoring large chunks of it when it gets too long. If your model needs a lot of background knowledge, it might be time to explore RAG (Retrieval-Augmented Generation), but that’s a topic for another post!
Test your prompts. LLMs are unpredictable—ask the same question twice, and you might get two different answers. Make sure your model is consistently delivering what it’s supposed to by rigorously testing your prompts.
Steer clear of fine-tuning (at the start). Sure, fine-tuning a model with your specific domain data sounds tempting, but it’s a rabbit hole you don’t want to dive into right away. Save it for when your use case is validated and you’re ready to invest in boosting quality. That’s when it makes sense to explore fine-tuning—but until then, keep things simple.
In a fast-evolving landscape, building flexibility into your approach will keep you ahead, letting you adapt as new models and capabilities emerge. Remember, it’s all about finding the right balance between innovation and practicality to keep your AI implementation efficient, effective, and sustainable.
Member discussion