Foundation models

Foundation Models

Analysts believe that we are entering the industrial era of artificial intelligence. Foundation models (FMs) — large pretrained AI models that can easily be adapted to new use cases — are revolutionizing creative work and are expected to augment or take over ever more knowledge work in the coming years as ever more use cases in different industries are being tackled by FM-based AI. his page provides an introduction to the impressive potential of Foundation models and describes how to set them up so that they do your bidding.
‍

OVERVIEW

Understanding Foundation Models: How do foundation models work

Foundation Models (FMs) have arrived and they are bound to change the landscape of AI systems for good. These large pretrained AI models, which can easily be adapted to new use cases, are revolutionising creative work and are expected to augment or take over ever more knowledge work in the coming years as use cases in different industries are being tackled.  

Below, we focus on Foundation models in general, across modalities and applications. For more specific information on Large Language Models (LLMs) - Foundation models for Natural Language - please visit this page. More on Generative AI applications and use cases can be found here.

What are foundation models?

Foundation models are models that are trained on large and diverse data at scale. They can be used or adapted for a wide range of downstream tasks and as such form the "foundation" for other models.

In the past few years, several dozen FMs have been developed, most of which were generative AI models ‘translating’ from one modality to another, e.g. text to text (GPT), text to image (DALL-E), image to text (BLIP), speech to text (Whisper), text to 3D (DreamFusion), text to short video (Make A Video), text to longer video (Phenaki), video to video (Gen1) and text to 3D video (Make a video 3D). Connecting text and images (CLIP) and Segmentation (SAM) are two examples of other tasks that were tackled by FMs.

Foundation models are trained to perform a very simple task, e.g. reconstructing an image or predicting the next word. Yet they develop complex, unexpected behavioural strategies to achieve this (emergence). They can also be steered to perform a wide range of tasks, so they become nearly universal (homogenization). This also means they can exhibit unexpected behavior, however, so they need to be aligned with expectations.

It is important to note that the field of Foundation Models is rapidly evolving, and there may be newer and even more advanced models available by the time of reading this article.

Foundation models for Natural Language Processing

Large language models - foundation models specialized on outputting text - are advanced artificial intelligence (AI) systems that have revolutionized natural language processing and understanding. These models, such as OpenAI's GPT-3 (Generative Pre-trained Transformer 3) and its successors, are designed to understand and generate human-like text, making them powerful tools for a wide range of applications.

Read our learn all about Large Language Models here >

Foundation Models for image generation

Foundation models for image generation are powerful AI systems able to create images based on text prompts, other images, image primitives or other types of guidance. The currently most popular models, such as Stable Diffusion, Dall-e 2, or Midjourney, are deep learning models that can generate images from natural language descriptions (prompts), which can often also be applied to other tasks such as inpainting or outpainting. AI image generation has made rapid progress in the past years, and we expect this trend to continue. New applications are coming, for example with different modalities such as text to 3D image, text to video or video to video.

Can AI generatetruly original logos? >

Foundation Models for computer vision

Foundation models for computer vision are large, high-performance models that have been pretrained on huge amounts of data. Examples include Vision Transformer (ViT) for image classification, You Only Look Once (YOLO) for object detection and Segment Anything (SAM) for segmentation. Typically you will select one of these models based on their performance on a specific task and then fine-tune it further on your use case-specific dataset.

‍

How foundation models are trained

Foundation models are trained on large datasets for significant amounts of time. While this used to be only possible for large companies with huge budgets, innovative techniques like Low-Rank Adaptation and quantization and increasingly powerful hardware allow for models to be trained from scratch within reasonable budgetary constraints. Typically, foundation models are trained in multiple stages with different datasets and different loss functions (self-supervised, supervised, reinforcement learning with human feedback). Recent findings also show that data quality is more important than quantity for specialized use cases. Feel free to contact us if you would like to know more.

‍

The future of foundation models

Foundation models bring unprecedented creative, reasoning and problem solving power. They are trained once and can be used for a multitude of tasks, providing unique opportunities and potential for positive societal impact, if implemented with the proper ethical considerations and safeguards. We see a few important topics for the future of Foundation models:

Domain expertise

While foundation models can perform general tasks very well, they are still outperformed by expert models on specific tasks. How to efficiently teach a foundation model to be an expert in a specific area or topic is the next big question. One technique to further adapt foundation models is fine-tuning, whereby the foundation model can be specialised with added data and knowledge or taught a specific style of generation.

Multimodality

Foundation models are increasingly becoming multimodal, meaning that they treat all modalities of data (text, image, video, audio,...) the same way.

Commoditization

AI solutions are increasingly becoming standardized and productized, especially traditional AI tasks. We expect this trend to continue and expand towards generative AI as well.

Watch the webinar: Generative AI for corporate use: How to get started with LLM >

‍

Applications

Foundation model applications

From efficiency and productivity increases for knowledge workers, growth opportunities in new business lines, all the way to new applications such as drug and material discovery, the rise of foundation models will have a significant impact.

Creative sector

Image generation models can become powerful assistants to designers and create business value by proposing designs based on their input and style. Mastering a specific task, such as generating guided realistic-looking interior designs, requires the power of foundation models as well as fine-tuning on specially prepared domain data.

Click here for a demo! >

Customer support

Chatbots and virtual assistants powered by LLMs are able to understand and respond to a wide range of customer or employee questions in an accessible, conversational, multilingual and efficient manner, improving your customer satisfaction or enhancing the productivity of your employees. Importantly, these virtual assistants can be enhanced by providing them with a specific source of data (e.g. your company knowledge base) to further raise the accuracy of the provided responses. Check out our blogpost for more information.

Legal

Drafting and reviewing legal documents, such as contracts, deeds, or patents, or summarising long documents in the legal sector is time and resource intensive. Foundation models, especially LLMs, can speed up processes and enhance the productivity of employees. These models can be specialised (fine-tuned) to understand legal language, making them powerful assistants in the legal sector.

And many more!

The field of Foundation Models is rapidly expanding, with new solutions and use cases emerging every two days. As a leading provider of ML services, ML6 is committed to staying at the forefront of these developments, constantly exploring new ways to apply FMs to solve real-word business problems.

Use case in practice: Tailor-made AI image generation services for crafters for Creative Fabrica >

Watch the webinar: Generative AI for corporate use: How to get started with LLM >

‍

Challenges

Typical challenges LLM: Functional & Technical

FMOps

As we enter the Foundation Model age, MLOps is undergoing a profound change whereby combining multiple task-specific models and business logic downstream is giving way to more upstream smart data preparation, fine-tuning and guidance of emergent FM behaviour and further postprocessing and chaining of Foundation Model outputs. Foundation Model Ops (FMOps) refers to the operational capabilities required to adapt, deploy, optimize and monitor foundation models as part of an application. Adapting, which includes fine-tuning, guidance (prompting), post-processing and chaining, is the most radical change with the upcoming of foundation models compared to traditional MLOps.

AI Alignment

One of the main differences between traditional machine learning models and foundation models is their emergent behaviour - i.e. the fact that they develop complex, unexpected behavioural strategies to perform the simple task they were trained for (e.g. predicting the next word). In the future, one of the main challenges for machine learning professionals will be to guide a model and ensure that its behaviour is aligned not only in terms of performance at a certain task, but also in terms of norms, values and human expectations and ethical principles.

Advising clients on compliance & risks

Our experience in conducting detailed risk assessments enables us to effectively identify and mitigate potential high-risk scenarios. We have streamlined our sales process to systematically identify ethical and legal risks, ensuring that we can advise our clients and ensure the necessary risk mitigation measures are taken into account already from the start.

Trustworthy AI

Foundation models face ethical challenges, including the potential for biases and perpetuating harmful stereotypes due to biassed training data, the risk of AI hallucinations generating incorrect or nonsensical content that could be maliciously exploited to spread misinformation, lack of transparency of training data, data privacy issues, and the lack of knowledge of recent events or facts. These and other ethical risks need to be considered and mitigated.

Copyright and other legal challenges

Regulation often lags behind innovation, which is also the case in the foundation model domain. New legal issues are arising with foundation models, such as privacy and copyright considerations for training data used to train these models, as well as how to ensure transparency and risk management for downstream use cases using foundation models as a base. The most recent draft of the upcoming EU AI Act recently has added provisions on foundation models and generative AI.

‍

Solution

High level outline of solutions with LLMs

Foundation Model solutions are available in various forms:

For some use cases, incorporating an existing LLM is all that is needed.
For other use cases, fine-tuning an existing LLM is required.
Yet in other scenarios, a custom language model trained from scratch may be the way to go.

Crucial steps in building an LLM solutions include:

Large Language Model Choice

When choosing the right LLM for your use case, you need to take the following tradeoffs into account: open-source models (eg. LLaMA) vs. commercial models (eg. GPT), self-hosting vs. API-access, allowed commercial usage vs. only for research, as well as model performance and latency, pricing and data ownership and protection.

Analysing Cost and TCO

To prevent unpleasant surprises, it’s crucial to take initial capital expenditure and operating expenses into account when designing your solution. Given how you intend your solution to be used, some options (eg. token-based API use) may or may not be feasible options.

Prompt Engineering

Prompt Engineering involves defining, refining and optimising a prompt template to get the most accurate and relevant results from the language model. By providing additional information, you can steer the model answers into the desired direction in terms of content, company style/tone and structure.

MLOps/LLMOps/FMOps

A crucial part of working with LLM solutions in production is related to version control, fine-tuning pipelines, model swappability, performance monitoring, … To bring (and keep!) your model into production, you have to consider the fine-art of MLOps for LLMs. Implementing a user feedback loop is invaluable here. Imagine a generative solution that suggests content that users can correct however they see fit, we would want to know exactly what was changed and feed that to the loop to re-train it appropriately at the right time.

Read our Blogpost: Developing AI Systems in the Foundation Model Age (1)

Read our Blogpost: Developing AI Systems in the Foundation Model Age (2)

Data Processing

While some believe that AI is a model-centric field, it is our view that, especially with the dawn of Foundation Models, the field is becoming increasingly data-centric. Getting the right processes in place to process and store the data relevant to your use case, such as third party or internal data, still play a crucial role.

Prompt Analytics

When working with non-deterministic systems it’s crucial to capture exactly how your LLM solution is being used (input as well as output). For example, you can monitor the usage of your LLM solution or even explore the content provided to the solution, such as the top 10 most asked questions in the context of a chatbot.

Get in touch

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Table of contents