A data centric framework for finetuning foundation models
Foundation models bring unprecedented opportunities for increasing productivity and enabling growth. Out of the box, they are able to perform generic tasks such as generating text, images and videos. However, in order to perform well on complex and domain-specific tasks, such as generating images in a certain visual style or working with legal or medical jargon, they need to be specialised.
Fondant has been developed to make this process, called fine-tuning of foundation models, as easy and performant as possible.
Foundation models are models that are trained on large and diverse data sources and can be used for a wide range of downstream tasks. As such, they form the "foundation" for other models.
For example, GPT3 is used as the foundation for ChatGPT, a model adapted for question-answering. Other examples of foundation models include Stable diffusion, CLIP, SegmentAnything (SAM), and many more.
Fondant is an open source framework for data preparation and fine-tuning of foundation models, developed by ML6 together with the open source community. Our goal is to make it easy and efficient to fine-tune large foundation models based on specific knowledge domain data.
Data quality and quantity are the main factors determining the power of fine-tuned AI models. However, preparing the data often takes 80 to 90% of the budget in real-world scenarios. Through Fondant we aim to make this process as painless as possible by providing an easy-to-use programming interface, composable pipelines and reusable components which can process terabyte scale data loads in hours.
Visit our Github page and start testing and contributing to Fondant! On Github, you can find all information on how to install, test and create your own pipelines and components. Share your feedback with us - we are continuously adding features and components based on our user’s needs.
A model’s performance is directly determined by the quantity and quality of data on which it was fine-tuned. Fondant makes it easy to collect, enrich and curate large-scale data for fine-tuning.
Fondant is compatible with data and model hubs, for example model hubs such as Huggingface. It supports all major clouds, giving you freedom and control and avoiding vendor lock-in. We also aim to support all data modalities (images, text, video, …) to enable fine-tuning of any foundation model.
Fondant makes it possible to create highly scalable pipelines of reusable components for enrichment and fine-tuning of large foundation models. It facilitates the smart collection, filtering and transformation of data and optimises fine-tuning. Fondant is easy to reuse and extend.
For optimal performance, foundation models need large amounts of data to be fine-tuned. Therefore, Fondant is built to scale. In future releases, we aim to enable fine-tuning or even training of large models through distributed compute and highly scalable pipelines.
Fondant is designed with datasets as the interface and built around a central manifest. This enables write-once-read-many and minimal data movement, reducing cost.
We currently have four locations across Europe and we cannot wait to impress you. Let us know how we can help.