December 7, 2023

Unveiling Google's Gemini: a game-changer in the world of AI?

Contributors
Matthias Feys
Q / CTO
No items found.
Subscribe to newsletter
Share this post

What is Gemini, Google’s newest AI model?

Yesterday, Google launched the first version of their new family of models: Gemini. The launch immediately sparked discussions - is Google, with these new models, opening up the competition field with the OpenAI models? In this blog, we’ll share our view on the Gemini launch.

Text generation - patience, please! 

Until now, GPT4 has set the standard for text generation capabilities. With Gemini, Google now wants to challenge this position. Gemini Ultra - the biggest, most capable version of the model - promises to compete with GPT4, according to several benchmarks reported in the technical paper. Next to Gemini Ultra, there is also Gemini Pro, with performance more comparable to GPT3.5 and Gemini Nano, which would support on-device LLM support!

We should always be careful about jumping to conclusions based on these benchmarks: specifically for LLMs, it’s difficult to abstract the impact of the prompt used. Still, these results show the potential of Gemini in text generation: for the first time, a model performs better than human experts on the MMLU dataset, a benchmark generally used to evaluate LLMs against human experts. 

As of December 13th, Gemini Pro will be available via Google AI Studio and Vertex AI. In many places (but not yet in Europe), a fine-tuned version is already available in Bard, allowing for easy experimentation with the new model. The broad release of the Gemini Ultra is planned early next year as mentioned in this release statement, after further refining and ensuring the model is safe.

To experience the full text capabilities of Gemini Ultra you’ll need some patience. Or, if you’re lucky, you can get early access to experiment as a selected partner or customer. As a Google Cloud Premium Partner, we are in close contact for early access and are keen to experiment to learn what this model can mean for our customers. 

How Google's Gemini differentiates through multimodal capabilities

So, why are we so excited about Gemini today? While text generation has been a major focus in the GenAI landscape, what really sets Gemini apart from other generative models are its multimodal capabilities: Gemini has been trained natively multimodal, it can take audio & visuals as input next to text and return text and images as output. This enables:

  • Multimodal combinations: take a picture of your fridge, add a voice message to come up with a recipe, receive written instructions and a picture of how you can organise the plate? This type of interaction will become the new standard with this capability.
  • Chart understanding: interpreting & reasoning on charts, enabling businesses to quickly extract insights from extensive reports, or make them more transparent.
  • Video understanding: Gemini supports video as input natively, a capability not available in the GPT4 models.
  • Transcription: Gemini Pro outperforms specialised audio models such as Whisper in transcription tasks. 

This more generalist, multimodal approach seems to us the most important differentiator for Gemini in the current landscape. As we are learning from building GenAI solutions with customers, businesses have information in very different formats. These multimodal capabilities open up new possibilities for businesses to interact with that information in a more intuitive way, for example by using Multimodal RAG.

How Gemini compares with PaLM 2

PaLM 2 deals with text, while Gemini is more versatile, handling various types of data like text, images, and code. What makes Gemini different from PaLM 2 is its ability to learn from diverse sources. Since Bard (Google’s chat based AI tool) is moving from PaLM 2 to Gemini models, we expect it to benefit from Gemini's capabilities as time goes on.

Our conclusion on Gemini

Over the next weeks, we’ll start using these models at ML6 and find out what value they can bring to our customers. Open questions remain (pricing, regional availability, timing), but this much is clear: by launching Gemini, Google has made a significant move in the GenAI race, and we can’t wait to see what’s next.

If you’re interested to find out more about Gemini, or what GenAI can mean for your business, don’t hesitate to reach out. We’d love to think along.

Related posts

View all
No results found.
There are no results with this criteria. Try changing your search.
Large Language Model
Foundation Models
Corporate
People
Structured Data
Chat GPT
Sustainability
Voice & Sound
Front-End Development
Data Protection & Security
Responsible/ Ethical AI
Infrastructure
Hardware & sensors
MLOps
Generative AI
Natural language processing
Computer vision