Google’s multi-modal AI model Gemini

Thursday 07 December 2023

Portfolio insights

Share this

I believe the transition we are seeing right now with AI will be the most profound in our lifetimes, far bigger than the shift to mobile or to the web before it. This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company.
Alphabet CEO Sundar Pichai

Google has released details of its Gemini AI model, created by Google DeepMind and built for multimodality and reasoning. It is the first model to outperform human experts on MMLU (Massive Multitask Language Understanding), the most common test of the knowledge and problem solving abilities of AI models. Gemini has sophisticated multimodal reasoning and advanced coding capabilities enabling it to generalise and combine different types of information, including text, images, audio, video and code. The model will be available in three sizes – Nano, Pro and Ultra.

The Nano version for on-device tasks has been released to developers to create AI driven apps for Android and is already functioning on the latest Pixel phones. The Pro version powers Google’s Bard, providing more advanced reasoning, planning and understanding from text based prompts, initially in English, with other languages and modalities to follow. From early 2024, Bard Advanced will access the more advanced Gemini Ultra model. It is currently undergoing safety checks and testing through Google’s trusted tester program. Gemini will eventually be integrated into all Google’s services and products.

In the video below a Google employee interacts with the Gemini AI demonstrating its multi-modal and reasoning capabilities. (Google confirms sequences of events in the video have been accelerated.)

Google Cloud contributed to the AI announcement confirming the new Cloud TPU v5p chip will run the Gemini model. It is Google’s most powerful and scalable chip capable of supporting the expanding size and training requirements of large language models. Greater operating efficiency means the chips will lower the cost of running AI models while maintaining performance and compute power. Early testing found they train LLMs 2.8 times faster than the previous chip version.

A specialised version of Gemini is the backbone of an update to Google’s AI code generation system AlphaCode. AlphaCode 2 excels at solving competitive programming problems involving complex maths and theoretical computer science. Assessed against the original version, it solves nearly twice as many problems and performs better than 85% of participants, reflecting its value in collaborative programming.

Google DeepMind was established earlier this year, with the merger of the DeepMind team and the Brain team from Google Research. Google acquired DeepMind, a UK based AI startup run by Demis Hassabis, in 2014 and the team remained as a separate unit within Google until the merger with Brain. They created history in 2016 when their AlphaGo AI program beat the world champion Go player, Lee Sedol, and ultimately defeated every other computer program at the strategy game.

About the Author
Swell Investment Team

Swell Investment Team

Members of the investment team contributed to this article.