Google Gemini: Everything you need to know about the generative AI models

With Gemini, its flagship collection of generative AI models, applications, and services, Google is attempting to create a stir. However, what is Gemini? How is it useful? And how does it compare to other generative AI tools like Microsoft’s Copilot, Meta’s Llama, and OpenAI’s ChatGPT?

We’ve created this helpful guide to help you stay up to date on the most recent Gemini developments, and we’ll update it as new Gemini models, features, and information about Google’s plans for Gemini become available.

What is Gemini?

Google’s long-promised next-generation generative AI model family is called Gemini. It is available in four varieties and was created by Google’s AI research teams DeepMind and Google Research:

A extremely huge model is the Gemini Ultra.
Despite being smaller than Ultra, the Gemini Pro is a huge model. Google’s flagship is the most recent version, Gemini 2.0 Pro Experimental.
Gemini Flash is a “distilled” and faster version of Pro. Gemini Flash-Lite, a somewhat quicker and smaller version, and Gemini Flash Thinking Experimental, a version with reasoning skills, are also available.
Gemini Nano comes in two tiny models: Nano-1 and Nano-2, which is intended to operate offline and is marginally more capable.

In order to work with and evaluate data other than text, all Gemini models were trained to be natively multimodal. Google claims that they were pre-trained and optimized on a range of codebases, text in various languages, and public, proprietary, and licensed audio, photos, and videos.

This distinguishes Gemini from models like Google’s LaMDA, which was trained solely using textual input. LaMDA is unable to comprehend or produce anything other than text, such as emails, essays, and the like, but Gemini models may be able to do so.

Here, we’ll point out that there are questions regarding the morality and legality of using public data to train models—sometimes without the owners’ knowledge or approval. Google has an AI indemnity policy that includes carve-outs to protect specific Google Cloud users from litigation should they be subject to them. Be cautious, especially if you plan to use Gemini for business purposes.

What’s the difference between the Gemini apps and Gemini models?

The web and mobile Gemini apps (previously Bard) are not the same as Gemini.

A chatbot-like interface is layered on top of the Gemini apps, which are clients that connect to different Gemini models. Consider these as front ends for Google’s generative AI, similar to the Claude family of apps from Anthropic and ChatGPT.

This is where Geminis on the web reside. The Google Assistant app for Android has been replaced with the Gemini app. Additionally, the Google and Google Search applications on iOS function as the Gemini clients for that platform.

Recently, Android users may now ask questions about what’s on the screen (like a YouTube video) by bringing up the Gemini overlay over any app. Simply speak “Hey Google” or press and hold the power button on a compatible smartphone to bring up the overlay.

In addition to generating photos, Gemini apps can also take voice instructions, text, and files like PDFs and soon-to-be films that are uploaded or imported from Google Drive. Conversations with Gemini applications on your phone translate to Gemini on the web, as you might anticipate, and vice versa if you’re using the same Google Account on both platforms.

Gemini Advanced

There are other ways to find Gemini models to help with jobs besides the Gemini apps. Gemini-infused features are gradually finding their way into Google Docs and Gmail, two of the company’s most popular programs.

You must have the Google One AI Premium Plan in order to benefit from the majority of them. The $20 AI Premium Plan, which is technically a component of Google One, grants access to Gemini in Google Workspace applications such as Docs, Maps, Slides, Sheets, Drive, and Meet. Additionally, it makes possible what Google refers to as Gemini Advanced, which integrates the company’s more advanced Gemini models into the Gemini applications.

Additionally, Gemini Advanced users occasionally receive extras such as a larger “context window,” priority access to new features, and the ability to run and edit Python code directly in Gemini. Gemini Advanced can reason through and retain about 750,000 words of conversational content (or 1,500 pages of documents). This is in contrast to the 24,000 words (48 pages) that the standard Gemini app can process.

Screenshot of a Google Gemini commercial

Users of Gemini Advanced can also access Google’s Deep Research tool, which creates research briefs by utilizing “long context capabilities” and “advanced reasoning.” Following your suggestion, the chatbot develops a multi-phase research plan, requests your approval, and then spends a few minutes searching the internet and producing a comprehensive report based on your query. More complicated queries like “Can you help me redesign my kitchen?” are intended to be addressed by it.

Additionally, Google provides Gemini Advanced users with a memory capability that enables the chatbot to contextualize your present conversation by drawing on previous exchanges with Gemini. NotebookLM, the company’s product that converts PDFs into AI-generated podcasts, is also used more frequently by Gemini Advanced users.

Users of Gemini Advanced can also access Google’s flagship model, Gemini 2.0 Pro, which is optimized for challenging math and coding issues.

Trip planning in Google Search, which generates personalized travel itineraries based on suggestions, is another Gemini Advanced exclusive.Gemini will create an itinerary that automatically updates to reflect any changes based on factors like flight times (derived from emails in a user’s Gmail inbox), meal preferences, and details about nearby attractions (derived from Google Search and Maps data), along with the distances between those attractions.

Corporate clients can also access Gemini across Google services with Gemini Business (an add-on for Google Workspace) and Gemini Enterprise plans. While Gemini Enterprise, which includes meeting note-taking, translated captions, document classification, and labeling, is typically more costly but is priced according to a business’s needs, Gemini Business starts at just $6 per user per month. (A yearly commitment is necessary for both options.)

Gemini resides on the side panel of Gmail, where it can compose emails and compile message threads. The similar panel can be found in Docs, where it facilitates content creation, editing, and idea generation. Gemini creates slides and unique visuals in Slides. Additionally, Gemini tracks and arranges data in Google Sheets, generating tables and formulas.

Recently, Google’s AI chatbot, Gemini, made its debut on Maps. It can summarize coffee shop evaluations and suggest ways to spend a day in a new place.

Gemini can also provide brief information about a project and summarize files and folders on Drive. Meanwhile, Gemini provides multilingual caption translation in Meet.

Recently, Gemini appeared as an AI writing tool in Google’s Chrome browser. You can use it to alter current text or create something entirely new; according to Google, it will take into account the page you are on while making suggestions.

Gemini is also present in Google’s database products, cloud security tools, and app development platforms (such as Firebase and Project IDX). It can also be found in applications like Google Photos, YouTube, and NotebookLM, which help with video brainstorming and note-taking.

Gemini is taking up the computational burden from Google’s range of AI-powered assistance tools for code generation and completion, Code Assist (previously Duet AI for Developers). Gemini also serves as the foundation for Google’s security solutions, such as Gemini in Threat Intelligence, which allows users to conduct natural language searches for persistent threats or signs of compromise and evaluate big chunks of possibly harmful code.

Gemini extensions and Gems

Gemini Advanced users can build unique chatbots called Gems using Gemini models, which were announced at Google I/O 2024. Natural language descriptions, such as “You’re my running coach,” can be used to create gems. “Give me a daily running plan” — and either kept confidential or shared with others.

Gems is accessible on desktop and mobile devices in the majority of languages and 150 countries. To finish bespoke projects, they will eventually have access to a wider range of integrations with Google services, such as Google Calendar, projects, Keep, and YouTube Music.

In relation to integrations, the web and mobile Gemini apps can access Google services through what Google refers to as “Gemini extensions.” Today, Gemini can answer questions like “Could you summarize my last three emails?” by integrating with YouTube, Gmail, and Google Drive. The Android-only apps Google Calendar, Keep, Tasks, YouTube Music, and Utilities, which manage on-device functions like alarms and timers, media controls, flashlights, volume, Wi-Fi, Bluetooth, and more, will enable Gemini to perform more tasks later this year.

Gemini Live in-depth voice chats

Users can have “in-depth” speech chats with Gemini through an experience called Gemini Live. It is accessible even when your phone is locked through the Gemini apps on mobile devices and the Pixel Buds Pro 2.

When Gemini Live is enabled, you can ask a clarifying question to the chatbot while it’s speaking (in one of many new voices), and it will adjust to your speech patterns in real time. Gemini is expected to eventually develop visual awareness, which would enable it to perceive and react to your environment through images or videos taken by the cameras on your cellphones.

Live is also intended to act as a kind of virtual coach, assisting you with brainstorming, event preparation, and other tasks. For example, Live can offer advise on public speaking and recommend which abilities to emphasize in a future job or internship interview.

Our review of Gemini Live is available here. Spoiler alert: Although it’s still early, we believe the function has a ways to go before it becomes really helpful.

Image generation via Imagen 3

Gemini users can utilize Google’s integrated Imagen 3 model to create artwork and photos.

In comparison to its predecessor, Imagen 2, Google claims that Imagen 3 is more “creative and detailed” in its generations and is better able to comprehend the text cues that it converts into images. Additionally, the model is the greatest Imagen model ever for text rendering and creates less visual defects and artifacts (at least according to Google).

After users complained about historical inaccuracies, Google was compelled to halt Gemini’s capacity to create photographs of people back in February 2024. However, as part of a pilot study, Google resumed people generation in August for specific customers, namely English-language users who had joined up for one of Google’s premium Gemini plans (such as Gemini Advanced).

Gemini for teens

Students can sign up using their Google Workspace for Education school accounts to access Google’s teen-focused Gemini experience, which was launched in June.

The teen-focused Gemini features “additional policies and safeguards,” such as a customized onboarding procedure and a “AI literacy guide” to “help teens use AI responsibly,” as Google puts it. With the exception of the “double check” option that searches the internet to verify that Gemini’s answers are correct, everything else is almost the same as the typical Gemini experience.

Gemini in smart home devices

From the Google TV Streamer to the Pixel 9 and 9 Pro to the newest Nest Learning Thermostat, an increasing number of Google-made gadgets use Gemini for improved performance.

Gemini, a feature of the Google TV Streamer, summarizes reviews and even entire TV seasons based on your interests.

Gemini will soon enhance Google Assistant’s conversational and analytical skills on the newest Nest thermostat (as well as Nest speakers, cameras, and smart displays).

Later this year, Google’s Nest Aware plan subscribers will get a sneak peek at new Gemini-powered features like natural language video search, AI descriptions for Nest camera footage, and suggested automations. While the Google Home app will surface videos and create device automations based on a description (e.g., “Did the kids leave their bikes in the driveway?” or “Have my Nest thermostat turn on the heating when I get home from work every Tuesday”), Nest cameras will comprehend what is happening in real-time video feeds (e.g., when a dog is digging in the garden).

In order to improve the naturalness of conversations, Google Assistant will also receive some updates later this year on Nest-branded and other smart home appliances. Along with the ability to “[more] easily go back and forth” and ask follow-up questions, improved voices are also on the horizon.

What can the Gemini models do?

Gemini models can do a variety of multimodal tasks, such as real-time picture and video captioning and speech transcription, since they are multimodal. As mentioned in the preceding section, several of these features have advanced to the product stage, and Google is indicating that many more will be available soon.

Naturally, it’s a little difficult to believe what the corporation says. Google did a terrible job with the initial Bard launch. More recently, it caused controversy with a video that was more aspirational than live and claimed to demonstrate Gemini’s potential.

Additionally, Google does not address some of the fundamental issues with current generative AI technology, such as its inherent biases and propensity to fabricate (i.e., hallucinate). It’s something to consider when thinking about using or paying for Gemini, but neither do its competitors.

Here are the current capabilities of the various Gemini levels and what they will be able to accomplish if they reach their maximum capacity, assuming for the purposes of this essay that Google is being true with their recent claims:

What you can do with Gemini Ultra

According to Google, Gemini Ultra’s multimodality allows it to assist with tasks like physics homework, step-by-step problem solving on a worksheet, and identifying potential errors in previously completed responses.

But in recent months, we haven’t seen much of Gemini Ultra. Neither the Google Gemini API pricing page nor the Gemini app display the model. That does not exclude Google from reintroducing Gemini Ultra to its lineup in the future, though.

According to Google, Ultra can also be used for activities like finding scientific publications that are pertinent to an issue. For example, the model can update a chart from one publication by creating the formulas required to recreate the chart, and it can also extract information from multiple papers with more timely data.

Image generation is technically supported by Gemini Ultra. However, that feature hasn’t yet been included in the model’s productized version, maybe because the mechanism is more intricate than the way programs like ChatGPT create photos. Gemini produces images “natively,” without the need for a middleman, as opposed to feeding commands to an image generator (such as DALL-E 3 in ChatGPT’s case).

Google’s fully managed AI development platform, Vertex AI, and its web-based app and platform developer tool, AI Studio, both offer Ultra as an API.

Gemini Pro’s capabilities

According to Google, its most recent Pro model, Gemini 2.0 Pro, is the finest one to date for complicated prompts and coding performance. Since it is presently only available as an experimental version, unforeseen problems may arise.

Gemini 2.0 Pro performs better than Gemini 1.5 Pro on tests that gauge factual accuracy, coding, reasoning, and math. Up to 1.4 million words, two hours of video, or twenty-two hours of audio can be fed into the model, which can then use that information to reason or provide answers to queries.

Google’s Deep Research feature is still powered by Gemini 1.5 Pro, nevertheless.

Code execution, a component of Gemini 2.0 Pro that was introduced in June along with Gemini 1.5 Pro, attempts to minimize errors in code produced by the model by iteratively improving it over a number of steps. (Code execution is compatible with Gemini Flash as well.)

Through a fine-tuning or “grounding” process, developers can tailor Gemini Pro to particular use cases and situations within Vertex AI. For instance, instead of using its own knowledge library, Pro (and other Gemini models) can be told to use data from third-party sources such as Moody’s, Thomson Reuters, ZoomInfo, and MSCI, or to get information from company datasets or Google Search. Additionally, Gemini Pro can be linked to third-party, external APIs to carry out specific tasks, such as automating a back-office process.

AI Studio provides templates for using Pro to create structured conversation prompts. In addition to adjusting Pro’s safety parameters, developers can manage the model’s creative range and offer samples to convey tone and style guidelines.

Gemini-powered “agents” can be created within Vertex AI using the Vertex AI Agent Builder. For instance, a business might develop an agent that studies past advertising campaigns to identify a brand’s style, then uses that understanding to assist come up with fresh concepts that fit the style.

Gemini Flash is lighter but packs a punch

Google refers to its AI model for the agentic era as Gemini 2.0 Flash. In addition to text, the model can create graphics and music natively. It can also communicate with external APIs and utilize resources like Google Search.

On benchmarks measuring coding and image analysis, the 2.0 Flash model exceeds some of the larger Gemini 1.5 models and is faster than Gemini’s prior generation of models. Gemini 2.0 Flash is available through Google’s AI development platforms as well as the Gemini web or mobile app.

Google unveiled a “thinking” version of Gemini 2.0 Flash in December that can “reason,” meaning that the AI model must go backwards through an issue for a few seconds before providing a solution.

Google released Gemini 2.0 Flash thinking in the Gemini app in February. Google also unveiled Gemini 2.0 Flash-Lite, a scaled-down version, that same month. Although it costs the same and has the same speed, the firm claims that this model performs better than its Gemini 1.5 Flash variant.

Flash is a compact and effective Gemini Pro derivative designed for specific, high-frequency generative AI tasks. Like Gemini Pro, it is multimodal, meaning it can analyze text, photos, video, and audio (but only produce text). According to Google, Flash is especially well-suited for activities like data extraction from lengthy papers and tables, image and video captioning, and chat programs.

Context caching is an optional feature that allows developers working with Flash and Pro to store a lot of data (such as a knowledge base or database of research papers) in a cache that Gemini models can access fast and affordably. However, context caching is an extra cost on top of the other usage fees for the Gemini model.

Gemini Nano can run on your phone

A far smaller variant of the Gemini Pro and Ultra editions, the Gemini Nano is powerful enough to run on (certain) devices directly rather than transferring the operation to a server. As of right now, Nano powers a few functions on the Pixel 8 Pro, Pixel 8 Pro, Pixel 9 Pro, Pixel 9, and Samsung Galaxy S24, such as Smart Reply in Gboard and Summarize in Recorder.

A Gemini-powered synopsis of recorded talks, interviews, presentations, and other audio clips is included in the Recorder app, which allows users to record and transcribe audio by pressing a button. Even without a signal or Wi-Fi connection, users can receive summaries, and in an effort to protect their privacy, no data is sent from their phone throughout the process.

Google Gemini: Everything you need to know about the generative AI models

Gboard, Google’s keyboard substitute, also uses Nano. It is responsible for a function in messaging apps like WhatsApp called Smart Reply, which helps to predict what you would want to say next.

On compatible devices, Nano powers Magic Compose in the Google Messages app, which can create messages in “excited,” “formal,” and “lyrical” styles.

According to Google, Nano will be used in a future Android version to warn consumers of possible scams while on the phone.Gemini Nano is used by the new weather app for Pixel phones to create customized weather reports. Additionally, Nano is used by Google’s accessibility tool TalkBack to generate audio descriptions of things for blind and low-vision users.

How much do the Gemini models cost?

Google’s Gemini API allows developers to create apps and services using Gemini 1.5 Pro, 1.5 Flash, 2.0 Flash, and 2.0 Flash-Lite, all of which have free choices. However, the free versions have usage restrictions and lack several capabilities, such as batching and context caching.

Otherwise, Gemini models are pay-as-you-go. As of September 2024, the base price is as follows, excluding add-ons like context caching:

For prompts up to 128K tokens, Gemini 1.5 Pro costs $1.25 per 1 million input tokens, or $2.50 per 1 million input tokens; for prompts longer than 128K tokens, the price is $5 per 1 million output tokens, or $10 per 1 million output tokens.
Gemini 1.5 Flash:For prompts up to 128K tokens, the price is 7.5 cents per 1 million input tokens; for prompts longer than 128K tokens, it is 15 cents per 1 million input tokens; for prompts up to 128K tokens, it is 30 cents per 1 million output tokens; and for prompts longer than 128K tokens, it is 60 cents per 1 million output tokens.
10 cents per million input tokens and 40 cents per million output tokens are the prices for Gemini 2.0 Flash. In particular, it costs 70 centers for every million input tokens and 40 centers for every million output tokens for audio.
30 cents per million output tokens and 7.5 cents per million input tokens are the prices for Gemini 2.0 Flash-Lite.

The syllables “fan,” “tas,” and “tic” in the word “fantastic” are examples of tokens, which are split pieces of raw data; one million tokens is around 700,000 words. Tokens given into the model are referred to as input, and tokens generated by the model are referred to as output.

Nano is still in early access, and the price of 2.0 Pro has not yet been revealed.

What’s the latest on Project Astra?

Google DeepMind’s Project Astra aims to develop AI-powered applications and “agents” for multimodal, real-time comprehension. Google has demonstrated in demos how the AI model can interpret live audio and video simultaneously. In December, Google made an app version of Project Astra available to a select group of reliable testers; however, at this time, there are no plans for a wider release.

The company wants to incorporate Project Astra into a smart eyewear device. In December, Google also sent a few trusted testers with a prototype of certain spectacles that included augmented reality and Project Astra features. But as of right now, there isn’t a defined product, and it’s uncertain when Google will actually create something similar.

Project Astra is merely that—a project—rather than a finished good. But the Astra demos show what Google wants its AI products to do in the future.

Is Gemini coming to the iPhone?

According to Apple, discussions are on to leverage Gemini and other third-party models for certain Apple Intelligence suite features. Apple SVP Craig Federighi acknowledged plans to work with models, including Gemini, after giving a keynote address at WWDC 2024, although he withheld any further information.