Updated production-ready Gemini models, reduced 1.5 Pro pricing, increased rate limits, and more

Learn about the latest updates to Google's Gemini models, including reduced pricing for Gemini 1.5 Pro, increased rate limits, faster performance, enhanced quality, and more.

Logan Kilpatrick, Shrestha Basu Mallick
4 min readbeginner
--
View Original

Overview

The article discusses the release of updated production-ready Gemini models, specifically Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002, highlighting significant improvements in pricing, performance, and usability for developers. Key enhancements include reduced pricing, increased rate limits, faster output, and updated filter settings aimed at improving overall model quality and helpfulness.

What You'll Learn

1

How to access and utilize the updated Gemini models for various applications

2

Why the pricing changes for Gemini 1.5 Pro and Flash models are beneficial for developers

3

How to implement context caching to reduce costs when using Gemini models

Key Questions Answered

What improvements have been made in the Gemini 1.5 models?
The Gemini 1.5 models have seen over 50% reduced pricing, 2x higher rate limits for 1.5 Flash, and approximately 3x higher for 1.5 Pro. Additionally, they offer 2x faster output and 3x lower latency, along with improved overall quality in math, long context, and vision tasks.
How do the new Gemini models enhance developer experience?
The updated models provide a more concise response style, improved helpfulness, and reduced output length by 5-20%, making them easier to use. Developers can also apply safety filters based on their specific use cases, enhancing customization.
What are the new rate limits for Gemini 1.5 models?
The paid tier rate limits for Gemini 1.5 Flash have been increased to 2,000 RPM, while Gemini 1.5 Pro has been raised to 1,000 RPM, up from the previous limits of 1,000 and 360 RPM respectively.
What are the pricing changes for Gemini 1.5 Pro?
Effective October 1st, 2024, there will be a 64% reduction on input tokens, a 52% reduction on output tokens, and a 64% reduction on incremental cached tokens for the Gemini 1.5 Pro model, specifically for prompts less than 128K tokens.

Key Statistics & Figures

Price reduction on input tokens for Gemini 1.5 Pro
64%
Effective October 1st, 2024, for prompts less than 128K tokens.
Price reduction on output tokens for Gemini 1.5 Pro
52%
Effective October 1st, 2024, for prompts less than 128K tokens.
Rate limit for Gemini 1.5 Flash
2,000 RPM
Increased from the previous limit of 1,000 RPM.
Rate limit for Gemini 1.5 Pro
1,000 RPM
Increased from the previous limit of 360 RPM.
Improvement in MMLU-Pro benchmark
~7%
Indicates overall quality enhancement of the models.
Improvement in math benchmarks
~20%
Significant improvement in performance on MATH and HiddenMath benchmarks.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

AI/ML
Gemini
Used for various text, code, and multimodal tasks.
Platform
Google AI Studio
Platform for accessing the latest Gemini models.
API
Gemini API
API for integrating Gemini models into applications.
Cloud Service
Vertex AI
Available for larger organizations and Google Cloud customers.

Key Actionable Insights

1
Take advantage of the reduced pricing for Gemini 1.5 Pro and Flash models to optimize your AI/ML projects.
With significant reductions in token costs, developers can experiment more freely and build scalable applications without the financial burden.
2
Utilize the increased rate limits to enhance the performance of your applications using the Gemini models.
Higher rate limits allow for more requests per minute, enabling developers to handle larger workloads and improve application responsiveness.
3
Implement context caching to further reduce costs when using the Gemini models.
By caching context, developers can minimize repeated processing, leading to lower token usage and more efficient application performance.

Common Pitfalls

1
Failing to utilize the updated filter settings may lead to less tailored responses from the models.
Developers should apply the appropriate safety filters based on their specific use cases to ensure the models provide the most relevant and safe outputs.