Introducing Gemma models in Keras

Martin Görner

The Keras team is happy to announce that Gemma, a family of lightweight, state-of-the art open model...

Google

•

Martin Görner

•5 min read•intermediate•

--

•View Original

Fine-tuningGeminiGoogle CloudJavaScriptJAXKerasMistralPyTorchTensorFlowTransformer

Overview

The article introduces Gemma models in Keras, a family of lightweight, state-of-the-art open models that leverage the same technology as the Gemini models. It highlights the new features in Keras 3, including support for multiple backends and enhancements for large language models.

What You'll Learn

1

How to get started with Gemma models in Keras

2

How to fine-tune Gemma models using LoRA

3

How to implement distributed training for Gemma models on multiple GPUs/TPUs

Prerequisites & Requirements

Familiarity with Keras and large language models
Access to JAX, PyTorch, or TensorFlow(optional)

Key Questions Answered

What are Gemma models and how do they compare to other models?

Gemma models are lightweight, state-of-the-art open models built using the same technology as the Gemini models. The Gemma 7B model scores 64.3% on the MMLU benchmark, outperforming Mistral-7B and Llama2-13B, making it a competitive choice in language understanding tasks.

How can I fine-tune Gemma models efficiently?

You can fine-tune Gemma models using the new LoRA API in Keras, which allows for parameter-efficient tuning. By enabling LoRA with a rank of 4, the number of trainable parameters can be reduced from 2.5 billion to just 1.3 million, making the process more efficient.

What are the new features introduced in Keras 3 for large language models?

Keras 3 introduces several new features for large language models, including a new LoRA API for efficient fine-tuning and large-scale model-parallel training capabilities. These features enhance the flexibility and scalability of model training.

How do I set up distributed training for Gemma models?

To set up distributed training for Gemma models, you can use the Keras distribution API to configure a device mesh for model parallelism. This allows you to efficiently train the model across multiple GPUs or TPUs, optimizing resource utilization.

Key Statistics & Figures

MMLU benchmark score

64.3%

Gemma 7B outperforms Mistral-7B (62.5%

GSM8K benchmark score

46.4%

Gemma 7B achieves this score compared to Mistral-7B (35.4%

HumanEval coding challenge score

32.3%

Gemma 7B's performance surpasses Mistral 7B (26.2%

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Framework

Keras

Used for building and training the Gemma models.

Backend

Jax

One of the backends supported by Keras 3 for running Gemma models.

Backend

Pytorch

Another backend option available in Keras 3 for model execution.

Backend

Tensorflow

A supported backend in Keras 3 for running Gemma models.

Key Actionable Insights

1
Leverage the LoRA API for efficient fine-tuning of Gemma models.
Using the LoRA API can significantly reduce the number of parameters you need to train, making it easier and faster to fine-tune models for specific tasks.

2
Utilize the Keras distribution API for large-scale training.
The Keras distribution API allows you to take advantage of multiple GPUs or TPUs, which can drastically improve training times and model performance.

3
Experiment with different backends in Keras 3.
Keras 3 supports JAX, PyTorch, and TensorFlow, giving you the flexibility to choose the backend that best fits your project's needs and infrastructure.

Common Pitfalls

1

Failing to properly configure the backend can lead to runtime errors.

Ensure that the backend is set correctly before importing Keras to avoid compatibility issues and to leverage the desired features of the selected backend.

Related Concepts

Distributed Training Techniques

Parameter-efficient Fine-tuning Methods

Large Language Model Architectures