EasyLM

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax. EasyLM can scale up LLM training to hundreds of TPU/GPU accelerators by leveraging JAX's pjit functionality.

Building on top of Hugginface's transformers and datasets, this repo provides an easy to use and easy to customize codebase for training large language models without the complexity in many other frameworks.

EasyLM is built with JAX/Flax. By leveraging JAX's pjit utility, EasyLM is able to train large models that don't fit on a single accelerator by sharding the model weights and training data across multiple accelerators. Currently, EasyLM supports multiple TPU/GPU training in a single host as well as multi-host training on Google Cloud TPU Pods.

Currently, the following models are supported:

Discord Server

We are running an unofficial Discord community (unaffiliated with Google) for discussion related to training LLMs in JAX. Follow this link to join the Discord server. We have dedicated channels for several JAX based LLM frameworks, include EasyLM, JaxSeq, Alpa and Levanter.

Models Trained with EasyLM

OpenLLaMA

OpenLLaMA is our permissively licensed reproduction of LLaMA which can be used for commercial purposes. Check out the project main page here. The OpenLLaMA can serve as drop in replacement for the LLaMA weights in EasyLM. Please refer to the LLaMA documentation for more details.

Koala

Koala is our new chatbot fine-tuned on top of LLaMA. If you are interested in our Koala chatbot, you can check out the blogpost and documentation for running it locally.

Installation

The installation method differs between GPU hosts and Cloud TPU hosts. The first step is to pull from GitHub.

git clone https://github.com/young-geng/EasyLM.git
cd EasyLM
export PYTHONPATH="${PWD}:$PYTHONPATH"

Installing on GPU Host

The GPU environment can be installed via Anaconda.

conda env create -f scripts/gpu_environment.yml
conda activate EasyLM

Installing on Cloud TPU Host

The TPU host VM comes with Python and PIP pre-installed. Simply run the following script to set up the TPU host.

./scripts/tpu_vm_setup.sh

Documentations

The EasyLM documentations can be found in the docs directory.

Reference

If you found EasyLM useful in your research or applications, please cite using the following BibTeX:

@software{geng2023easylm,
  author = {Geng, Xinyang},
  title = {EasyLM: A Simple And Scalable Training Framework for Large Language Models},
  month = March,
  year = 2023,
  url = {https://github.com/young-geng/EasyLM}
}

Credits

The LLaMA implementation is from JAX_llama
The JAX/Flax GPT-J and RoBERTa implementation are from transformers
Most of the JAX utilities are from mlxu
The codebase is heavily inspired by JAXSeq

Gemma-3-27B Fine-Tuning with Hugging Face

This repository contains scripts for fine-tuning the Google Gemma-3-27B model using Hugging Face's Transformers and PEFT libraries. The setup is optimized for 2x A100 80GB GPUs.

Setup Overview

Model: google/gemma-3-27b-pt (27B parameters)
Training Method: LoRA fine-tuning with 4-bit quantization
Hardware: 2x NVIDIA A100 80GB GPUs
Distributed Training: DeepSpeed ZeRO-3 with parameter and optimizer offloading

Requirements

Make sure you have:

2x A100 80GB GPUs
CUDA 12.x installed
Python 3.10+
Hugging Face account with access to Gemma-3 models
WANDB account (optional but recommended for tracking)

Installation

Install dependencies with the provided script:

bash install_requirements.sh

Data Format

The training scripts expect a JSONL file with examples and a YAML template file. The default template format is:

sequence:
  - no_loss: "{instruction}{input}\n"
  - no_loss: '<msg username="{author}">'
  - with_loss: "{output}"
  - with_loss: '</msg>\n'

Make sure your JSONL file has the corresponding fields (instruction, input, author, output).

Training

To run the training:

bash run_gemma_sft.sh

This script:

Creates necessary directories
Configures DeepSpeed ZeRO-3 with optimal settings for 2x A100 GPUs
Launches distributed training with 4-bit quantization
Uses gradient checkpointing and other memory optimizations
Logs metrics to WANDB

Key Configuration Settings

For 2x A100 80GB GPUs:

Batch size: 1 per GPU
Gradient accumulation steps: 16
Effective batch size: 32 (1 × 2 GPUs × 16 accumulation steps)
4-bit quantization (NF4)
Gradient checkpointing enabled
LoRA rank: 32, alpha: 64

Inference

To run inference with your fine-tuned model:

python inference.py --adapter_path /mnt/disk2/gemma_sft_output --load_in_4bit --interactive

Additional Tools

trl_example.py: Example script for TRL (RLHF) fine-tuning
merge_lora.py: Utility to merge LoRA weights with the base model

Memory Considerations

The configuration is optimized for 2x A100 80GB GPUs. Key memory optimizations:

4-bit quantization
DeepSpeed ZeRO-3 with CPU offloading
Gradient checkpointing
Small per-device batch size with gradient accumulation

Troubleshooting

If you encounter out-of-memory errors:

Reduce per_device_train_batch_size to 1 (already set)
Increase gradient_accumulation_steps (e.g., 16 to 24)
Enable more aggressive CPU offloading in DeepSpeed config
Reduce sequence length if possible

Name		Name	Last commit message	Last commit date
Latest commit History 1,621 Commits
.idea		.idea
EasyDelStuff		EasyDelStuff
EasyLM		EasyLM
config		config
docs		docs
easydel		easydel
examples		examples
mlxu		mlxu
prompts		prompts
scripts		scripts
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cleanup_tpu_pod.sh		cleanup_tpu_pod.sh
cleanup_worker.sh		cleanup_worker.sh
convert_llama.sh		convert_llama.sh
coordinator_connector.py		coordinator_connector.py
create_infer1.sh		create_infer1.sh
deploy_tpu_pod.sh		deploy_tpu_pod.sh
download_llama.py		download_llama.py
download_llama31_1b.sh		download_llama31_1b.sh
easydel_serve.py		easydel_serve.py
gemma_sft_train.py		gemma_sft_train.py
inference.py		inference.py
inspect_dataset.py		inspect_dataset.py
install_requirements.sh		install_requirements.sh
kill_all.sh		kill_all.sh
merge_lora.py		merge_lora.py
mount_nfs.sh		mount_nfs.sh
preprocess_dataset.py		preprocess_dataset.py
reboot_tpu.sh		reboot_tpu.sh
requirements_tree_conversation.txt		requirements_tree_conversation.txt
run_gemma3_simple.sh		run_gemma3_simple.sh
run_gemma_no_quantization.sh		run_gemma_no_quantization.sh
run_gemma_sft.sh		run_gemma_sft.sh
run_gemma_sft_no_deepspeed.sh		run_gemma_sft_no_deepspeed.sh
run_gemma_simple.sh		run_gemma_simple.sh
run_gemma_single_gpu.sh		run_gemma_single_gpu.sh
serve_tpu_pod.sh		serve_tpu_pod.sh
setup_proxy.sh		setup_proxy.sh
setup_tpu_worker.sh		setup_tpu_worker.sh
simulects.json		simulects.json
start_tree_conversation.py		start_tree_conversation.py
system_integration.py		system_integration.py
test_flash_attention.sh		test_flash_attention.sh
test_flash_attention_pod.sh		test_flash_attention_pod.sh
test_gemma3_load.py		test_gemma3_load.py
test_gemma_direct.py		test_gemma_direct.py
test_gemma_direct_class.py		test_gemma_direct_class.py
test_gemma_final.py		test_gemma_final.py
test_gemma_inference.py		test_gemma_inference.py
test_gemma_inference_small.py		test_gemma_inference_small.py
test_gemma_simple.py		test_gemma_simple.py
test_gemma_simple_fix.py		test_gemma_simple_fix.py
test_inference.sh		test_inference.sh
test_inference_1b.sh		test_inference_1b.sh
test_inference_sim.sh		test_inference_sim.sh
test_lora_grad.py		test_lora_grad.py
test_lora_inference.sh		test_lora_inference.sh
test_text_processor.py		test_text_processor.py
test_tpu_pod.sh		test_tpu_pod.sh
train_gemma3_single_gpu.sh		train_gemma3_single_gpu.sh
train_tpu_pod.sh		train_tpu_pod.sh
tree_conversation_backend.py		tree_conversation_backend.py
tree_conversation_env.yml		tree_conversation_env.yml
tree_conversation_env_fixed.yml		tree_conversation_env_fixed.yml
tree_conversation_env_p310.yml		tree_conversation_env_p310.yml
tree_conversation_ui.html		tree_conversation_ui.html
trl_example.py		trl_example.py
worker_cleanup.sh		worker_cleanup.sh
worker_serve.sh		worker_serve.sh
worker_serve_70b.sh		worker_serve_70b.sh
worker_train.sh		worker_train.sh
worker_train_1b.sh		worker_train_1b.sh
worker_train_70b.sh		worker_train_70b.sh
worker_train_lora.sh		worker_train_lora.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EasyLM

Discord Server

Models Trained with EasyLM

OpenLLaMA

Koala

Installation

Installing on GPU Host

Installing on Cloud TPU Host

Documentations

Reference

Credits

Gemma-3-27B Fine-Tuning with Hugging Face

Setup Overview

Requirements

Installation

Data Format

Training

Key Configuration Settings

Inference

Additional Tools

Memory Considerations

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EasyLM

Discord Server

Models Trained with EasyLM

OpenLLaMA

Koala

Installation

Installing on GPU Host

Installing on Cloud TPU Host

Documentations

Reference

Credits

Gemma-3-27B Fine-Tuning with Hugging Face

Setup Overview

Requirements

Installation

Data Format

Training

Key Configuration Settings

Inference

Additional Tools

Memory Considerations

Troubleshooting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages