Skip to content

Add --model-tag to scripts.chat_cli call in runcpu.sh/speedrun.sh example #542

@obriensystems

Description

@obriensystems

Excellent work on this repo. I and am sure the entire community greatly appreciates the ability to experiment with frontier models from scratch.
I have been using the runcpu.sh (working out distribution both in Win/wsl and OSX:thunderbolt5 clusters) script primarily on the following local GPUs - everything is working very well.

using up to 75% of ram on ARM gpus (Apple or NVIDIA), 100% of vram on discrete NVIDIA gpus
DGX Spark 128g, m4Max 40c 48g, m3Ultra 60c 96g, m2ultra 60c 64g , m4Pro 16c 24g, dual 4090 24g, RTX-A6000 48g, dual RTX-A4500 20g, RTX-A4000 16g, RTX-3500ada 12g,

Target a specific model off chatsft_checkpoints as d$DEPTH

A minor issue occurs with inference when running scripts.chat_cli when multiple models have been created under the .local cache. Initially I was unable to get past the default BOS tag when chatting with the model. It turns out I needed to target the latest model directly in the case the default (the largest depth size) had SFT issues and I am running a smaller depth

when doing SFT - the default will be d16 even though we may be running a depth of 8
2026-02-17 21:19:02,572 - nanochat.checkpoint_manager - INFO - No model tag provided, guessing model tag: d16

in that case add the following to the call to scripts.chat_sft

--model-tag=d8

Before

Assistant: <|bos|>

After

Assistant: The capital of France is Paris. It is a city known for its historical landmarks....

I can put up a 1-line forked repo PR of the line below - it can stay commented and is for reference only - to help out those developers new to this excellent project.

changes around line 78
https://github.com/obriensystems/nanochat/blob/dev/runs/runcpu.sh#L79

export DEPTH=8
export BATCH_SIZE=16
export MODEL_TAG="d${DEPTH}"
..
# target output from SFT in ~/.cache/nanochat/chatsft_checkpoints/d$DEPTH = $MODEL_TAG
python -m scripts.chat_cli -p "What is the capital of France?" --model-tag $MODEL_TAG

Metadata

Metadata

Assignees

No one assigned

    Labels

    scriptsEdits in the bash scripts

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions