Ollama Plugin

The Ollama plugin provides interfaces to any of the local LLMs supported by Ollama.

Installation

npm install genkitx-ollama

Configuration

This plugin requires that you first install and run the Ollama server. You can follow the instructions on: Download Ollama.

You can use the Ollama CLI to download the model you are interested in. For example:

ollama pull gemma

To use this plugin, specify it when you initialize Genkit:

import { genkit } from 'genkit';
import { ollama } from 'genkitx-ollama';

const ai = genkit({
  plugins: [
    ollama({
      models: [
        {
          name: 'gemma',
          type: 'generate', // type: 'chat' | 'generate' | undefined
        },
      ],
      serverAddress: 'http://127.0.0.1:11434', // default local address
    }),
  ],
});

Authentication

If you would like to access remote deployments of Ollama that require custom headers (static, such as API keys, or dynamic, such as auth headers), you can specify those in the Ollama config plugin:

Static headers:

ollama({
  models: [{ name: 'gemma'}],
  requestHeaders: {
    'api-key': 'API Key goes here'
  },
  serverAddress: 'https://my-deployment',
}),

You can also dynamically set headers per request. Here’s an example of how to set an ID token using the Google Auth library:

import { GoogleAuth } from 'google-auth-library';
import { ollama } from 'genkitx-ollama';
import { genkit } from 'genkit';

const ollamaCommon = { models: [{ name: 'gemma:2b' }] };

const ollamaDev = {
  ...ollamaCommon,
  serverAddress: 'http://127.0.0.1:11434',
};

const ollamaProd = {
  ...ollamaCommon,
  serverAddress: 'https://my-deployment',
  requestHeaders: async (params) => {
    const headers = await fetchWithAuthHeader(params.serverAddress);
    return { Authorization: headers['Authorization'] };
  },
};

const ai = genkit({
  plugins: [ollama(isDevEnv() ? ollamaDev : ollamaProd)],
});

// Function to lazily load GoogleAuth client
let auth: GoogleAuth;
function getAuthClient() {
  if (!auth) {
    auth = new GoogleAuth();
  }
  return auth;
}

// Function to fetch headers, reusing tokens when possible
async function fetchWithAuthHeader(url: string) {
  const client = await getIdTokenClient(url);
  const headers = await client.getRequestHeaders(url); // Auto-manages token refresh
  return headers;
}

async function getIdTokenClient(url: string) {
  const auth = getAuthClient();
  const client = await auth.getIdTokenClient(url);
  return client;
}

Usage

This plugin doesn’t statically export model references. Specify one of the models you configured using a string identifier:

const llmResponse = await ai.generate({
  model: 'ollama/gemma',
  prompt: 'Tell me a joke.',
});

Embedders

The Ollama plugin supports embeddings, which can be used for similarity searches and other NLP tasks.

const ai = genkit({
  plugins: [
    ollama({
      serverAddress: 'http://localhost:11434',
      embedders: [{ name: 'nomic-embed-text', dimensions: 768 }],
    }),
  ],
});

async function getEmbeddings() {
  const embeddings = (
    await ai.embed({
      embedder: 'ollama/nomic-embed-text',
      content: 'Some text to embed!',
    })
  )[0].embedding;

  return embeddings;
}

getEmbeddings().then((e) => console.log(e));

The Ollama plugin provides interfaces to any of the local LLMs supported by Ollama.

Prerequisites

This plugin requires that you first install and run the Ollama server. You can follow the instructions on the Download Ollama page.

Use the Ollama CLI to download the models you are interested in. For example:

ollama pull gemma3

For development, you can run Ollama on your development machine. Deployed apps usually run Ollama on a GPU-accelerated machine that is different from the one hosting the app backend running Genkit.

Configuration

To use this plugin, pass ollama.Ollama to WithPlugins() in the Genkit initializer, specifying the address of your Ollama server and the response timeout (defaulted to 30 seconds):

import (
    "context"
    "log"

    "github.com/firebase/genkit/go/ai"
    "github.com/firebase/genkit/go/genkit"
    "github.com/firebase/genkit/go/plugins/ollama"
)

func main() {
  ctx := context.Background()

  ollamaPlugin := &ollama.Ollama{
      ServerAddress: "http://127.0.0.1:11434",
      Timeout:       60, // Optional field, adjust accordingly
    }

  g := genkit.Init(ctx, genkit.WithPlugins(ollamaPlugin))
}

Usage

To generate content, you first need to create a model definition based on the model you installed and want to use. For example, if you installed Gemma 3:

model := ollama.DefineModel(
  ollama.ModelDefinition{
    Name: "gemma3",
    Type: "chat", // "chat" or "generate"
  },
  &ai.ModelOptions{
    Supports: &ai.ModelSupports{
      Multiturn:  true,
      SystemRole: true,
      Tools:      false,
      Media:      false,
    },
  },
)

Then, you can use the model reference to send requests to your Ollama server:

resp, err := genkit.Generate(ctx,
  g,
  ai.WithModel(model),
  ai.WithPrompt("Tell me a joke."),
)
if err != nil {
    return err
}

log.Println(resp.Text())

Or you can refer to the model by its name:

resp, err := genkit.Generate(ctx,
  g,
  ai.WithModelName("ollama/gemma3:latest"),
  ai.WithPrompt("Tell me a joke."),
)
if err != nil {
    return err
}

log.Println(resp.Text())

See Generating content for more information.

The Ollama plugin provides interfaces to any of the local LLMs supported by Ollama.

Prerequisites

This plugin requires that you first install and run the Ollama server. You can follow the instructions on the Download Ollama page.

Use the Ollama CLI to download the models you are interested in. For example:

ollama pull llama3.2
ollama pull gemma2
ollama pull mistral

For development, you can run Ollama on your development machine. Deployed apps usually run Ollama on a GPU-accelerated machine that is different from the one hosting the app backend running Genkit.

Installation

uv add genkit-plugin-ollama

Configuration

To use this plugin, import Ollama and specify it when you initialize Genkit:

from genkit import Genkit
from genkit.plugins.ollama import Ollama, ollama_name
from genkit.plugins.ollama.models import ModelDefinition

ai = Genkit(
    plugins=[
        Ollama(
            models=[
                ModelDefinition(name='llama3.2'),
                ModelDefinition(name='gemma2'),
            ],
            server_address='http://127.0.0.1:11434',  # default local address
        )
    ],
    model=ollama_name('llama3.2'),  # optional default model
)

Authentication

If you would like to access remote deployments of Ollama that require custom headers (such as API keys), you can specify those in the Ollama plugin configuration:

ai = Genkit(
    plugins=[
        Ollama(
            models=[ModelDefinition(name='gemma2')],
            server_address='https://my-deployment',
            request_headers={
                'api-key': 'API Key goes here'
            },
        )
    ],
)

Usage

This plugin doesn’t statically export model references. Specify one of the models you configured using the ollama_name() helper or a string identifier:

from genkit import Genkit
from genkit.plugins.ollama import Ollama, ollama_name
from genkit.plugins.ollama.models import ModelDefinition

ai = Genkit(
    plugins=[
        Ollama(
            models=[ModelDefinition(name='llama3.2')],
        )
    ],
)

@ai.flow()
async def llama_flow(prompt: str) -> str:
    """Generate text using Llama.

    Args:
        prompt: The prompt to generate from.

    Returns:
        The generated text.
    """
    response = await ai.generate(
        model=ollama_name('llama3.2'),
        prompt=prompt,
    )
    return response.text

Or reference the model directly by string:

response = await ai.generate(
    model='ollama/llama3.2',
    prompt='Tell me a joke.',
)

Advanced usage

Embeddings

The Ollama plugin supports embeddings, which can be used for similarity searches and other NLP tasks:

from genkit import Genkit
from genkit.plugins.ollama import Ollama
from genkit.plugins.ollama.embedders import EmbeddingDefinition

ai = Genkit(
    plugins=[
        Ollama(
            server_address='http://localhost:11434',
            embedders=[
                EmbeddingDefinition(
                    name='nomic-embed-text',
                    dimensions=768,
                )
            ],
        )
    ],
)

@ai.flow()
async def get_embeddings(text: str) -> list[float]:
    """Generate embeddings for text.

    Args:
        text: The text to embed.

    Returns:
        The embedding vector.
    """
    result = await ai.embed(
        embedder='ollama/nomic-embed-text',
        content=text,
    )
    return result

Streaming

Ollama models support streaming responses for real-time output:

from genkit import ActionRunContext

@ai.flow()
async def streaming_story(topic: str, ctx: ActionRunContext | None = None) -> str:
    """Generate a story with streaming output.

    Args:
        topic: Story topic.
        ctx: Action context for streaming chunks.

    Returns:
        The complete generated story.
    """
    response = await ai.generate(
        model=ollama_name('llama3.2'),
        prompt=f'Write a short story about {topic}',
        on_chunk=ctx.send_chunk if ctx else None,
    )
    return response.text

Tool Calling

Some Ollama models support tool calling (e.g., Mistral, Llama 3.1+):

from pydantic import BaseModel, Field

class WeatherInput(BaseModel):
    """Input for weather tool."""
    location: str = Field(description='City name')

@ai.tool(description='Get current weather for a location')
def get_weather(input: WeatherInput) -> str:
    """Get the current weather for a location."""
    # In a real implementation, call a weather API
    return f'The weather in {input.location} is 72°F and sunny.'

@ai.flow()
async def weather_flow(location: str) -> str:
    """Get weather information using Ollama with tool calling.

    Note: Requires a model that supports tools, such as mistral-nemo
    or llama3.1 and newer.

    Args:
        location: The location to get weather for.

    Returns:
        Weather information for the location.
    """
    response = await ai.generate(
        model=ollama_name('mistral-nemo'),
        prompt=f"What's the weather like in {location}?",
        tools=['get_weather'],
    )
    return response.text

Structured Output

Generate structured data using Pydantic models:

from pydantic import BaseModel, Field
from genkit import Output

class Recipe(BaseModel):
    """A cooking recipe."""
    name: str = Field(description='Recipe name')
    ingredients: list[str] = Field(description='List of ingredients')
    steps: list[str] = Field(description='Cooking steps')
    prep_time_minutes: int = Field(description='Preparation time in minutes')

@ai.flow()
async def create_recipe(dish: str) -> Recipe:
    """Generate a recipe with structured output.

    Args:
        dish: The dish to create a recipe for.

    Returns:
        A structured recipe.
    """
    response = await ai.generate(
        model=ollama_name('llama3.2'),
        prompt=f'Create a recipe for {dish}',
        output=Output(schema=Recipe),
    )
    return response.output

Custom Server Configuration

For production deployments or custom Ollama server locations:

ai = Genkit(
    plugins=[
        Ollama(
            models=[ModelDefinition(name='llama3.2')],
            server_address='http://ollama-server.internal:11434',
            request_headers={
                'X-Custom-Header': 'value',
            },
        )
    ],
)