Metabase | Business Intelligence, Dashboards, and Data Visualization

Meet Repro-Bot, our GitHub issue triage agent

Fri, 10 Apr 2026 00:00:00 +0000

Reproducing bug reports is one of the most time-consuming parts of maintaining an open source project. We built an AI agent called Repro-Bot to help us with this task. In this post, we’re sharing how we built it and show you how you can build your own. If you want to skip ahead and check out our code, take a look at the Repro-Bot repo!

Repro-Bot automates the boring parts

Think about how you, a human person, would reproduce a bug report:

Set up an environment identical (or at least similar) to what the reporter has.
Follow the steps provided in the issue.
If you can reproduce the bug:
- Write a test for the bug
- Fix it
If you can’t reproduce, think through why:
- Is there information missing?
- Are there any hidden dependencies?
- Did anything change recently?
- etc

This process is a mix of judgement calls (like ”fix it,” or what constitutes a relevant change), and chores like setting up the environment, following the steps, and asking the same questions over and over.

Repro-Bot automates the boring parts and gets us started on fixing the issue.

Results: repro steps, findings, possible root cause

As Repro-Bot attempts to repro an issue, it generates a report with its findings, a pointer to where in the code the bug probably occurs, etc.. For example, here is the first part of its output for this issue about disappearing percentages on some pie charts.

This information helps us respond to the person who reported the issue faster and get more details from them while it’s still fresh in their mind. We’ve also cleared out a number of issues from our backlog that Repro-Bot confirmed we had already fixed.

Of course, Repro-Bot isn’t infallible. Sometimes it can’t repro an issue. Sometimes it thinks it has reproduced a bug when it hasn’t. But even in those cases, Repro-Bot’s reports are still valuable. They give us hints and chronicle dead-ends, both of which save devs time getting to the root cause.

How Repro-Bot works, and how to build your own

The details of Repro-Bot are quite specific to Metabase, but we’ll walk you through its inner workings so you can build a similar agent for your own codebase and development setup. You can also fork our repo and adapt it to your workflow.

Repro-Bot needs to be able to perform these tasks:

Parse and understand issues
Spin up a test environment
Work through reproduction steps
Write tests
Write a report
Clean up and self-review

Here’s how we approached each one.

Parsing issues tells the LLM what you, a developer, are looking for when you analyze the issue yourself. For example, for Metabase, we need information about the Metabase version, the application database, the data warehouse. We also triage the issue as backend-focused or frontend-focussed to guide which tools should the agent use for reproduction.
Spinning up a test environment is very specific to your codebase and tools, of course. At Metabase, we use Playwright for browser automation, filling forms, and taking screenshots. Repro-Bot spins up an environment in the same way that a developer would, and uses REPL access to the instance.
To work through the reproduction steps, we wrote a skill that gives code recipes for common actions we see in reported reproduction steps, like inspecting a table or creating a query. We also wrote down some of the “folklore” domain knowledge around some features (like for example that there are two different ways to make a pivot table with two different code paths). Repro-Bot uses the reproduction steps that it previously extracted from the issue, invokes the tools based on the “triage” into backend/frontend, and uses the recipes for those tools to run through the repro steps. It then evaluates what was tested and if its results match the reported behavior. If the agent can’t determine whether the issue was reproduced, it tries again (but no more than three times total).
If the agent reproduces the issue, it writes a failing test so that a developer can have something to test against when they’re fixing the issue. We give some directions on the kind of tests we write for our code and troubleshooting common issues, but since the agent already has full access to the codebase, it can learn a lot from just analyzing existing tests.
Write a report and post to Linear. This provides the bot with a detailed outline for the report (as you saw in the previous section), as well as instructions for how to post to Linear, and how to prevent the internal report from getting synced back to GitHub.
Finally, cleanup and self-review. After each run, Repro-Bot reviews the notes from this and previous runs to make concrete suggestions for improvements, for example to address tool errors, close any knowledge gaps it found, document new tools it might need, etc.

Integrating Repro-Bot into the workflow

We use GitHub to collect reported bugs, and Linear to manage development work. To run Repro-Bot, a human tags a bug on GitHub with .Run Repro-Bot, which triggers a GitHub action that runs the workflow described above.

Running the bot is not an automated task by design: a human in the loop is essential to prevent injection attacks. Most issues come from our public GitHub issues, so it would be trivial for somebody to poison context. To guard against this, we sandbox the agent and limit its permissions. We also require a human to review issues before running it to make sure there is nothing suspicious in the issue.

We intentionally did not ask Repro-Bot to fix the issue. We had initially wanted to make a more end-to-end bot that could do it all, but that wider scope opened up a number of wrong paths the bot could go down. Keeping the agent’s purview limited keeps its output manageable, and we can always introduce more automation downstream.

What we learned

We think that Repro-Bot is an interesting approach to using LLMs and AI tooling for software development, because it’s not about code generation. Our Repro-Bot repo is very specific to our setup, but with the code as a starting point and the description above, you can build your own.

Repro-Bot has become part of our daily development work, and continues to save us time. We hope that it inspires others to build (and share!) similar tools for themselves.

Meet Data Studio: tools to curate your semantic layer in Metabase

Tue, 10 Mar 2026 00:00:00 +0000

Metabase has grown a lot over the past few years. We’ve added a bunch of tools to help people stay on top of their analytics as things scale.

Eventually, it became clear these tools needed their own home.

Today, we’re introducing Metabase Data Studio, a place where teams can shape their data and define shared metrics.

Analytics starts simple. Then it gets… less simple

One goal at Metabase has always been to help non-technical people answer questions with data. And at first, that’s easy. You connect Metabase to your database, build a few dashboards, and things feel straightforward. But over time, cracks in the data model inevitably start to show.

People aren’t sure which tables to use, dashboard loading times get annoying, and there are three different queries that say “ARR”. AIs don’t stand a chance sifting through this stuff.

Data Studio has all the tools you need to clean up the mess

Data Studio lets teams transform raw tables into analytics-ready datasets. You can define reusable metrics (like MRR) and segments (like Active Customers) that everyone (including AI!) can trust when building dashboards and questions.

Data Studio lives in Metabase: no extra tools, no duplicate work, no workflow overhauls, just publish and share instantly. You can start small and grow into it naturally as analytics becomes more shared and harder to change.

The tools in the toolbox

The first version of Data Studio ships with the following tools:

Library: A curated space for your organization’s most trusted analytics content—tables, metrics, and SQL snippets that your data team recommends.
Data structure: Add table metadata to make tables easier to work with.
Glossary: Define terms relevant to your business, both for people and agents trying to understand your data.
Dependency graph: A visual map of how your content connects, so you can understand the impact of changes before you make them.
Dependency diagnostics: See which items have broken dependencies, or that aren’t used.
Transforms: Wrangle your data in Metabase, write the query results back to your database, and reuse them in Metabase as sources for new queries.

And we have more to come, so stay tuned.

Open source at the core

We want data structure and curation to be accessible to everyone, which is why foundational features of Data Studio are available in our open source edition, with Pro and Enterprise features to grow into as you need them.

Do I even need to care about Data Studio?

People tend to need some kind of data transformations when they have multiple sources of data (like your application and payments data), or a bunch of normalized tables. If you’re under 50 tables in your schema, don’t stress, watercress. If you have multiple data sources or a lot of tables, chances are you’ve been paying a tax on clarity, correctness, and performance. Data Studio can help get you sorted.

Data Studio is just one part of a bumper release. Check out what else is new in v59

How to get started with Data Studio

Data Studio ships with both OSS and EE editions (with some paid features).

Admins can find Data Studio from the top right grid icon. Some paid plans can grant non-admins access to Data Studio by adding people to the Data Analysts group.

Try Metabase Pro for free.

February 2026 vulnerability: What happened?

Mon, 02 Mar 2026 12:10:18 +0000

What happened?

Sho Odagiri, a security researcher, reported a vulnerability in Metabase’s notification API. The vulnerability allowed an authenticated user to craft a specially formatted notification template that could extract database connection details, including credentials, and send them via outbound email.

Who was affected?

We have no evidence that this vulnerability was exploited by any customer or malicious actor prior to the fix being released.

See the Fixed versions below, and find the latest point version for the Metabase version you’re running. If you’re running a point version below that version, you’re still vulnerable and should upgrade immediately.

Why did it happen?

Two independent changes introduced this vulnerability:

We updated the notification system to support user-supplied Handlebars templates for rendering email content.
We added metadata objects to query results, which could be traversed to access database connection details.

Together, these changes made it possible for an authenticated user to write a template that could extract sensitive database details via an outgoing email.

Part of what made this vulnerability difficult to catch is that the Handlebars library did not clearly document a method resolver that allows templates to invoke arbitrary Java methods.

What did we fix?

We addressed this vulnerability through two fixes:

Locked down the Handlebars template engine. We removed the method resolver from the Handlebars library configuration, which prevents templates from invoking arbitrary Java methods on objects in the rendering context. This eliminates the ability for user-supplied templates to traverse into internal objects.
Stripped metadata from query results used in notifications. We ensured that internal metadata objects—which previously could carry a reference to database connection details—are no longer present in the notification rendering context.

Fixed versions

All Metabase Cloud instances have been upgraded and are no longer vulnerable.

If you are self-hosted and haven’t already upgraded, please upgrade to one of the following versions (or higher) for your respective version.

Version 55: v0.55.20 / v1.55.20
Version 56: v0.56.20 / v1.56.20
Version 57: v0.57.13 / v1.57.13
Version 58: v0.58.7 / v1.58.7

What are we doing to prevent this in the future?

Along with the fixes we’ve made, we’re working through additional improvements to reduce risk:

Improving logging around template rendering so that we can audit user-supplied templates and detect unusual behavior.
Adding a wrapper around database credential access to prevent credential access outside of designated connection establishment paths.

Conclusion

Patches are live across all affected versions, and we have no evidence this vulnerability was exploited before the fix landed. We’re tightening template evaluation, locking down credential access paths, and improving logging to catch unusual behavior early.

If you’re self-hosted and haven’t upgraded already, please upgrade as soon as possible.

Credits

Hat tip to Sho Odagiri from GMO Cybersecurity by Ierae, Inc for discovering and disclosing this vulnerability.

Questions or concerns?

Reach out at support@metabase.com.

Security update available for Metabase - Please upgrade now

Thu, 19 Feb 2026 12:10:18 +0000

An independent security researcher Sho Odagiri from GMO Cybersecurity by Ierae submitted a severe issue with Metabase. We generally don’t blog about every bug, but this one is dangerous so we want to make sure that we reach out on all channels to our community to let them know that they should pay attention to this.

While we have no evidence that the vulnerability was ever exploited in the wild, and exploiting this vulnerability isn’t simple, if you are self-hosting Metabase, you should IMMEDIATELY update your Metabase instances (if you have not already).

The vulnerability

The vulnerability allows an authenticated user (including embedding users) to retrieve sensitive information from a Metabase instance, including database access credentials. For more info, check out the security advisory.

Are you affected?

Metabase Cloud customers don’t need to upgrade

No action needed. We’ve already upgraded your Metabase, and you’re no longer vulnerable.

All self-hosted Metabases, including customers, should upgrade immediately

IF you haven’t already, you should immediately upgrade to the latest point version of whichever Metabase version you’re running.

See the list of minimum safe releases below, and find the latest point version for the Metabase version you’re running. If you’re running a point version below that version, you’re still vulnerable and should upgrade immediately.

For example, if you are running 1.58.6, you should upgrade to 1.58.7 release or later. If you’re running a version of Metabase below version 55, you should upgrade to one of the versions listed below. You can find your current version by clicking on the “gear” icon in the upper right and selecting “About Metabase.”

If you’re running a custom fork of Metabase, reach out to us for the patches

Email us at help@metabase.com so we can provide you the appropriate patches.

Minimum safe releases for each Metabase version

The downloads below include the minimum safe release for each Metabase version.

55

v0.55.20

Docker image: metabase/metabase:v0.55.20
Download the JAR here: https://downloads.metabase.com/v0.55.20/metabase.jar

v1.55.20

Docker image: metabase/metabase-enterprise:v1.55.20
Download the JAR here: https://downloads.metabase.com/enterprise/v1.55.20/metabase.jar

56

v0.56.20

Docker image: metabase/metabase:v0.56.20
Download the JAR here: https://downloads.metabase.com/v0.56.20/metabase.jar

v1.56.20

Docker image: metabase/metabase-enterprise:v1.56.20
Download the JAR here: https://downloads.metabase.com/enterprise/v1.56.20/metabase.jar

57

v0.57.13

Docker image: metabase/metabase:v0.57.13
Download the JAR here: https://downloads.metabase.com/v0.57.13/metabase.jar

v1.57.13

Docker image: metabase/metabase-enterprise:v1.57.13
Download the JAR here: https://downloads.metabase.com/enterprise/v1.57.13/metabase.jar

58

v0.58.7

Docker image: metabase/metabase/v0.58.7
Download the JAR here: https://downloads.metabase.com/v0.58.7/metabase.jar

v1.58.7

Docker image: metabase/metabase-enterprise/v1.58.7
Download the JAR here: https://downloads.metabase.com/enterprise/v1.58.7/metabase.jar

Credits

We thank Sho Odagiri from GMO Cybersecurity by Ierae, Inc for discovering and disclosing this vulnerability.

Lessons learned from building AI analytics agents: build for chaos

Tue, 03 Feb 2026 00:00:00 +0000

Last year, I came back from a conference, pulled the latest code, and fired up our brand new version of Metabot to show our CEO the progress we’d made. While I was away, the team had been shipping new features and improvements.

My excitement quickly turned into one of the most embarrassing moments of my professional career. Metabot had transformed into a confused intern: eager to help but unable to remember what tools it had or how to use them.

But here’s the thing: this disaster taught us a lot about building production AI agents. In this post, I’ll walk you through what actually broke, why it happened, and the patterns we discovered that actually work in production for us.

If you want the full story with all the details, you can watch the talk we gave at the AI Engineering conference 2025 in Paris.

What we were building (and why it’s hard)

When we started building Metabot, the text-to-SQL space was already crowded: you describe what you want, give an LLM your database schema, and it generates SQL. Easy, right?

Yes and no.

The happy path works great - you’ve probably seen the demos with 5 well-documented tables and simple questions. But even if it works, there’s a problem: not everyone speaks SQL. A query can look fancy and return results, but how do you know if it’s answering the right question?

We wanted to go beyond SQL generation. Our goal was to leverage the Metabase query builder, a visual interface where users can click together queries and actually see what filters and aggregations are applied. This gives non-SQL users a way to validate and iterate on results themselves more easily.

But here’s where it gets hard:

SQL is baked into LLM training data. Our query builder language? Not so much.
Real customer data is messy: hundreds of tables, vague descriptions, legacy cruft.
Humans are notoriously bad at providing context: “How many customers did we lose?” (Which time period? What’s a “customer”? Logo churn or revenue churn?)

The real challenge wasn’t query generation. It was building an agent that could navigate this chaos by understanding what users are looking at, what they actually mean, and helping them find answers even when they don’t know how to ask the question.

That’s what Metabot set out to do. And that’s what spectacularly broke.

What broke: local optimization

The demo failure traced back to parallel development without integration testing. One engineer perfected the context awareness to make sure Metabot knew exactly what dashboard you were looking at. Another engineer optimized the querying tool, fine-tuning descriptions, parameters, and prompts until it worked beautifully in isolation.

Together, they created chaos.

The LLM doesn’t experience your architecture. It sees one context window: every instruction, every tool description, every piece of dynamic state, flattened into a single prompt. Our individually-optimized components were sending contradictory signals. Tool descriptions assumed different conventions. Instructions overlapped and conflicted. The model couldn’t figure out what we wanted because we were telling it multiple inconsistent things simultaneously.

The fix required thinking differently about what we were building. We weren’t building a querying tool with some context features. We were building a context engineering system. The LLM handles the generation; our job is to ensure it sees clean, unambiguous context at every decision point.

What worked: context engineering over prompt engineering

We stopped front-loading prompts and started engineering context throughout the agent’s lifecycle. Three patterns—optimized data representations, just-in-time instructions, and actionable error guidance—transformed how the LLM understood and used its tools

LLM-optimized data representations

We stopped dumping raw API responses into the context. Instead, we built explicit serialization templates for every data object Metabot works with — tables, fields, dashboards, questions — optimized for LLM consumption:

<table
  id=""
  name=""
  database_id=""
>
  ### Description  ### Fields | Field Name | Field ID |
  Type | Description | |------------|----------|------|-------------| 
</table>

This structured format gives the LLM consistent, hierarchical context it can parse reliably. Table metadata, field types, and relationships are always in the same place, reducing hallucination and tool misuse. The format is also reusable across all of Metabot’s tools, so when one person optimizes it, everyone benefits.

Use this pattern when your agent works with complex domain objects that appear across multiple tools or conversation turns.

Just-in-time instructions

Our original architecture front-loaded everything into the system prompt. The LLM ignored most of it.

So we tried something different: include instructions in tool results, right in the relevant moment:

{
  "data": "Chart created with ID 123",
  "instructions": """Chart created but not yet visible to the user.

  To show them:
  - Navigate: use navigate_to tool with chart_id 123
  - Reference: include [View chart](metabase://chart/123) in your response
  """
}

When a chart gets created, tell the LLM right then how to show it. The LLM pays attention to context that shows up exactly when it’s relevant, not buried in a system prompt from 20 messages ago.

Explicit error guidance

This pattern is more commonly known, but worth emphasizing: don’t just return error messages, return recovery paths instead.

{
  "error": "Table 'orders_v2' not found",
  "guidance": """This table may have been renamed or deprecated.

  Try:
  1. Search for tables matching 'orders'
  2. Check if 'orders' or 'order_items' fits your query
  3. Ask the user which orders table they want to use"""
}

The LLM handles ambiguity much better when you tell it how to handle ambiguity.

The benchmark problem in AI analytics agents

After the demo disaster, we built benchmarks. Scores climbed into the 90s, but perceived quality dropped.

The issue is subtle but important: engineers write benchmark prompts like engineers. “Count of orders grouped by created_at week.” Clean, precise, all context provided.

Real people say: “Why is revenue down?”

That question is missing a time period, revenue definition, comparison baseline, and probably some context about what triggered the question in the first place. And even if you nail the realistic test cases, there’s the uncontrollable data mess underneath - legacy tables, missing descriptions, inconsistent naming. The gap between benchmark coverage and production chaos is where AI products fail.

We now treat benchmarks as integration tests, not pure quality measures. If a change drops the score, something broke. But a passing score doesn’t mean the agent works, just that it handles clean inputs correctly. The real evaluation is production feedback, analyzed through a lens of what people actually asked versus what they needed.

Build for chaos, not happy paths

Our initial Metabot hackathon prototype had 5 perfectly documented tables and worked beautifully. Production has hundreds of tables with varying quality, used by people who phrase questions in wildly creative ways, and with edge cases we never imagined.

That’s the core lesson: don’t build for the happy path. Every polished demo with clean data creates expectations you can’t meet. People wander off the happy path in seconds. Better to understand the chaos, build for it, and deliver consistently than to show something impressive that falls apart on contact with reality.

We learned this the hard way. But it forced us to focus on what actually matters: robust context engineering, handling messy data gracefully, and building for the chaos that production inevitably brings.

Try it yourself

Metabot is now out of beta in Metabase. You can try it with your own data and see how these patterns work in practice. Pro tip: Do your homework and set up your semantic types like foreign key relations, metrics and segments. This will help improve the experience.

Want the full technical deep dive? Watch the complete talk from AI Engineer Paris 2025 where we go deeper into the implementation details.

These patterns apply beyond analytics agents. Any time you’re building with LLMs, think about the full context window, deliver instructions when they’re actionable, and always (always!) build for the chaos.

We simplified embedding

Tue, 13 Jan 2026 00:01:00 +0000

TL;DR: If you’re embedding Metabase and upgrading to 58, you don’t have to do anything. Your existing embeds will continue to work exactly as before. These changes are about simplifying our embedding options so it’s easier for people to pick the option they need.

What changed

Starting in Metabase 58, there are two ways to embed Metabase for new embeds.

Modular Embedding - Embed Metabase components, as Guest or as user via SSO.
Full-app Embedding - The full Metabase, SSO only.

If you’re already embedding Metabase, your existing embeds still work. Static embedding will still work for existing embeds, but new embeds must use Modular Embedding.

How the old options map to the new

Before 58	Starting in 58
Static embedding	Modular Embedding - Guest
Embedded analytics JS	Modular Embedding - SSO
Embedded analytics SDK	Modular Embedding SDK - Guest or SSO (React only)
Interactive embedding	Full-app Embedding - SSO only

Why we changed our embedding options

They were confusing. Static embedding was also somewhat interactive, but we also had interactive embedding, which was a different thing. Each had a different setup flow. It worked, but figuring out the right path could have been easier, so that’s what we did. Plus, the new Modular Embedding provides an easier upgrade path from static embedding—you can start with Guest embeds and upgrade to SSO without major code changes.

Modular Embedding overview

We built an in-app wizard that walks you through setup and generates a code snippet. Copy, paste, done. The choice you have to make is whether you give people in your app a Metabase account. If you do, it unlocks a lot of stuff, and reduces your maintenance burden.

Guest — People don’t need Metabase accounts. You sign embed URLs with JWT (just like static embedding), and they see the dashboard or question you’ve embedded. You can add and lock filters, but that’s about it. Guest embeds are available in the OSS Edition (with a “Powered by Metabase” badge) and on Pro/Enterprise.
SSO — People authenticate through your identity provider and get their own Metabase account. And since your Metabase knows who’s viewing what, it can unlock everything: drill-through, the query builder, AI chat, collection browser, and more, all with the correct permissions applied. Self-service embedded analytics.

Modular Embedding has an SDK for React, so if your app uses React, you should go with the SDK.

The nice part: Guest and SSO embeds share the same foundation, so it’s a much smoother upgrade path from Guest to SSO.

Upgrading existing embeds to 58

You don’t have to do anything special when upgrading. Your existing embeds will continue to work exactly as before.

Static embeds — Keep working. No code changes required. The iframe URLs, JWT signing, and parameter handling all work the same way. By default, you’ll see the new Modular Embedding wizard, but there’s an escape hatch: you can still access the legacy static embedding UI, configure embeds with the new wizard, and use the traditional iframe + JWT approach. Migrating to Modular Embedding SSO unlocks deep theming options that weren’t available with static embedding (Pro/Enterprise).
Interactive embeds — Keep working. Now called “Full-app Embedding,” but nothing changes on your end.
SDK embeds — Keep working. No changes to the SDK API. It’s just called the Modular Embedding SDK.

The only difference is what you’ll see in the Metabase UI when setting up new embeds.

Become a data analyst in 2026: a practical roadmap

Mon, 12 Jan 2026 00:00:00 +0000

Becoming a data analyst is both a great goal and a big undertaking. It’s tempting to try to list everything you might need to learn, but that quickly becomes overwhelming and still incomplete.

Instead, this guide gives you a solid starting point. It focuses on building a strong foundation that will support all the topics you’ll want, or need, to learn later. The topics are broken down below into chunks of related topics, and organized by tools you might know or want to learn: Spreadsheets, Metabase, and SQL.

Stay tuned for part 2, which will include pointers to more advanced topics for you to tackle once you’ve mastered the basics laid out here.

Step 1: Data basics

First, let’s look at what data and data analytics even are. This is the foundation for all further steps. Even if you’re familiar with some of the material here, it will pay off to remind yourself of these fundamental concepts.

Data analytics fundamentals

Analytics is all about understanding what your data means and what it can tell you about your business (or whatever your data is about). Analytics serves a number of purposes, and doesn’t just react to the data. There are different types of analytics you should be aware of, even if we are mostly covering descriptive analytics in this guide.

Core concepts

What data analysts do, explained in this expert guide
The main types of analytics: descriptive, diagnostic, predictive, and prescriptive
The difference between data science and data analytics

Data types and structures

Data comes in many forms, but we’re really only looking at a fairly specific kind of data here: tabular data the way it is used in spreadsheets and databases. Much of the worlds of business and technology run on this kind of data, however. In this section, we’re looking at how this data is organized and what kinds of information you can find in databases and spreadsheets.

How data tables work, including rows and columns
The difference between quantitative and qualitative data
Understanding discrete vs continuous variables
What structured and unstructured data mean
Why data granularity matters
Additional reading on numerical and categorical data

Step 2: Getting started with your tools

Before diving into analysis, you need to get comfortable with your tools. The three most common are Excel (for spreadsheets), Metabase (a BI tool), and SQL (for querying databases).

Start with whichever tool is most relevant to your work. You don’t need to learn all three at once.

Excel basics

You can skip this if you’re comfortable in Excel, but take this opportunity to reacquaint yourself with the basics of spreadsheets if you’re unsure.

An overview of essential formulas like SUM, AVERAGE, COUNT, IF, and VLOOKUP in this Excel tutorial

Metabase basics

As a BI tool, Metabase allows you to work directly with databases without necessarily knowing SQL, and also create data visualizations.

An introduction to core Metabase concepts in the Metabase basics overview

SQL basics

SQL is the native language of databases. Even if it might look difficult, it’s worth knowing for more complex data analysis (and because it’s the industry standard). It’s also much less scary than it initially looks, once you get to know it a little bit.

A beginner-friendly SQL introduction from Metabase
Hands-on practice with SQLBolt’s interactive lessons
A concise SQL cheat sheet for quick reference

Step 3: Exploring and preparing data

Now let’s get to the actual work with data! The first step is being able to organize it. Filtering and sorting data are the first operations, and they are part of all the next steps below.

The good news is that once you understand the concept in one tool, it transfers easily to others.

Filtering data

Filtering in Excel Learn how to narrow down rows in spreadsheets using built-in filters, as shown in this Excel filtering guide.
Filtering in Metabase Apply filters visually in the query builder without writing SQL, as explained in the Metabase filtering guide.
Filtering in SQL Filter rows in database queries using WHERE clauses, text and date conditions, and logical operators like AND, OR, and NOT, using examples from filtering by text and filtering by date.

Data quality

Data is rarely clean and perfect. It may come from multiple sources, use inconsistent formats, or contain missing values.

Learn how to prevent errors using data validation in Excel
Understand different strategies for handling missing values in this guide to data cleaning
Format and clean results directly in Metabase using question formatting options
A checklist covering common data cleaning tasks in this data cleaning checklist

Step 4: Summarizing and analyzing data

Once your data has been checked and cleaned, and you’re able to perform basic sorting and filtering, the more advanced operations can begin. Data can be vast, so it is necessary to reduce it in different ways to make sense of it. This is done using various kinds of aggregations: groupings that compute values. They can be simple sums, or various statistical values like means or medians.

Basic statistics

You’ve heard of means and medians, but what do they mean? And how are they computed?

An explanation of mean, median, and mode in this statistics overview
An introduction to statistics concepts commonly used in data analysis in Basic statistics for data analysis

It’s also useful to understand data aggregation, how individual data points are grouped into summary values. See this data aggregation overview.

Aggregation in Excel: Pivot tables

Pivot tables are a powerful way of computing aggregations in spreadsheets. While they might seem daunting at first, the basic idea is the same as in SQL: subdivide the data and compute values over each segment. Much of data analysis is built on top of this approach.

Learn how to create them with Microsoft’s pivot table guide
Read why they matter in this Reddit discussion on pivot tables
See a visual explanation on Wikipedia’s pivot table page

Aggregation in Metabase

Similarly to Excel, Metabase has the tools for computing sums and breaking down large datasets into sections.

Learn how to summarize data using the Metabase summarize feature

Aggregation in SQL: `GROUP BY`

The SQL way of computing aggregations and statistics is done using the GROUP BY keyword. These resources will help you understand how it works.

A clear explanation of GROUP BY in this SQL guide

Metrics and KPIs

Once you can aggregate data, the next step is deciding what to measure. Metrics and key performance indicators (KPIs) help turn raw numbers into signals you can track over time and use to guide decisions.

An overview of common business metrics in Essential SaaS metrics
Tips for designing better metrics in How to design better metrics

Step 5: Analysis across data tables

In real databases, data is usually organized into several data tables. To answer questions, these have to be connected through joins. This section explains how joins work, and how this operation can be performed even in Excel, but also in a BI tool like Metabase and using SQL.

XLOOKUP and VLOOKUP in Excel

To combine data from different tables, Excel has the XLOOKUP and VLOOKUP functions.

See how spreadsheet lookups relate to database joins in From XLOOKUP to joins

Joining tables in Metabase

In databases, this operation is called a join. Metabase can create joins in its query editor.

Learn how joins work in the Metabase joins guide

Database joins in SQL

SQL of course allows you to create joins using the JOIN keyword.

A practical introduction to joins in SQL joins explained

Step 6: Data visualization and dashboards

Finally, once your analysis is done, it is time to show your results to the world. The way to do this is using charts and visualizations. Here we cover the basics of creating visualizations from data, and how to turn them into a interesting and compelling story.

Visualization fundamentals

Learn how to visualize trends with time series charts
Improve chart clarity with better line and bar charts
Explore geographic data using maps and geospatial visualizations
Choose the right chart with this visualization guide

Designing clear dashboards

Understand what dashboards are in this Coursera overview
Avoid common mistakes highlighted in Top dashboard fails

Data storytelling

Learn what data storytelling means in this introduction
Improve clarity by reducing clutter, as explained in What clutter can we eliminate?
Build better charts with the idea of a clear graph skeleton

Conclusion

Data analysis is a fascinating field to dive into, but it is easy to get lost in the many different things you can do, and all the possible ways to do it. If you’re new to this field, our guide will give you a first foundation for you to then base your own explorations on.

We very consciously kept this guide quite simple and basic. It’s better to have a shorter list of items to really work through, than a massive list that you have no hope of ever completing.

In a few months, we will follow up with more advanced tutorials and topics, so stay tuned!

10 B2B SaaS product metrics that should be on your dashboard

Mon, 10 Nov 2025 00:05:05 +0000

For SaaS teams, product metrics are the backbone of decision-making. This guide walks through the core metrics every product and data team should track: conversion, activation, retention, churn, and NDR-showing how to define each one in SQL, calculate it correctly, and visualize results in Metabase.

Why it matters: Shows how well your website turns visitors into potential customers.

How to build: Divide number of people who signed up to number of unique visitors and group by date (e.g. week/month).

Bonus points: if your signup flow has several steps, calculate conversion rate into each step and display as a set of line charts

Signups segmentation

Why it matters: Shows which marketing channels bring you most customers and which are growing faster.

How to build: Group signups by marketing_channel or another attribution parameters that you have.

Why it matters: Not all signups are equal, and you should be monitoring what kind of users you are acquiring. Sudden spikes in traffic quality might influence a lot of metrics down the funnel.

How to build: The quality of signups is usually defined by the marketing and product team together, historically looking at easy to define qualities of a signup.

Those could be, for B2B: share of signups with business domain, share of signups from a specific country or a list of countries. Calculate the % of ICP signups over the total number of signups and group by week or month.

Activation rate

Why it matters: Activation rate is the most important metric every product manager should watch.

Activation rate is a share of new signups that reached the point where they understand your product’s value and are taking meaningful actions.

There’re 3 key things about a well defined activation metric. It should:

Be easily measurable and represent value
Correlate with conversion well
Show result in a 2-3 day window after signup (great if you can do it in a 1-day window).

How to define and validate activation rate metrics

List hypotheses

Very likely you can name 1-2-5 actions that users should perform in your product in order to signal that they understood what it is about. For example in Metabase, likely users understood how to use the product if they were able to build a nice chart from their data. Make a list of activation metric hypotheses in the format:

Performed X events in Y days after signup, Y<3

Examples:

New user should get 7 friends in 10 days (Facebook)

Sent 1 document within 2 days after trial start (PandaDoc)

User executes 100 queries OR invites > 1 user within 3 days (Metabase)

Find correlations

For the hypotheses you listed, build the shares: share of users who performed event x / total signups. Plot these shares on a chart and add your historical conversion rate (share of users who paid / total signups).

You should get a picture like this:

Run the correlation analysis for the curves of potential activation metrics and conversion on a weekly trend. Pick the best metric, the one that correlates with conversion better. You can check if your metrics correlate, using basic CORR function in SQL.

Not seeing correlations?

If correlation analysis did not work, try running an Odds Ratio analysis or WOE/IV analysis.

Or add an “OR did Z events” condition to the hypotheses that you have and repeat the correlation exercise.

Hopefully these steps would help. In order to further dissect the activation rate, it is often suggested to divide into “setup moment” and “aha moment”. E.g. in case of Metabase, our “setup moment” is connecting a database, which is super important but is not an activation action per se, while the “aha moment” in our case would be the creation of a chart.

Conversion rate from trial to paid / conversion rate from free to paid customer

Why it matters: To know how big is the share of traffic that actually pays you. It’s also a metric that is easy to benchmark (see Benchmarks below).

How to build: Divide the number of converted customers by the number of total signups and group by the date of signup (week/month).

To make this metric more stable, add a conversion window that is typical for your customers. E.g. if you have a time-limited 14d trial without a credit card, people can convert any time after 14 days, sometimes it would be after 60 or 90 days, so pick something that is easy enough for you to make decisions, e.g. 15 days.

Note: Conversion rate is a lagging metric and is not very suitable for being a goal for the product team. If you’d like to have goals on something, try activation rate or self-service revenue, not conversion.

New business revenue / new business MRR

Why it matters: Your product’s most important metric is revenue and you need to know what’s going on with it.

How to build: Sum the revenue/MRR of all new customers who converted in the given month

Pro tip: Group by pricing plans to see which products are bringing you more new revenue

Why it matters: The bigger the product becomes, the more features it has. Oftentimes new features are being added without consideration of their actual performance. This product strategy is called “Fire and forget”. For that not to happen, monitor the adoption rate of your features.

How to build: Divide the number of customers who used the feature in a given month by the number of paid customers in the given month. Do it for all features in a form of a set of line charts over time to see the trends (and also see which features are doing well, and which aren’t).

Pro tip: segment the features by pricing plans - to see what is actually driving customers to your higher plans.

Retention rate (general and by specific feature/use case)

Why it matters: Retention is king, and knowing where the trends go is important. For anything new you ship you have to be sure people stick with the new features - retention cohorts to the rescue. If you see that people are dropping the feature after a first week and don’t come back — it’s time to iterate and improve. Otherwise your product might turn into a Frankenstein.

How to build: Divide your customers to monthly cohorts and calculate how big % of the cohort uses the feature on month 1, 2, and so on.

Net Dollar Retention rate / Expansion MRR

Why it matters: The only thing that can save your business long term is your ability to upsell and expand your existing customers. There’s a limit to user acquisition efforts your marketing team can apply in order to bring new customers in a cheap way. The more you will be growing, the more expensive acquisition will become.

Thus invest early in upgrade and upsell paths in your product — and monitor the Expansion MRR that will eventually drive your Net Dollar Retention rate up as well.

How to build:

Expansion MRR: Sum the revenue from Expansion MRR events for the given month.
NDR in $: Expansion MRR + Reactivation MRR - Contraction MRR - Churn MRR

Churn rate (logo, MRR)

Why it matters: The opposite metric to retention rate, churn is telling you how much money you’re losing each month from customers who paid you but stopped paying you in this month. Churn in logo and in MRR % is a well benchmarked metric.

How to build: For all customers who paid in the previous month, but didn’t pay this month, generate Churn MRR events with a negative value. Sum these events to get the Churn MRR.

Logo churn in % is the number of customers who paid you in the previous month but stopped paying you this month divided by the total number of paying customers in this month.

Essential benchmarks for top-of-funnel SaaS metrics

Some of these metrics, mostly conversion rates, are easy to benchmark, especially if you’re building a product that has a similar business model to the canonical digital products.

Here’re some examples of top of the funnel metrics benchmarks for B2b SaaS types of products:

Metric	Benchmark	Comment
Website visit (unique) → signup (finished)	8%	8% is for a mix of paid and organic traffic. Depending on your user acquisition strategy, it could be higher or lower for paid traffic. A very nice early indicator of how well your marketing campaigns perform — if they bring in traffic that converts to signups better than organic, you’re on the right track. This number might be lower if you’re offering a trial with credit card collection.
Signup start → signup finished	25%	Depends on the number of required fields/steps and again, on whether you collect a credit card or not.
Signup finished → Activated	35%	This is the canonical B2B SaaS activation rate, with activation defined as mentioned above.
Signup finished → paid (7 or 14d trial)	10% (trial without credit card, 30-day window) 40% (trial with credit card) 2.5–5% (freemium)	Trials with credit cards collected convert better, but you should be mindful of retention and refund rates. If you have a sales team engaging with leads from the self-service funnel, this number can be higher. Freemium products convert at best around 5%, which is considered very high — your product must be very sticky and have a large user base to sustain revenue at this rate.
Churn MRR %	4%	For B2B SaaS, under this amount churn is considered healthy. If you’re doing better — great!
Net Dollar Retention %	100%+	If you’re upselling well, you might reach 100%+ NDR, meaning your business would grow even if user acquisition stopped. Investors love this metric because it signals strong expansion and retention.

Source: OpenView Product Benchmarks

How to turn your product metrics into actionable Insights

First, get a full picture of your metrics in a nice bird view - make a dashboard that will serve as a navigating compass for your decisions. Metabase can help!

Look at the metrics and find the laggards: which metrics are far below the benchmarks?

Try to pick 1-2 of them and find the reasons of why those could be so low.

Low activation and user engagement on trial will naturally lead to low conversions. Is your product too complicated to use? Are you acquiring the right users?

Perform user interviews, listen to sales calls to find the gaps in the first time user experience and focus on fixing those. Oh, and bugs, don’t forget to fix these as buggy software is also not converting people into paid and loyal customers well.

Activation problem is not fixed by adding tooltips, walkthrough guides or extra documentation. The only thing that works well is thoughtful design of the navigation and first time user experience, that should help to overcome setup hurdles, if they are necessary.

Churn is the derivative metric and won’t improve if you don’t fix the activation rate for new customers and adoption for the existing.

Final thoughts: making product metrics accessible and actionable

This post showed you how to choose the right product metrics and visualize them. Metabase is a powerful tool not only for measuring and displaying these metrics but also for giving the entire team access to them, so everyone can explore the data and discover valuable insights on their own, no data expertise or SQL needed.

More product metrics resources

Metabase Community Data Stack Report 2025

Wed, 03 Sep 2025 00:05:05 +0000

We asked 330+ teams across 50+ countries how they build and use their data stacks, from tool choices to AI adoption. This is what we learned.

Building a community resource for data stack decisions in 2025

For this report, we ask teams how they build their data stacks: what tools they choose, what challenges they face, and what their plans for the future are. Our goal is to build a community-sourced, open source resource that can help people make informed decisions about their data tools and shape modern data practices together.

In 2025, we heard from 330+ teams of all shapes and sizes - from two-person startups to orgs with hundreds of employees - from 15+ different industries and 50+ countries. Teams shared their tool choices, adoption timelines, happiness levels, and how AI is changing the way they work. We compiled all that wisdom into a report (built with Metabase, of course) that we’re making available to the community.

If you want to jump straight to the full report, go forth.

Key takeaways

Most data teams are small, even at large companies

Most companies in our survey started building out their stacks when they reached 20-50 people - but then again, most of the companies that we surveyed tend to be around that size as well, so take this with a grain of salt. We found that the sizes of data teams, however, don’t vary much - most data teams are around 1-3 people, even in companies with hundreds of employees.

PostgreSQL dominates both transactional and analytics workloads

Postgres is the most popular transactional database and the most popular analytics storage. It’s the database that people choose the most regardless of their main concerns, and the database that most people who are thinking of leaving their current tool are considering as a replacement. It’s also the highest rated transactional database among our respondents, and in top 3 highest rated tools for analytics storage.

50% of teams don’t use a data warehouse or a data lake to store their analytics data

Nearly everyone we asked is separating their analytics data from their transactional data, but - maybe surprisingly - about half of respondents aren’t using a specialized tool (like a data warehouse or a data lake) to store their analytics data. No judgement from us: we’re long believed that you don’t need a data warehouse (until you do).

Larger companies with larger data teams are more likely to be using data warehouses and data lakes, probably because larger company have more intense data needs.

ETL and transformation tools as data maturity indicators

About 60% of people using an ingestion/ETL tool, and about 60% are using a modeling/transformation tool - with most of those people using both tools. Reverse ETL is also up there: if your company has the need for reverse ETL (which not every company does), you are likely to also be using ingestion and modeling tools as well. Companies also tend to adopt those around the same time. So if you’re reaching the point in your data journey where you’re considering an ingestion tool - you might want to evaluate whether it might be the time to add a modeling tool as well.

What surprised us

AI trust does not track with AI adoption

People across nearly all industries, roles, and company sizes have adopted AI querying and code generation. That’s not newsworthy in 2025, of course, but what surprised us was how low the trust in the results of AI queries actually was, considering that near-universal adoption. People in more technical roles tend to trust AI results less.

Engineers are the hardest to please

Across the entire stack, software and data engineers consistently gave lower ratings to their tools. The only exception to this rule is modeling/transformation tools, where the end users of the data that comes out of those tools (like product managers) rate them much lower than other tools, while people handling the day-to-day (data engineers) - rate them much higher.

Individual tools matter less than how they play together

People rated individual tools in their stack generally higher than they rated their stack as a whole. We think it’s because it doesn’t really matter if you have the best tools in the world handling their specific tasks - if you can’t ensure that the flow of data through the stack is smooth and transparent. The tools with the rating closest to the rating of the whole stack are the ingestion/ETL tools, whose entire purpose is to facilitate the movement of data across the data stack.

The methodology and analytical process behind the report

It might not surprise you that we, people at Metabase, thought that Metabase was the best tool to analyze and present the results of the survey.

Our survey was conducted through a Typeform form, which gave us the results as a CSV file. Then we uploaded that CSV to Metabase Cloud Storage for analysis - no need to even set up a database.

The data needed some additional formatting and cleaning - like relabeling answers for better presentation, accounting for different spellings, or combining answers for different types of analytics storage into a single column - so we used a Metabase model to create a cleaned-up and transformed dataset based on the original CSV.

Metabase has a built-in graphical query builder which we used to build questions like “what is the average satisfaction by role” based on our augmented CSV model and build visualizations without writing any code.

This was sufficient for all explorations that we were interested in - except for one (can you guess which one?). One question required a more complicated query, so to handle that, we used SQL on our CSV: when you upload data to Metabase Cloud Storage, you’re actually putting it into a ClickHouse database, so you can use Metabase to query data from your CSV using SQL. In our case of one stray question, we used SQL instead of graphical query builder because we needed to make use of a UNION.

Here at Metabase, we have a lot of strong opinions about best practices for building dashboards, which we applied judiciously to build a dashboard that communicates the insights we found interesting while making sure to avoid misrepresenting the data.

We wanted the dashboard to stand out visually, so we defined a custom color palette and uploaded a custom font to our Metabase instance and used it on the dashboard.

Once the dashboard was ready, we created a public embed so that people could access the dashboard without having a Metabase account and just iframe’d the embed into our website.

Explore the full report

You can check out the full report, or, if you want to run your own analysis of the survey data, we’ve set up a repository for you that spins up Metabase with the anonymized survey data pre-loaded, so you can explore it yourself.

Let’s get the conversation rolling. Post your insights and tag us! We love seeing your takeaways in action.

The story behind our AI Dataset Generator

Tue, 15 Jul 2025 00:10:18 +0000

At Metabase, I often need fake data to demo new features. I found myself digging through Kaggle, but not feeling very inspired, and wasting a lot of time searching. So I built a little tool to help me generate datasets and decided to open source it.

It ended up hitting the front page of Hacker News, got 600+ stars on GitHub, received contributions from a YC-backed startup, and was picked up by TLDR newsletter.

We’ve brought the AI data generator to the browser so the community can generate datasets instantly. Open access, instant results, and free to explore.

Why not Kaggle or ChatGPT

As mentioned above, I was feeling very uninspired by Kaggle datasets and kept turning to ChatGPT to generate fake data. I’d ask for something, get results back, visualize it, and spot issues. Bar charts all the same height, growth trends going the wrong way, not enough variation, etc. I found myself repeating that cycle and thought… maybe there’s a better way.

What I actually did

Since I’d already been writing prompts and had some experience, I figured, why not turn that process into a simple tool? So I converted my prompt inputs into a few dropdowns:

Business type
Row count
Single or multi-table schema
Date range
Growth pattern
Variation and granularity

You hit “Preview Data” and get back a sample schema and 10 rows of data. If it looks good, you can export a full dataset as CSV, SQL, or launch Metabase to explore it.

How It Works

Step 1: How schema generation works

When you hit “Preview Data,” the app sends a prompt to your selected LLM provider (OpenAI, Anthropic, or Google) via LiteLLM. It’s tailored to the business type and returns a JSON spec defining the tables, fields, relationships, and logic. Think of it as a blueprint for a believable dataset.

Originally, I was just generating the schema with ChatGPT. But after a few folks on Hacker News mentioned it’d be cool to switch models, we got an awesome PR that added LiteLLM support, so now you can swap between providers easily. Thanks for the contribution @manueltarouca!

Step 2: Rows are generated locally by the DataFactory

I originally had the LLM generate all the rows, but it was painfully slow, even for 100 rows. I tried splitting the job into batches, but that introduced new issues. For example, a user ID might be 001, 002, 003 in the first batch and something like u099, u100 in the second.

So I took a step back and had a deep discussion with Cursor. I needed something fast, more realistic, and cheaper to run. After some back and forth, I decided to build the DataFactory. It generates data locally using Faker.js and applies the schema + simulation rules from the LLM. It also enforces logic like:

Realistic SaaS churn and pricing plans
E-commerce subtotals, tax, and shipping that actually add up
Healthcare claims where payouts never exceed procedure costs

Step 3: Performance and cost

By splitting it into two phases, the tool stays fast and surprisingly cheap. Schema generation is the only part that hits the LLM, and I wanted to make sure it wouldn’t lead me to bankruptcy. So I added token tracking and ran the numbers using a super advanced formula:

total_tokens × cost_per_token = ???

Turns out… not that bad. Most previews come in around $0.03–$0.05 with GPT-4o. After that, it’s all free. No extra API calls, just pure, 100%, Grade A data.

Try it yourself + contribute

It’s still early, so it’s not bulletproof. But if you need quick, realistic datasets, give it a try. Everything runs locally with Docker, and all you need is an API key from your favorite LLM provider to get started.

If you want to contribute, there’s plenty of room to jump in:

Add new business types or tweak existing ones
Improve schema logic or simulation rules
Add your awesome feature here

The groundwork is already there. If you’ve got ideas, I’d love your help taking it further. Star it, fork it, or open a PR on GitHub.

Metabase | Business Intelligence, Dashboards, and Data Visualization

Meet Repro-Bot, our GitHub issue triage agent

Repro-Bot automates the boring parts

Results: repro steps, findings, possible root cause

How Repro-Bot works, and how to build your own

Integrating Repro-Bot into the workflow

What we learned

Meet Data Studio: tools to curate your semantic layer in Metabase

Analytics starts simple. Then it gets… less simple

Data Studio has all the tools you need to clean up the mess

The tools in the toolbox

Open source at the core

Do I even need to care about Data Studio?

How to get started with Data Studio

February 2026 vulnerability: What happened?

What happened?

Who was affected?

Why did it happen?

What did we fix?

Fixed versions

What are we doing to prevent this in the future?

Conclusion

Credits

Questions or concerns?

Security update available for Metabase - Please upgrade now

The vulnerability

Are you affected?

Metabase Cloud customers don’t need to upgrade

All self-hosted Metabases, including customers, should upgrade immediately

If you’re running a custom fork of Metabase, reach out to us for the patches

Minimum safe releases for each Metabase version

55

56

57

58

Credits

Lessons learned from building AI analytics agents: build for chaos

What we were building (and why it’s hard)

What broke: local optimization

What worked: context engineering over prompt engineering

LLM-optimized data representations

Just-in-time instructions

Explicit error guidance

The benchmark problem in AI analytics agents

Build for chaos, not happy paths

Try it yourself

We simplified embedding

What changed

How the old options map to the new

Why we changed our embedding options

Modular Embedding overview

Upgrading existing embeds to 58

Further reading

Become a data analyst in 2026: a practical roadmap

Step 1: Data basics

Data analytics fundamentals

Data types and structures

Step 2: Getting started with your tools

Excel basics

Metabase basics

SQL basics

Step 3: Exploring and preparing data

Filtering data

Data quality

Step 4: Summarizing and analyzing data

Basic statistics

Aggregation in Excel: Pivot tables

Aggregation in Metabase

Aggregation in SQL: GROUP BY

Metrics and KPIs

Step 5: Analysis across data tables

XLOOKUP and VLOOKUP in Excel

Joining tables in Metabase

Database joins in SQL

Step 6: Data visualization and dashboards

Visualization fundamentals

Designing clear dashboards

Data storytelling

Conclusion

10 B2B SaaS product metrics that should be on your dashboard

Aggregation in SQL: `GROUP BY`