#9106: Add PDiff similarity score to Version Tracking by clearbluejar · Pull Request #9107 · NationalSecurityAgency/ghidra

clearbluejar · 2026-04-07T22:05:28Z

Summary

Add a PDiff (basic-block mnemonic hash) similarity score to the Version Tracking match
table. The score is computed at match creation time, persisted in the database, and
exposed via a new Similarity column and filter in the UI.

Related: #9106, #5859

Problem

Version Tracking correlators each produce their own similarity scores, but these reflect
each correlator's matching algorithm rather than actual structural similarity. For example,
the Symbol Name correlator assigns a perfect 1.0 when names match, even if the functions
differ significantly. There is no correlator-independent metric for how structurally
similar two matched functions are at the basic-block level, making it difficult to
prioritize matches when patch diffing.

Solution

New PDiff similarity score — computed once at match creation time and stored in the DB.

Score formula

95% basic-block mnemonic hash similarity — for each basic block, mnemonic hashes are
collected, sorted (to tolerate compiler instruction reordering), and combined into a
per-block hash. A sorted-merge counts matching blocks between source and destination.
5% stack frame size similarity — min(a,b)/max(a,b) ratio; a subtle tiebreaker that
distinguishes otherwise-identical functions with different local variable allocation.

New UI components

Similarity column — displays the stored PDiff score in the match table
Similarity filter — filter matches by PDiff score range (bypasses DATA and null-score
matches)
Best PDiff Match filter — deduplicates matches across correlators, keeping the best
score per source/destination function pair

Backward compatibility

Auto-upgrade: v0 sessions are automatically migrated to v1 schema on open
Backfill: After programs are loaded, any function match with a null PDiff score is
automatically computed and persisted — upgraded sessions get full scores on first open
Filter safety: Null scores (DATA matches, pre-migration) pass through the Similarity
filter instead of being rejected

Changes (15 files, 846 insertions, 13 deletions)

Core DB schema

VTMatchTableDBAdapter.java — Add PDIFF_SIMILARITY_SCORE_COL, bump schema 0→1
VTMatchTableDBAdapterV0.java — Accept v0 and v1, auto-upgrade v0→v1, write new column
VTMatchDB.java — Read/write stored PDiff score
VTSessionDB.java — Wrap adapter init in transaction (for upgrade), backfill on program open

Domain model

VTMatch.java — Add getPdiffSimilarityScore() interface method
VTMatchInfo.java — Add pdiffSimilarityScore field with getter/setter

Score computation

VTMatchSetDB.java — Compute PDiff score in addMatch() for FUNCTION matches
BasicBlockMnemonicFunctionBulker.java — Per-basic-block mnemonic hashing + combined
similarity
FunctionBulker.java — Interface for function hash strategies

UI components

AbstractVTMatchTableModel.java — New Similarity column reading stored value
SimilarityFilter.java — New filter by PDiff score range
BestPDiffMatchFilter.java — New deduplication filter across correlators
VTMatchTableModel.java — Register Similarity column
VTMatchTableProvider.java — Register Similarity and BestPDiff filters

Support

MatchMapper.java — Delegate getPdiffSimilarityScore() for implied matches

Test plan

All 6 existing VT database tests pass (./gradlew :VersionTracking:test)
Build compiles clean (./gradlew jar)
New VT sessions compute and display Similarity scores immediately
Old v0 sessions auto-upgrade and backfill scores on open
Similarity filter works responsively
DATA matches and null-score matches pass through filter correctly
BestPDiff filter correctly deduplicates across correlators

Compute the PDiff similarity score (95% basic-block mnemonic hash similarity + 5% stack frame size similarity) once at match creation time and persist it in the match table DB. This eliminates expensive on-the-fly recomputation every time the Similarity column renders or the Similarity filter runs. Schema changes: - Add PDIFF_SIMILARITY_SCORE_COL to match table (schema v0 -> v1) - Auto-upgrade v0 tables on session open (recreate with new column) - Backfill scores for migrated matches once programs are loaded Key files: - VTMatchSetDB.addMatch(): computes score for all FUNCTION matches - VTMatchTableDBAdapterV0: v0->v1 schema migration - VTSessionDB.backfillPdiffScores(): one-time backfill on open - BulkBBSimilarityTableColumn: reads stored score instead of computing - SimilarityFilter: reads stored score, passes null scores through - BasicBlockMnemonicFunctionBulker: adjusted weights to 95/5 - BestPDiffMatchFilter: deduplicates matches across correlators

Correlators reuse a single VTMatchInfo across all addMatch() calls. The null check on getPdiffSimilarityScore() meant the score was only computed for the first match — all subsequent matches inherited that stale value. Remove the null guard so the score is always recomputed for FUNCTION matches.

clearbluejar added 2 commits April 7, 2026 15:45

ryanmkurtz linked an issue Apr 8, 2026 that may be closed by this pull request

Add correlator-independent similarity score to Version Tracking match table #9106

Open

ryanmkurtz assigned ghidra007 Apr 8, 2026

ryanmkurtz added Feature: Version Tracking Status: Triage Information is being gathered labels Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#9106: Add PDiff similarity score to Version Tracking#9107

#9106: Add PDiff similarity score to Version Tracking#9107
clearbluejar wants to merge 2 commits intoNationalSecurityAgency:masterfrom
clearbluejar:pdiff-similarity-score

clearbluejar commented Apr 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

clearbluejar commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Score formula

New UI components

Backward compatibility

Changes (15 files, 846 insertions, 13 deletions)

Core DB schema

Domain model

Score computation

UI components

Support

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clearbluejar commented Apr 7, 2026 •

edited

Loading