#9106: Add PDiff similarity score to Version Tracking#9107
Open
clearbluejar wants to merge 2 commits intoNationalSecurityAgency:masterfrom
Open
#9106: Add PDiff similarity score to Version Tracking#9107clearbluejar wants to merge 2 commits intoNationalSecurityAgency:masterfrom
clearbluejar wants to merge 2 commits intoNationalSecurityAgency:masterfrom
Conversation
Compute the PDiff similarity score (95% basic-block mnemonic hash similarity + 5% stack frame size similarity) once at match creation time and persist it in the match table DB. This eliminates expensive on-the-fly recomputation every time the Similarity column renders or the Similarity filter runs. Schema changes: - Add PDIFF_SIMILARITY_SCORE_COL to match table (schema v0 -> v1) - Auto-upgrade v0 tables on session open (recreate with new column) - Backfill scores for migrated matches once programs are loaded Key files: - VTMatchSetDB.addMatch(): computes score for all FUNCTION matches - VTMatchTableDBAdapterV0: v0->v1 schema migration - VTSessionDB.backfillPdiffScores(): one-time backfill on open - BulkBBSimilarityTableColumn: reads stored score instead of computing - SimilarityFilter: reads stored score, passes null scores through - BasicBlockMnemonicFunctionBulker: adjusted weights to 95/5 - BestPDiffMatchFilter: deduplicates matches across correlators
Correlators reuse a single VTMatchInfo across all addMatch() calls. The null check on getPdiffSimilarityScore() meant the score was only computed for the first match — all subsequent matches inherited that stale value. Remove the null guard so the score is always recomputed for FUNCTION matches.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a PDiff (basic-block mnemonic hash) similarity score to the Version Tracking match
table. The score is computed at match creation time, persisted in the database, and
exposed via a new Similarity column and filter in the UI.
Related: #9106, #5859
Problem
Version Tracking correlators each produce their own similarity scores, but these reflect
each correlator's matching algorithm rather than actual structural similarity. For example,
the Symbol Name correlator assigns a perfect 1.0 when names match, even if the functions
differ significantly. There is no correlator-independent metric for how structurally
similar two matched functions are at the basic-block level, making it difficult to
prioritize matches when patch diffing.
Solution
New PDiff similarity score — computed once at match creation time and stored in the DB.
Score formula
collected, sorted (to tolerate compiler instruction reordering), and combined into a
per-block hash. A sorted-merge counts matching blocks between source and destination.
min(a,b)/max(a,b)ratio; a subtle tiebreaker thatdistinguishes otherwise-identical functions with different local variable allocation.
New UI components
matches)
score per source/destination function pair
Backward compatibility
automatically computed and persisted — upgraded sessions get full scores on first open
filter instead of being rejected
Changes (15 files, 846 insertions, 13 deletions)
Core DB schema
VTMatchTableDBAdapter.java— AddPDIFF_SIMILARITY_SCORE_COL, bump schema 0→1VTMatchTableDBAdapterV0.java— Accept v0 and v1, auto-upgrade v0→v1, write new columnVTMatchDB.java— Read/write stored PDiff scoreVTSessionDB.java— Wrap adapter init in transaction (for upgrade), backfill on program openDomain model
VTMatch.java— AddgetPdiffSimilarityScore()interface methodVTMatchInfo.java— Add pdiffSimilarityScore field with getter/setterScore computation
VTMatchSetDB.java— Compute PDiff score inaddMatch()for FUNCTION matchesBasicBlockMnemonicFunctionBulker.java— Per-basic-block mnemonic hashing + combinedsimilarity
FunctionBulker.java— Interface for function hash strategiesUI components
AbstractVTMatchTableModel.java— New Similarity column reading stored valueSimilarityFilter.java— New filter by PDiff score rangeBestPDiffMatchFilter.java— New deduplication filter across correlatorsVTMatchTableModel.java— Register Similarity columnVTMatchTableProvider.java— Register Similarity and BestPDiff filtersSupport
MatchMapper.java— DelegategetPdiffSimilarityScore()for implied matchesTest plan
./gradlew :VersionTracking:test)./gradlew jar)