Skip to content

Fix nil pointer panic in Raft-dependent methods during startup (issue #1702)#1831

Merged
vcastellm merged 5 commits intomainfrom
copilot/investigate-issue-1702
Dec 11, 2025
Merged

Fix nil pointer panic in Raft-dependent methods during startup (issue #1702)#1831
vcastellm merged 5 commits intomainfrom
copilot/investigate-issue-1702

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Oct 27, 2025

After node restart, Dkron panics with "runtime error: invalid memory address or nil pointer dereference" causing 5+ minute connection delays. The HTTP server starts before Raft initialization, creating a race where API requests can trigger nil pointer access.

Changes

Added nil checks to Raft-dependent methods:

  • IsLeader() - returns false
  • leaderMember() - returns ErrLeaderNotFound
  • Leader() - returns empty ServerAddress
  • applySetJob() - returns error
  • RaftApply() - returns nil
  • Stop() - guards raft.Shutdown() call

Added tests:

  • Unit test verifying methods handle nil raft gracefully
  • Integration test simulating API access during early startup

Example

Before:

func (a *Agent) IsLeader() bool {
    return a.raft.State() == raft.Leader  // panics if a.raft is nil
}

After:

func (a *Agent) IsLeader() bool {
    if a.raft == nil {
        return false
    }
    return a.raft.State() == raft.Leader
}

Fixes #1702

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • 127.0.0.10
    • Triggering command: /tmp/go-build1868015620/b001/dkron.test -test.testlogfile=/tmp/go-build1868015620/b001/testlog.txt -test.paniconexit0 -test.v=true -test.run=TestAgent|TestIsLeader -test.timeout=1m0s (packet block)
    • Triggering command: /tmp/go-build3610477801/b001/dkron.test -test.testlogfile=/tmp/go-build3610477801/b001/testlog.txt -test.paniconexit0 -test.v=true -test.run=API -test.timeout=2m0s (packet block)
  • 127.0.0.11
    • Triggering command: /tmp/go-build830898429/b001/dkron.test -test.testlogfile=/tmp/go-build830898429/b001/testlog.txt -test.paniconexit0 -test.v=true -test.run=TestAgentCommand_runForElection -test.timeout=1m0s (packet block)
    • Triggering command: /tmp/go-build1868015620/b001/dkron.test -test.testlogfile=/tmp/go-build1868015620/b001/testlog.txt -test.paniconexit0 -test.v=true -test.run=TestAgent|TestIsLeader -test.timeout=1m0s (packet block)
  • 127.0.0.12
    • Triggering command: /tmp/go-build3610477801/b001/dkron.test -test.testlogfile=/tmp/go-build3610477801/b001/testlog.txt -test.paniconexit0 -test.v=true -test.run=API -test.timeout=2m0s (packet block)
  • 127.0.0.14
    • Triggering command: /tmp/go-build3610477801/b001/dkron.test -test.testlogfile=/tmp/go-build3610477801/b001/testlog.txt -test.paniconexit0 -test.v=true -test.run=API -test.timeout=2m0s (packet block)
  • 127.0.0.15
    • Triggering command: /tmp/go-build3610477801/b001/dkron.test -test.testlogfile=/tmp/go-build3610477801/b001/testlog.txt -test.paniconexit0 -test.v=true -test.run=API -test.timeout=2m0s (packet block)
  • 127.0.0.21
    • Triggering command: /tmp/go-build3610477801/b001/dkron.test -test.testlogfile=/tmp/go-build3610477801/b001/testlog.txt -test.paniconexit0 -test.v=true -test.run=API -test.timeout=2m0s (packet block)
  • 127.0.0.22
    • Triggering command: /tmp/go-build3610477801/b001/dkron.test -test.testlogfile=/tmp/go-build3610477801/b001/testlog.txt -test.paniconexit0 -test.v=true -test.run=API -test.timeout=2m0s (packet block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Investigate the possible cause #1702


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits October 27, 2025 18:01
…1702)

Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
Copilot AI changed the title [WIP] Investigate possible cause of issue 1702 Fix nil pointer panic in Raft-dependent methods during startup (issue #1702) Oct 27, 2025
Copilot AI requested a review from vcastellm October 27, 2025 18:10
@vcastellm vcastellm marked this pull request as ready for review December 11, 2025 08:32
@vcastellm vcastellm merged commit bde41eb into main Dec 11, 2025
2 checks passed
@vcastellm vcastellm deleted the copilot/investigate-issue-1702 branch December 11, 2025 08:32
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 11, 2025

Codecov Report

❌ Patch coverage is 66.66667% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 31.62%. Comparing base (8b55676) to head (15ce925).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
dkron/agent.go 66.66% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1831      +/-   ##
==========================================
+ Coverage   31.39%   31.62%   +0.22%     
==========================================
  Files          76       76              
  Lines        6806     6817      +11     
==========================================
+ Hits         2137     2156      +19     
+ Misses       4375     4364      -11     
- Partials      294      297       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Server not promptly detected after restart on a healthy 5-node cluster.

2 participants