Skip to content

Add comprehensive E2E test suite for Dkron API#1895

Merged
vcastellm merged 13 commits intomainfrom
feature/e2e-testing
Jan 29, 2026
Merged

Add comprehensive E2E test suite for Dkron API#1895
vcastellm merged 13 commits intomainfrom
feature/e2e-testing

Conversation

@vcastellm
Copy link
Copy Markdown
Member

@vcastellm vcastellm commented Jan 25, 2026

  • Create e2e test infrastructure using Bats (Bash Automated Testing System)
  • Add Docker Compose setup for isolated e2e testing environment
  • Add 71 tests covering:
    • Health, cluster, leader, and debug endpoints (19 tests)
    • Job CRUD operations including timezone, retries, concurrency (24 tests)
    • Job execution and scheduling (17 tests)
    • Pause/unpause functionality (11 tests)
  • Add helper functions for API requests, response parsing, and assertions
  • Add test runner script with options for filtering, verbose output, and debugging
  • Add Makefile targets: e2e, e2e-quick, e2e-debug

Key findings documented in tests:

  • Disabled jobs cannot be manually triggered via /run endpoint
  • Pause only blocks new job creation, not scheduled execution
  • Timezone jobs require 6-field cron expressions (with seconds)

Proposed changes

Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request.
If it fixes a bug or resolves a feature request, be sure to link to that issue.

Types of changes

What types of changes does your code introduce?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Summary by CodeRabbit

  • New Features

    • 30-day execution statistics API and dashboard chart; plugin health exposed in health/leader views; agent run automatic retry with exponential backoff.
  • UI

    • Redesigned dashboard, themed admin UI, new execution-stats chart, updated login and various visual enhancements.
  • Tests

    • New comprehensive E2E test suite, test runner, helpers, CI workflow, and Makefile targets for running/debugging tests.
  • Chores

    • Test infrastructure (Docker-based) and dependency updates.

✏️ Tip: You can customize this high-level summary in your review settings.

- Create e2e test infrastructure using Bats (Bash Automated Testing System)
- Add Docker Compose setup for isolated e2e testing environment
- Add 71 tests covering:
  - Health, cluster, leader, and debug endpoints (19 tests)
  - Job CRUD operations including timezone, retries, concurrency (24 tests)
  - Job execution and scheduling (17 tests)
  - Pause/unpause functionality (11 tests)
- Add helper functions for API requests, response parsing, and assertions
- Add test runner script with options for filtering, verbose output, and debugging
- Add Makefile targets: e2e, e2e-quick, e2e-debug

Key findings documented in tests:
- Disabled jobs cannot be manually triggered via /run endpoint
- Pause only blocks new job creation, not scheduled execution
- Timezone jobs require 6-field cron expressions (with seconds)
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 25, 2026

📝 Walkthrough

Walkthrough

Adds end-to-end testing infra and suites; implements per-day execution statistics (store, API, UI); adds gRPC AgentRun retry/backoff with configurable settings and tests; tracks plugin clients for health reporting; various UI/theme refinements and dependency bumps.

Changes

Cohort / File(s) Summary
E2E Testing Infrastructure
Makefile, e2e/run-e2e-tests.sh, e2e/docker-compose.e2e.yml, e2e/lib/helpers.bash, e2e/tests/*, .github/workflows/e2e.yml
Adds Make targets (e2e, e2e-quick, e2e-debug), a test runner with dependency checks/flags, docker-compose test cluster with healthcheck, a rich Bash helper library for API interactions and waits, four bats suites covering health, jobs CRUD, executions, pause/unpause, and a CI workflow.
Execution Statistics (storage → API → UI)
dkron/execution_stats.go, dkron/store.go, dkron/storage.go, dkron/execution_stats_test.go, dkron/api.go, website/static/openapi/openapi.yaml, ui/src/dashboard/ExecutionStatsChart.tsx, ui/src/dashboard/Dashboard.tsx
Adds ExecutionStat types and totals helper; store methods to increment and retrieve per-day stats; integrates best-effort stat updates on execution completion; exposes GET /v1/stats?days= via API and OpenAPI; adds tests and a 30-day execution stats chart in the dashboard.
Plugin Client Tracking & Health
cmd/plugins.go, cmd/agent.go, dkron/agent.go, dkron/options.go, dkron/api.go, dkron/api_test.go
Introduces PluginClients maps in Plugins and Agent, updates pluginFactory to return plugin clients, populates clients during discovery, surfaces plugin health in /health response, and adds a health endpoint unit test.
gRPC Agent Retry Logic
dkron/grpc_client.go, dkron/grpc_test.go, dkron/config.go
Refactors AgentRun into single-attempt function with a retry loop and exponential backoff; adds isRetryableError classification; introduces config flags for max retries and intervals; adds tests for retryability.
UI / Theming / Components
ui/src/*, ui/src/theme.ts, ui/index.html, ui/package.json, dkron/ui-dist/*
Multiple UI improvements: new lightTheme, dashboard rewrite, ExecutionStatsChart, revamped Job/Execution views, themed layout, styles, font changes, and two new frontend deps (date-fns, query-string). Updated built UI assets.
E2E Docker Compose & CI
e2e/docker-compose.e2e.yml, .github/workflows/e2e.yml
Adds a minimal Dkron cluster compose file with healthcheck and a GitHub Actions workflow to run e2e via Make, collect logs on failure, and always teardown.
Storage Interface / Store Additions
dkron/storage.go, dkron/store.go
Extends Storage interface with GetExecutionStats and IncrementExecutionStat; implements persistent per-day stat keys, format helpers, and retrieval over day ranges.
Misc / Module bumps
go.mod, Makefile, dkron/ui-dist/index.html
Bumps several Go modules, updates UI build artifacts, and adds Makefile e2e targets.
Tests
dkron/execution_stats_test.go, dkron/grpc_test.go, dkron/api_test.go, e2e/tests/*
Adds unit tests for execution stats and gRPC retry logic and a health endpoint test; adds multiple e2e bats suites.

Sequence Diagram(s)

sequenceDiagram
    participant Frontend
    participant API as API Server
    participant Store as Persistent Store
    participant JobExec as Job Execution Path

    Frontend->>API: GET /v1/stats?days=30
    API->>Store: GetExecutionStats(days)
    Store-->>API: {stats: [{date, success_count, failed_count}, ...]}
    API-->>Frontend: JSON payload

    JobExec->>API: SetExecutionDone(execution)
    API->>Store: Persist execution
    API->>Store: IncrementExecutionStat(date, success)
    Store-->>API: ACK (best-effort)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐇 I hop through logs and midnight queues,

Counting wins and those that lose,
E2E guards the warren gate,
Retries nudge me when things wait,
Charts bloom bright — the dashboard chews.

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is mostly complete with detailed information about changes, but the required 'Types of changes' section has no checkbox marked, which is a missing required element. Mark at least one checkbox in the 'Types of changes' section. This appears to be a 'New feature' (adding E2E test infrastructure), so check that box.
Docstring Coverage ⚠️ Warning Docstring coverage is 56.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective of the PR: adding a comprehensive end-to-end test suite for the Dkron API using Bats.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@e2e/run-e2e-tests.sh`:
- Around line 234-264: In main(), add validation before consuming the --filter
argument to ensure a value exists and isn't another flag: check that $2 is
non-empty and does not begin with '-' (or otherwise validate its format) before
doing FILTER="$2" and shift 2; if the check fails, call log_error with a clear
message about the missing/invalid --filter value, call print_usage, and exit
with a non-zero status to avoid treating the next option as the filter value.

In `@e2e/tests/03_job_execution.bats`:
- Around line 319-331: The test is flaky because the second run waits a fixed
sleep instead of confirming completion; replace the fixed sleep with the same
polling helper used earlier: after calling api_post
"/v1/jobs/${TEST_JOB_NAME}/run" use wait_for_execution "$TEST_JOB_NAME" 30 (or
the existing timeout) to wait for the second execution to finish before fetching
executions with api_get and counting via get_response_body and
json_array_length; ensure you keep the first api_post + wait_for_execution and
remove the sleep 3 so both runs use wait_for_execution for deterministic
behavior.
- Around line 28-62: Update the first test titled "POST /v1/jobs/:job triggers
job execution" to call the bare endpoint instead of the /run variant: change the
api_post invocation that currently posts to "/v1/jobs/${TEST_JOB_NAME}/run" to
post to "/v1/jobs/${TEST_JOB_NAME}" (leave the surrounding setup, sleeps, and
assertions unchanged so the test exercises the alternate route as intended and
avoids duplicating the second test).

In `@Makefile`:
- Line 61: The .PHONY declaration is missing the e2e-debug target; update the
.PHONY line that currently lists docs apidoc test ui updatetestcert test-email
e2e e2e-quick to also include e2e-debug so the make target e2e-debug is treated
as phony; modify the .PHONY symbol in the Makefile to add "e2e-debug" alongside
the existing entries.
🧹 Nitpick comments (8)
e2e/lib/helpers.bash (3)

10-13: Remove unused color variables.

These color variables are defined but never used in the helper library. Based on static analysis hints, consider removing them or exporting if intended for use in test files.

♻️ Suggested fix
-# Colors for output
-RED='\033[0;31m'
-GREEN='\033[0;32m'
-YELLOW='\033[1;33m'
-NC='\033[0m' # No Color

If these are intended for future use, export them:

-RED='\033[0;31m'
-GREEN='\033[0;32m'
-YELLOW='\033[1;33m'
-NC='\033[0m' # No Color
+export RED='\033[0;31m'
+export GREEN='\033[0;32m'
+export YELLOW='\033[1;33m'
+export NC='\033[0m' # No Color

278-298: Consider using jq for safer JSON construction.

The heredoc approach for building JSON can break if $command contains double quotes, backslashes, or other JSON special characters. For example, create_shell_job "test" "@every 1s" 'echo "hello"' would produce malformed JSON.

♻️ Suggested fix using jq
 create_shell_job() {
     local name="$1"
     local schedule="$2"
     local command="$3"
     local disabled="${4:-false}"
     
     local job_json
-    job_json=$(cat <<EOF
-{
-    "name": "$name",
-    "schedule": "$schedule",
-    "disabled": $disabled,
-    "executor": "shell",
-    "executor_config": {
-        "command": "$command"
-    }
-}
-EOF
-)
+    job_json=$(jq -n \
+        --arg name "$name" \
+        --arg schedule "$schedule" \
+        --argjson disabled "$disabled" \
+        --arg command "$command" \
+        '{
+            name: $name,
+            schedule: $schedule,
+            disabled: $disabled,
+            executor: "shell",
+            executor_config: { command: $command }
+        }')
     
     api_post "/v1/jobs" "$job_json"
 }

118-123: Use --arg for safer jq value interpolation.

String interpolation in the jq filter can break or behave unexpectedly if $value contains quotes or special characters.

♻️ Suggested fix
 json_array_contains() {
     local json="$1"
     local field="$2"
     local value="$3"
-    echo "$json" | jq -e "map(select($field == \"$value\")) | length > 0" > /dev/null 2>&1
+    echo "$json" | jq -e --arg v "$value" "map(select($field == \$v)) | length > 0" > /dev/null 2>&1
 }
e2e/tests/04_pause_unpause.bats (1)

17-22: Consider portability of md5sum command.

On macOS, the command is md5 instead of md5sum. If tests need to run on macOS (e.g., developer machines), this could cause failures.

♻️ Portable alternative
 setup() {
     # Generate unique job name for each test
-    export TEST_JOB_NAME="e2e-pause-$(date +%s%N | md5sum | head -c 8)"
+    export TEST_JOB_NAME="e2e-pause-$(date +%s)-$$-$RANDOM"
     # Ensure we start unpaused
     api_post "/v1/unpause" "" > /dev/null 2>&1 || true
 }

Or use a portable hash approach:

export TEST_JOB_NAME="e2e-pause-$(echo "$$-$(date +%s%N)" | cksum | cut -d' ' -f1)"
e2e/tests/01_health_cluster.bats (1)

79-104: Consider using helper functions consistently for JSON operations.

Lines 84, 93, and 102 use direct jq calls while other tests use the json_get helper. For consistency and maintainability, consider wrapping these in helper functions or using a consistent approach throughout.

This is a minor style suggestion and doesn't affect correctness.

e2e/tests/03_job_execution.bats (2)

36-38: Consider using a wait/retry pattern instead of fixed sleeps.

The fixed sleep 2 waits for job persistence could cause flaky tests on slower systems. Consider a helper that polls for job existence similar to wait_for_execution.

This is a minor improvement suggestion—the current approach is functional for most environments.

Example helper function
wait_for_job() {
    local job_name="$1"
    local timeout="${2:-10}"
    local interval=1
    local elapsed=0
    
    while [ $elapsed -lt $timeout ]; do
        local response
        response=$(api_get "/v1/jobs/${job_name}" 2>/dev/null)
        local status
        status=$(get_status_code "$response")
        
        if [ "$status" = "200" ]; then
            return 0
        fi
        
        sleep "$interval"
        elapsed=$((elapsed + interval))
    done
    return 1
}

195-199: Weak assertion with fallback pattern match.

The assertion [[ "$output" == *"$test_output"* ]] || [[ "$output" == *"e2e"* ]] has a very loose fallback. If the specific output isn't captured, matching any string containing "e2e" could pass incorrectly (e.g., if output contains the job name).

Consider removing the fallback or making it more specific:

Suggested fix
     # Output should contain our test string
-    [[ "$output" == *"$test_output"* ]] || [[ "$output" == *"e2e"* ]]
+    [[ "$output" == *"$test_output"* ]]
e2e/run-e2e-tests.sh (1)

14-22: Unused variable PROJECT_ROOT detected.

The variable PROJECT_ROOT is defined but never used in the script. Either remove it or use it where appropriate (e.g., in paths).

Suggested fix - remove if not needed
 # Get script directory
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
-PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
 
 # Configuration
 COMPOSE_FILE="${SCRIPT_DIR}/docker-compose.e2e.yml"

Comment on lines +234 to +264
main() {
# Parse arguments
while [[ $# -gt 0 ]]; do
case $1 in
--no-build)
BUILD=false
shift
;;
--keep)
KEEP_RUNNING=true
shift
;;
--filter)
FILTER="$2"
shift 2
;;
--verbose)
VERBOSE=true
shift
;;
--help|-h)
print_usage
exit 0
;;
*)
log_error "Unknown option: $1"
print_usage
exit 1
;;
esac
done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Argument parsing missing validation for --filter value.

If --filter is provided without a value (e.g., at end of command line), shift 2 will fail or $2 will be empty/next option. Consider adding validation.

Suggested fix
             --filter)
+                if [[ -z "${2:-}" || "$2" == --* ]]; then
+                    log_error "--filter requires a pattern argument"
+                    exit 1
+                fi
                 FILTER="$2"
                 shift 2
                 ;;
🤖 Prompt for AI Agents
In `@e2e/run-e2e-tests.sh` around lines 234 - 264, In main(), add validation
before consuming the --filter argument to ensure a value exists and isn't
another flag: check that $2 is non-empty and does not begin with '-' (or
otherwise validate its format) before doing FILTER="$2" and shift 2; if the
check fails, call log_error with a clear message about the missing/invalid
--filter value, call print_usage, and exit with a non-zero status to avoid
treating the next option as the filter value.

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 28, 2026

@vcastellm I've opened a new pull request, #1897, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 28, 2026

@vcastellm I've opened a new pull request, #1898, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@vcastellm
Copy link
Copy Markdown
Member Author

@copilot create a GH Actions workflow to run the e2e tests in this PR

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 28, 2026

@vcastellm I've opened a new pull request, #1899, to work on those changes. Once the pull request is ready, I'll request review from you.

* Initial plan

* Add GitHub Actions workflow for E2E tests

Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>

* Fix workflow: remove invalid tags-ignore and use apt for bats

Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>

* Add explicit permissions to e2e workflow

Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
@vcastellm
Copy link
Copy Markdown
Member Author

@copilot merge main and fix conflicts

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 28, 2026

@vcastellm I've opened a new pull request, #1902, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 3 commits January 28, 2026 21:09
* Initial plan

* Merge main branch into PR branch - resolve conflicts

Merged latest changes from main branch (commit 28f91f1) into the E2E test PR branch.

Conflict resolution:
- Accepted main branch version for all source files (Go code, UI, configs)
- Preserved E2E test infrastructure (e2e/ directory and all test files)
- Added e2e, e2e-quick, e2e-debug targets back to Makefile
- Kept .github/workflows/e2e.yml workflow file

New features from main now included:
- Execution statistics storage and dashboard graph
- Plugin health monitoring
- Retry logic with exponential backoff for AgentRun RPC
- Various dependency updates

Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
* Initial plan

* Update first test to use bare endpoint /v1/jobs/:job

Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
…#1898)

* Initial plan

* Replace sleep 3 with wait_for_execution for deterministic test behavior

Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
dkron/grpc_client.go (1)

3-17: Review retryable status codes against current gRPC guidance.

The current retry policy is overly aggressive. Current gRPC documentation recommends:

  • Safe to retry: Unavailable (explicitly "most likely transient")
  • Risky without conditions: DeadlineExceeded (client timeout ≠ request not executed; can cause duplicates), ResourceExhausted (should only retry for transient overload with backoff signal, not blind retry), Aborted (better handled at application level for concurrency control), Internal (gRFC A6 warns this may occur after side effects—method semantics matter; job execution is not safely idempotent)

For a distributed job scheduler, retrying DeadlineExceeded and Internal without additional guards risks executing jobs multiple times. Consider narrowing to Unavailable with exponential backoff, or add explicit guards (e.g., check for ResourceExhausted only if server sent retry pushback).

Also applies to: 505–529

cmd/plugins.go (1)

148-160: Kill the plugin client immediately on Client() or Dispense() errors to avoid subprocess leaks.

Even though config.Managed = true allows eventual cleanup via plugin.CleanupClients() at shutdown, the spawned plugin subprocess can linger between the error and program exit. Call client.Kill() immediately on both error paths (lines 153 and 158) to ensure timely cleanup.

🛠️ Suggested fix
	rpcClient, err := client.Client()
	if err != nil {
+		client.Kill()
		return nil, nil, err
	}

	raw, err := rpcClient.Dispense(pluginType)
	if err != nil {
+		client.Kill()
		return nil, nil, err
	}
🤖 Fix all issues with AI agents
In `@dkron/api_test.go`:
- Around line 612-638: The test leaks the agent returned by setupAPITest; add
cleanup to stop the agent like other tests that capture the variable `a` in
TestHealthEndpoint by calling the agent shutdown method (e.g., defer a.Stop() or
defer a.Shutdown() depending on which exists) immediately after setupAPITest
returns to ensure goroutines/ports are released.

In `@dkron/grpc_client.go`:
- Around line 435-503: The retry loop can be skipped when AgentRunMaxRetries is
negative causing lastErr to stay nil; clamp or validate maxRetries before the
loop (e.g., set maxRetries = 0 if AgentRunMaxRetries < 0) so at least one
AgentRun attempt occurs; update any derived values/fields that use maxRetries
(like "total_attempts") to reflect the clamped value and keep the rest of the
logic (backoff calc, lastErr handling, isRetryableError, logging) unchanged.

In `@e2e/tests/03_job_execution.bats`:
- Around line 152-173: The test "Failed job execution has success=false"
currently only checks that at least one execution exists; modify it to assert
the execution's success flag is false by parsing the executions response (from
api_get "/v1/jobs/${TEST_JOB_NAME}/executions" / get_response_body) and
verifying the latest/first execution object's "success" field equals false (use
the same helpers: get_response_body, json_array_length and a JSON extractor like
jq to read .[0].success or the last element). Keep the existing flow
(create_shell_job, api_post, wait_for_execution) and add a final assertion that
the execution's success value is false.

In `@go.mod`:
- Line 157: Find which direct module is pulling in
github.com/jhump/protoreflect/v2 (the indirect v2.0.0-beta.1) by inspecting
module graph (e.g., run go mod graph or go list -m all) and locate the direct
dependency in go.mod that references it; then try upgrading that direct
dependency to a newer release that depends on a non-beta protoreflect (or to a
version that no longer pulls v2), and if upgrading isn’t possible, pin the
transitive module to a stable version (e.g., github.com/jhump/protoreflect
v1.18.0) using an explicit require/replace in go.mod and run go mod tidy to
verify the beta v2 is no longer pulled.
🧹 Nitpick comments (2)
dkron/api.go (1)

583-598: Consider validating the days parameter bounds.

The OpenAPI spec defines days with minimum: 1 and maximum: 365, but the handler doesn't enforce these bounds. Invalid values could lead to unexpected behavior or performance issues with very large values.

♻️ Proposed fix to add bounds validation
 func (h *HTTPTransport) statsHandler(c *gin.Context) {
 	daysStr := c.DefaultQuery("days", "30")
 	days, err := strconv.Atoi(daysStr)
 	if err != nil {
 		days = 30
 	}
+	if days < 1 {
+		days = 1
+	} else if days > 365 {
+		days = 365
+	}
 
 	stats, err := h.agent.Store.GetExecutionStats(c.Request.Context(), days)
ui/src/dashboard/ExecutionStatsChart.tsx (1)

43-69: Consider handling the case where stats array is undefined or null.

If the API response doesn't include the stats field or returns null, accessing statsResponse.stats.map() will throw an error.

♻️ Proposed defensive check
                 const response = await httpClient(`${apiUrl}/stats?days=30`);
                 const statsResponse: ExecutionStatsResponse = response.json;
                 
-                const chartData: ChartDataPoint[] = statsResponse.stats.map((stat) => ({
+                const chartData: ChartDataPoint[] = (statsResponse.stats || []).map((stat) => ({
                     date: stat.date,
                     dateFormatted: dateFormatter(stat.date),
                     success: stat.success_count,
                     failed: stat.failed_count,
                     total: stat.success_count + stat.failed_count,
                 }));

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@vcastellm
Copy link
Copy Markdown
Member Author

@copilot In @go.mod at line 157, Find which direct module is pulling in
github.com/jhump/protoreflect/v2 (the indirect v2.0.0-beta.1) by inspecting
module graph (e.g., run go mod graph or go list -m all) and locate the direct
dependency in go.mod that references it; then try upgrading that direct
dependency to a newer release that depends on a non-beta protoreflect (or to a
version that no longer pulls v2), and if upgrading isn’t possible, pin the
transitive module to a stable version (e.g., github.com/jhump/protoreflect
v1.18.0) using an explicit require/replace in go.mod and run go mod tidy to
verify the beta v2 is no longer pulled.

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 29, 2026

@vcastellm I've opened a new pull request, #1903, to work on those changes. Once the pull request is ready, I'll request review from you.

@vcastellm
Copy link
Copy Markdown
Member Author

@copilot In @e2e/tests/03_job_execution.bats around lines 152 - 173, The test "Failed job
execution has success=false" currently only checks that at least one execution
exists; modify it to assert the execution's success flag is false by parsing the
executions response (from api_get "/v1/jobs/${TEST_JOB_NAME}/executions" /
get_response_body) and verifying the latest/first execution object's "success"
field equals false (use the same helpers: get_response_body, json_array_length
and a JSON extractor like jq to read .[0].success or the last element). Keep the
existing flow (create_shell_job, api_post, wait_for_execution) and add a final
assertion that the execution's success value is false.

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 29, 2026

@vcastellm I've opened a new pull request, #1904, to work on those changes. Once the pull request is ready, I'll request review from you.

@vcastellm
Copy link
Copy Markdown
Member Author

@copilot In @dkron/grpc_client.go around lines 435 - 503, The retry loop can be skipped
when AgentRunMaxRetries is negative causing lastErr to stay nil; clamp or
validate maxRetries before the loop (e.g., set maxRetries = 0 if
AgentRunMaxRetries < 0) so at least one AgentRun attempt occurs; update any
derived values/fields that use maxRetries (like "total_attempts") to reflect the
clamped value and keep the rest of the logic (backoff calc, lastErr handling,
isRetryableError, logging) unchanged.

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 29, 2026

@vcastellm I've opened a new pull request, #1905, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@dkron/api_test.go`:
- Around line 612-617: TestHealthEndpoint currently hardcodes port "8099", which
conflicts with other tests; change it to allocate a free port dynamically and
pass that port to setupAPITest. Use net.Listen on ":0" (or an existing helper
like GetFreePort if present) to obtain an available port, extract the actual
port string, build healthURL with that port, then call setupAPITest with the
dynamic port and proceed as before (referencing TestHealthEndpoint and
setupAPITest).
- Around line 640-641: There is an extra closing brace causing a syntax error at
the end of the test file; remove the stray `}` that follows the already-closed
TestHealthEndpoint function so the file ends with the proper closing of that
test and no additional brace; search for the TestHealthEndpoint function in
dkron/api_test.go and delete the lone `}` after its closing brace.

Comment on lines +612 to +617
func TestHealthEndpoint(t *testing.T) {
port := "8099"
healthURL := fmt.Sprintf("http://localhost:%s/health", port)
dir, a := setupAPITest(t, port)
defer os.RemoveAll(dir)
defer a.Stop() // nolint: errcheck
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Port "8099" is already in use by other tests.

Port 8099 is used by TestAPIJobCreateUpdateValidationBadTimezone (line 282) and TestAPIJobCreateUpdateValidationBadShellExecutorTimeout (line 295). This can cause flaky test failures when tests run in parallel.

🔧 Proposed fix: use a unique port
 func TestHealthEndpoint(t *testing.T) {
-	port := "8099"
+	port := "8112"
 	healthURL := fmt.Sprintf("http://localhost:%s/health", port)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func TestHealthEndpoint(t *testing.T) {
port := "8099"
healthURL := fmt.Sprintf("http://localhost:%s/health", port)
dir, a := setupAPITest(t, port)
defer os.RemoveAll(dir)
defer a.Stop() // nolint: errcheck
func TestHealthEndpoint(t *testing.T) {
port := "8112"
healthURL := fmt.Sprintf("http://localhost:%s/health", port)
dir, a := setupAPITest(t, port)
defer os.RemoveAll(dir)
defer a.Stop() // nolint: errcheck
🤖 Prompt for AI Agents
In `@dkron/api_test.go` around lines 612 - 617, TestHealthEndpoint currently
hardcodes port "8099", which conflicts with other tests; change it to allocate a
free port dynamically and pass that port to setupAPITest. Use net.Listen on ":0"
(or an existing helper like GetFreePort if present) to obtain an available port,
extract the actual port string, build healthURL with that port, then call
setupAPITest with the dynamic port and proceed as before (referencing
TestHealthEndpoint and setupAPITest).

@vcastellm
Copy link
Copy Markdown
Member Author

@copilot In @dkron/api_test.go:

  • Around line 612-638: The test leaks the agent returned by setupAPITest; add
    cleanup to stop the agent like other tests that capture the variable a in
    TestHealthEndpoint by calling the agent shutdown method (e.g., defer a.Stop() or
    defer a.Shutdown() depending on which exists) immediately after setupAPITest
    returns to ensure goroutines/ports are released.

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 29, 2026

@vcastellm I've opened a new pull request, #1906, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 4 commits January 29, 2026 21:37
…1906)

* Initial plan

* Fix syntax error in api_test.go - remove duplicate closing brace

Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
* Initial plan

* Fix retry loop to ensure at least one attempt when maxRetries is negative

Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
* Initial plan

* Enhance failed job execution test to assert success=false

Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
* Initial plan

* Downgrade protoreflect from v1.18.0 to v1.17.0 to remove beta v2 dependency

Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
@vcastellm
Copy link
Copy Markdown
Member Author

@copilot merge main and fix conflicts

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 29, 2026

@vcastellm I've opened a new pull request, #1907, to work on those changes. Once the pull request is ready, I'll request review from you.

* Initial plan

* Merge main branch into E2E test suite PR

Merged latest changes from main branch (commit 2b5f80b) into the E2E test PR branch.

Conflict resolution strategy:
- Accepted main branch version for all source files (Go code, UI, configs)
- Preserved E2E test infrastructure (e2e/ directory and all test files)
- Added e2e, e2e-quick, e2e-debug targets back to Makefile
- Kept .github/workflows/e2e.yml workflow file

New features from main now included:
- Redesigned UI with modern professional styling (#1894)
- Execution statistics storage and dashboard graph (#1896)
- Plugin health monitoring in /health endpoint (#1884)
- Retry logic with exponential backoff for AgentRun RPC (#1882)
- Various dependency updates

Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>

* Fix unresolved merge conflict in ui/src/layout/index.css

Replaced file with clean version from main branch to remove conflict markers.

Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
dkron/grpc_client.go (1)

586-590: Defer inside loop captures variable by reference—may leak entries.

The defer on line 590 is inside the for loop and captures execution.Key() as a closure. Since execution is reassigned on each iteration (line 589), all accumulated defers will evaluate execution.Key() at function return time, using the final value. If the execution key changes across iterations, earlier keys stored in activeExecutions won't be deleted.

🛡️ Proposed fix to capture key immediately
 		// Registers an active stream
 		grpcc.agent.activeExecutions.Store(ars.Execution.Key(), ars.Execution)
 		grpcc.logger.WithField("key", ars.Execution.Key()).Debug("grpc: received execution stream")

 		execution = ars.Execution
-		defer grpcc.agent.activeExecutions.Delete(execution.Key())
+		execKey := execution.Key()
+		defer grpcc.agent.activeExecutions.Delete(execKey)
ui/src/jobs/JobShow.tsx (1)

66-126: Use useRefresh() to trigger UI update after fetching full output.

Lines 80-81 mutate the record object directly, but in react-admin, useRecordContext() only re-renders when the RecordContext value itself changes—not when the record object is mutated. This means the button won't disappear and the output won't display even after the fetch succeeds.

Similar components in the codebase (RunButton.tsx, ToggleButton.tsx) use useRefresh() to handle this. Import useRefresh and call it in the onSuccess handler to refetch and re-render the parent view:

Suggested fix
-import { useNotify, Button, useRecordContext } from "react-admin";
+import { useNotify, Button, useRecordContext, useRefresh } from "react-admin";
 const FullButton = ({ record }: any) => {
   const notify = useNotify();
+  const refresh = useRefresh();
   const [loading, setLoading] = useState(false);
   const handleClick = () => {
     setLoading(true);
     httpClient(`${apiUrl}/jobs/${record.job_name}/executions/${record.id}`)
       .then((response) => {
         if (response.status === 200) {
           notify("Success loading full output");
           return response.json;
         }
         throw response;
       })
       .then((data) => {
-        record.output_truncated = false;
-        record.output = data.output;
+        refresh();
       })
       .catch((e) => {
         notify("Error on loading full output", { type: "warning" });
       })
       .finally(() => {
         setLoading(false);
       });
   };
🤖 Fix all issues with AI agents
In `@ui/src/jobs/JobEdit.tsx`:
- Around line 31-45: JobEdit currently calls useRecordContext() before Edit
creates the provider and also doesn't accept routing props, causing
record/resource/id to be undefined; update JobEdit to accept props (e.g.,
function JobEdit(props)) and spread those props onto the Edit component (replace
the current {...record} usage), remove or move any premature useRecordContext()
usage so useRecordContext is only called inside children after Edit provides
context, and keep existing children/components (JobEditActions, EditForm)
intact.

In `@ui/src/layout/index.css`:
- Around line 121-129: Global CSS rules .RaSidebar-fixed and
.RaMenuItemLink-active use !important and override the theme-aware styles in
ThemedLayout.tsx; remove or migrate them so only one source controls the sidebar
styling. Fix by deleting or commenting out the .RaSidebar-fixed and
.RaMenuItemLink-active declarations in ui/src/layout/index.css, then move their
intended styles into the styled component(s) inside ThemedLayout.tsx (use the
component's theme-driven backgroundColor values and ensure class names/props
match), or conversely remove the styled component rules and keep the global
CSS—do not keep both with conflicting !important rules.
🧹 Nitpick comments (7)
ui/src/jobs/JobList.tsx (1)

130-171: Remember to rebuild UI assets before shipping.
Since UI source changed, run yarn build --out-dir ../dkron/ui-dist (or make ui) before building the main binary so embedded assets stay in sync.

ui/src/executions/BusyList.tsx (2)

11-11: Misleading function name for row selection logic.

The function selectRowDisabled returns false, which when passed to isRowSelectable means rows are not selectable. The name implies the opposite behavior. Consider renaming for clarity.

💡 Suggested rename for clarity
-const selectRowDisabled = () => false;
+const isRowSelectable = () => false;

Then update line 76:

-isRowSelectable={selectRowDisabled}
+isRowSelectable={isRowSelectable}

20-53: Consider using theme tokens for consistent styling.

The ListHeader uses hard-coded colors (e.g., #d69e2e, #b7791f) that match the warning palette defined in theme.ts. Using theme tokens would improve maintainability and ensure consistency if the palette changes.

💡 Example using theme
+import { useTheme } from '@mui/material/styles';

-const ListHeader = () => (
+const ListHeader = () => {
+  const theme = useTheme();
+  return (
   <Box ...>
     <Box
       sx={{
         ...
-        background: 'linear-gradient(135deg, `#d69e2e` 0%, `#b7791f` 100%)',
+        background: `linear-gradient(135deg, ${theme.palette.warning.main} 0%, ${theme.palette.warning.dark} 100%)`,
         ...
-        boxShadow: '0 4px 6px -1px rgba(214, 158, 46, 0.2)',
+        boxShadow: `0 4px 6px -1px ${theme.palette.warning.main}33`,
       }}
     >
ui/src/layout/ThemedLayout.tsx (1)

47-47: Consider typing the StyledSidebar props.

The any type on props reduces type safety. Consider using proper typing.

💡 Type-safe alternative
-const StyledSidebar = (props: any) => <CustomSidebar {...props} size={220} />;
+import { SidebarProps } from 'react-admin';
+const StyledSidebar = (props: SidebarProps) => <CustomSidebar {...props} size={220} />;
ui/src/layout/index.css (1)

34-52: Custom scrollbar styling is WebKit-only.

The ::-webkit-scrollbar pseudo-elements only work in Chromium-based browsers and Safari. Firefox users will see the default scrollbar. Consider adding Firefox support if consistent scrollbar appearance is important.

💡 Add Firefox scrollbar support
 .execution-output {
     ...
+    /* Firefox scrollbar */
+    scrollbar-width: thin;
+    scrollbar-color: `#4a5568` `#2d3748`;
 }
ui/src/App.tsx (1)

30-37: Ensure the UI build step runs before building the main binary.
This keeps embedded UI assets in sync with these changes.

As per coding guidelines: Build the UI with yarn build --out-dir ../dkron/ui-dist or make ui before building the main binary to ensure embedded UI assets are available.

ui/src/jobs/JobShow.tsx (1)

263-409: Reminder: rebuild UI assets after these UI changes.
Run the UI build step before producing the main binary so the embedded assets stay in sync.

As per coding guidelines: Build the UI with yarn build --out-dir ../dkron/ui-dist or make ui before building the main binary to ensure embedded UI assets are available.

Comment on lines 31 to +45
export const JobEdit = () => {
const record = useRecordContext();
return (<Edit {...record}>
<EditForm />
</Edit>);
return (
<Edit
{...record}
actions={<JobEditActions />}
sx={{
'& .RaEdit-main': {
mt: 2,
},
}}
>
<EditForm />
</Edit>
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the JobEdit.tsx file
find . -name "JobEdit.tsx" -type f

Repository: dkron-io/dkron

Length of output: 83


🏁 Script executed:

# Let's also check the file structure around UI
git ls-files ui/ | head -20

Repository: dkron-io/dkron

Length of output: 478


🏁 Script executed:

# Search for JobEdit usage and imports
rg "JobEdit" --type ts --type tsx -A 3 -B 3

Repository: dkron-io/dkron

Length of output: 85


🏁 Script executed:

# Check the full content of JobEdit.tsx
find . -name "JobEdit.tsx" -type f -exec wc -l {} \;

Repository: dkron-io/dkron

Length of output: 87


🏁 Script executed:

# Read the JobEdit.tsx file
cat -n ./ui/src/jobs/JobEdit.tsx

Repository: dkron-io/dkron

Length of output: 5406


🏁 Script executed:

# Search for JobEdit usage in the codebase
rg "JobEdit" -A 3 -B 3

Repository: dkron-io/dkron

Length of output: 1484


🏁 Script executed:

# Check App.tsx to understand routing/component hierarchy
cat -n ./ui/src/App.tsx | head -100

Repository: dkron-io/dkron

Length of output: 1839


JobEdit is losing react-admin routing props.

useRecordContext() has no provider at the point where it's called (it's above the Edit component which creates the provider), so record is undefined. The Edit component requires resource and id props from react-admin's routing to function. This breaks the edit view.

Accept the incoming props parameter and spread them onto Edit, matching the pattern already used correctly in JobCreate:

Diff
import { 
    Edit,
    SelectInput,
    TextInput,
    Create,
    SimpleForm,
    BooleanInput,
    NumberInput,
    DateTimeInput,
    required,
-   useRecordContext,
    TopToolbar,
    ListButton,
    ShowButton,
} from 'react-admin';

-export const JobEdit = () => {
-    const record = useRecordContext();
-    return (
-        <Edit 
-            {...record}
+export const JobEdit = (props: any) => (
+    <Edit 
+        {...props}
         actions={<JobEditActions />}
         sx={{
             '& .RaEdit-main': {
                 mt: 2,
             },
         }}
     >
         <EditForm />
     </Edit>
-    );
-}
+);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export const JobEdit = () => {
const record = useRecordContext();
return (<Edit {...record}>
<EditForm />
</Edit>);
return (
<Edit
{...record}
actions={<JobEditActions />}
sx={{
'& .RaEdit-main': {
mt: 2,
},
}}
>
<EditForm />
</Edit>
);
import {
Edit,
SelectInput,
TextInput,
Create,
SimpleForm,
BooleanInput,
NumberInput,
DateTimeInput,
required,
TopToolbar,
ListButton,
ShowButton,
} from 'react-admin';
export const JobEdit = (props: any) => (
<Edit
{...props}
actions={<JobEditActions />}
sx={{
'& .RaEdit-main': {
mt: 2,
},
}}
>
<EditForm />
</Edit>
);
🤖 Prompt for AI Agents
In `@ui/src/jobs/JobEdit.tsx` around lines 31 - 45, JobEdit currently calls
useRecordContext() before Edit creates the provider and also doesn't accept
routing props, causing record/resource/id to be undefined; update JobEdit to
accept props (e.g., function JobEdit(props)) and spread those props onto the
Edit component (replace the current {...record} usage), remove or move any
premature useRecordContext() usage so useRecordContext is only called inside
children after Edit provides context, and keep existing children/components
(JobEditActions, EditForm) intact.

Comment on lines +121 to +129
/* Sidebar styling */
.RaSidebar-fixed {
background-color: #1a365d !important;
}

/* Menu items in sidebar */
.RaMenuItemLink-active {
background-color: rgba(255, 255, 255, 0.1) !important;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's search for references to these classes
rg -n "RaSidebar|RaMenuItemLink" --type=css --type=ts --type=tsx -g '!node_modules'

Repository: dkron-io/dkron

Length of output: 85


🏁 Script executed:

#!/bin/bash
# Search for references to these classes across the codebase
rg -n "RaSidebar|RaMenuItemLink" -g '!node_modules'

Repository: dkron-io/dkron

Length of output: 50372


🏁 Script executed:

#!/bin/bash
# Read the specific lines from index.css
echo "=== index.css lines 120-130 ==="
sed -n '120,130p' ui/src/layout/index.css

echo -e "\n=== ThemedLayout.tsx lines 1-50 ==="
head -50 ui/src/layout/ThemedLayout.tsx

Repository: dkron-io/dkron

Length of output: 1802


Consolidate sidebar styling to avoid CSS cascade conflicts.

The global CSS rules for .RaSidebar-fixed and .RaMenuItemLink-active (lines 122-128) use !important and will override the styled component rules in ThemedLayout.tsx (lines 5-45), which don't have !important. This means the styled component's theme-aware backgroundColor settings won't actually apply—the hard-coded global CSS values will always win. Either move these styles into the styled component and remove the global CSS rules, or vice versa, but don't maintain both approaches simultaneously.

🤖 Prompt for AI Agents
In `@ui/src/layout/index.css` around lines 121 - 129, Global CSS rules
.RaSidebar-fixed and .RaMenuItemLink-active use !important and override the
theme-aware styles in ThemedLayout.tsx; remove or migrate them so only one
source controls the sidebar styling. Fix by deleting or commenting out the
.RaSidebar-fixed and .RaMenuItemLink-active declarations in
ui/src/layout/index.css, then move their intended styles into the styled
component(s) inside ThemedLayout.tsx (use the component's theme-driven
backgroundColor values and ensure class names/props match), or conversely remove
the styled component rules and keep the global CSS—do not keep both with
conflicting !important rules.

@vcastellm vcastellm merged commit 73fd28e into main Jan 29, 2026
7 checks passed
@vcastellm vcastellm deleted the feature/e2e-testing branch January 29, 2026 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants